Minimize the jumps of all derivatives up to order k across each face:
Weights
Local system
Solved iteratively — options below.
Three inner-product choices
Reconstruction iteration schemes (VariationalReconstruction.hpp:938-1031)
DoReconstructionIter — Jacobi / SOR sweep (tests use Jacobi).DoReconstructionIterDiff — Jacobian-vector product (GMRES inner).DoReconstructionIterSOR — SOR with optional reverse pass.DoReconstruction2nd, DoReconstruction2ndGrad.Construct* callstemplate <int dim = 2>
class VariationalReconstruction : public FiniteVolume {
public:
void ConstructMetrics(); // via FiniteVolume
void ConstructBaseAndWeight(tFGetBoundaryWeight id2faceDircWeight = …); // basis + cached diff values
void ConstructRecCoeff(); // A, B, A^-1 B, secondary
// …
};
ConstructMetrics buildsConstructBaseAndWeight buildscellBaseMoment — basis moments per cell.faceAlignedScales, faceMajorCoordScale.cellDiffBaseCache, faceDiffBaseCache — cached derivative values at all quadrature points, for every neighbour in the stencil.bndVRCaches — boundary-face caches for BC-weighted VR.ConstructRecCoeff buildsmatrixAB, vectorB — per-neighbor RHS blocks.matrixAAInvB, vectorAInvB — precomputed matrixSecondary, matrixAHalf_GG — auxiliary reconstruction systems.matrixA, matrixACholeskyL, volIntCholeskyL — full system + Cholesky factor for dense local solves.All arrays are ArrayEigenMatrix* or ArrayEigenUniMatrixBatch* — i.e., Eigen maps over an MPI-aware distributed memory block.
FiniteVolume — the metric cacheclass FiniteVolume : public DeviceTransferable<FiniteVolume> {
real sumVolume, minVolume{veryLargeReal}, maxVolume, volGlobal;
tScalarPair volumeLocal; // per-cell volume
tScalarPair faceArea; // per-face area
tRecAtrPair cellAtr, faceAtr; // (NDOF, NDIFF, Order, intOrder)
tCoeffPair cellIntJacobiDet, faceIntJacobiDet;
t3VecsPair faceUnitNorm; // normal at each face quadrature pt
t3VecPair faceMeanNorm;
t3VecPair cellBary, faceCent, cellCent;
t3VecsPair cellIntPPhysics, faceIntPPhysics;
t3VecPair cellAlignedHBox, cellMajorHBox;
t3MatPair cellMajorCoord, cellInertia;
tScalarPair cellSmoothScale;
int axisSymmetric = 0; // wedge axisymmetry
std::set<index> axisFaces;
// CRTP: to_device(), to_host(), device(), deviceView<B>()
};
CUDA-transferable. FiniteVolume (and therefore VariationalReconstruction) inherits from DeviceTransferable<FiniteVolume>. One call to fv.to_device() migrates the entire metric cache to the GPU as a device-side view.
enum RiemannSolverType {
UnknownRS = 0,
Roe = 1, HLLC = 2, HLLEP = 3, HLLEP_V1 = 21,
Roe_M1 = 11, Roe_M2 = 12, Roe_M3 = 13, Roe_M4 = 14, Roe_M5 = 15,
Roe_M6 = 16, Roe_M7 = 17, Roe_M8 = 18, Roe_M9 = 19,
};
| Variant | Entropy-fix / eigenvalue scheme |
|---|---|
Roe |
standard Roe + Harten–Yee |
Roe_M1 |
cLLF (central + Local Lax–Friedrichs) |
Roe_M2 |
Lax–Friedrichs |
Roe_M3 |
LD Roe (low-dissipation) |
Roe_M4 |
ID Roe (intermediate dissipation) |
Roe_M5 |
LD cLLF |
Roe_M6 |
H-correction only |
Roe_M7 |
Harten–Yee only, no H-correction |
Roe_M8 |
H-correction + Harten–Yee |
Roe_M9 |
Reserved (eigScheme 9, currently asserts false) |
HLLC |
Harten–Lax–van Leer–Contact |
HLLEP |
HLLE with pressure fix |
HLLEP_V1 |
HLLEP variant 1 |
// Shared helper
template <int dim>
RoePreamble<dim> ComputeRoePreamble(ULm, URm, gamma, dumpInfo);
RoePreamble — the shared middletemplate <int dim>
struct RoePreamble {
TVec veloLm, veloRm; // primitive velocities
real rhoLm, rhoRm, pLm, pRm, HLm, HRm; // primitive state
real veloLm0, veloRm0; // normal velocity components
TVec veloRoe; // Roe-averaged velocity
real sqrtRhoLm, sqrtRhoRm;
real vsqrRoe, HRoe, asqrRoe, rhoRoe, aRoe;
};
template <int dim, int eigScheme>
void RoeFlux(UL, UR, ULm, URm, n, vgN,
/*out*/ flux,
/*out*/ dLambda,
fixScale, gamma, dumpInfo);
template <int dim, int type>
void HLLEPFlux_IdealGas(UL, UR, ULm, URm, n, vgN,
flux, …, gamma, dumpInfo);
template <int dim>
void HLLCFlux(UL, UR, ULm, URm, n, vgN, …);
All 13 variants share ComputeRoePreamble — the Roe average, eigScheme template parameter then selects the dissipation / entropy-fix strategy.
dim, eigScheme) keeps code size bounded.NSFluxInvis<dim>, NSFluxVis<dim>(U, gradU, T, mu, n, flux, adiabaticWall, useQCR).FWBAP_L2_Multiway — generic Eigen arrays.FWBAP_L2_Multiway_Polynomial2D — 2D polynomial-weighted norm.FWBAP_L2_Multiway_PolynomialOrth — orthogonal variant.FMEMM_Multiway_Polynomial2D — Modified Extremum-Monotone Mixer.Power parameter: p = 4; verySmallReal_pDiP = std::pow(verySmallReal, 1.0/p) stabilises near zero.
FWBAP_L2_BiwayFWBAP_L2_Cut_Biway — sign-cutoffFMINMOD_BiwayFVanLeer_BiwayFWBAP_L2_Biway_PolynomialNorm<dim, nVarsFixed>FMEMM_Biway_PolynomialNorm<dim, nVarsFixed>FWBAP_L2_Biway_PolynomialOrthConfiguration
"limiterProcedure": 0 // WBAP (V2)
"limiterProcedure": 1 // CWBAP (V3) ← recommended
"usePPRecLimiter": true
Positivity preservation —
LimiterUGrad(Euler side) clamps gradients;EvaluateURecBetaenforces cell-mean positivity on reconstructed values;EvaluateCellRHSAlphaenforces CFL-consistent per-cell RHS scaling.
template <int nVarsFixed>
void DoLimiterWBAP_C(tUDof<nVarsFixed> &u,
tURec<nVarsFixed> &uRec,
tURec<nVarsFixed> &uRecNew,
tURec<nVarsFixed> &uRecBuf,
tSmoothIndicator &si,
bool ifAll,
tFM FM, // cons → char transform
tFMI FMI, // char → cons transform
bool putIntoNew = false);
template <int nVarsFixed>
void DoLimiterWBAP_3(...); // 3-mode variant
si.FM).FMI).uRecNew (double-buffer for iterative schemes).DoCalculateSmoothIndicator<nVarsFixed, nVarsSee=2>(si, uRec, u, varsSee) — classical indicator over a subset of variables.DoCalculateSmoothIndicatorV1<nVarsFixed>(si, uRec, u, varsSee, FPost) — V1 with user-provided post-processing.All integrators descend from:
template <class TDATA, class TDTAU>
class ImplicitDualTimeStep {
using Frhs = std::function<void(TDATA&, TDATA&, TDTAU&, int, real, int)>;
using Fdt = std::function<void(TDATA&, TDTAU&, real, int)>;
using Fsolve = std::function<void(TDATA&, TDATA&, TDATA&, TDTAU&, real, real, TDATA&, int, real, int)>;
using Fstop = std::function<bool(int, TDATA&, int)>;
using Fincrement = std::function<void(TDATA&, TDATA&, real, int)>;
virtual void Step(TDATA &x, TDATA &xinc, const Frhs&, const Fdt&, const Fsolve&,
int maxIter, const Fstop&, const Fincrement&, real dt) = 0;
};
odeCode |
Class | Scheme |
|---|---|---|
103 |
ImplicitEulerDualTimeStep |
Backward Euler |
0 |
ImplicitBDFDualTimeStep |
BDF2 / BDF-k |
| — | ImplicitVBDFDualTimeStep |
Variable-step BDF-k |
1 |
ImplicitSDIRK4DualTimeStep (schemeCode 0…4) |
SDIRK-4 · ESDIRK2/3 · Trapezoidal |
101 |
(alias for 1) |
(backward-compat odeCode) |
401 |
ImplicitHermite3SimpleJacobianDualStep |
HM3 + p-Multigrid |
2 |
ExplicitSSPRK3TimeStepAsImplicitDualTimeStep |
SSP-RK3 |
SetExtraParams(json) exposes scheme-specific knobs (e.g. nMG, incFScale).
HM3 (Hermite-3) is a 3rd-order A-stable implicit scheme with three modes:
Inside ImplicitHermite3SimpleJacobianDualStep::Step() a nonzero nMG triggers p-multigrid smoothing cycles:
// pseudocode inside the inner solve (lines 1250-1251)
fdt (xMG, dTau, 1.0, /*upos=*/2); // lower-order pseudo-timestep
frhs(rhsbuf[1], xMG, dTau, iter, 1.0, /*upos=*/2);
The upos=2 argument tells the evaluator to evaluate at a lower polynomial order (level-transition). VR provides DownCastURecOrder(curOrder, iCell, uRec, downCastMethod) to project reconstruction coefficients between orders.
tpMG — toggle for multigrid in the outer dual-time loop.incFScale — incremental flux scaling on lower MG levels; integrated into the entropy fix path (RELEASE_NOTES.md).LimiterUGrad — prevent the lower-order coarse-grid correction from producing negative density / pressure.SDIRK4 codesschemeCode = 0 — Nørsett 3-stage SDIRK-4schemeCode = 1 — 6-stage ARK-family SDIRKschemeCode = 2 — Kennedy–Carpenter ESDIRK3schemeCode = 3 — TrapezoidalschemeCode = 4 — ESDIRK2, γ = 1 − √2/2template <class TDATA>
class GMRES_LeftPreconditioned {
public:
GMRES_LeftPreconditioned(index dofSize);
void setSpace(int kSpace);
bool solve(const TDATA &rhs, TDATA &x,
FMatVec Ax, FPCApply PC,
int maxIter, real tol);
};
template <class TDATA, class TScalar>
class PCG_PreconditionedRes { … };
Matrix-free: the caller supplies Ax and PC functors.
Provided by EulerEvaluator:
void LUSGSMatrixInit(JDiag, JSource, dTau, dt, alphaDiag, u, uRec, jacCode, t);
void LUSGSMatrixVec(alphaDiag, t, u, uInc, JDiag, AuInc);
void LUSGSMatrixToJacobianLU(alphaDiag, t, u, JDiag, jacLU);
void UpdateSGS(alphaDiag, t, rhs, u, uInc, uIncNew, JDiag,
forward, gsUpdate, sumInc, uIncIsZero = false);
void LUSGSMatrixSolveJacobianLU(alphaDiag, t, rhs, u, uInc, uIncNew,
bBuf, JDiag, jacLU,
uIncIsZero, sumInc);
void UpdateSGSWithRec(alphaDiag, t, rhs, u, uRec, uInc, uRecInc,
JDiag, forward, sumInc);
Selector
"gmresCode": 0 // LUSGS only (cheap, robust)
"gmresCode": 1 // GMRES (matrix-free Krylov)
"gmresCode": 2 // LUSGS + GMRES (LUSGS as PC for GMRES)
Direct path for small blocks: src/Solver/Direct.hpp (LU / LDLT). Optional SuperLU_dist via the cfd_externals submodule.
Setup is collective and expensive. Communication is local and cheap.
Build-once phase — collective
trans.setFatherSon(father, son);
trans.createFatherGlobalMapping();
// collective: MPI_Allgather over local sizes
trans.createGhostMapping(pullGlobal);
// collective: sorts + dedups pullGlobal IN PLACE
// — saves a copy if you need the original
trans.createMPITypes();
// local: MPI_Type_create_hindexed describes
// the scattered rows to send/recv
// — ALSO resizes the son array to hold them
trans.initPersistentPull();
// local: MPI_Recv_init + MPI_Send_init
The derived MPI datatypes persist with the transformer — teardown costs them nothing until destruction.
Hot-loop phase — local only
for (int step = 0; step < N; ++step) {
trans.startPersistentPull(); // MPI_Startall
computeFluxes(/* reads ghosts */);
trans.waitPersistentPull(); // MPI_Waitall
}
trans.clearPersistentPull();
v0.2.0 bug-fix:
globalSize() used to be collective and could deadlock when some ranks took short-cut paths. It's now cached at createFatherGlobalMapping time — fully local.
MPI::CommStrategy::Instance().GetArrayStrategy() selects:
HIndexed — defaultMPI_Type_create_hindexed(count, blocklengths, displacements,
base_type, &new_type);
InSituPackinSituBuffer[rank].clear();
for (index i : pushingIndexLocal[rank])
inSituBuffer[rank].append(row(i));
MPI_Isend(inSituBuffer[rank].data(), ...);
HIndexed on some older MPI stacks and on CUDA-aware MPI with GPU-Direct where the driver prefers flat buffers.Both strategies live behind the same public API. The choice is a tuning knob — no application-level changes needed.
BorrowGGIndexing — avoid collective setup twice// Primary array: does the full collective setup
ArrayTransformer<real, 5> cellUTrans;
cellUTrans.setFatherSon(uFather, uSon);
cellUTrans.createFatherGlobalMapping();
cellUTrans.createGhostMapping(pullGlobal);
cellUTrans.createMPITypes();
// Secondary array: reuses the *global + ghost* mapping.
// Only the MPI datatypes (which depend on the row size) are rebuilt.
ArrayTransformer<real, DynamicSize> recTrans;
recTrans.setFatherSon(uRecFather, uRecSon);
recTrans.BorrowGGIndexing(cellUTrans); // <-- key line
recTrans.createMPITypes();
recTrans.initPersistentPull();
Consequence. In the Euler pipeline every DOF array (u, uPrev, uInc, uRec, uRecInc, uRecB, …) shares a single ghost map established from the cell2cell adjacency. Only the MPI datatypes differ, keyed on the row size of each array.
-DDNDS_DIST_MT_USE_OMP=ON activates threaded paths throughout:
EigenVecMin, EigenVecSum fold per thread, then combine.toLocalOMP / toGlobalOMP / bootstrapToLocalOMP parallelize over the rows of adjacency arrays.ConstructX() methods in FiniteVolume loop over cells / faces with #pragma omp parallel for.DoReconstructionIter has an OMP variant.CI default OMP_NUM_THREADS=2 (override at configure time via DNDS_TEST_OMP_THREADS). MPI-rank count per test configurable via DNDS_TEST_NP_LIST.
Typical production deployment: 1 MPI rank per NUMA node × OMP threads within. MPI handles cross-socket / cross-node; OMP handles within.
DeviceTransferable CRTPtemplate <class TDerived>
class DeviceTransferable {
public:
// Derived implements: device_array_list() returning a tuple of host-device arrays
void to_device(DeviceBackend B = DeviceBackend::CUDA);
void to_host();
DeviceBackend device() const;
template <DeviceBackend B> auto deviceView();
};
// Example user
class FiniteVolume : public DeviceTransferable<FiniteVolume> {
auto device_array_list() {
return std::tie(volumeLocal, faceArea, faceUnitNorm, cellBary,
cellInertia, cellIntJacobiDet, /* ... */);
}
};
fv.to_device();
auto dv = fv.deviceView<CUDA>();
launchKernel<<<blocks, threads>>>(dv);
fv.to_host();
UnstructuredMesh (connectivity)FiniteVolume (metrics)VariationalReconstruction (via base)VRDefines DOF arraysBuild: cmake --preset cuda → -DDNDS_USE_CUDA=ON · Thrust fixes via CMAKE_CUDA_ARCHITECTURE=native.
Problem: the stock Euler evaluator uses Eigen with compile-time nVars; Eigen matrix ops do not cleanly lower to device-callable scalar loops. CUDA kernel launches over tiny matrices cost more than the math.
Solution: a parallel-track evaluator in src/EulerP/ that:
nVars.EvaluatorDeviceView<B> with B ∈ {Host, CUDA} — same interface, two implementations compiled in separate translation units (.cpp and .cu).*_Arg structs (e.g. RecGradient_Arg, Flux2nd_Arg) so the launching host code doesn't need to know argument order.template <DeviceBackend B>
struct EvaluatorDeviceView {
FiniteVolume::t_deviceView<B> fv;
BCHandlerDeviceView<B> bc;
PhysicsDeviceView<B> physics;
};
Python driver: python/DNDSR/EulerP/EulerP_Solver.py orchestrates the full EulerP pipeline from Python with CUDA selected by runtime flag.
class Evaluator {
ssp<CFV::FiniteVolume> fv;
ssp<BCHandler> bcHandler;
ssp<Physics> physics;
// face buffers (dense packed from ghost father+son)
tUFaceBuffer u_face_bufferL, u_face_bufferR;
tUScalarFaceBuffer uScalar_face_bufferL, uScalar_face_bufferR;
public:
// Setup
void BuildFaceBufferDof(TUDof &u);
void BuildFaceBufferDofScalar(TUScalar &u);
void PrepareFaceBuffer(int nVarsScalar);
// Pipeline kernels (each host-or-device via Evaluator_impl<B>)
void RecGradient (RecGradient_Arg &arg); // Green-Gauss + Barth-Jespersen
void Cons2PrimMu (Cons2PrimMu_Arg &arg);
void Cons2Prim (Cons2Prim_Arg &arg);
void RecFace2nd (RecFace2nd_Arg &arg); // 2nd-order face reconstruction
void Flux2nd (Flux2nd_Arg &arg); // inviscid + viscous face flux
};
Evaluator_impl<B>).EvaluateRHS regardless of backend.src/Geom/Mesh/BenchmarkFiniteVolume.cu exercises the metric arrays on-device with varied block sizes.host_device_vector<T> — a vector that can shadow itself on device; used throughout FiniteVolume / UnstructuredMesh.to_device / to_host) — no hidden synchronization.CMAKE_CUDA_ARCHITECTURE=native fixes a class of compile errors in Thrust's internal machinery.to_device: a bug in the face-buffer creation path was copying host buffers to device needlessly; fixed in v0.2.0.py::classh holders: ensure safe PythonEuler evaluator to CUDA (not just EulerP).MPI_Type_create_hindexed over pinned device memory.| Layer | Responsibility |
|---|---|
SerializerBase |
Abstract scalar / vector / byte-array interface |
SerializerH5 |
MPI-parallel HDF5 (collective I/O) |
SerializerJSON |
Per-rank JSON (IsPerRank() == true), no MPI coordination |
Array |
Per-array metadata, structure tags, flat data buffer |
ParArray |
Global offsets, EvenSplit, CSR global row-starts |
ArrayPair |
Father-son bundle · ReadSerializeRedistributed |
ArrayRedistributor |
Rendezvous redistribution via ArrayTransformer |
Key property. Every method in SerializerH5 is MPI-collective — every rank must call them in the same order, even when that rank has size == 0. Failing to participate causes a hang, not a crash.
SerializerBase — the public interface// File lifecycle
virtual void OpenFile(const std::string &fName, bool read) = 0;
virtual void CloseFile() = 0;
// Path navigation (think HDF5 group structure)
virtual void CreatePath(const std::string &p) = 0;
virtual void GoToPath(const std::string &p) = 0;
virtual std::string GetCurrentPath() = 0;
virtual std::set<std::string> ListCurrentPath() = 0;
virtual bool IsPerRank() = 0; // true for JSON
virtual int GetMPIRank() = 0; int GetMPISize() = 0;
virtual const MPIInfo &getMPI() = 0;
// Scalars (per-rank)
virtual void WriteInt(const std::string &name, int64_t v) = 0;
virtual void WriteIndex/WriteReal/WriteString(...) = 0;
virtual void ReadInt /ReadIndex / ReadReal / ReadString(...) = 0;
// Vectors (COLLECTIVE under H5)
virtual void WriteIndexVector(const std::string &name, const std::vector<index> &v,
ArrayGlobalOffset offset) = 0;
virtual void ReadIndexVector (const std::string &name, std::vector<index> &v,
ArrayGlobalOffset &offset) = 0; // offset is in/out
// ... Rowsize, Real, SharedIndex, SharedRowsize
virtual void WriteUint8Array(const std::string &name, const uint8_t *data,
index size, ArrayGlobalOffset offset) = 0;
virtual void ReadUint8Array (const std::string &name, uint8_t *data,
index &size, ArrayGlobalOffset &offset) = 0;
ArrayGlobalOffset — five offset modesstatic const index Offset_Parts = -1;
static const index Offset_One = -2;
static const index Offset_EvenSplit = -3;
static const index Offset_Unknown = UnInitIndex;
class ArrayGlobalOffset {
index _size{0};
index _offset{0};
public:
ArrayGlobalOffset(index sz, index ofs);
index size() const;
index offset() const;
ArrayGlobalOffset operator*(index R) const; // scales size (and offset if real)
ArrayGlobalOffset operator/(index R) const;
void CheckMultipleOf(index R) const;
bool operator==(const ArrayGlobalOffset &other) const;
bool isDist() const; // _offset >= 0
};
extern ArrayGlobalOffset ArrayGlobalOffset_Unknown, _One, _Parts, _EvenSplit;
| Sentinel | Meaning |
|---|---|
Unknown |
Auto-detect from companion rank_offsets dataset |
Parts |
Compute offset via MPI_Scan over local sizes |
One |
Rank 0 writes / reads the whole dataset |
EvenSplit |
Read: each rank gets ~nGlobal / nRanks rows |
isDist() |
Explicit {localSize, globalStart} |
When nGlobal < nRanks (5 entries across 8 ranks), EvenSplitRange assigns 0 rows to some ranks. Collective HDF5 calls still demand every rank participates — and std::vector<>::data() on an empty vector may return nullptr.
std::vector<index> v(size); // size may be 0
ReadDataVector<index>(name, v.data(), ...); // may pass nullptr → hang
Caller-side helpers like __ReadSerializerData and ReadUint8Array would skip the H5Dread when buf == nullptr, and the collective hangs.
Every caller in SerializerBase.cpp passes a stack-allocated dummy pointer when size == 0:
index dummy;
ReadDataVector<index>(name,
size == 0 ? &dummy : v.data(),
...);
ReadUint8Array exposes the two-pass pattern:
data = nullptr, returns the size.All collectives proceed with 0-count hyperslabs on the empty ranks — no application-level branching.
Consequence. EulerSolver::ReadRestart is a single call. The user writes from 4 ranks on a login node, restarts on 1024 ranks on a compute partition, and the same JSON config runs. Ranks with localRows == 0 participate in every collective with empty buffers.
DNDS_DECLARE_CONFIGstruct ImplicitCFLControl {
real CFL = 10.0;
int nForceLocalStartStep = INT_MAX;
bool useLocalDt = true;
real RANSRelax = 1.0;
DNDS_DECLARE_CONFIG(ImplicitCFLControl) {
DNDS_FIELD(CFL, "CFL for implicit local dt");
DNDS_FIELD(nForceLocalStartStep, "Step to force local dt",
DNDS::Config::range(0));
DNDS_FIELD(useLocalDt, "Use local (vs uniform) dTau");
DNDS_FIELD(RANSRelax, "RANS under-relaxation factor",
DNDS::Config::range(0.0, 1.0));
config.check([](const T &s) -> DNDS::CheckResult {
if (s.RANSRelax <= 0) return {false, "RANSRelax must be positive"};
return {true, ""};
});
}
};
What the macro gives you. No base class, no virtual members, no per-instance data — the struct stays a POD safe for CUDA. Underneath, a static _dnds_do_register() method is generated that fills a ConfigRegistry<T> singleton with FieldMeta records.
// Simple scalars & bounded scalars
DNDS_FIELD(CFL, "CFL number");
DNDS_FIELD(nInternalIt, "Inner iterations", DNDS::Config::range(0));
DNDS_FIELD(relax, "Relaxation factor", DNDS::Config::range(0.0, 1.0));
// Enum with value names (appears in schema as enum constraint)
DNDS_FIELD(rsType, "Riemann solver type",
DNDS::Config::enum_values({"Roe","HLLC","HLLEP","HLLEP_V1",
"Roe_M1","Roe_M2","Roe_M3","Roe_M4",
"Roe_M5","Roe_M6","Roe_M7","Roe_M8","Roe_M9"}));
// Documentation kwargs — emitted as "x-..." extensions in schema
DNDS_FIELD(CFL, "CFL number", DNDS::Config::info("units", "nondim"),
DNDS::Config::info("ref", "Jameson 1985"));
// Nested sub-section
config.field_section(&T::frameRotation, "frameConstRotation", "Rotating frame");
// Arrays / maps of sub-objects
config.field_array_of<BoxInit> (&T::boxInits, "boxInitializers", "Box initializers");
config.field_map_of<CoarseCtrl> (&T::coarseList, "coarseGridList", "Per-level controls");
// Opaque JSON (for scheme-specific extras)
config.field_json (&T::extra, "odeSettingsExtra", "Opaque ODE scheme settings");
// Renaming / aliases (backward compatibility)
config.field_alias (&T::rsType, "riemannSolverType", "Riemann solver type");
// Emit the schema (run-time or ahead-of-time)
nlohmann::ordered_json schema = ConfigRegistry<EulerConfig>::Instance().emitSchema("Euler solver config");
{
"$schema": "http://json-schema.org/draft-07/schema#",
"type": "object",
"description": "Euler solver config",
"properties": {
"CFL": { "type": "number", "default": 10.0, "description": "..." },
"rsType": { "type": "string", "default": "Roe",
"enum": ["Roe","HLLC",...,"Roe_M9"] }
}
}
./build/app/euler.exe --emit-schema > euler_schema.json
# drops ~107 KB per-solver schema
VS Code + any JSON-schema-aware editor give autocompletion and in-line validation. Pre-computed schemas ship in cases/euler_schema.json, eulerSA3D_schema.json, etc.
auto ® = ConfigRegistry<EulerConfig>::Instance();
reg.readFromJson(j, cfg); // deserialize + range checks
reg.validate(cfg); // cross-field
reg.validateWithContext(cfg, ctx); // uses nVars, dim, modelCode
reg.validateKeys(userJson); // throws on unknown keys
validateKeys is automatic — no hand-maintained list of allowed fields.
from DNDSR import DNDS
↓
python/DNDSR/__init__.py
↓
python/DNDSR/DNDS/__init__.py
├── _loader.preload("dnds") # ctypes.CDLL · RTLD_GLOBAL
├── from ._ext.dnds_pybind11 import * # pybind11 extension
└── _init_mpi() # MPI_Init_thread AT IMPORT TIME
_loader.py loads external dependencies with RTLD_GLOBAL before the pybind11 extension opens. If they were loaded later with the default RTLD_LOCAL, the extension would not find the symbols it depends on.
DNDSR.DNDS — arrays, MPI, serializerDNDSR.Geom — mesh reading & manipulationDNDSR.CFV — finite volume / VR / Fourier analysisDNDSR.EulerP — GPU-friendly Euler evaluatorTop-level __init__.py imports all four so a single from DNDSR import * works.
from DNDSR import DNDS
from DNDSR.Geom.utils import read_mesh, prepare_mesh, build_bnd_mesh
# 1. MPI bootstrap (implicit MPI_Init_thread already ran at import)
mpi = DNDS.MPIInfo(); mpi.setWorld()
# 2. Read a CGNS mesh with elevation and bisection
result = read_mesh(
"data/mesh/UniformSquare_10.cgns",
mpi = mpi,
dim = 2,
elevation = "O2", # Quad4 → Quad9
bisect = 1, # one round of h-refinement
)
# 3. Finish the mesh (build ghosts, interpolate faces, reorder cells)
prepare_mesh(result.mesh, result.reader)
# 4. Extract surface mesh and dump VTK
bnd = build_bnd_mesh(result.mesh)
result.mesh.BuildVTKConnectivity()
PEP 561 compliant. A py.typed marker ships in the package; .pyi stubs are auto-generated by pybind11-stubgen during cmake --install. Pyright, mypy, and Pylance see full C++ type signatures.
from DNDSR.CFV import ModelEvaluator # pure-Python wrapper over pybind11 class
me = ModelEvaluator(mesh, fv, vr)
me.set_order(3)
# Fourier analysis: plug in a plane wave, read back the complex amplification
kx_range = np.linspace(-pi, pi, 200)
for kx in kx_range:
lam = me.fourier_amplification_factor(kx)
print(kx, lam.real, lam.imag)
Why this matters for a research code. VR's dispersion/dissipation properties depend on order, limiter, and inner-product choice. Having a Python harness to sweep them over a discrete Fourier spectrum means parameter studies (limiter combinations, inner-product choices, derivative weights) are done in hours, not weeks.
Other Python-exposed bits:
ArrayPair, ArrayEigenMatrix/Vector/BatchBuildUDof / BuildURec / BuildUGrad (typed constructors)to_device / to_hostMeshAdjState enum and AdjPairTracked::idx queries (query-only, no mutation from Python — intentional)// app/Euler/euler.cpp — the entire file
#include "EulerSolver.hpp"
int main(int argc, char *argv[]) {
return DNDS::Euler::RunSingleBlockConsoleApp<
DNDS::Euler::NS>(argc, argv);
}
EulerModel)enum EulerModel {
NS, // 2D Navier-Stokes
NS_3D,
NS_SA, // 2D Spalart-Allmaras
NS_SA_3D,
NS_2D, // alias for NS
NS_2EQ, // k-omega two-equation
NS_2EQ_3D,
NS_EX, // reactive / multi-species
NS_EX_3D,
};
Template dispatch on EulerModel produces one binary per solver — shared source, separated object files.
enum RANSModel {
RANS_None,
RANS_SA, // Spalart-Allmaras (IDDES capable)
RANS_KOWilcox, // Wilcox k-omega
RANS_KOSST, // Menter k-omega SST
RANS_RKE, // Realizable k-epsilon
};
Each has a RANSModelTraits<> specialization with its own wall BC, source terms, and spectral radius.
EulerSolver — the top-level conductorThe Euler module extends CFV's generic tUDof/tURec aliases with
solver-specific array types that add higher-level operators
(initialization, boundary anchors, positivity-preserving limiters):
ArrayDOFV<N> inherits from CFV::tUDof<N> (= ArrayDof<N,1>).ArrayRECV<N> inherits from CFV::tURec<N> (= ArrayDof<DynamicSize,N>).template <EulerModel model>
class EulerSolver {
typedef EulerEvaluator<model> TEval;
static const int nVarsFixed = TEval::nVarsFixed;
MPIInfo mpi;
ssp<Geom::UnstructuredMesh> mesh, meshBnd;
TpVFV vfv; // VariationalReconstruction
ssp<Geom::UnstructuredMeshSerialRW> reader, readerBnd;
ssp<EulerEvaluator<model>> pEval;
ssp<BoundaryHandler<model>> pBCHandler;
// Solver state (DOF arrays)
ArrayDOFV<nVarsFixed> u, uIncBufODE, wAveraged, uAveraged;
ObjectPool<ArrayDOFV<nVarsFixed>> uPool; // rent/return buffers
ArrayRECV<nVarsFixed> uRec, uRecLimited, uRecNew, uRecNew1,
uRecOld, uRec1, uRecInc, uRecInc1,
uRecB, uRecB1;
JacobianDiagBlock<nVarsFixed> JD, JD1, JDTmp, JSource, JSource1, JSourceTmp;
ssp<JacobianLocalLU<nVarsFixed>> JLocalLU;
ArrayDOFV<1> alphaPP, alphaPP1, betaPP, betaPP1,
alphaPP_tmp, dTauTmp;
// Config + output
Configuration config; // nested sub-configs
nlohmann::ordered_json gSetting;
std::string output_stamp;
// ... outDist* / outSerial* / outDist2SerialTrans* for VTK
};
Configuration — everything that tunes a runEvery sub-section uses DNDS_DECLARE_CONFIG so the full JSON schema is auto-generated.
TimeMarchControl — dtImplicit, nTimeStep, steadyQuit, useRestart, useImplicitPP, odeCode, odeSetting1..4, odeSettingsExtra (opaque JSON), dtCFLLimitScale, …ImplicitReconstructionControl — useExplicit, nInternalRecStep, recLinearScheme (0 = SOR, 1 = GMRES), nGmresSpace/Iter, fpcgReset*, recThreshold.OutputControl — outputIntervalStep, outputFormat (VTK, PLT, VTKHDF, series), parallel vs serial write.CFLControl — initial / max CFL, ramping schedule.ConvergenceControl — residual thresholds, monitor variables.DataIOControl — read/write paths, restart checkpointing.BoundaryDefinition — per-face-zone BC types, free-stream state.LimiterControl — limiterProcedure, usePPRecLimiter, WBAP order.LinearSolverControl — gmresCode, Krylov sub-space, iterations.TimeAverageControl — long-time averaging for statistics.EvaluatorSettings wraps EulerEvaluatorSettings<model>.VFVSettings wraps VRSettings.
--emit-schemadumps the entire tree as a single JSON Schema document —euler_schema.json/eulerSA3D_schema.json/ etc., each ~107 KB.
EulerEvaluator<model> — the spatial operatorvoid EvaluateRHS(ArrayDOFV<nVarsFixed> &rhs,
JacobianDiagBlock<nVarsFixed> &JSource,
ArrayDOFV<nVarsFixed> &u,
ArrayRECV<nVarsFixed> &uRecUnlim,
ArrayRECV<nVarsFixed> &uRec,
ArrayDOFV<1> &uRecBeta,
ArrayDOFV<1> &cellRHSAlpha,
bool onlyOnHalfAlpha,
real t,
uint64_t flags = RHS_No_Flags);
RHS_Ignore_ViscosityRHS_Dont_Update_IntegrationRHS_Dont_Record_Bud_FluxRHS_Direct_2nd_Rec — bypass VR, use GG-based 2nd-orderRHS_Direct_2nd_Rec_1st_Conv — 2nd-order rec but 1st-order convectiveRHS_Direct_2nd_Rec_use_limiterRHS_Direct_2nd_Rec_already_have_uGradBufNoLimRHS_Recover_IncFScaleFlags compose bitwise — they cover fallback / diagnostic modes used by p-MG and PP sub-steps.
EvaluateDt(...) — CFL-based local dt, spectral-radius based.EvaluateURecBeta — PP limiter β per cell.EvaluateCellRHSAlpha — per-cell RHS scaling for PP.LimiterUGrad — gradient limiter, optional shock detection.LUSGSMatrixInit/Vec/ToJacobianLU and UpdateSGS(WithRec).GetWallDist_AABB, GetWallDist_BatchedAABB, GetWallDist_Poisson.muEff(U, T) with Sutherland or constant models.Each BC is a class implementing a common interface; BoundaryHandler<model> routes face-zone IDs to BC instances at runtime.
| BC | Use |
|---|---|
BCWall |
No-slip wall (adiabatic) |
BCWallIsothermal |
No-slip wall at fixed temperature |
BCWallInvis |
Slip / symmetry |
BCSym |
Explicit symmetry plane |
BCFarField |
Riemann-invariant farfield |
BCIn |
Specified inflow |
BCOut / BCOutP |
Specified outflow / pressure-outflow |
BCPeriodic |
Standard periodic |
BCPeriodicRot |
Rotating periodic (turbomachinery) |
BCProfileIn |
Tabulated profile (boundary layer, RANS) |
BCActuator |
Actuator disk source term |
Specialized turbomachinery BCs: BCTotalInlet, BCRadialEqOutlet, BCMixingPlane, and the CL driver for AoA-adaptive lift matching (pCLDriver in the evaluator).
euler_config_1DRiemann.jsoneuler_config_1DRiemann_LeBlanc.jsoneuler_config_1DSedov.jsoneuler_config_2DSedov.jsoneuler3D_config_Noh.jsoneuler_config_blast.jsoneuler_config_M2000Jet.jsoneuler_config_M5Diffraction.jsoneuler_config_cylinderHS.jsoneuler3D_config_SphereShock.jsoneuler_config_IV.json — convergence studyeuler3D_config_TGV.json, euler3D_config_BenchTGV.jsonconfig_cylinderInvis_mg_bench.jsoneulerSA_config_0012_AOA15.json) and k-ω (euler2EQ/...) variants, with O2 elevation (..._Elev.json) and MG benchmarks (config_0012_mg_bench.json).eulerSA_config_30p30n.json.eulerSA3D_config_Rotor37.json.eulerSA3D_config_FanA1.json.tpMG, incFScale, positivity-preserving coupling with LimiterUGrad.euler2EQ / euler2EQ3D executables.BCWallIsothermal).LimiterUGrad.source2nd, mergeMultiResidual, normOrd, restartOutAtInit, resBaseType options.RunImplicitEulervoid RunImplicitEuler() {
InitializeRunningEnvironment(env);
// optional restart
if (config.restartState.useRestart)
ReadRestart(config.dataIO.readRestart);
for (int step = 1; step <= config.timeMarch.nTimeStep; ++step) {
EvaluateDt(dt, u, uRec, CFL, dtMinAll, config.timeMarch.dtImplicit,
config.cflControl.useLocalDt, t);
// Inner pseudo-time loop (driven by the chosen ODE integrator)
odeIntegrator.Step(
u, uInc,
/*frhs*/ [&](rhs, u, dTau, iter, alpha, upos) { pEval->EvaluateRHS(...); },
/*fdt */ [&](u, dTau, alpha, upos) { pEval->EvaluateDt(...); },
/*fsolve*/ [&](x, rhs, uInc, dTau, alpha, ...) { Krylov + LUSGS; },
maxInnerIter, fStop, fIncrement, config.timeMarch.dtImplicit);
UpdateCFL();
if (step % config.outputControl.outputIntervalStep == 0)
PrintData(fname, series, …);
if (step % config.outputControl.restartInterval == 0)
PrintRestart(fname);
if (Converged() && config.timeMarch.steadyQuit) break;
}
}
The lambdas above are where EulerEvaluator, GMRES_LeftPreconditioned, and LUSGSMatrix* plug in — the ODE integrator never knows which solver is instantiating it.
| Module | C++ executables | test cases | Python tests | np values |
|---|---|---|---|---|
| DNDS | 8 | 249 | 9 | 1, 2, 4, 8 |
| Geom | 9 | 193 | 2 | 1, 2, 4, 8 |
| CFV | 4 | 67 | 43 | 1, 2, 4, 8 |
| Euler | 4 | 62 | 4 | 1, 2, 4, 8 |
| Solver | 4 | 29 | — | 1 |
Totals. 29 C++ executables, 600 test cases, 58 Python tests across 82 CTest registrations. All MPI-aware tests are CTest-registered at each np value. Serial tests have a 60–120 s timeout; parallel tests 120–600 s depending on module.
# Build + run everything
cmake -B build -DDNDS_BUILD_TESTS=ON
cmake --build build -t all_unit_tests -j8
ctest --test-dir build --output-on-failure
test_array — layouts, row views, iteratorstest_mpi — MPI wrapper, collective opstest_array_transformer — father/son ghost exchangetest_array_derived — AdjacencyRow, EigenMap rowstest_array_dof — vector-space ops, norms, AXPYtest_index_mapping — global test_serializer — H5 + JSON, redistributetest_permutation_transfer — MPL renumber compression / decompressiontest_elements — shape functions, jacobianstest_quadrature — orders, weightstest_mesh_index_conversion — state transitionstest_mesh_pipeline — full build chaintest_mesh_distributed_read — ParMetis repartitiontest_mesh_connectivity — Inverse / Compose DSLtest_mesh_connectivity_ghost — GhostSpec BFStest_mesh_connectivity_interpolate — face interptest_mesh_reorder — reverse Cuthill-McKee / Hilbert orderingTEST_CASE("ArrayTransformer: round-trip ghost pull" *
doctest::description("np=1,2,4") *
doctest::timeout(120.0)) {
MPIInfo mpi; mpi.setWorld();
auto father = make_ssp<ParArray<real, 5>>();
auto son = make_ssp<ParArray<real, 5>>();
father->Resize(localN); father->createGlobalMapping();
// ... populate father ...
ArrayTransformer<real, 5> trans;
trans.setFatherSon(father, son);
trans.createFatherGlobalMapping();
trans.createGhostMapping(pullGlobal);
trans.createMPITypes();
trans.initPersistentPull();
trans.pullOnce();
CHECK(son->operator[](0).isApprox(expected, 1e-14));
}
test_reconstruction · tests of VR convergence on analytic fields.test_reconstruction3d · 3D variants; Jacobi/SOR comparison.test_limiters · WBAP / CWBAP on contrived data; exercises the full limiter menu.test_device_transferable (CUDA only) · round-trip of FiniteVolume to GPU and back.test_gas_thermo · ideal gas Cv/Cp, T/p relations, Mach→state.test_riemann_solvers · 13 variants, exact-solution agreement on 1D Riemann problems.test_rans · SA + k-ω source terms, wall distance integration, trip location.test_evaluator_pipeline · full EvaluateRHS on a fixed mesh — golden values.test_ode · BDF / SDIRK / HM3 on ODE benchmarks (Van der Pol, stiff scalar).test_linear · GMRES + PCG convergence on canonical matrices.test_direct · small-block LU / LDLT correctness.test_scalar · scalar transport advection-diffusion regression.Many tests compare computed results against pre-captured golden values with relative tolerance 1e-6 to 1e-8. For this to be meaningful, runs must be byte-stable across re-executions.
metisSeed = 42 (fixed).When a golden value has not yet been captured, the test stores the sentinel 1e300:
const real gold_kinetic = 1e300; // TODO: capture
const real computed = evaluate();
if (gold_kinetic < 1e299)
CHECK(computed == doctest::Approx(gold_kinetic).epsilon(1e-8));
else
CHECK(std::isfinite(computed) && computed >= 0);
So the first run of a new test is a finite/non-negative sanity check, and the developer updates the golden in a follow-up commit.
test/DNDS/test_basic.py (9 tests) — import chain, MPIInfo, small array round-trip.test/Geom/test_basic_geom.py (2 tests) — CGNS read, elevation, bisection.test/CFV/test_fv_correctness.py (16 tests) — cell volume / face area / jacobian correctness on wall meshes.test/CFV/test_vr_correctness.py (16 tests) — VR order convergence on sin(x)sin(y).test/CFV/test_basic_fv.py + test_basic_cfv.py + test_cfv_dissdisp.py (11 tests) — FV/CFV smoke tests and dissipation-dispersion analysis.test/EulerP/test_basic_eulerP.py (1 test) — host + CUDA round-trip.test/Euler/test_restart_redistribute.py (3 tests) — solver restart with MPI repartition.# Serial
pytest test/DNDS/test_basic.py -v
# MPI
mpirun -np 4 python -m pytest test/DNDS/test_basic.py
# Some tests support standalone
python test/DNDS/test_basic.py
mpirun -np 2 python test/DNDS/test_basic.py
# 1. Rebuild pybind11 shared libs
cmake --build build -t dnds_pybind11 geom_pybind11 \
cfv_pybind11 eulerP_pybind11 -j32
# 2. Reinstall into python/DNDSR/ (MANDATORY)
cmake --install build --component py
# 3. Only now, run tests
source venv/bin/activate
PYTHONPATH=<root>/python pytest test/ -v
Skipping the install step after changing C++ source leaves stale
.so files and produces misleading segfaults that look like code bugs. git checkout changes source but does not rebuild binaries.
{
"configurePresets": [
{
"name": "release-test",
"generator": "Ninja",
"binaryDir": "${sourceDir}/build",
"cacheVariables": {
"CMAKE_BUILD_TYPE": "Release",
"DNDS_BUILD_TESTS": "ON",
"DNDS_USE_OMP": "ON"
}
},
{ "name": "debug", "inherits": "release-test",
"cacheVariables": { "CMAKE_BUILD_TYPE": "Debug" } },
{ "name": "cuda", "inherits": "release-test",
"cacheVariables": { "DNDS_USE_CUDA": "ON",
"CMAKE_CUDA_ARCHITECTURES": "native" } },
{ "name": "ci", "inherits": "release-test",
"cacheVariables": { "DNDS_TEST_NP_LIST": "1;2;4",
"DNDS_TEST_OMP_THREADS": "2" } }
]
}
Aggregate targets: dnds_unit_tests, geom_unit_tests, cfv_unit_tests, euler_unit_tests, solver_unit_tests, all_unit_tests — all EXCLUDE_FROM_ALL so plain cmake --build stays fast.
scikit-build-core# pyproject.toml
[build-system]
requires = ["scikit-build-core>=0.8", "pybind11", "pybind11-stubgen"]
build-backend = "scikit_build_core.build"
[project]
name = "DNDSR"
version = "0.2.0" # synchronized with VERSION file + git describe
[tool.scikit-build]
cmake.args = ["-DDNDS_BUILD_PYTHON=ON", "-DDNDS_PYBIND11_NO_LTO=ON"]
install.components = ["py"] # only install the py component
CC=mpicc CXX=mpicxx \
CMAKE_BUILD_PARALLEL_LEVEL=32 \
pip install -e .
*_pybind11 targets.python/DNDSR/*/_ext/.pybind11-stubgen to produce .pyi files.python/DNDSR/_lib/.Conda/Anaconda Python embeds an
RPATHto conda's bundled libstdc++, which may be older than what the MPI compiler produces. System Python uses the system libstdc++ and avoids this conflict.
—README.md
macOS has a dedicated fmtlib workaround, also shipped.
omp.h include issue..clang-tidy rationale preserved in docs/dev/clang_tidy_plan.md..clang-tidy disables (representative)cppcoreguidelines-pro-bounds-pointer-arithmetic — unavoidable in CSR / row-flat arrays.fuchsia-default-arguments-declarations — MPI defaults.llvm-header-guard — we use #pragma once.modernize-use-trailing-return-type — style preference.# Per-module histogram
python scripts/run_clang_tidy.py DNDS
python scripts/run_clang_tidy.py Geom
python scripts/run_clang_tidy.py CFV
python scripts/run_clang_tidy.py Euler
python scripts/run_clang_tidy.py Solver
Solver / Geom / CFV / Euler / EulerP are not yet sanitised — same recipe to apply. The .clang-tidy disables carry forward.
doxygen_compat.py./doxygen/ on the Sphinx site.| Trigger | Time |
|---|---|
| No-op rebuild | < 1 s |
| Markdown-only edit | ~10 s |
| Full (Doxygen + Sphinx) | ~2.5 min |
cmake --build build -t serve-docs
# → http://localhost:8000 with hot reload
cfd_externals binary libraries (HDF5, CGNS, Metis, ParMetis)..clang-format ships at repo root; CI checks a diff in a separate job.POSIX index() ambiguity guard — code style requires DNDS::index whenever using namespace DNDS; is active (documented in docs/tests/overview.md).VERSION file at repo root (0.2.0).git describe --tags --long.DNDS_VERSION_STRING.DNDSR.__version__ (PEP 440 compliant).x-version field.# Bump VERSION file
echo 0.2.0 > VERSION
git tag v0.2.0
git push --tags
# Pages workflow + release notes kick off.
Compressible Taylor–Green vortex at Re = 1600, 100 iterations, fixed ~4k cells per rank on a single HPC node.
| Series | Solver | |
|---|---|---|
| BSSCA | DNDSR /BSSCA | 64 → 10240 ranks |
| BSSCT | DNDSR /BSSCA | 96 → 1920 ranks |
| CS | DNDSR /JS | 32 → 256 ranks |
kCI/s = kilo cell-iterations per second; one cell-iteration is one RHS evaluation on one cell.

mesh

AOA = 5°, Mach number

AOA = 15°, Mach number

AOA = 15°,
Density at t = 0.2, Mach 10 shock on 30° wedge.

DITR U2R2,

BDF2
Mach 3 inviscid flow over a 15° compression corner.

mesh

Density, Re

Pressure, Re

Density, Re = 100
Time-averaged results.

mesh

Time-averaged

Explicit time step comparison

t = 0.1 Mach, 32 isolines (explicit 2nd-order FV)

Mesh

Vorticity
Q-criterion iso-surfaces coloured by Mach number.

mesh

Q-criterion iso-surfaces

Density

Density along diagonal

Density at t = 0.6

Density along diagonal
Mach 2000 jet




Residual convergence
Wing-body

Surface mesh

Surface

Surface

Force-coefficient convergence
struct GhostRequirement {
int cellRings = 1; // # of cell2cell rings
bool nodeNeighbor = true; // cell2cell by vertex share vs face share
bool complementNodes = true; // ghost cells keep all their nodes
bool complementBnds = true; // owned nodes keep all their bnds
};
nGhostLayers only adjusts depth, not the kind of neighbor.edge2node, cell2edge, node2edge with the same AdjPairTracked discipline.useCone × useClosure Boolean matrix.docs/architecture/MeshDAGDesign.md (765 lines).NS_EX) — maturity pass, published validation cases.EulerP to the full Euler evaluator.docs/dev/.array_infrastructure.md — bottom-up tour of Array → ArrayTransformer → ArrayPair → ArrayDof.MeshConnectivity.md — the AdjPairTracked state machine, the ghost-spec DSL, the DAG roadmap.Serialization.md — layer-cake I/O, cross-np restart, offset modes.Paradigm.md — the delayed-abstraction philosophy contrasted with OpenFOAM / SU2.Variational_Reconstruction.md / .pdf — full derivation of the facial functional, inner-product choices, and local system.Shape_Functions.md — per-element shape functions and quadrature.building.md — externals, headers, CMake presets.array_usage.md — how to write code with Array / ArrayDof.geom_usage.md — mesh construction and VR pipeline.python_geom_guide.md — full Python Geom API reference.serialization_usage.md — HDF5 checkpoints, redistribution.style_guide.md — C++ and Python conventions.examples.md — runnable examples/ex_*.cpp programs.docs/tests/overview.md — golden values, determinism, suite totals.docs/tests/{dnds,geom,cfv,euler,solver}_unit_tests.md.Try it in three commands
cmake --preset release-test
cmake --build build -t euler -j32
mpirun -np 4 ./build/app/euler.exe cases/euler_config_IV.json
Code · github.com/CFDLAB-THU/DNDSR Docs · cfdlab-thu.github.io/DNDSR Release notes · RELEASE_NOTES.md (v0.2.0)
CFD Lab, Tsinghua University
DNDSR comprehensive overview deck. This file is GENERATED. Source lives in: docs/presentations/DNDSR_overview/ 00_frontmatter.md parts/00_title.md parts/01_chapter_1.md ... parts/09_chapter_9.md To rebuild: bash docs/presentations/DNDSR_overview/build.sh To render directly from source (PDF): bash docs/presentations/DNDSR_overview/build.sh --pdf Best viewer: "Marp for VS Code" extension (Mermaid + MathJax built-in). Paths to images are relative to the final DNDSR_overview.md location: ../elements/... and ../theory/... Overflow-control classes (append per-slide as Marp directives): _class: dense -- 18px base (tighter tables / lots of bullets) _class: denser -- 16px base (very dense reference slides) _class: tight -- 14px base (maximum density; use sparingly)