|
DNDSR 0.1.0.dev1+gcd065ad
Distributed Numeric Data Structure for CFV
|
Eigen can create temporary terms that uses dynamic sized buffer.
For example, with
With
b*b generates a temporary value, then assigned to a.
Using
would be fine even for dynamic-sized maps.
Each thread should better have lower computational load.
For 3x5 sized linear reconstruction problem, using 1 thread per cell is OK, but 5 threads per cell is far better.
It seems sometimes CUDA toolchain could be corrupted that dynamic linking / separated linking of cub::reduce would cause runtime error?
Changing from 12.1 to 12.0 installation fixed.
Complicated type systems on device should be very carefully treated.
Sometimes your CTOR/assignment for trivial copy / construction on the call chain (Base class's) or some member functions (see array iterator's getView()) are missing the __device__ mark, that may cause the compiler to silently default initialize the object. Normally it would be an error emitted, but maybe in CRTP and/or relocatable-device-code this could be silently error.
Always ensure full __device__ coverage on the call chain of device side.
NVCC could generate ptx code and JIT it by driver. If the driver JIT compiler is buggy, some errors could happen SILENTLY.
Safe thing to do: use: