Debugging and Catalyst Replay
To simplify the process of debugging in-situ pipelines, catalyst now
supports the serialization of conduit_nodes. During each API call,
users can write the params
argument of each API call out to disk.
Then, using catalyst_replay
, the nodes will be read back in,
and each API call will be invoked again. This prevents users from
needing to re-run their simulation when debugging.
Serializing Nodes and Writing to Disk
To use the catalyst_replay
command, nodes must first be written to disk.
The steps to do this are simple:
Set the environment variable
CATALYST_DATA_DUMP_DIRECTORY
to the directory where the node data for each API invocation should be saved.Invoke the stub implementation in your custom API implementation.
This will write the conduit_node
passed into the API call out to
CATALYST_DATA_DUMP_DIRECTORY
. The conduit_nodes
are written out as
.conduit_bin
files. They will follow the general pattern
<stage>_params.conduit_bin.<num_ranks>.<rank>
, where:
<stage>
is one ofinitialize
,execute
orfinalize
.<num_ranks>
is the number of MPI ranks that the simulation was run with.<rank>
is the 0 based index of the rank used to generate this file.
Files for the execute
stage will also include the invocation number,
since catalyst_execute
can be called multiple times. For example,
execute_invc0_params.conduit_bin.2.1
would contain the params
passed
into the 0th invocation of catalyst_execute
, which was called by 2nd of
two ranks (since rank indices are 0-indexed).
Replaying API Calls with catalyst_replay
After the node data has been written out to disk, the catalyst_replay
command can be used to read the node data back into memory and execute the
same API calls. Find the catalyst_replay
executable in the
RUNTIME_OUTPUT_DIRECTORY
generated by CMake (this is usually bin/
).
Run catalyst_replay
with the same number of MPI ranks as the simulation
used to generate the data, and pass the value of CATALYST_DATA_DUMP_DIRECTORY
as a command-line argument. This invoke each API method with the corresponding node
data. For an example, see the examples/replay
directory.