Debugging and Catalyst Replay

To simplify the process of debugging in-situ pipelines, catalyst now supports the serialization of conduit_nodes. During each API call, users can write the params argument of each API call out to disk. Then, using catalyst_replay, the nodes will be read back in, and each API call will be invoked again. This prevents users from needing to re-run their simulation when debugging.

Serializing Nodes and Writing to Disk

To use the catalyst_replay command, nodes must first be written to disk. The steps to do this are simple:

Set the environment variable CATALYST_DATA_DUMP_DIRECTORY to the directory where the node data for each API invocation should be saved.
Invoke the stub implementation in your custom API implementation.

This will write the conduit_node passed into the API call out to CATALYST_DATA_DUMP_DIRECTORY. The conduit_nodes are written out as .conduit_bin files. They will follow the general pattern <stage>_params.conduit_bin.<num_ranks>.<rank>, where:

<stage> is one of initialize, execute or finalize.
<num_ranks> is the number of MPI ranks that the simulation was run with.
<rank> is the 0 based index of the rank used to generate this file.

Files for the execute stage will also include the invocation number, since catalyst_execute can be called multiple times. For example, execute_invc0_params.conduit_bin.2.1 would contain the params passed into the 0th invocation of catalyst_execute, which was called by 2nd of two ranks (since rank indices are 0-indexed).

Replaying API Calls with catalyst_replay

After the node data has been written out to disk, the catalyst_replay command can be used to read the node data back into memory and execute the same API calls. Find the catalyst_replay executable in the RUNTIME_OUTPUT_DIRECTORY generated by CMake (this is usually bin/). Run catalyst_replay with the same number of MPI ranks as the simulation used to generate the data, and pass the value of CATALYST_DATA_DUMP_DIRECTORY as a command-line argument. This invoke each API method with the corresponding node data. For an example, see the examples/replay directory.