◀ Back

Simulation.hdf5

We provide for each run a simulation.hdf5 file, which uses HDF5 virtual datasets (available from version 1.10 onward) to hide the subdivision of some output files in chunks. The simulation.hdf5 file contains a virtual unchunked version of snapshots, group and subhalo catalogs, offset files and Cartesian outputs. For this to work, the original (chunked) data has to be downloaded and place within the same relative directory structure as in the public THESAN data release folder.

The simulation.hdf5 file contains the following HDF5 groups:

Attribute Description
Cartesians Contains the virtual datasets of the Cartesian outputs. Each output is stored as a subgroup with name equal to its number (e.g. 0, 1, ..., 400, no zero-padding). Within each subgroup there is a Header group and virtual dataset replicating those found in the original files.
Config Empty group that stores, as attributes, the configuration flags used to run the simulation.
Groups Contains the virtual datasets of the halo and subhalo catalogs. Each (unchunked) catalog is stored as a subgroup with name equal to its number (e.g. 0, 1, ..., 80, no zero-padding). Within each subgroup, the structure of the corresponding halo and subhalo catalog (including the division in a Group and a Subhalo sub-groups) is replicated.
Header Empty group that stores, as attributes, information about the simulation common to all other files (e.g. BoxSize, etc.)
Offsets Contains the virtual datasets of the offset files. Each offset file is stored as a subgroup with name equal to its number (e.g. 0, 1, ..., 80, no zero-padding). Within each subgroup, the structure of the corresponding offset file is replicated.
Parameters Empty group that stores, as attributes, the parameeters used to run the simulation.
Snapshots Contains the virtual datasets of the Snapshots. Each (unchunked) snapshot is stored as a subgroup with name equal to its number (e.g. 0, 1, ..., 80, no zero-padding). Within each subgroup, the structure of the corresponding snapshot is reproduced.
Using this file, it is then possible to access datasets as if they were fully contained in a single HDF5 file. For example, it is possible to do the following:
import h5py

cartesian_output_number = 230
cartesian_output_field_name = 'Density'

with h5py.File("simulation.hdf5", 'r') as simfile:
    density_cartesian = simfile[f'Cartesians/{cartesian_output_number}/{cartesian_output_field_name}'].copy()
and completely ignore the fact that the different parts of the Density field reside in different chunks of the Cartesian output.