Table of Contents
Controlling Enzo data output
There are 5 timing methods, 2 output formats, two pitfalls when determining how to output data from your Enzo simulation.
Data Formats and Files
There are two output formats for Enzo data. In both cases, each data dump gets its own directory.
Each data dump writes several key files. NNNN denotes the dump number (i.e. 0001) and basename is something like RedshiftOutput or data or DD}.
All output files are also restart files. It's not necessarily wise to write in 32 bit format if you're computing in 64, though, as you'll lose all the extra precision when you restart. (These are makefile flags.)
basenameNNNN::
The parameter file. This contains general simulation parameters, dump time, cycle, and all the parameters defined here. It's worth your time to be familiar with what's in this file.
basenameNNNN::
The hierarchy file. Contains a description of the hierarchy. One entry for each grid, including information like the Grid Size, the position in the volume, it's position in the hierarchy.
basenameNNNN.boundary::
A description of the boundary (plain text.) Basically a meta description and filename for the next file
basenameNNNN.boundary.hdf5::
Actually contains the boundary information.
Other versions may output other files. It would be nice to have a description of them.
Packed AMR
This is the default. Each processor outputs all the grids it owns. This is done to avoid the hassle that comes with a 500,000 grid, 512 processor AMR sim. 512 files are much easier to deal with than 500,000
In addition to the parameter, hierarchy, and boundary files which may or may not be described elsewhere, data is output in one basenameNNNN.taskmapCCCC} file for each processor, which contains a map between grid number and hdf5 file, and one basenameNNNN.cpuCCCC for each processor NNNN and CCCC are the dump number and cpu number, respectively.
basenameNNNN.cpuCCCC is an hdf5 file which contains an hdf5 group for each grid. Each grid in turn contains a dataset for each of the fields in the simulation.
~/DD0100>h5ls data0100.cpu0003
Grid00000002 Group
Grid00000026 Group
~/DD0100>h5ls data0100.cpu0003/Grid00000002
Density Dataset {16, 16, 32}
z-velocity Dataset {16, 16, 32}
Not Packed AMR
The second output format is not packed. Not packed amr is included for legacy reasons only. It writes one hdf5 file per grid. Use is strongly discouraged. It will likely be removed in subsequent versions.
Pitfall - Hard Coded Pathnames
Pathnames are hard coded into the parameter, boundary, and hierarchy files. If you need to move a simulation, put the data where you want it, and run the following script in the directory containing the data.
#!/bin/tcsh if ( `ls -1 *.hierarchy |wc -l ` != 1 ) exit set paramfile = `basename *.hierarchy .hierarchy` # Find the parameter file. set dir1 = `grep GlobalDir $paramfile | awk '{print $3}'` # Find the hard-coded directory set dir2 = `cd ..; pwd` # The working directory foreach i ($paramfile $paramfile.hierarchy $paramfile.boundary) # Find the files to alter sed -e "s:"$dir1":"$dir2":g" $i > tmp; mv tmp $i # Replace the directory. sed is rad. end
This script is included in the Enzo source distribution, as bin/update_path. Assuming you have this in your path, here's how to loop over the all of the subdirectories with a particular prefix, and update them. You can modify the prefix (i.e., DD), to match other prefixes, such as RD. Please note, this happens to be bash syntax, so you may need to adjust it for other shells.
for d in `find . -type 'd' -name 'DD*'` do cd $d update_path cd ../ done
If sed scares you, here's a snippet of Python to iterate over a file, replace a string and write to a new file. It could be a bit more terse, but hopefully it is clear.
hierarchy = 'RD0033.hierarchy' old, new = '/dsgpfs/harkness/NewL7/Dumps/RD0033/','/gpfs/ux455215/L7/RD0033/' new_lines = (line.replace(old, new) for line in open(hierarchy)) open("%s.new" % hierarchy).writelines(new_lines)
Timing Methods
There are 6 ways to trigger output from enzo.
Cycle Based Output
CycleSkipDataDump = N CycleLastDataDump = W DataDumpName = data
One can trigger output every N cycles starting with cycle W using CycleSkipDataDump and CycleLastDataDump. Outputs are put in the directory DD0000 (or DD0001, etc.) and the basename is determined by DataDumpName.
CycleSkipDataDump <= 0 means cycle based output is skipped. The default is 0.
Pitfall 2: CycleLastDataDump defaults to zero and is incremented by CycleSkipDataDump every time output is done. If you change the value of CycleSkipDataDump and neglect to change CycleLastDataDump, Enzo will dump as long as CycleNumber >= CycleSkipDataDump + CycleLastDataDump. (So if you change CycleSkipDataDump from 0 to 10 from a Redshift dump at n=70, you'll get an output every timestep for 7 timesteps.)
Time Based Output
TimeLastDataDump = V dtDataDump = W
Exactly like Cycle based output, but triggered whenever time >= TimeLastDataDump + dtDataDump. The same pitfall applies.
Redshift Based Output
CosmologyOutputRedshift[ 0 ] = 12 CosmologyOutputRedshiftName[ 0 ] = Redshift12 RedshiftDumpName = RedshiftOutput
Outputs at the specified redshift. Any number of these can be specified.
CosmologyOutputRedshift[ i ] is the only necessary parameter, and is the ith redshift to output.
Any outputs with CosmologyOutputRedshiftName[ i ] specified has that name used for the output, and no number is appended. (so if CosmologyOutputRedshiftName[ 6 ] = BaconHat, the outputs will be BaconHat, BaconHat.hierarchy, etc.)
If CosmologyOutputRedshiftName[ i ] is omitted, RedshiftDumpName is used for the basename, and the output number is taken from the array index. (So CosmologyOutputRedshift[19] = 2.34 and RedshiftDumpName = MonkeyOnFire, at dump will be made at z=2.34 with files called MonkeyOnFire0019.hierarchy, etc.)
Force Output Now
The following two options are run time driven. These are especially useful for very deep simulations that spend the majority of their time on lower levels.
To force an output as soon as the simulation finished the next step on the finest resolution, make a file called outputNow:
touch outputNow
This will remove the file as soon as the output has finished.
Sub Cycle Based Output
To get the simulation to output every 10 subsycles (again at the finest level of resolution) put the number of subcycles to skip in a file called subcycleCount:
echo 10 > subcycleCount
Time Based Interpolated Output
Even when you are running simulations with a long dtDataDump, sometimes you may want to see or analyze the interim datadumps. Using dtInterpolatedDataDump, you can control enzo to check if it should start outputting interpolated data based on the time passed (dtInterpolatedDataDump < dtDataDump).
dtDataDump = 1e-4 dtInterpolatedDataDump = 1e-5
This is mostly for making movies or looking at the interim data where TopGrid? dt is too long, and in principle, this output shouldn't be used for restart.
Friendly Note on Data Output
Enzo is content to output enough data to fill up a hard drive -- for instance, your home directory. This should be noted before output parameters are set, particularly the Sub Cycle outputs, as Enzo has no prohibition against causing problems with quotas and file system size.
