Tutorials/ProblemSizeEsimates

Version 3 (modified by bwoshea, 5 years ago)

--

Estimate of Enzo problem sizes and computation times

Estimating problem sizes for most Enzo calculations is at best an inexact science, given the nature of adaptive mesh refinement simulations. The fundamental issue with an AMR calculation in cosmology or in many astrophysical situations where gravitational collapse is important has to do with memory. The amount of memory used at the beginning of the simulation (when you have a single grid or a handful of grids) is far, far less than the memory consumption at the end of the simulation, when there can be hundreds of grids per processor. The amount of memory required can easily grow by an order of magnitude over the course of a cosmological simulation, so it is very important to make sure that enough memory is available. It is also important to realize that Enzo achieves reasonable scaling by overlapping computation and communication as much as possible, and that in general one should try to keep the largest amount of data per processing core that you can so that individual cores are never data-starved (which causes poor scaling, as your CPUs will then be sitting idle while waiting for data from other computing nodes). Computational fluid dynamics simulations are notoriously communication-heavy, making this a challenging corner of parameter space to operate in.

This page contains some rules of thumb that will help you along your way, based on data collected up to the release of Enzo v1.5 (so up to fall 2008), when supercomputers typically have 1GB-2GB of memory per processing unit (a dual-processor node with two cores per processor would have 4-8 GB of memory, for example).

Cosmology or non-cosmology unigrid (non-AMR) simulations. These are actually quite straightforward to predict, given that in a unigrid simulation the grid is partitioned up in an approximately equal fashion and then left alone. Experimentation shows that, for machines with 1-2 GB of memory per core, one gets near-ideal scaling with 1283 cells per core (so a 5123 cell calculations should be run on 64 processors, and a 10243 cell run should be done on 512 processors). This is comfortably within memory limits for non-cosmology runs, and there is no danger of running up against a node's memory ceiling (which causes tremendous slowdown, if not outright program failure). Unigrid cosmology runs have a further complication due to the dark matter particles - these move around in space, and thus move from processor to processor. Areas where halos and other cosmological structures form will correspond to regions with greater than average memory consumption. Keeping 1283 cells and particles per core seems to scale extremely efficiently up to thousands of processors, though if one is using a machine like an IBM Blue Gene, which typically has far less memory per core than other computers, one might have to go to 643 cells/particles per core so that nodes corresponding to dense regions of the universe don't run out of memory.

Cosmology adaptive mesh simulations. Scaling and problem size is much more difficult to predict for an AMR cosmology run than for its unigrid equivalent. As discussed above, the amount of memory consumed can grow strongly over time. For example, a 5123 root grid simulation with seven levels of adaptive mesh refinement started out with 512 root grid tiles, and ended up with over 400,000 grids! This calculation was run on 512 processors, though memory consumption grew to the point that it had to be run on a system where half of the cores per node were kept idle so as to provide enough memory for the computation. Furthermore, there is significant memory overhead in extremely large calculations relating to the data structures used to keep track of the grid hierarchy, and some scaling bottlenecks due to Enzo's load balancing scheme. These will both be addressed in versions of the code after Enzo v1.5.

Empirically, it appears that simulations with 5-7 levels of refinement, where refinement takes place everywhere in the simulation volume, scale reasonably well when the root grid per tile is kept between 323 and 643 cells - a reasonable compromise would be 64x64x32 cells, or 128 processors for a simulation with a 2563 root grid and seven levels of adaptive mesh refinement. Going to a significantly smaller root grid will result in individual cores being data-starved, and thus give poor scaling results.

AMR calculations with nested-grid initial conditions, where adaptive mesh refinement is only allowed to occur on a small section of the simulation volume, tend to scale relatively poorly. This is due to a combination of factors, including difficulty in load balancing. A fundamental issue is that there are generally not that many grids on the highest level of refinement, since there are only a few halos in the refinement region. This makes scaling to large numbers of processors extremely hard since AMR calculations with adaptive timestepping tend to spend most of their time at the highest level of refinement. As an example, simulations with a 1283 root grid and three levels of static nested grids (with the third-level grid being 2563) tend to run quite well on 32 or 64 processors on a Linux cluster using an Infiniband interconnect. Scaling is clearly affected at larger processor counts by the small number of high-level grids, and at low processor counts there is not enough available memory. Careful experimentation is required for this sort of simulation!

Non-cosmology adaptive mesh simulations. There is a far wider variety of non-cosmology AMR calculations than there are cosmology AMR calculations, making a general guideline much more difficult. For example, one can refine by a factor of four instead of two when not using particles, allowing fewer grids that are overall larger. One can opt to not use gravity at all (when doing certain hydrodynamics problems, such as compressible turbulence), which also strongly affects scaling. The scaling guidelines listed for AMR cosmology calculations will broadly be useful, but some experimentation will undoubtedly be required for new kinds of calculations.