Personal tools
You are here: Home Codes ZEUS 3D SGI Power Challenge Array (4x16 R8000 CPUs)
Document Actions

SGI Power Challenge Array (4x16 R8000 CPUs)

by streeter last modified 2007-03-30 04:44
  • These machines are connected via HIPPI with a full crossbar switch.

  • These tests were run in dedicated mode.

  • All ZEUS-MP routines were compiled with: f77 -c -O3 -g3 -w

  • Most ZEUS-3D routines were compiled with: f77 -c -O3 -w1 -g3 -pfa list -WK,-ro=3,-so=3,-o=5,-as=l,-chs=16

  • Note that ZEUS-3D cannot be run on more than one POWERnode because the system's shared memory does not extend across POWERnodes.

  • WORK IS SCALED WITH THE NUMBER OF PROCESSORS





 
GRID: 32 x 32 x 32 per processor(tile) 
COMMENT: On just a few processors, ZEUS-3D is faster for this small problem size because 
         a substantial fraction of the data (less than 7 MB) fits in the 4 MB cache.  The 
         optimizations for ZEUS-MP to improve the reuse of encached data are largely 
         wasted. 
COMMENT: When communicating across POWERnodes (more than 16 threads), SGI native MPI uses 
         HIPPI for message sizes above 8 KB and sockets for smaller messages.  Although 
         HIPPI is the faster network, it has a longer latency than sockets, so it takes 
         longer for short messages.  Even with this small tile size, most messages exceed 
         8 KB (see discussion above under COMMUNICATION), so the long latency of HIPPI 
         is probably responsible for the poor scaling across more than 16 threads. 
 
ZEUS-MP (10 steps) 
                                                                                   Speedup/
 Processors  Layout  Wall Clock  tused(s) Zone-Cycles/sec   MFLOPS   C90s  Speedup Processor 
      1       1x1x1       4.68       4.64        70334       61.47   0.32   1.00     1.00 
      2       2x1x1       5.03       4.97       130881      114.39   0.60   1.86     0.93 
      4       2x2x1       5.00       4.95       263300      230.13   1.21   3.74     0.94 
      8       2x2x2       5.33       5.25       495453      433.04   2.28   7.04     0.88 
     16       4x2x2       6.26       6.03       864381      755.50   3.98  12.29     0.77 
     32       4x4x2      16.50      16.07       648345      566.68   2.98   9.22     0.29 
     64       4x4x4      45.61      42.76       467040      408.21   2.15   6.64     0.10 
 
ZEUS-3D (20 steps) (same layout)

                                                                          Speedup/
 Processors  Wall Clock  tused(s) Zone-Cycles/sec MFLOPS   C90s  Speedup  Processor 
      1         8.00       8.10        80932       70.74   0.37    1.00     1.00 
      2         9.00       8.18       160142      136.60   0.50    1.98     0.99 
      4        12.00      10.05       260945      218.15   0.77    3.22     0.81 
      8        17.00      15.12       346802      283.81   0.97    4.29     0.54 
     16        18.00      15.85       661638      535.75   1.49    8.18     0.51 



 
GRID: 64 x 64 x 64 per processor(tile) 
COMMENT: ZEUS-MP outperforms ZEUS-3D even on 1 processor.  
 
ZEUS-MP (10 steps) 
                                                                                    Speedup/

 Processors  Layout  Wall Clock  tused(s) Zone-Cycles/sec  MFLOPS   C90s   Speedup  Processor 
      1       1x1x1     39.80      39.38        66115       54.11   0.18     1.00     1.00 
      2       2x1x1     40.80      40.34       129099      105.65   0.36     1.95     0.98 
      4       2x2x1     41.69      41.24       252549      206.67   0.71     3.82     0.95 
      8       2x2x2     45.65      45.07       461475      377.65   1.29     6.98     0.87 
     16       4x2x2     68.46      66.65       621333      508.47   1.74     9.40     0.59 
     32       4x4x2     75.70      73.13      1126290      921.70   3.15    17.04     0.53 
     64       4x4x4    113.49     109.70      1503090     1230.06   4.20    22.73     0.36 
 
ZEUS-3D (10 steps)(same layout)
                                                                           Speedup/ 
 Processors  Wall Clock  tused(s) Zone-Cycles/sec  MFLOPS   C90s  Speedup  Processor 
      1         53.00      52.40        50026       40.94   0.14    1.00     1.00 
      2         53.00      51.30       102201       82.75   0.23    2.04     1.02 
      4         77.00      56.88       184355      149.28   0.42    3.69     0.92 
      8        121.00      79.90       262467      212.53   0.59    5.25     0.66 
     16        203.00     189.84       220941      178.90   0.50    4.42     0.28 



 
GRID: 128 x 64 x 64 per processor(tile) 
COMMENT: I tried to do a 128-cubed problem with ZEUS-MP, but the system kept crashing 
         (no error messages) for 16 or more processors.   
         Each 128-cubed tile requires about 256 MB, so running with this tile size on 
         16 processors would use about 4 GB.  In fact, I had no success with 
         any MPI run with a total memory requirement over 2 GB.  Moreover, the 
         32 processor run below does not get the correct answer -- the timestep is 0! 
         These problems have been fixed for IRIX 6.2. 
COMMENT: For ZEUS-MP, the speedup is nearly the same as it is for the 64-cubed tiles. 
         The long latency of HIPPI apparently has no impact on scaling. 
         HIPPI is probably just not fast enough to keep up with the processors for 64-cubed 
         or larger tiles. 
 
ZEUS-MP (10 steps) 

                                                                                 Speedup/
 Processors  Layout  Wall Clock  tused(s) Zone-Cycles/sec MFLOPS   C90s  Speedup Processor 
      1       1x1x1    75.50      74.68        69705       54.16   0.33    1.00    1.00 
      2       1x1x2    75.50      74.57       139601      108.46   0.67    2.00    1.00 
      4       1x2x2    76.69      75.79       274691      213.42   1.31    3.94    0.99 
      8       2x2x2    83.56      82.53       504319      391.83   2.40    7.24    0.90 
     16       2x4x2   148.44     129.34       640063      497.32   3.03    9.18    0.57 
     32       2x4x4   149.24     140.89      1169500      908.69   5.54   16.77    0.52 
 
ZEUS-3D (7 or 10 steps) (same layout)
                                                                         Speedup/

 Processors  Wall Clock  tused(s) Zone-Cycles/sec MFLOPS  C90s   Speedup Processor 
      1         106.00     104.23     50302       40.73   0.11     1.00    1.00 
      2         111.00     109.03     96169       77.87   0.22     1.91    0.96 
      4         124.00     120.21    174458      141.26   0.39     3.47    0.87 
      8         194.00     183.53    228535      185.05   0.52     4.54    0.57 
     16        5371.00    1123.20     52281       42.33   0.12     1.04    0.06 
WORK IS CONSTANT




 
GRID: Tile size adjusted to make the full mesh 128 x 128 x 128 
COMMENT: The ZEUS-MP tile size is 32-cubed for 64 processors. 
 
ZEUS-MP (10 steps) 

                                                                                Speedup/
 Processors  Layout  Wall Clock  tused(s) Zone-Cycles/sec MFLOPS  C90s  Speedup Processor 
      1       1x1x1    350.65     346.84      60077       47.66   0.31    1.00    1.00 
      2       1x1x2    146.91     145.21     143305      113.70   0.75    2.39    1.19 
      4       1x2x2     76.78      75.92     274217      217.56   1.43    4.56    1.14 
      8       2x2x2     45.65      45.07     461475      366.13   2.41    7.68    0.96 
     16       2x2x4     32.23      31.57     657477      521.64   3.43   10.94    0.68 
     32       2x4x4     26.26      25.18     817647      648.72   4.27   13.61    0.42 
     64       4x4x4     45.61      42.76     467040      370.55   2.44    7.77    0.12 
 
ZEUS-3D (10 steps) (same layout)
                                                                            Speedup/ 
 Processors  Wall Clock   tused(s) Zone-Cycles/sec  MFLOPS   C90s   Speedup Processor 
      1         410.00     408.70        51313       41.55   0.12     1.00    1.00 
      2         214.00     211.09        99351       80.45   0.22     1.94    0.97 
      3         152.00     148.65       141080      114.24   0.32     2.75    0.92 
      4         119.00     114.88       182557      147.82   0.41     3.56    0.89 
      5         104.00      99.72       210300      170.29   0.47     4.10    0.82 
      6          92.00      88.70       236437      191.45   0.53     4.61    0.77 
      7          88.00      84.71       247569      200.46   0.56     4.82    0.69 
      8          85.00      79.98       262212      212.32   0.59     5.11    0.64 
      9          83.00      78.19       268202      217.17   0.60     5.23    0.58 
     10          80.00      75.97       276053      223.53   0.62     5.38    0.54 
     11          79.00      74.33       282132      228.45   0.64     5.50    0.50 
     12          79.00      73.29       286162      231.71   0.65     5.58    0.46 
     13          76.00      71.33       294009      238.07   0.66     5.73    0.44 
     14          75.00      70.19       298785      241.93   0.67     5.82    0.42 
     15          70.00      64.69       324182      262.50   0.73     6.32    0.42 
     16          67.00      60.77       345097      279.43   0.78     6.73    0.42 



 
GRID: Tile size adjusted to make the full mesh 256 x 256 x 256 
COMMENT: The ZEUS-MP tile size is 64-cubed for 64 processors. 
COMMENT: The ZEUS-3D data was obtained from an ordinary batch job (not dedicated). 
 
ZEUS-MP (10 steps) 

                                                                                   Speedup/
 Processors  Layout  Wall Clock  tused(s) Zone-Cycles/sec  MFLOPS   C90s   Speedup Processor 
      1       1x1x1    666.38     659.53        63180       49.09   0.30     1.00    1.00 
      2       1x1x2    671.53     664.60       125393       97.42   0.60     1.98    0.99 
      4       1x2x2    679.97     672.85       247688      192.44   1.18     3.92    0.98 
      8       2x2x2    394.17     389.89       427417      332.10   2.11     6.77    1.00 
     16       2x2x4    241.92     237.11       701097      544.74   3.32    11.10    0.69 
     32       2x4x4    149.24     140.89      1169500      908.69   5.55    18.51    0.59 
     64       4x4x4    113.49     109.70      1503090     1167.88   7.14    23.79    0.37 
 
ZEUS-3D (3 to 10 steps) (same layout)

                                                                        Speedup/
 Processors Wall Clock  tused(s) Zone-Cycles/sec  MFLOPS  C90s  Speedup Processor 
      1       1005.00    1000.50      50308       40.74   0.11    1.00    1.00 
      2       1014.00    1005.10     100154       81.10   0.23    1.99    1.00 
      4        912.00     890.21     188464      152.60   0.43    3.75    0.94 
      8        672.00     647.26     259202      209.88   0.58    5.15    0.64 
     16        643.00     602.86     278295      225.34   0.63    5.53     0.35 


Back to Scaling Comparison Main


Powered by Plone CMS, the Open Source Content Management System

This site conforms to the following standards: