Personal tools
You are here: Home Codes ZEUS 3D HP/Convex Exemplar SPP-1200
Document Actions

HP/Convex Exemplar SPP-1200

by streeter last modified 2007-03-30 04:41
  • Isom Crawford's Fortran-callable interface to the Exemplar's thread timing routines is available here.

  • This 4-HYPERnode system was configured with several HYPERnodes devoted to processing one batch job at a time (dedicated batch queue).

  • All ZEUS-MP routines were compiled with: f77 -c +O3

  • All ZEUS-3D routines were compiled with: fc -c -nw -O3 -or none

  • Parallelization directives were inserted into ZEUS-3D above most loop nests.

  • ZEUS-MP can be compiled with fc instead of f77, but the performance is 20 to 30 percent worse with fc.

  • WORK IS SCALED WITH THE NUMBER OF PROCESSORS



 
GRID: 32 x 32 x 32 per processor(tile) 
 
ZEUS-MP (10 Steps) 

                                                                                  Speedup/
 Processors  Layout  Wall Clock  tused(s) Zone-Cycles/sec MFLOPS   C90s   Speedup Processor 
      1       1x1x1    13.23      12.90        25247       23.11    .21     1.00    1.00 
      2       2x1x1    13.15      12.64        51553       47.20    .43     2.04    1.02 
      2       1x2x1    13.01      12.56        51903       47.52    .43     2.06    1.03 
      2       1x1x2    12.89      12.59        51753       47.38    .43     2.05    1.02 
      4       2x2x1    13.57      12.75       102294       93.66    .85     4.05    1.01 
      4       2x1x2    13.47      12.65       103073       94.37    .86     4.08    1.02 
      4       1x2x2    13.75      12.93       100848       92.33    .84     3.99    1.00 
      8       2x2x2    15.45      13.65       190981      174.86   1.59     7.56     .95 
     12       3x2x2    16.98      15.23       253011      231.65   2.11    10.02     .84 
     12       2x3x2    17.88      16.26       236404      216.44   1.97     9.36     .78 
     12       2x2x3    16.54      15.12       256747      235.07   2.14    10.17     .85 
     16       4x2x2    16.26      14.58       355321      325.32   2.96    14.07     .88 
     16       2x4x2    16.54      14.27       363743      333.03   3.03    14.41     .90 
     16       2x2x4    16.41      14.10       369519      338.32   3.08    14.64     .91 
     24       4x3x2    19.80      17.90       431783      395.33   3.59    17.10     .71 
     24       4x2x3    20.61      18.78       410831      376.14   3.42    16.27     .68 
     24       3x4x2    20.23      17.78       433639      397.03   3.61    17.18     .72 
     24       3x2x4    19.59      16.84       449868      411.88   3.74    17.82     .74 
     24       2x4x3    19.42      17.25       449888      411.90   3.74    17.82     .74 
     24       2x3x4    19.73      17.62       441053      403.81   3.67    17.47     .73 

ZEUS-3D (10 Steps) 
                                                                                 Speedup/
 Processors        Wall Clock  tused(s) Zone-Cycles/sec  MFLOPS   C90s   Speedup Processor 
      1              21.54      19.42        16872       14.75    .08     1.00     1.00 
      2              20.63      17.74        36954       31.52    .12     2.19     1.10 
      4              24.17      19.90        65874       55.07    .19     3.90      .98 
      8              34.30      27.33        95911       78.49    .27     5.68      .71 
     12              40.04      31.81       123633      108.08    .30     7.33      .61 
     16              49.79      35.90       146053      118.26    .33     8.66      .54 



 
GRID: 64 x 64 x 64 per processor(tile) 
 
ZEUS-MP (10 Steps) 

                                                                                   Speedup/
 Processors  Layout  Wall Clock  tused(s) Zone-Cycles/sec  MFLOPS   C90s   Speedup Processor 
      1       1x1x1    106.25     104.52        24980       19.85    .13     1.00    1.00 
      2       2x1x1    106.92     104.57        49944       39.69    .26     2.00    1.00 
      2       1x2x1    106.09     103.77        50321       39.99    .26     2.01    1.01 
      2       1x1x2    105.64     103.86        50259       39.94    .26     2.01    1.01 
      4       2x2x1    112.14     108.39        96210       76.45    .50     3.85     .96 
      4       2x1x2    124.16     121.02        86254       68.54    .45     3.45     .86 
      4       1x2x2    112.23     109.14        95720       76.06    .50     3.83     .96 
      8       2x2x2    133.61     125.29       166092      131.99    .87     6.65     .83 
     12       3x2x2    162.52     155.99       198564      157.79   1.04     7.95     .66 
     12       2x3x2    130.79     124.72       248235      197.26   1.30     9.94     .83 
     12       2x2x3    157.01     149.75       205874      163.60   1.08     8.24     .69 
     16       4x2x2    158.87     150.34       277229      220.30   1.45    11.10     .69 
     16       2x4x2    146.82     137.35       303961      241.54   1.59    12.17     .76 
     16       2x2x4    164.91     155.49       268922      213.70   1.41    10.77     .67 
     24       4x3x2    169.09     170.45       363080      288.52   1.90    14.53     .61 
     24       4x2x3    162.37     151.38       405550      322.27   2.12    16.23     .68 
     24       3x4x2    160.57     150.35       410881      326.51   2.15    16.45     .69 
     24       3x2x4    168.81     158.54       393492      312.69   2.06    15.75     .66 
     24       2x4x3    150.04     141.05       443702      352.59   2.32    17.76     .74 
     24       2x3x4    179.96     167.63       365722      290.62   1.91    14.64     .61 
 
ZEUS-3D (10 Steps)
                                                                                 Speedup/ 
 Processors        Wall Clock  tused(s) Zone-Cycles/sec  MFLOPS   C90s   Speedup Processor 
      1              160.30     154.34        16985       13.90    .05     1.00    1.00 
      2              149.66     137.18        38218       30.95    .09     2.25    1.13 
      4              185.32     165.53        63347       51.29    .14     3.73     .93 
      8              259.98     217.56        96395       78.05    .22     5.68     .71 
     12              315.24     255.90       122927      100.60    .28     7.24     .60 
     16              348.87     285.02       147156      119.16    .33     8.66     .54 
 



 
GRID: 128 x 64 x 64 per processor(tile) 
COMMENT: For 16 Processors (512 x 128 x 128), ZEUS-3D requires practically all 
         available global shared memory allowed in the dedicated batch queue (1536 MB), 
         and consequently runs extremely slowly. 
 
ZEUS-MP (10 Steps) 

                                                                                   Speedup/
 Processors  Layout  Wall Clock  tused(s) Zone-Cycles/sec  MFLOPS   C90s   Speedup Processor 
      1       1x1x1    180.37     177.20        29465       22.89    .14     1.00    1.00 
      2       2x1x1    184.40     180.38        57894       44.98    .28     1.96     .98 
      2       1x2x1    183.32     178.76        58418       45.39    .28     1.98     .99 
      2       1x1x2    185.76     182.05        57355       44.56    .27     1.95     .97 
      4       2x2x1    191.42     184.98       112802       87.64    .54     3.83     .96 
      4       2x1x2    190.85     185.37       112522       87.42    .54     3.82     .95 
      4       1x2x2    239.22     232.97        89528       69.56    .43     3.04     .76 
      8       2x2x2    277.92     273.16       152968      118.85    .73     5.19     .65 
     12       3x2x2    274.88     264.13       235129      182.68   1.12     7.98     .66 
     12       2x3x2    282.95     270.50       227820      177.00   1.09     7.73     .64 
     12       2x2x3    303.91     290.85       211659      164.45   1.01     7.18     .60 
     16       4x2x2    311.14     289.58       282359      219.38   1.35     9.58     .60 
     16       2x4x2    280.57     261.56       315084      244.80   1.50    10.69     .67 
     16       2x2x4    289.44     272.04       305104      237.05   1.45    10.35     .65 
     24       4x3x2    321.47     306.12       409177      317.91   1.95    13.89     .58 
     24       4x2x3    301.44     281.84       437455      339.88   2.09    14.85     .62 
     24       3x4x2    279.58     263.18       473942      368.23   2.26    16.08     .67 
     24       3x2x4    315.52     295.54       418151      324.88   1.99    14.19     .59 
     24       2x4x3    299.11     281.96       443263      344.39   2.11    15.04     .63 
     24       2x3x4    286.94     267.67       461444      358.52   2.20    15.66     .65 
 
ZEUS-3D (10 Steps) 
                                                                                 Speedup/
 Processors       Wall Clock  tused(s) Zone-Cycles/sec   MFLOPS   C90s   Speedup Processor 
      1              302.59     287.39        18243       14.77    .04     1.00    1.00 
      2              326.27     301.93        34729       28.12    .08     1.90     .95 
      4              465.30     424.18        49440       40.03    .11     2.71     .68 
      8              618.57     540.37        77619       62.85    .18     4.25     .53 
     12              742.10     625.32       100612       81.47    .23     5.52     .46 
WORK IS CONSTANT


GRID: Tile size adjusted to make the full mesh 128 x 128 x 128
 
ZEUS-MP (10 steps) 

                                                                                   Speedup/
 Processors  Layout  Wall Clock  tused(s) Zone-Cycles/sec  MFLOPS   C90s   Speedup Processor 
      1       1x1x1    776.79     764.66        27324       21.23    .13     1.00    1.00 
      2       1x1x2    405.45     398.34        52456       40.76    .25     1.92     .96 
      4       1x2x2    191.16     185.20       112779       87.62    .54     4.13    1.03 
      8       2x2x2    126.93     118.20       176414      140.19    .92     6.50     .81 
     16       2x2x4     57.16      51.95       400037      310.81   1.91    14.64     .92 
 
ZEUS-3D (10 steps) 

                                                                              Speedup/
 Processors    Wall Clock  tused(s) Zone-Cycles/sec   MFLOPS   C90s   Speedup Processor 
      1          1063.78    1021.20        16430       13.30    .04     1.00    1.00 
      2           696.51     658.44        31851       25.79    .07     1.94     .97 
      3           496.41     462.76        45318       36.70    .10     2.76     .92 
      4           390.13     354.02        59238       47.97    .13     3.61     .90 
      5           338.13     303.39        69124       55.97    .16     4.21     .84 
      6           300.06     267.19        78489       63.55    .18     4.78     .80 
      7           274.28     241.24        86931       70.39    .20     5.29     .76 
      8           254.72     220.12        95272       77.14    .21     5.80     .72 
      9           239.49     203.01       103302       83.65    .23     6.29     .70 
     10           234.77     178.46       117517       95.16    .27     7.15     .72 
     11           243.52     166.28       126125      102.13    .28     7.68     .70 
     12           257.49     159.27       131675      106.62    .30     8.01     .67 
     13           264.45     141.05       148686      120.39    .34     9.05     .70 
     14           237.86     136.99       153090      123.96    .35     9.32     .67 
     15           266.59     151.92       138044      111.78    .31     8.40     .56 
     16           235.00     143.43       146214      118.39    .33     8.90     .56 


Back to Scaling Comparison Main


Powered by Plone CMS, the Open Source Content Management System

This site conforms to the following standards: