Inconsistent results on bluegene

Yu-Heng Tseng yhtseng at lbl.gov
Thu Apr 13 15:48:07 CDT 2006


Jim,

Would you mind testing it using 2, 8 and 16 nodes? I got correct results 
using 32, 64, 128 nodes but wrong results using 2,8, 16 nodes. That's 
really interesting and may be machine dependent. Thanks for your 
investigation!

Yu-heng
Jim Edwards wrote:

> Yuheng,
>
> I am unable to reproduce your problem, here are the compile commands 
> and the resulting output...
>
> jedwards at fr0101ge:~/src/pnetcdf/test/fandc> make
> blrts_xlf -o pnctestf -g -qfullpath -O2 -qsuffix=cpp=F 
> -I/bgl/BlueLight/ppcfloor/bglsys/include   -qsuffix=cpp=F 
> -I/bgl/BlueLight/ppcfloor/bglsys/include ./pnctestf.F 
> -I../../src/libf/ -L/contrib/bgl/pnetcdf/lib -lpnetcdf -lm 
> -L/bgl/BlueLight/ppcfloor/bglsys/lib - lmpich.rts -lmsglayer.rts 
> -lrts.rts -ldevices.rts
> ** pnf_test   === End of Compilation 1 ===
> 1501-510  Compilation successful for file pnctestf.F.
> blrts_xlc -o pnctest -g -qfullpath -O2 
> -I/bgl/BlueLight/ppcfloor/bglsys/include   -o pnctest ./pnctest.c 
> -I./../../src/lib -L/contrib/bgl/pnetcdf/lib -lpnetcdf -lm 
> -L/bgl/BlueLight/ppcfloor/bglsys/lib - lmpich.rts -lmsglayer.rts 
> -lrts.rts -ldevices.rts
>
>
> mype  pe_coords    totsiz_3d         locsiz_3d       kstart,jstart,istart
>   0    0  0  0   256  256  256     256  256    2        1      1      1
>  42    0  0 42   256  256  256     256  256    2       85      1      1
>   6    0  0  6   256  256  256     256  256    2       13      1      1
>  72    0  0 72   256  256  256     256  256    2      145      1      1
>  76    0  0 76   256  256  256     256  256    2      153      1      1
>   8    0  0  8   256  256  256     256  256    2       17      1      1
>  12    0  0 12   256  256  256     256  256    2       25      1      1
>  64    0  0 64   256  256  256     256  256    2      129      1      1
>  68    0  0 68   256  256  256     256  256    2      137      1      1
>  33    0  0 33   256  256  256     256  256    2       67      1      1
>   4    0  0  4   256  256  256     256  256    2        9      1      1
>  75    0  0 75   256  256  256     256  256    2      151      1      1
> 102    0  0 **   256  256  256     256  256    2      205      1      1
>   1    0  0  1   256  256  256     256  256    2        3      1      1
>  13    0  0 13   256  256  256     256  256    2       27      1      1
> 105    0  0 **   256  256  256     256  256    2      211      1      1
>  77    0  0 77   256  256  256     256  256    2      155      1      1
>  34    0  0 34   256  256  256     256  256    2       69      1      1
>   5    0  0  5   256  256  256     256  256    2       11      1      1
>  65    0  0 65   256  256  256     256  256    2      131      1      1
>  69    0  0 69   256  256  256     256  256    2      139      1      1
>  40    0  0 40   256  256  256     256  256    2       81      1      1
>  38    0  0 38   256  256  256     256  256    2       77      1      1
>  74    0  0 74   256  256  256     256  256    2      149      1      1
> 103    0  0 **   256  256  256     256  256    2      207      1      1
>  10    0  0 10   256  256  256     256  256    2       21      1      1
>  45    0  0 45   256  256  256     256  256    2       91      1      1
> 104    0  0 **   256  256  256     256  256    2      209      1      1
> 109    0  0 **   256  256  256     256  256    2      219      1      1
>   9    0  0  9   256  256  256     256  256    2       19      1      1
>  31    0  0 31   256  256  256     256  256    2       63      1      1
>  98    0  0 98   256  256  256     256  256    2      197      1      1
>  78    0  0 78   256  256  256     256  256    2      157      1      1
>  43    0  0 43   256  256  256     256  256    2       87      1      1
>  44    0  0 44   256  256  256     256  256    2       89      1      1
>  73    0  0 73   256  256  256     256  256    2      147      1      1
> 100    0  0 **   256  256  256     256  256    2      201      1      1
>  41    0  0 41   256  256  256     256  256    2       83      1      1
>  46    0  0 46   256  256  256     256  256    2       93      1      1
>  67    0  0 67   256  256  256     256  256    2      135      1      1
>  71    0  0 71   256  256  256     256  256    2      143      1      1
>   2    0  0  2   256  256  256     256  256    2        5      1      1
>  37    0  0 37   256  256  256     256  256    2       75      1      1
>  97    0  0 97   256  256  256     256  256    2      195      1      1
> 101    0  0 **   256  256  256     256  256    2      203      1      1
>  32    0  0 32   256  256  256     256  256    2       65      1      1
>  15    0  0 15   256  256  256     256  256    2       31      1      1
> 106    0  0 **   256  256  256     256  256    2      213      1      1
> 110    0  0 **   256  256  256     256  256    2      221      1      1
>   3    0  0  3   256  256  256     256  256    2        7      1      1
>  36    0  0 36   256  256  256     256  256    2       73      1      1
>  96    0  0 96   256  256  256     256  256    2      193      1      1
> 108    0  0 **   256  256  256     256  256    2      217      1      1
>  11    0  0 11   256  256  256     256  256    2       23      1      1
>  14    0  0 14   256  256  256     256  256    2       29      1      1
> 114    0  0 **   256  256  256     256  256    2      229      1      1
>  86    0  0 86   256  256  256     256  256    2      173      1      1
>  49    0  0 49   256  256  256     256  256    2       99      1      1
>   7    0  0  7   256  256  256     256  256    2       15      1      1
>  66    0  0 66   256  256  256     256  256    2      133      1      1
>  70    0  0 70   256  256  256     256  256    2      141      1      1
>  35    0  0 35   256  256  256     256  256    2       71      1      1
>  54    0  0 54   256  256  256     256  256    2      109      1      1
>  99    0  0 99   256  256  256     256  256    2      199      1      1
>  79    0  0 79   256  256  256     256  256    2      159      1      1
>  27    0  0 27   256  256  256     256  256    2       55      1      1
>  47    0  0 47   256  256  256     256  256    2       95      1      1
> 113    0  0 **   256  256  256     256  256    2      227      1      1
> 118    0  0 **   256  256  256     256  256    2      237      1      1
>  50    0  0 50   256  256  256     256  256    2      101      1      1
>  61    0  0 61   256  256  256     256  256    2      123      1      1
> 107    0  0 **   256  256  256     256  256    2      215      1      1
> 111    0  0 **   256  256  256     256  256    2      223      1      1
>  17    0  0 17   256  256  256     256  256    2       35      1      1
>  39    0  0 39   256  256  256     256  256    2       79      1      1
>  83    0  0 83   256  256  256     256  256    2      167      1      1
> 125    0  0 **   256  256  256     256  256    2      251      1      1
>  18    0  0 18   256  256  256     256  256    2       37      1      1
>  22    0  0 22   256  256  256     256  256    2       45      1      1
>  81    0  0 81   256  256  256     256  256    2      163      1      1
> 126    0  0 **   256  256  256     256  256    2      253      1      1
>  56    0  0 56   256  256  256     256  256    2      113      1      1
>  21    0  0 21   256  256  256     256  256    2       43      1      1
>  82    0  0 82   256  256  256     256  256    2      165      1      1
>  85    0  0 85   256  256  256     256  256    2      171      1      1
>  58    0  0 58   256  256  256     256  256    2      117      1      1
>  62    0  0 62   256  256  256     256  256    2      125      1      1
> 120    0  0 **   256  256  256     256  256    2      241      1      1
> 127    0  0 **   256  256  256     256  256    2      255      1      1
>  24    0  0 24   256  256  256     256  256    2       49      1      1
>  52    0  0 52   256  256  256     256  256    2      105      1      1
> 122    0  0 **   256  256  256     256  256    2      245      1      1
> 124    0  0 **   256  256  256     256  256    2      249      1      1
>  51    0  0 51   256  256  256     256  256    2      103      1      1
>  55    0  0 55   256  256  256     256  256    2      111      1      1
>  80    0  0 80   256  256  256     256  256    2      161      1      1
>  94    0  0 94   256  256  256     256  256    2      189      1      1
>  48    0  0 48   256  256  256     256  256    2       97      1      1
>  28    0  0 28   256  256  256     256  256    2       57      1      1
>  91    0  0 91   256  256  256     256  256    2      183      1      1
>  92    0  0 92   256  256  256     256  256    2      185      1      1
>  26    0  0 26   256  256  256     256  256    2       53      1      1
>  30    0  0 30   256  256  256     256  256    2       61      1      1
> 112    0  0 **   256  256  256     256  256    2      225      1      1
>  87    0  0 87   256  256  256     256  256    2      175      1      1
>  25    0  0 25   256  256  256     256  256    2       51      1      1
>  60    0  0 60   256  256  256     256  256    2      121      1      1
>  90    0  0 90   256  256  256     256  256    2      181      1      1
> 116    0  0 **   256  256  256     256  256    2      233      1      1
>  19    0  0 19   256  256  256     256  256    2       39      1      1
>  23    0  0 23   256  256  256     256  256    2       47      1      1
>  89    0  0 89   256  256  256     256  256    2      179      1      1
>  95    0  0 95   256  256  256     256  256    2      191      1      1
>  57    0  0 57   256  256  256     256  256    2      115      1      1
>  29    0  0 29   256  256  256     256  256    2       59      1      1
> 123    0  0 **   256  256  256     256  256    2      247      1      1
>  93    0  0 93   256  256  256     256  256    2      187      1      1
>  59    0  0 59   256  256  256     256  256    2      119      1      1
>  53    0  0 53   256  256  256     256  256    2      107      1      1
> 121    0  0 **   256  256  256     256  256    2      243      1      1
> 117    0  0 **   256  256  256     256  256    2      235      1      1
>  16    0  0 16   256  256  256     256  256    2       33      1      1
>  63    0  0 63   256  256  256     256  256    2      127      1      1
> 115    0  0 **   256  256  256     256  256    2      231      1      1
> 119    0  0 **   256  256  256     256  256    2      239      1      1
>  20    0  0 20   256  256  256     256  256    2       41      1      1
>  88    0  0 88   256  256  256     256  256    2      177      1      1
>  84    0  0 84   256  256  256     256  256    2      169      1      1
> write 1:  .149E-01  .149E+01
> write 2:  .954E+00  .131E+01
> write 3:  .240E+00  .149E+01
> write 4:  .227E+00  .142E+01
> write 5:  .205E+00  .134E+01
>  read 1:  .132E-01  .621E+00
> diff, delmax, delmin =  .000E+00  .000E+00  .000E+00
>  read 2:  .178E-01  .624E+00
>  read 3:  .195E-01  .629E+00
>  read 4:  .184E-01  .638E+00
>  read 5:  .210E-01  .659E+00
> File size:   .671E+02 MB
>     Write:    51.398 MB/s  (eff.,    50.819 MB/s)
>     Read :   108.142 MB/s  (eff.,   105.886 MB/s)
> Total number PEs:  128
>    .149E-01   .131E+01   50.819   .132E-01   .621E+00  105.886
>
>
>
> On 4/13/06, *Yu-Heng Tseng * <YHTseng at lbl.gov 
> <mailto:YHTseng at lbl.gov>> wrote:
>
>     Hi Ross
>
>     I am testing it on NCAR's BG/L. These questions are really beyond my
>     understanding. Sidd at NCAR should be able to answer your questions.
>     Thanks!
>
>     Yuheng
>     ---------------------------------------------------
>     Yu-Heng Tseng
>
>     Computational Research Division
>     Lawrence Berkeley National Laboratory
>     One Cyclotron Rd, MS: 50F-1650
>     Berkeley, CA94720
>     YHTseng at lbl.gov <mailto:YHTseng at lbl.gov>
>     510.495.2904
>
>     ----- Original Message -----
>     From: Rob Ross <rross at mcs.anl.gov <mailto:rross at mcs.anl.gov>>
>     Date: Thursday, April 13, 2006 6:23 am
>     Subject: Re: Inconsistent results on bluegene
>
>     > Hi,
>     >
>     > I see. Can you tell us what BG/L you are running on, what file
>     > system it
>     > has, and how it is mounted to the I/O nodes?
>     >
>     > If not, can you CC someone that would know?
>     >
>     > This could be an NFS thing.
>     >
>     > Thanks,
>     >
>     > Rob
>     >
>     > Yu-Heng Tseng wrote:
>     > > Hi,
>     > >
>     > > Here is the description of the problem.
>     > > In the test directory under test/fandc/
>     > > there is a Fortran test file called "pnf_test.F"
>     > > This file generates an array has size 256x256x256 and output it
>     > to a
>     > > pnf_test.nc file and read it back.
>     > >
>     > > In other machines or using gcc to compile the library, it
>     > compares the
>     > > original array with the readin new array. The output have
>     > > diff, delmax, delmin
>     > >
>     > > They should be all zero which represents the difference between
>     > the
>     > > new array and old array. Unfortunitely, I got some non-zero
>     > values
>     > > when I use 2, 8, 16 processors which means the new readin array
>     > and
>     > > original array are not identical.
>     > > The only correct result is using 1 processor. The new readin
>     > array is
>     > > identical to the old array.
>     > >
>     > > I am not sure if there is anything wrong on the compiler or any
>     > > problem. It looks fine when I use gcc to compile. Thank you so
>     > much
>     > > for your help!
>     > >
>     > > Yu-heng
>     > > ---------------------------------------------------
>     > > Yu-Heng Tseng
>     > >
>     > > Computational Research Division
>     > > Lawrence Berkeley National Laboratory
>     > > One Cyclotron Rd, MS: 50F-1650
>     > > Berkeley, CA94720
>     > > YHTseng at lbl.gov <mailto:YHTseng at lbl.gov>
>     > > 510.495.2904
>     > >
>     > > ----- Original Message -----
>     > > From: Rob Ross <rross at mcs.anl.gov <mailto:rross at mcs.anl.gov>>
>     > > Date: Wednesday, April 12, 2006 7:28 pm
>     > > Subject: Re: Inconsistent results on bluegene
>     > >
>     > >> Hi,
>     > >>
>     > >> Can you describe how the results are inconsistent?
>     > >>
>     > >> Thanks,
>     > >>
>     > >> Rob
>     > >>
>     > >> Yu-Heng Tseng wrote:
>     > >>> Hi,
>     > >>>
>     > >>> I am doing a testing on bluegene using. Surprisely, the
>     > results
>     > >> are not
>     > >>> consistent when I use
>     > >>> multiple processors. There is no error when I compile and
>     > >> install the
>     > >>> library  I believe.
>     > >>> I use the test case from /fandc/pnf_test.F
>     > >>>
>     > >>> The compiler I am using is "blrts_xlc" I am not sure what
>     > cause
>     > >> the
>     > >>> problem.
>     > >>> But when I use gcc to compile it. The results are consistent
>     > for
>     > >>> multiple processors.
>     > >>> Please give me some advices! Thanks!
>     > >>>
>     > >>> Yu-heng
>     > >>>
>     > >
>     >
>     >
>
>




More information about the parallel-netcdf mailing list