Inconsistent results on bluegene
Yu-Heng Tseng
yhtseng at lbl.gov
Thu Apr 13 15:48:07 CDT 2006
Jim,
Would you mind testing it using 2, 8 and 16 nodes? I got correct results
using 32, 64, 128 nodes but wrong results using 2,8, 16 nodes. That's
really interesting and may be machine dependent. Thanks for your
investigation!
Yu-heng
Jim Edwards wrote:
> Yuheng,
>
> I am unable to reproduce your problem, here are the compile commands
> and the resulting output...
>
> jedwards at fr0101ge:~/src/pnetcdf/test/fandc> make
> blrts_xlf -o pnctestf -g -qfullpath -O2 -qsuffix=cpp=F
> -I/bgl/BlueLight/ppcfloor/bglsys/include -qsuffix=cpp=F
> -I/bgl/BlueLight/ppcfloor/bglsys/include ./pnctestf.F
> -I../../src/libf/ -L/contrib/bgl/pnetcdf/lib -lpnetcdf -lm
> -L/bgl/BlueLight/ppcfloor/bglsys/lib - lmpich.rts -lmsglayer.rts
> -lrts.rts -ldevices.rts
> ** pnf_test === End of Compilation 1 ===
> 1501-510 Compilation successful for file pnctestf.F.
> blrts_xlc -o pnctest -g -qfullpath -O2
> -I/bgl/BlueLight/ppcfloor/bglsys/include -o pnctest ./pnctest.c
> -I./../../src/lib -L/contrib/bgl/pnetcdf/lib -lpnetcdf -lm
> -L/bgl/BlueLight/ppcfloor/bglsys/lib - lmpich.rts -lmsglayer.rts
> -lrts.rts -ldevices.rts
>
>
> mype pe_coords totsiz_3d locsiz_3d kstart,jstart,istart
> 0 0 0 0 256 256 256 256 256 2 1 1 1
> 42 0 0 42 256 256 256 256 256 2 85 1 1
> 6 0 0 6 256 256 256 256 256 2 13 1 1
> 72 0 0 72 256 256 256 256 256 2 145 1 1
> 76 0 0 76 256 256 256 256 256 2 153 1 1
> 8 0 0 8 256 256 256 256 256 2 17 1 1
> 12 0 0 12 256 256 256 256 256 2 25 1 1
> 64 0 0 64 256 256 256 256 256 2 129 1 1
> 68 0 0 68 256 256 256 256 256 2 137 1 1
> 33 0 0 33 256 256 256 256 256 2 67 1 1
> 4 0 0 4 256 256 256 256 256 2 9 1 1
> 75 0 0 75 256 256 256 256 256 2 151 1 1
> 102 0 0 ** 256 256 256 256 256 2 205 1 1
> 1 0 0 1 256 256 256 256 256 2 3 1 1
> 13 0 0 13 256 256 256 256 256 2 27 1 1
> 105 0 0 ** 256 256 256 256 256 2 211 1 1
> 77 0 0 77 256 256 256 256 256 2 155 1 1
> 34 0 0 34 256 256 256 256 256 2 69 1 1
> 5 0 0 5 256 256 256 256 256 2 11 1 1
> 65 0 0 65 256 256 256 256 256 2 131 1 1
> 69 0 0 69 256 256 256 256 256 2 139 1 1
> 40 0 0 40 256 256 256 256 256 2 81 1 1
> 38 0 0 38 256 256 256 256 256 2 77 1 1
> 74 0 0 74 256 256 256 256 256 2 149 1 1
> 103 0 0 ** 256 256 256 256 256 2 207 1 1
> 10 0 0 10 256 256 256 256 256 2 21 1 1
> 45 0 0 45 256 256 256 256 256 2 91 1 1
> 104 0 0 ** 256 256 256 256 256 2 209 1 1
> 109 0 0 ** 256 256 256 256 256 2 219 1 1
> 9 0 0 9 256 256 256 256 256 2 19 1 1
> 31 0 0 31 256 256 256 256 256 2 63 1 1
> 98 0 0 98 256 256 256 256 256 2 197 1 1
> 78 0 0 78 256 256 256 256 256 2 157 1 1
> 43 0 0 43 256 256 256 256 256 2 87 1 1
> 44 0 0 44 256 256 256 256 256 2 89 1 1
> 73 0 0 73 256 256 256 256 256 2 147 1 1
> 100 0 0 ** 256 256 256 256 256 2 201 1 1
> 41 0 0 41 256 256 256 256 256 2 83 1 1
> 46 0 0 46 256 256 256 256 256 2 93 1 1
> 67 0 0 67 256 256 256 256 256 2 135 1 1
> 71 0 0 71 256 256 256 256 256 2 143 1 1
> 2 0 0 2 256 256 256 256 256 2 5 1 1
> 37 0 0 37 256 256 256 256 256 2 75 1 1
> 97 0 0 97 256 256 256 256 256 2 195 1 1
> 101 0 0 ** 256 256 256 256 256 2 203 1 1
> 32 0 0 32 256 256 256 256 256 2 65 1 1
> 15 0 0 15 256 256 256 256 256 2 31 1 1
> 106 0 0 ** 256 256 256 256 256 2 213 1 1
> 110 0 0 ** 256 256 256 256 256 2 221 1 1
> 3 0 0 3 256 256 256 256 256 2 7 1 1
> 36 0 0 36 256 256 256 256 256 2 73 1 1
> 96 0 0 96 256 256 256 256 256 2 193 1 1
> 108 0 0 ** 256 256 256 256 256 2 217 1 1
> 11 0 0 11 256 256 256 256 256 2 23 1 1
> 14 0 0 14 256 256 256 256 256 2 29 1 1
> 114 0 0 ** 256 256 256 256 256 2 229 1 1
> 86 0 0 86 256 256 256 256 256 2 173 1 1
> 49 0 0 49 256 256 256 256 256 2 99 1 1
> 7 0 0 7 256 256 256 256 256 2 15 1 1
> 66 0 0 66 256 256 256 256 256 2 133 1 1
> 70 0 0 70 256 256 256 256 256 2 141 1 1
> 35 0 0 35 256 256 256 256 256 2 71 1 1
> 54 0 0 54 256 256 256 256 256 2 109 1 1
> 99 0 0 99 256 256 256 256 256 2 199 1 1
> 79 0 0 79 256 256 256 256 256 2 159 1 1
> 27 0 0 27 256 256 256 256 256 2 55 1 1
> 47 0 0 47 256 256 256 256 256 2 95 1 1
> 113 0 0 ** 256 256 256 256 256 2 227 1 1
> 118 0 0 ** 256 256 256 256 256 2 237 1 1
> 50 0 0 50 256 256 256 256 256 2 101 1 1
> 61 0 0 61 256 256 256 256 256 2 123 1 1
> 107 0 0 ** 256 256 256 256 256 2 215 1 1
> 111 0 0 ** 256 256 256 256 256 2 223 1 1
> 17 0 0 17 256 256 256 256 256 2 35 1 1
> 39 0 0 39 256 256 256 256 256 2 79 1 1
> 83 0 0 83 256 256 256 256 256 2 167 1 1
> 125 0 0 ** 256 256 256 256 256 2 251 1 1
> 18 0 0 18 256 256 256 256 256 2 37 1 1
> 22 0 0 22 256 256 256 256 256 2 45 1 1
> 81 0 0 81 256 256 256 256 256 2 163 1 1
> 126 0 0 ** 256 256 256 256 256 2 253 1 1
> 56 0 0 56 256 256 256 256 256 2 113 1 1
> 21 0 0 21 256 256 256 256 256 2 43 1 1
> 82 0 0 82 256 256 256 256 256 2 165 1 1
> 85 0 0 85 256 256 256 256 256 2 171 1 1
> 58 0 0 58 256 256 256 256 256 2 117 1 1
> 62 0 0 62 256 256 256 256 256 2 125 1 1
> 120 0 0 ** 256 256 256 256 256 2 241 1 1
> 127 0 0 ** 256 256 256 256 256 2 255 1 1
> 24 0 0 24 256 256 256 256 256 2 49 1 1
> 52 0 0 52 256 256 256 256 256 2 105 1 1
> 122 0 0 ** 256 256 256 256 256 2 245 1 1
> 124 0 0 ** 256 256 256 256 256 2 249 1 1
> 51 0 0 51 256 256 256 256 256 2 103 1 1
> 55 0 0 55 256 256 256 256 256 2 111 1 1
> 80 0 0 80 256 256 256 256 256 2 161 1 1
> 94 0 0 94 256 256 256 256 256 2 189 1 1
> 48 0 0 48 256 256 256 256 256 2 97 1 1
> 28 0 0 28 256 256 256 256 256 2 57 1 1
> 91 0 0 91 256 256 256 256 256 2 183 1 1
> 92 0 0 92 256 256 256 256 256 2 185 1 1
> 26 0 0 26 256 256 256 256 256 2 53 1 1
> 30 0 0 30 256 256 256 256 256 2 61 1 1
> 112 0 0 ** 256 256 256 256 256 2 225 1 1
> 87 0 0 87 256 256 256 256 256 2 175 1 1
> 25 0 0 25 256 256 256 256 256 2 51 1 1
> 60 0 0 60 256 256 256 256 256 2 121 1 1
> 90 0 0 90 256 256 256 256 256 2 181 1 1
> 116 0 0 ** 256 256 256 256 256 2 233 1 1
> 19 0 0 19 256 256 256 256 256 2 39 1 1
> 23 0 0 23 256 256 256 256 256 2 47 1 1
> 89 0 0 89 256 256 256 256 256 2 179 1 1
> 95 0 0 95 256 256 256 256 256 2 191 1 1
> 57 0 0 57 256 256 256 256 256 2 115 1 1
> 29 0 0 29 256 256 256 256 256 2 59 1 1
> 123 0 0 ** 256 256 256 256 256 2 247 1 1
> 93 0 0 93 256 256 256 256 256 2 187 1 1
> 59 0 0 59 256 256 256 256 256 2 119 1 1
> 53 0 0 53 256 256 256 256 256 2 107 1 1
> 121 0 0 ** 256 256 256 256 256 2 243 1 1
> 117 0 0 ** 256 256 256 256 256 2 235 1 1
> 16 0 0 16 256 256 256 256 256 2 33 1 1
> 63 0 0 63 256 256 256 256 256 2 127 1 1
> 115 0 0 ** 256 256 256 256 256 2 231 1 1
> 119 0 0 ** 256 256 256 256 256 2 239 1 1
> 20 0 0 20 256 256 256 256 256 2 41 1 1
> 88 0 0 88 256 256 256 256 256 2 177 1 1
> 84 0 0 84 256 256 256 256 256 2 169 1 1
> write 1: .149E-01 .149E+01
> write 2: .954E+00 .131E+01
> write 3: .240E+00 .149E+01
> write 4: .227E+00 .142E+01
> write 5: .205E+00 .134E+01
> read 1: .132E-01 .621E+00
> diff, delmax, delmin = .000E+00 .000E+00 .000E+00
> read 2: .178E-01 .624E+00
> read 3: .195E-01 .629E+00
> read 4: .184E-01 .638E+00
> read 5: .210E-01 .659E+00
> File size: .671E+02 MB
> Write: 51.398 MB/s (eff., 50.819 MB/s)
> Read : 108.142 MB/s (eff., 105.886 MB/s)
> Total number PEs: 128
> .149E-01 .131E+01 50.819 .132E-01 .621E+00 105.886
>
>
>
> On 4/13/06, *Yu-Heng Tseng * <YHTseng at lbl.gov
> <mailto:YHTseng at lbl.gov>> wrote:
>
> Hi Ross
>
> I am testing it on NCAR's BG/L. These questions are really beyond my
> understanding. Sidd at NCAR should be able to answer your questions.
> Thanks!
>
> Yuheng
> ---------------------------------------------------
> Yu-Heng Tseng
>
> Computational Research Division
> Lawrence Berkeley National Laboratory
> One Cyclotron Rd, MS: 50F-1650
> Berkeley, CA94720
> YHTseng at lbl.gov <mailto:YHTseng at lbl.gov>
> 510.495.2904
>
> ----- Original Message -----
> From: Rob Ross <rross at mcs.anl.gov <mailto:rross at mcs.anl.gov>>
> Date: Thursday, April 13, 2006 6:23 am
> Subject: Re: Inconsistent results on bluegene
>
> > Hi,
> >
> > I see. Can you tell us what BG/L you are running on, what file
> > system it
> > has, and how it is mounted to the I/O nodes?
> >
> > If not, can you CC someone that would know?
> >
> > This could be an NFS thing.
> >
> > Thanks,
> >
> > Rob
> >
> > Yu-Heng Tseng wrote:
> > > Hi,
> > >
> > > Here is the description of the problem.
> > > In the test directory under test/fandc/
> > > there is a Fortran test file called "pnf_test.F"
> > > This file generates an array has size 256x256x256 and output it
> > to a
> > > pnf_test.nc file and read it back.
> > >
> > > In other machines or using gcc to compile the library, it
> > compares the
> > > original array with the readin new array. The output have
> > > diff, delmax, delmin
> > >
> > > They should be all zero which represents the difference between
> > the
> > > new array and old array. Unfortunitely, I got some non-zero
> > values
> > > when I use 2, 8, 16 processors which means the new readin array
> > and
> > > original array are not identical.
> > > The only correct result is using 1 processor. The new readin
> > array is
> > > identical to the old array.
> > >
> > > I am not sure if there is anything wrong on the compiler or any
> > > problem. It looks fine when I use gcc to compile. Thank you so
> > much
> > > for your help!
> > >
> > > Yu-heng
> > > ---------------------------------------------------
> > > Yu-Heng Tseng
> > >
> > > Computational Research Division
> > > Lawrence Berkeley National Laboratory
> > > One Cyclotron Rd, MS: 50F-1650
> > > Berkeley, CA94720
> > > YHTseng at lbl.gov <mailto:YHTseng at lbl.gov>
> > > 510.495.2904
> > >
> > > ----- Original Message -----
> > > From: Rob Ross <rross at mcs.anl.gov <mailto:rross at mcs.anl.gov>>
> > > Date: Wednesday, April 12, 2006 7:28 pm
> > > Subject: Re: Inconsistent results on bluegene
> > >
> > >> Hi,
> > >>
> > >> Can you describe how the results are inconsistent?
> > >>
> > >> Thanks,
> > >>
> > >> Rob
> > >>
> > >> Yu-Heng Tseng wrote:
> > >>> Hi,
> > >>>
> > >>> I am doing a testing on bluegene using. Surprisely, the
> > results
> > >> are not
> > >>> consistent when I use
> > >>> multiple processors. There is no error when I compile and
> > >> install the
> > >>> library I believe.
> > >>> I use the test case from /fandc/pnf_test.F
> > >>>
> > >>> The compiler I am using is "blrts_xlc" I am not sure what
> > cause
> > >> the
> > >>> problem.
> > >>> But when I use gcc to compile it. The results are consistent
> > for
> > >>> multiple processors.
> > >>> Please give me some advices! Thanks!
> > >>>
> > >>> Yu-heng
> > >>>
> > >
> >
> >
>
>
More information about the parallel-netcdf
mailing list