Houston, we have a problem
Jianwei Li
jianwei at cheetah.cpdc.ece.nwu.edu
Fri Aug 1 12:30:16 CDT 2003
OK. I got the full output from my test after I made those modifications
to the fortran binding code.
The result seems good?
###############################################################
standard output:
mype pe_coords totsiz_3d locsiz_3d kstart,jstart,istart
0 0 0 0 256 256 256 16 256 256 0 0 0
1 1 0 0 256 256 256 16 256 256 16 0 0
3 3 0 0 256 256 256 16 256 256 48 0 0
4 4 0 0 256 256 256 16 256 256 64 0 0
7 7 0 0 256 256 256 16 256 256 112 0 0
2 2 0 0 256 256 256 16 256 256 32 0 0
5 5 0 0 256 256 256 16 256 256 80 0 0
6 6 0 0 256 256 256 16 256 256 96 0 0
8 8 0 0 256 256 256 16 256 256 128 0 0
9 9 0 0 256 256 256 16 256 256 144 0 0
11 11 0 0 256 256 256 16 256 256 176 0 0
14 14 0 0 256 256 256 16 256 256 224 0 0
10 10 0 0 256 256 256 16 256 256 160 0 0
12 12 0 0 256 256 256 16 256 256 192 0 0
13 13 0 0 256 256 256 16 256 256 208 0 0
15 15 0 0 256 256 256 16 256 256 240 0 0
write 1: 1.312E+00 1.562E+00
write 2: 1.000E+00 1.188E+00
write 3: 8.750E-01 1.438E+00
write 4: 8.125E-01 1.062E+00
write 5: 8.125E-01 1.000E+00
read 1: 1.250E-01 3.750E-01
diff, delmax, delmin = 0.000E+00 0.000E+00 0.000E+00
read 2: 1.250E-01 5.000E-01
read 3: 1.250E-01 3.750E-01
read 4: 2.500E-01 3.750E-01
read 5: 1.250E-01 5.000E-01
File size: 1.342E+02 MB
Write: 134.218 MB/s (eff., 74.051 MB/s)
Read : 357.914 MB/s (eff., 268.435 MB/s)
Total number PEs: 16
8.125E-01 1.000E+00 74.051 1.250E-01 3.750E-01 268.435
####################################################################
ncdump pnf_test.nc | more
netcdf pnf_test {
dimensions:
level = 256 ;
latitude = 256 ;
longitude = 256 ;
variables:
float tt(level, latitude, longitude) ;
data:
tt =
65.794, 65.795, 65.796, 65.797, 65.798, 65.799, 65.8, 65.801, 65.802,
65.803, 65.804, 65.805, 65.806, 65.807, 65.808, 65.809, 65.81, 65.811,
65.812, 65.813, 65.814, 65.815, 65.816, 65.817, 65.818, 65.819, 65.82,
65.821, 65.822, 65.823, 65.824, 65.825, 65.826, 65.827, 65.828, 65.829,
65.83, 65.831, 65.832, 65.833, 65.834, 65.835, 65.836, 65.837, 65.838,
65.839, 65.84, 65.841, 65.842, 65.843, 65.844, 65.845, 65.846, 65.847,
65.848, 65.849, 65.85, 65.851, 65.852, 65.853, 65.854, 65.855, 65.856,
65.857, 65.858, 65.859, 65.86, 65.861, 65.862, 65.863, 65.864, 65.865,
65.866, 65.867, 65.868, 65.869, 65.87, 65.871, 65.872, 65.873, 65.874,
65.875, 65.876, 65.877, 65.878, 65.879, 65.88, 65.881, 65.882, 65.883,
65.884, 65.885, 65.886, 65.887, 65.888, 65.889, 65.89, 65.891, 65.892,
65.893, 65.894, 65.895, 65.896, 65.897, 65.898, 65.899, 65.9, 65.901,
65.902, 65.903, 65.904, 65.905, 65.906, 65.907, 65.908, 65.909, 65.91,
65.911, 65.912, 65.913, 65.914, 65.915, 65.916, 65.917, 65.918, 65.919,
65.92, 65.921, 65.922, 65.923, 65.924, 65.925, 65.926, 65.927, 65.928,
65.929, 65.93, 65.931, 65.932, 65.933, 65.934, 65.935, 65.936, 65.937,
65.938, 65.939, 65.94, 65.941, 65.942, 65.943, 65.944, 65.945, 65.946,
65.947, 65.948, 65.949, 65.95, 65.951, 65.952, 65.953, 65.954, 65.955,
65.956, 65.957, 65.958, 65.959, 65.96, 65.961, 65.962, 65.963, 65.964,
65.965, 65.966, 65.967, 65.968, 65.969, 65.97, 65.971, 65.972, 65.973,
65.974, 65.975, 65.976, 65.977, 65.978, 65.979, 65.98, 65.981, 65.982,
65.983, 65.984, 65.985, 65.986, 65.987, 65.988, 65.989, 65.99, 65.991,
65.992, 65.993, 65.994, 65.995, 65.996, 65.997, 65.998, 65.999, 66,
66.001, 66.002, 66.003, 66.004, 66.005, 66.006, 66.007, 66.008, 66.009,
66.01, 66.011, 66.012, 66.013, 66.014, 66.015, 66.016, 66.017, 66.018,
66.019, 66.02, 66.021, 66.022, 66.023, 66.024, 66.025, 66.026, 66.027,
66.028, 66.029, 66.03, 66.031, 66.032, 66.033, 66.034, 66.035, 66.036,
66.037, 66.038, 66.039, 66.04, 66.041, 66.042, 66.043, 66.044, 66.045,
66.046, 66.047, 66.048, 66.049,
...
Jianwei
On Fri, 1 Aug 2003, Jianwei Li wrote:
>
> I think I know where the problem is now, after looking into the
> fortran binding code as a "human":)
>
> The automatic fortran binding is mistaking (*start)[], (*count)[],
> and (*stride)[] as *start[], *count[], *stride[].
> After changing the fortran binding interface code from
>
> FORTRAN_API void FORT_CALL nfmpi_put_vara_float_all_ ( int *v1, int *v2,
> int * v3[], int * v4[], float*v5, MPI_Fint *ierr ){
> *ierr = ncmpi_put_vara_float_all( *v1, *v2, (const size_t *)(*v3),
> (const size_t *)(*v4), v5 );
> }
>
> to
>
> FORTRAN_API void FORT_CALL nfmpi_put_vara_float_all_ ( int *v1, int *v2,
> int (* v3)[], int (* v4)[], float*v5, MPI_Fint *ierr ){
> *ierr = ncmpi_put_vara_float_all( *v1, *v2, (const size_t *)(*v3),
> (const size_t *)(*v4), v5 );
> }
>
> in file "parallel-netcdf-0.8.8/src/libf/put_vara_float_allf.c"
>
> and doing the same thing to other fortran binding functions that need to
> deal with start[], count[], stride[],
> I got the fortran test running successfully.
> And that's why the original ones fails only for put_vara/get_vara:)
>
> Another way may be just using **start, **count, **stride?
>
> But I don't know how to modify these automatically :(
>
> #############################################################################
> This is the netcdf data file generated, running with 8 processes:
>
> ncdump pnf_test.nc | more
> netcdf pnf_test {
> dimensions:
> level = 256 ;
> latitude = 256 ;
> longitude = 256 ;
> variables:
> float tt(level, latitude, longitude) ;
> data:
>
> tt =
> 65.794, 65.795, 65.796, 65.797, 65.798, 65.799, 65.8, 65.801, 65.802,
> 65.803, 65.804, 65.805, 65.806, 65.807, 65.808, 65.809, 65.81, 65.811,
> 65.812, 65.813, 65.814, 65.815, 65.816, 65.817, 65.818, 65.819, 65.82,
> 65.821, 65.822, 65.823, 65.824, 65.825, 65.826, 65.827, 65.828, 65.829,
> 65.83, 65.831, 65.832, 65.833, 65.834, 65.835, 65.836, 65.837, 65.838,
> 65.839, 65.84, 65.841, 65.842, 65.843, 65.844, 65.845, 65.846, 65.847,
> 65.848, 65.849, 65.85, 65.851, 65.852, 65.853, 65.854, 65.855, 65.856,
> 65.857, 65.858, 65.859, 65.86, 65.861, 65.862, 65.863, 65.864, 65.865,
> 65.866, 65.867, 65.868, 65.869, 65.87, 65.871, 65.872, 65.873, 65.874,
> ...
>
> #####################################################################
> And the standard output looks like:
> poe pnf_test -nodes 1 -tasks_per_node 8 -rmpool 1 -euilib ip -euidevice en0
> mype pe_coords totsiz_3d locsiz_3d kstart,jstart,istart
> 0 0 0 0 256 256 256 32 256 256 0 0 0
> 1 1 0 0 256 256 256 32 256 256 32 0 0
> 2 2 0 0 256 256 256 32 256 256 64 0 0
> 3 3 0 0 256 256 256 32 256 256 96 0 0
> 4 4 0 0 256 256 256 32 256 256 128 0 0
> 5 5 0 0 256 256 256 32 256 256 160 0 0
> 6 6 0 0 256 256 256 32 256 256 192 0 0
> 7 7 0 0 256 256 256 32 256 256 224 0 0
> write 1: 0.000E+00 7.040E+02
>
> ... It's still running, and I'll post the full output later and confirm
> that my thought:)
>
> Jianwei
>
> On Thu, 31 Jul 2003, John Tannahill wrote:
>
> > Jianwei,
> >
> > This is what I was suspecting as well.
> >
> > John
> >
> > Jianwei Li wrote:
> > > I think I was wrong after a careful look at the standard output.
> > >
> > > operation header I/O time data I/O time
> > > write 2: 1.250E-01 0.000E+00
> > > read 2: 6.250E-02 0.000E+00
> > >
> > > It seems that Nfmpi_Put_Vara_Float_All/Nfmpi_Get_Vara_Float_All
> > > are not running properly in this case.
> > >
> > > We should look in more details for problems ...
> > >
> > > Jianwei
> > >
> > > On Thu, 31 Jul 2003, Jianwei Li wrote:
> > >
> > >
> > >>John,
> > >>
> > >>I had a quick run of your attached fortran code using pnetcdf0.8.8
> > >>on SDSC's IBM-SP (called bluehorizon). The code ran pretty well
> > >>and genterate these outputs:
> > >>
> > >>#######################################################################
> > >>standard output:
> > >>
> > >>mype pe_coords totsiz_3d locsiz_3d kstart,jstart,istart
> > >> 0 0 0 0 256 256 256 16 256 256 0 0 0
> > >> 1 1 0 0 256 256 256 16 256 256 16 0 0
> > >> 13 13 0 0 256 256 256 16 256 256 208 0 0
> > >> 2 2 0 0 256 256 256 16 256 256 32 0 0
> > >> 8 8 0 0 256 256 256 16 256 256 128 0 0
> > >> 5 5 0 0 256 256 256 16 256 256 80 0 0
> > >> 9 9 0 0 256 256 256 16 256 256 144 0 0
> > >> 6 6 0 0 256 256 256 16 256 256 96 0 0
> > >> 10 10 0 0 256 256 256 16 256 256 160 0 0
> > >> 4 4 0 0 256 256 256 16 256 256 64 0 0
> > >> 11 11 0 0 256 256 256 16 256 256 176 0 0
> > >> 12 12 0 0 256 256 256 16 256 256 192 0 0
> > >> 14 14 0 0 256 256 256 16 256 256 224 0 0
> > >> 15 15 0 0 256 256 256 16 256 256 240 0 0
> > >> 3 3 0 0 256 256 256 16 256 256 48 0 0
> > >> 7 7 0 0 256 256 256 16 256 256 112 0 0
> > >>write 1: 2.500E-01 6.250E-02
> > >>write 2: 1.250E-01 0.000E+00
> > >>write 3: 1.250E-01 6.250E-02
> > >>write 4: 1.875E-01 0.000E+00
> > >>write 5: 1.250E-01 0.000E+00
> > >> read 1: 6.250E-02 0.000E+00
> > >>diff, delmax, delmin = 1.009E+00 1.738E+00 1.701E-02
> > >> read 2: 6.250E-02 0.000E+00
> > >> read 3: 6.250E-02 0.000E+00
> > >> read 4: 6.250E-02 0.000E+00
> > >> read 5: 6.250E-02 0.000E+00
> > >>File size: 1.342E+02 MB
> > >> Write: INF MB/s (eff., 1073.742 MB/s)
> > >> Read : INF MB/s (eff., 2147.484 MB/s)
> > >>Total number PEs: 16
> > >> 1.250E-01 0.000E+00 1073.742 6.250E-02 0.000E+00 2147.484
> > >>
> > >>##########################################################################
> > >>netcdf file <pnf_test.nc>:
> > >>ncdump pnf_test.nc | more
> > >>netcdf pnf_test {
> > >>dimensions:
> > >> level = 256 ;
> > >> latitude = 256 ;
> > >> longitude = 256 ;
> > >>variables:
> > >> float tt(level, latitude, longitude) ;
> > >>data:
> > >>
> > >> tt =
> > >> 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0,
> > >>0,
> > >> 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0,
> > >>0,
> > >> 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0,
> > >>0,
> > >> 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0,
> > >>0,
> > >> 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0,
> > >>0,
> > >> 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0,
> > >>0,
> > >> 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0,
> > >>0,
> > >> 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0,
> > >>0,
> > >> 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0,
> > >>0,
> > >> 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0,
> > >>0,
> > >> 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0,
> > >> 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0,
> > >>0,
> > >> 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0,
> > >>0,
> > >>...
> > >>
> > >>I think it's a successful run, right?
> > >>
> > >>So what? Is it the Fortran Binding problem specially related to the Frost
> > >>platform? or something else?
> > >>
> > >>btw, I build my pnetcdf lib as below and maybe you want to try this:
> > >>
> > >>setenv CC xlc
> > >>setenv FC xlf
> > >>setenv F90 xlf90
> > >>setenv CXX xlC
> > >>setenv FFLAGS '-d -O2'
> > >>setenv MPICC mpcc_r
> > >>setenv MPIF77 mpxlf_r
> > >>
> > >>#make
> > >>#make install
> > >>
> > >>//what else can I do?:)
> > >>
> > >>Jianwei
> > >>
> > >>On Thu, 31 Jul 2003, John Tannahill wrote:
> > >>
> > >>
> > >>>Rob,
> > >>>
> > >>>I am hoping that I can catch you before you leave, so that you can
> > >>>pass this on to someone, but if you are already gone, can anyone
> > >>>else take a look at this?
> > >>>
> > >>>I have graduated up to my original bigger test case and the C version
> > >>>works, but the Fortran version doesn't. It's certainly possible that
> > >>>I have screwed up the translation from C to Fortran and I will be
> > >>>looking at that, but I wanted to pass this back to you folks, so that
> > >>>you can take a look at it to.
> > >>>
> > >>>I am using 0.8.8. Attached are two tar files that should be pretty
> > >>>self-explanatory, but let me know if you have questions.
> > >>>
> > >>>Regards,
> > >>>John
> > >>>
> > >>>--
> > >>>============================
> > >>>John R. Tannahill
> > >>>Lawrence Livermore Nat. Lab.
> > >>>P.O. Box 808, M/S L-103
> > >>>Livermore, CA 94551
> > >>>925-423-3514
> > >>>Fax: 925-423-4908
> > >>>============================
> > >>>
> > >>
> > >
> > >
> >
> >
> > --
> > ============================
> > John R. Tannahill
> > Lawrence Livermore Nat. Lab.
> > P.O. Box 808, M/S L-103
> > Livermore, CA 94551
> > 925-423-3514
> > Fax: 925-423-4908
> > ============================
> >
>
More information about the parallel-netcdf
mailing list