[MPICH] tests are failing with different numbers
Rajeev Thakur
thakur at mcs.anl.gov
Mon Sep 26 11:04:44 CDT 2005
If you use the MPI-IO functions for writing and fcntl file locks work
correctly on the NFS installation, you should get the right data in the
files.
Rajeev
> -----Original Message-----
> From: owner-mpich-discuss at mcs.anl.gov
> [mailto:owner-mpich-discuss at mcs.anl.gov] On Behalf Of Jerry Mersel
> Sent: Monday, September 26, 2005 3:37 AM
> To: Rob Ross
> Cc: jerry.mersel at weizmann.ac.il; Ashley Pittman;
> mpich-discuss at mcs.anl.gov
> Subject: Re: [MPICH] tests are failing with different numbers
>
> The data written to the exported/mounted directory gets
> written and does
> not interfere with what's going on with other machines.
>
> Regards,
> Jerry
>
> e.x. Certain uw-imap folder formats are not NFS safe
>
>
> > What do you mean by that?
> >
> > Rob
> >
> > Jerry Mersel wrote:
> >> Another question...
> >>
> >> Is MPICH (and,or MPICH2) NFS safe?
> >>
> >> Regards,
> >> Jerry
> >>
> >>
> >>
> >>>On Tue, 2005-09-20 at 16:54 +0300, Jerry Mersel wrote:
> >>>
> >>>>Hi:
> >>>> I've installed MPICH 1.2.6 on a cluster which consists
> of several
> >>>> dual
> >>>>opteron machines running redhat AS 4.0.
> >>>> A user, while running an application using 4
> processors has brought
> >>>>to
> >>>> my attention that 2 runs with the same binary results
> in 2 different
> >>>> sets of results.
> >>>>
> >>>> I then ran the tests that come with mpich (I know I
> should have
> >>>> done
> >>>>it before, we just won't tell anybody). And come up with
> errors, Here
> >>>>are they
> >>>> are:
> >>>> Differences in issendtest.out
> >>>> Differences in structf.out
> >>>> *** Checking for differences from expected output ***
> >>>> Differences in issendtest.out
> >>>> Differences in structf.out
> >>>> p0_3896: p4_error: net_recv read: probable
> EOF on socket:
> >>>> 1
> >>>> p0_10524: p4_error: : 972
> >>>> 0 - MPI_ADDRESS : Address of location given to
> MPI_ADDRESS
> >>>>does
> >>>>not fit in Fortran integer
> >>>>
> >>>> I've tried this with the gcc compiler and the pgi compiler with
> >>>> none and many different options - the results are the same.
> >>>
> >>>> I tried using MPICH from the pgi site, the user still
> got different
> >>>> results on different runs.
> >>>
> >>>I've seen something similar to this, firstly structf is testing
> >>>something that is mathematically impossible, i.e. storing
> a (in your
> >>>case 64bit) pointer in a 32bit integer. This sometimes
> works (depending
> >>>on what the pointer is) but often doesn't, we have some
> patches for this
> >>>(I believe pgi also ship them) but it's still not a 100%
> cure. Unless
> >>>you actually have an application that suffers from this
> then you don't
> >>>need the patch.
> >>>
> >>>Secondly RedHat AS 4.0 has some odd features that
> effectively mean your
> >>>supposed to get different results from running the same
> program twice,
> >>>in particular it's got exec-shield-randomize enabled which
> moves the
> >>>stack about between runs and it runs a cron job overnight which
> >>>randomises the load address of your installed shared
> library's. This
> >>>means that the same binary on two different nodes will
> have a different
> >>>address for the stack and a different address for mmap()/shared
> >>>library's.
> >>>
> >>>Most applications however should hide this away from you
> and MPI itself
> >>>is designed to be independent of this type of
> configuration change, it
> >>>is fairly easy to introduce artificial dependencies
> without meaning to
> >>>though.
> >>>
> >>>And of course there are the normal problems of floating
> point accuracy,
> >>>some apps aren't actually supposed to get identical results between
> >>>runs, merely similar ones...
> >>>
> >>>Ashley,
> >>>
> >>>
> >>
> >>
> >
> >
>
>
More information about the mpich-discuss
mailing list