[MPICH] Behavour if MPI_File_Open fails on some nodes
Rajeev Thakur
thakur at mcs.anl.gov
Mon Aug 20 08:35:59 CDT 2007
If the file is duplicated on the local file system of each node, then it is
not a common shared file that all processes access. In that case you must
open with MPI_COMM_SELF instead of COMM_WORLD.
Rajeev
> -----Original Message-----
> From: owner-mpich-discuss at mcs.anl.gov [mailto:owner-mpich-
> discuss at mcs.anl.gov] On Behalf Of James S Perrin
> Sent: Monday, August 20, 2007 5:41 AM
> To: Robert Latham
> Cc: mpich
> Subject: Re: [MPICH] Behavour if MPI_File_Open fails on some nodes
>
> Hi,
> Thanks for your comments however I think I need to clarify my
> problem:
>
> * The file may be not on a shared filesystem but duplicated on the local
> file system. It may even have a different filename.
> * The file is "expected" to exist fo all processes. If the file does not
> exist for any of the processes that is an error.
> * MPI_File_open is called on all processes in the comm.
>
> If the file does not exist for one of the processes I'd expect either:
>
> 1. The specific process to return an ERROR that that file can't be opened
> 2. All processes to return an ERROR
>
> I assumed case 1) and I do an Allgather so that all the processes know
> if all the other processes were successful and to proceed accordingly.
> This seems to be behaviour I have with other MPI implementations and is
> only causing problems if the directory path to the file doesn't exist i.e.
>
> if on the nodes I have the following dirs and files:
>
> node0: ~/mydir/mydata/myfile
> node1: ~/mydir/mydata
>
> node0 return SUCCESS and node1 ERROR, however if the dir is also missing
> from node1:
>
> node0: ~/mydir/mydata/myfile
> node1: ~/mydir/
>
> then node0 locks up and node1 returns an ERROR as expected.
>
> Regards
> James
>
>
> Robert Latham wrote:
> > On Fri, Aug 17, 2007 at 06:32:26PM +0100, James S Perrin wrote:
> >> The code below when run on a cluster where a filepath exists only on
> >> the head node locks up on 2 or more processes:
> >
> > You can do this, but it requires some tricks.
> >
> >> Am I making a mistake in my use of MPI_File_Open() should I be testing
> >> exists() before calling it?
> >
> > MPI_File_open is a collective call. You'll have to make the call on
> > all processes.
> >
> > Ok, so about that trick I mentioned. We call it "deferred open". Not
> > sure how much you know about ROMIO's collective I/O optimizations, but
> > with the right hints you can specify that only certain processes,
> > called "aggregators", carry out collective I/O. If you also hint that
> > you won't be doing independent I/O, then when you open the file, then
> > only aggregators will attempt to open the file (and only aggregators
> > will care if the file actually exists).
> >
> > The relevant hints:
> > cb_config_list or cb_nodes
> > romio_no_indep_rw
> >
> > So you could set the hints "cb_config_list" to "node0:1" and
> > "romio_no_indep_rw" to "true", and then as long as you carry out only
> > collective I/O (those routines ending in _all), your other nodes will
> > not care if the file exists only on one process, even if they have
> > data to write.
> >
> > I'll be happy to explain in more detail if any of that is unclear to
> > you.
> >
> > ==rob
> >
>
>
> --
> ------------------------------------------------------------------------
> James S. Perrin, | email: james.perrin at manchester.ac.uk
> Research Computing Services, | web: www.mc.manchester.ac.uk
> Kilburn Building, The University, | tel: +44 161 275 6945
> Manchester, England. M13 9PL. | fax: +44 161 275 0637
> ------------------------------------------------------------------------
> "The test of intellect is the refusal to belabour the obvious"
> - Alfred Bester
> ------------------------------------------------------------------------
More information about the mpich-discuss
mailing list