[MPICH] Behavour if MPI_File_Open fails on some nodes
James S Perrin
james.s.perrin at manchester.ac.uk
Mon Aug 20 05:40:39 CDT 2007
Hi,
Thanks for your comments however I think I need to clarify my problem:
* The file may be not on a shared filesystem but duplicated on the local
file system. It may even have a different filename.
* The file is "expected" to exist fo all processes. If the file does not
exist for any of the processes that is an error.
* MPI_File_open is called on all processes in the comm.
If the file does not exist for one of the processes I'd expect either:
1. The specific process to return an ERROR that that file can't be opened
2. All processes to return an ERROR
I assumed case 1) and I do an Allgather so that all the processes know
if all the other processes were successful and to proceed accordingly.
This seems to be behaviour I have with other MPI implementations and is
only causing problems if the directory path to the file doesn't exist i.e.
if on the nodes I have the following dirs and files:
node0: ~/mydir/mydata/myfile
node1: ~/mydir/mydata
node0 return SUCCESS and node1 ERROR, however if the dir is also missing
from node1:
node0: ~/mydir/mydata/myfile
node1: ~/mydir/
then node0 locks up and node1 returns an ERROR as expected.
Regards
James
Robert Latham wrote:
> On Fri, Aug 17, 2007 at 06:32:26PM +0100, James S Perrin wrote:
>> The code below when run on a cluster where a filepath exists only on
>> the head node locks up on 2 or more processes:
>
> You can do this, but it requires some tricks.
>
>> Am I making a mistake in my use of MPI_File_Open() should I be testing
>> exists() before calling it?
>
> MPI_File_open is a collective call. You'll have to make the call on
> all processes.
>
> Ok, so about that trick I mentioned. We call it "deferred open". Not
> sure how much you know about ROMIO's collective I/O optimizations, but
> with the right hints you can specify that only certain processes,
> called "aggregators", carry out collective I/O. If you also hint that
> you won't be doing independent I/O, then when you open the file, then
> only aggregators will attempt to open the file (and only aggregators
> will care if the file actually exists).
>
> The relevant hints:
> cb_config_list or cb_nodes
> romio_no_indep_rw
>
> So you could set the hints "cb_config_list" to "node0:1" and
> "romio_no_indep_rw" to "true", and then as long as you carry out only
> collective I/O (those routines ending in _all), your other nodes will
> not care if the file exists only on one process, even if they have
> data to write.
>
> I'll be happy to explain in more detail if any of that is unclear to
> you.
>
> ==rob
>
--
------------------------------------------------------------------------
James S. Perrin, | email: james.perrin at manchester.ac.uk
Research Computing Services, | web: www.mc.manchester.ac.uk
Kilburn Building, The University, | tel: +44 161 275 6945
Manchester, England. M13 9PL. | fax: +44 161 275 0637
------------------------------------------------------------------------
"The test of intellect is the refusal to belabour the obvious"
- Alfred Bester
------------------------------------------------------------------------
More information about the mpich-discuss
mailing list