[MPICH] Behavour if MPI_File_Open fails on some nodes

James S Perrin james.s.perrin at manchester.ac.uk
Mon Aug 20 05:40:39 CDT 2007


Hi,
	Thanks for your comments however I think I need to clarify my problem:

* The file may be not on a shared filesystem but duplicated on the local 
file system. It may even have a different filename.
* The file is "expected" to exist fo all processes. If the file does not 
exist for any of the processes that is an error.
* MPI_File_open is called on all processes in the comm.

If the file does not exist for one of the processes I'd expect either:

1. The specific process to return an ERROR that that file can't be opened
2. All processes to return an ERROR

I assumed case 1) and I do an Allgather so that all the processes know 
if all the other processes were successful and to proceed accordingly. 
This seems to be behaviour I have with other MPI implementations and is 
only causing problems if the directory path to the file doesn't exist i.e.

if on the nodes I have the following dirs and files:

node0: ~/mydir/mydata/myfile
node1: ~/mydir/mydata

node0 return SUCCESS and node1 ERROR, however if the dir is also missing 
from node1:

node0: ~/mydir/mydata/myfile
node1: ~/mydir/

then node0 locks up and node1 returns an ERROR as expected.

Regards
James


Robert Latham wrote:
> On Fri, Aug 17, 2007 at 06:32:26PM +0100, James S Perrin wrote:
>> The code below when run on a cluster where a filepath exists only on 
>> the head node locks up on 2 or more processes:
> 
> You can do this, but it requires some tricks.
> 
>> Am I making a mistake in my use of MPI_File_Open() should I be testing 
>> exists() before calling it?
> 
> MPI_File_open is a collective call. You'll have to make the call on
> all processes.  
> 
> Ok, so about that trick I mentioned.  We call it "deferred open".  Not
> sure how much you know about ROMIO's collective I/O optimizations, but
> with the right hints you can specify that only certain processes,
> called "aggregators", carry out collective I/O.  If you also hint that
> you won't be doing independent I/O, then when you open the file, then
> only aggregators will attempt to open the file (and only aggregators
> will care if the file actually exists).
> 
> The relevant hints:
> cb_config_list or cb_nodes
> romio_no_indep_rw
> 
> So you could set the hints "cb_config_list" to "node0:1" and
> "romio_no_indep_rw" to "true", and then as long as you carry out only
> collective I/O (those routines ending in _all), your other nodes will
> not care if the file exists only on one process, even if they have
> data to write.
> 
> I'll be happy to explain in more detail if any of that is unclear to
> you.
> 
> ==rob
> 


-- 
------------------------------------------------------------------------
James S. Perrin,                  | email: james.perrin at manchester.ac.uk
Research Computing Services,      | web:   www.mc.manchester.ac.uk
Kilburn Building, The University, | tel:   +44 161 275 6945
Manchester, England. M13 9PL.     | fax:   +44 161 275 0637
------------------------------------------------------------------------
"The test of intellect is the refusal to belabour the obvious"
- Alfred Bester
------------------------------------------------------------------------




More information about the mpich-discuss mailing list