[MPICH] Behavour if MPI_File_Open fails on some nodes

Robert Latham robl at mcs.anl.gov
Tue Aug 21 11:43:50 CDT 2007


On Tue, Aug 21, 2007 at 10:11:47AM -0500, Robert Latham wrote:
> On Tue, Aug 21, 2007 at 10:52:12AM +0100, James S Perrin wrote:
> > Sorry to be pendantic but did you use a leading dir path that didn't 
> > exist on the other processors. I get the correct behaviour if the dir 
> > paths exist on all processors but the files don't.
> 
> Thanks! That was the missing factor.   Ok, that should definitely not
> happen, and since I can reproduce it here, I should be able to fix it
> soon.  I'll send you a patch.

Please try this patch.  I'm a little wary of putting yet another
Allreduce in the open path, but I don't think there's another way

Index: src/mpi/romio/adio/common/ad_fstype.c
===================================================================
RCS file: /home/MPI/cvsMaster/romio/adio/common/ad_fstype.c,v
retrieving revision 1.54
diff -u -w -p -r1.54 ad_fstype.c
--- src/mpi/romio/adio/common/ad_fstype.c       12 Mar 2007 20:40:40 -0000     1.54
+++ src/mpi/romio/adio/common/ad_fstype.c       21 Aug 2007 16:41:56 -0000
@@ -503,7 +503,7 @@ tables in a reasonable way. -- Rob, 06/0
 void ADIO_ResolveFileType(MPI_Comm comm, char *filename, int *fstype, 
                          ADIOI_Fns **ops, int *error_code)
 {
-    int myerrcode, file_system, min_code;
+    int myerrcode, file_system, min_code, max_code;
     char *tmp;
     static char myname[] = "ADIO_RESOLVEFILETYPE";
 
@@ -514,6 +514,15 @@ void ADIO_ResolveFileType(MPI_Comm comm,
        ADIO_FileSysType_fncall(filename, &file_system, &myerrcode);
        if (myerrcode != MPI_SUCCESS) {
            *error_code = myerrcode;
+       }
+
+       /* the check for file system type will hang if any process got an error
+        * in ADIO_FileSysType_fncall (this could happen if a full path exists
+        * on one node but not on others, and no prefix like ufs: was provided)
+        */
+       MPI_Allreduce(error_code, &max_code, 1, MPI_INT, MPI_MAX, comm);
+       if (max_code != MPI_SUCCESS)  {
+               *error_code = max_code;
            return;
        }


> 
> ==rob
> 

-- 
Rob Latham
Mathematics and Computer Science Division    A215 0178 EA2D B059 8CDF
Argonne National Lab, IL USA                 B29D F333 664A 4280 315B




More information about the mpich-discuss mailing list