[MPICH] Behavour if MPI_File_Open fails on some nodes
Robert Latham
robl at mcs.anl.gov
Tue Aug 21 11:43:50 CDT 2007
On Tue, Aug 21, 2007 at 10:11:47AM -0500, Robert Latham wrote:
> On Tue, Aug 21, 2007 at 10:52:12AM +0100, James S Perrin wrote:
> > Sorry to be pendantic but did you use a leading dir path that didn't
> > exist on the other processors. I get the correct behaviour if the dir
> > paths exist on all processors but the files don't.
>
> Thanks! That was the missing factor. Ok, that should definitely not
> happen, and since I can reproduce it here, I should be able to fix it
> soon. I'll send you a patch.
Please try this patch. I'm a little wary of putting yet another
Allreduce in the open path, but I don't think there's another way
Index: src/mpi/romio/adio/common/ad_fstype.c
===================================================================
RCS file: /home/MPI/cvsMaster/romio/adio/common/ad_fstype.c,v
retrieving revision 1.54
diff -u -w -p -r1.54 ad_fstype.c
--- src/mpi/romio/adio/common/ad_fstype.c 12 Mar 2007 20:40:40 -0000 1.54
+++ src/mpi/romio/adio/common/ad_fstype.c 21 Aug 2007 16:41:56 -0000
@@ -503,7 +503,7 @@ tables in a reasonable way. -- Rob, 06/0
void ADIO_ResolveFileType(MPI_Comm comm, char *filename, int *fstype,
ADIOI_Fns **ops, int *error_code)
{
- int myerrcode, file_system, min_code;
+ int myerrcode, file_system, min_code, max_code;
char *tmp;
static char myname[] = "ADIO_RESOLVEFILETYPE";
@@ -514,6 +514,15 @@ void ADIO_ResolveFileType(MPI_Comm comm,
ADIO_FileSysType_fncall(filename, &file_system, &myerrcode);
if (myerrcode != MPI_SUCCESS) {
*error_code = myerrcode;
+ }
+
+ /* the check for file system type will hang if any process got an error
+ * in ADIO_FileSysType_fncall (this could happen if a full path exists
+ * on one node but not on others, and no prefix like ufs: was provided)
+ */
+ MPI_Allreduce(error_code, &max_code, 1, MPI_INT, MPI_MAX, comm);
+ if (max_code != MPI_SUCCESS) {
+ *error_code = max_code;
return;
}
>
> ==rob
>
--
Rob Latham
Mathematics and Computer Science Division A215 0178 EA2D B059 8CDF
Argonne National Lab, IL USA B29D F333 664A 4280 315B
More information about the mpich-discuss
mailing list