[mpich-discuss] RE : MPI_File_open() fails on local + NFS file system

Rajeev Thakur thakur at mcs.anl.gov
Tue Apr 26 16:27:10 CDT 2011


ROMIO uses file system calls such as statvfs to determine the type of file system. If that method doesn't work for some reason on a particular system, only then is the explicit prefix to the file name needed. So you shouldn't hard code "nfs:" into the code and run it everywhere, such as on Lustre or Windows.

Rajeev


On Apr 26, 2011, at 4:12 PM, Audet, Martin wrote:

> Hi,
> 
> Thanks for your response.
> 
> We did try what you suggested that is adding the prefix "nfs:" to the file name passed to MPI_File_open() and it did remove the error in both cases  (case 1: the process of rank 0 is on the master node using the local file system. case 2: the process of rank 0 is on a compute node acessing the file via NFS.).
> 
> However we think that this solution is not appropriate for us because in our case the processes may sometimes all run on the same local Unix File System or on the same Lustre file system or even on the same Windows file system when the code is compiled on Windows.
> 
> What would happen if the prefix "nfs:" was hardcoded to every call to MPI_File_open() when it is run on a cluster using Lustre or on a Windows machine (especially on Window would "nfs:" be interpreted as a drive name ?) ?
> 
> Moreover we also don't like this idea because these file prefix aren't part of the MPI standard and we want to keep our application portable.
> 
> We think that it is the job of the MPI implementation to identify the current file system and use the appropriate driver (or ADIO in case of ROMIO).
> 
> We looked in the documentation to see what happens when the processes access the same file via different file systems (UFS and NFS in this case) and we found nothing.
> 
> So the question I was asking in my first E-mail remains (unfortunately it contained a mistake making it strange) :
> 
>  We would like to know if the configuration we use is legal or not ?
> 
> Regards,
> 
> Martin Audet
> 
> P.S. Here is an example of the error message we get when no prefix "nfs:" are added.
> 
> case 1: rank 0 on master node (UFS), rank 1 on compute node (NFS) 
> 
> The program hang and print on stderr:
> 
> Fatal error in PMPI_Bcast: Message truncated, error stack:
> PMPI_Bcast(1429)........................: MPI_Bcast(buf=0x608fd8, count=1, MPI_CHAR, root=0, comm=0x84000000) failed
> MPIR_Bcast_impl(1272)...................: 
> MPIR_Bcast_intra(1106)..................: 
> MPIR_Bcast_binomial(143)................: 
> MPIDI_CH3_PktHandler_EagerShortSend(350): Message from rank 0 and tag 2 truncated; 4 bytes received but buffer size is 1
> [cli_1]: aborting job:
> Fatal error in PMPI_Bcast: Message truncated, error stack:
> PMPI_Bcast(1429)........................: MPI_Bcast(buf=0x608fd8, count=1, MPI_CHAR, root=0, comm=0x84000000) failed
> MPIR_Bcast_impl(1272)...................: 
> MPIR_Bcast_intra(1106)..................: 
> MPIR_Bcast_binomial(143)................: 
> MPIDI_CH3_PktHandler_EagerShortSend(350): Message from rank 0 and tag 2 truncated; 4 bytes received but buffer size is 1
> Fatal error in PMPI_Barrier: Other MPI error, error stack:
> PMPI_Barrier(425).........................: MPI_Barrier(comm=0x84000000) failed
> MPIR_Barrier_impl(331)....................: Failure during collective
> MPIR_Barrier_impl(313)....................: 
> MPIR_Barrier_intra(83)....................: 
> MPIC_Sendrecv(195)........................: 
> MPIC_Wait(540)............................: 
> MPIDI_CH3i_Progress_wait(213).............: an error occurred while handling an event returned by MPIDU_Sock_Wait()
> MPIDI_CH3I_Progress_handle_sock_event(421): 
> MPIDU_Socki_handle_read(651)..............: connection failure (set=0,sock=1,errno=104:Connection reset by peer)
> [cli_0]: aborting job:
> Fatal error in PMPI_Barrier: Other MPI error, error stack:
> PMPI_Barrier(425).........................: MPI_Barrier(comm=0x84000000) failed
> MPIR_Barrier_impl(331)....................: Failure during collective
> MPIR_Barrier_impl(313)....................: 
> MPIR_Barrier_intra(83)....................: 
> MPIC_Sendrecv(195)........................: 
> MPIC_Wait(540)............................: 
> MPIDI_CH3i_Progress_wait(213).............: an error occurred while handling an event returned by MPIDU_Sock_Wait()
> MPIDI_CH3I_Progress_handle_sock_event(421): 
> MPIDU_Socki_handle_read(651)..............: connection failure (set=0,sock=1,errno=104:Connection reset by peer)
> mpiexec: Warning: tasks 0-1 exited with status 1.
> [audet at mc1 experiences]$ 
> 
> case 2: rank 1 on master node (UFS), rank 0 on compute node (NFS) 
> 
> The program freeze in MPI_File_open(). By attaching to the process (rank 1) and doing a backtrace we get:
> 
> GNU gdb Red Hat Linux (6.6-8.fc7rh)
> Copyright (C) 2006 Free Software Foundation, Inc.
> GDB is free software, covered by the GNU General Public License, and you are
> welcome to change it and/or distribute copies of it under certain conditions.
> Type "show copying" to see the conditions.
> There is absolutely no warranty for GDB.  Type "show warranty" for details.
> This GDB was configured as "x86_64-redhat-linux-gnu"...
> (no debugging symbols found)
> Using host libthread_db library "/lib64/libthread_db.so.1".
> Attaching to program: /home/audet/experiences/create_toto, process 19616
> Reading symbols from /home/publique/mpich2-ch3_sock-1.4rc2/lib/libmpich.so.3...done.
> Loaded symbols for /usr/local/mpich2-ch3_sock-1.4rc2/lib/libmpich.so.3
> Reading symbols from /home/publique/mpich2-ch3_sock-1.4rc2/lib/libopa.so.1...done.
> Loaded symbols for /usr/local/mpich2-ch3_sock-1.4rc2/lib/libopa.so.1
> Reading symbols from /home/publique/mpich2-ch3_sock-1.4rc2/lib/libmpl.so.1...done.
> Loaded symbols for /usr/local/mpich2-ch3_sock-1.4rc2/lib/libmpl.so.1
> Reading symbols from /lib64/librt.so.1...done.
> Loaded symbols for /lib64/librt.so.1
> Reading symbols from /lib64/libpthread.so.0...done.
> [Thread debugging using libthread_db enabled]
> [New Thread 46912505079232 (LWP 19616)]
> Loaded symbols for /lib64/libpthread.so.0
> Reading symbols from /lib64/libc.so.6...done.
> Loaded symbols for /lib64/libc.so.6
> Reading symbols from /lib64/ld-linux-x86-64.so.2...done.
> Loaded symbols for /lib64/ld-linux-x86-64.so.2
> Reading symbols from /lib64/libnss_files.so.2...done.
> Loaded symbols for /lib64/libnss_files.so.2
> 0x0000003cd6eda8ef in poll () from /lib64/libc.so.6
> (gdb) bt
> #0  0x0000003cd6eda8ef in poll () from /lib64/libc.so.6
> #1  0x00002aaaaac27dcf in MPIDU_Sock_wait (sock_set=0x6024f8, millisecond_timeout=-1, eventp=0x7fff25242f40) at sock_wait.i:124
> #2  0x00002aaaaab41eb8 in MPIDI_CH3I_Progress (blocking=<value optimized out>, state=0x7fff25242fb0) at ch3_progress.c:185
> #3  0x00002aaaaabb39b8 in MPIC_Wait (request_ptr=0x2aaaaaec4fa0) at helper_fns.c:539
> #4  0x00002aaaaabb6349 in MPIC_Recv (buf=0x7fff252434c4, count=1, datatype=1275069445, source=0, tag=<value optimized out>, comm=-2080374784, status=0x1) at helper_fns.c:103
> #5  0x00002aaaaabb6832 in MPIC_Recv_ft (buf=0x7fff252434c4, count=1, datatype=1275069445, source=0, tag=2, comm=-2080374784, status=0x1, errflag=0x7fff25243384) at helper_fns.c:615
> #6  0x00002aaaaab2ee0f in MPIR_Bcast_binomial (buffer=0x7fff252434c4, count=1, datatype=1275069445, root=0, comm_ptr=<value optimized out>, errflag=0x7fff25243384) at bcast.c:138
> #7  0x00002aaaaab3043b in MPIR_Bcast_intra (buffer=0x7fff252434c4, count=1, datatype=1275069445, root=0, comm_ptr=0x2aaaaaec7440, errflag=0x7fff25243384) at bcast.c:1102
> #8  0x00002aaaaab310e6 in MPIR_Bcast_impl (buffer=0x605928, count=2, datatype=-1, root=-1, comm_ptr=0x0, errflag=0x2aaaaaec52d0) at bcast.c:1271
> #9  0x00002aaaaab3149e in PMPI_Bcast (buffer=0x7fff252434c4, count=1, datatype=1275069445, root=0, comm=-2080374784) at bcast.c:1414
> #10 0x00002aaaaab0b090 in ADIOI_GEN_OpenColl (fd=0x608a48, rank=<value optimized out>, access_mode=5, error_code=0x7fff252434c4) at ad_opencoll.c:52
> #11 0x00002aaaaab0ad3a in ADIO_Open (orig_comm=1140850688, comm=-2080374784, filename=0x2 <Address 0x2 out of bounds>, file_system=152, ops=0x2aaaaaec0520, access_mode=5, disp=0, 
>    etype=1275068685, filetype=1275068685, info=469762048, perm=-1, error_code=0x7fff252434c4) at ad_open.c:147
> #12 0x00002aaaaabe8f00 in PMPI_File_open (comm=1140850688, filename=0x400888 "toto", amode=5, info=469762048, fh=0x7fff25243518) at open.c:152
> #13 0x0000000000400764 in main ()
> (gdb) detach
> 
> 
> Note: For these two cases we used mpich2-1.4rc2 with ch3:sock device but the output is similar with older versions and other devices (we tested many combination).
> 
> 
> ________________________________________
> De : mpich-discuss-bounces at mcs.anl.gov [mpich-discuss-bounces at mcs.anl.gov] de la part de Rajeev Thakur [thakur at mcs.anl.gov]
> Date d'envoi : 21 avril 2011 17:31
> À : mpich-discuss at mcs.anl.gov
> Cc : Charland, Denis
> Objet : Re: [mpich-discuss] MPI_File_open() fails on local + NFS file system
> 
> Try adding the prefix "nfs:" to file name passed to MPI_File_open.
> 
> Rajeev
> 
> On Apr 21, 2011, at 4:20 PM, Audet, Martin wrote:
> 
>> Hi MPICH_Developers,
>> 
>> We are unable to use MPI_File_open() on a cluster where the first node (master node) mount a local file system and export it via NFS to a few cluster nodes so that /home on both the master node and the compute node refers to the same directory.
>> 
>> When a job composed on one (or more) process on the master node and one (or more) process on a client node is started, MPI_File_open() to create a new file either make the program to abort (if the process of rank 0 is on the master node using the local file system) or to freeze (if the process of rank 0 is on a compute node acessing the file via NFS).
>> 
>> When the program freeze, an inspection with gdb shows that the process of rank 0 is stuck into a MPI_Bcast() called by MPI_File_open().
>> 
>> Note that this happens with many mpich2 versions from 1.0.7 to 1.4rc2.
>> 
>> So we would like to know if the configuration we use is local or not.
>> 
>> Thanks,
>> 
>> Martin Audet
>> _______________________________________________
>> mpich-discuss mailing list
>> mpich-discuss at mcs.anl.gov
>> https://lists.mcs.anl.gov/mailman/listinfo/mpich-discuss
> 
> _______________________________________________
> mpich-discuss mailing list
> mpich-discuss at mcs.anl.gov
> https://lists.mcs.anl.gov/mailman/listinfo/mpich-discuss
> _______________________________________________
> mpich-discuss mailing list
> mpich-discuss at mcs.anl.gov
> https://lists.mcs.anl.gov/mailman/listinfo/mpich-discuss



More information about the mpich-discuss mailing list