[mpich-discuss] Program hangs when accessing MPI file

Pablo Guaza Peces pabloguaza at ugr.es
Wed Sep 19 02:32:51 CDT 2012


That's right the head node is the one sharing with the rest of the nodes 
the NFS share, so it detects it as local while the others as remote.

What kind of real parallel file systems could I use? I guess that that 
implies having some specific hardware (a SAN or so), is that right?

Cheers!

El 18/09/12 14:53, Rajeev Thakur escribió:
> It should normally work without the prefix because MPICH tries to detect the file system automatically. However in this case one process detects the file system as a local file system (ufs), whereas others detect it as remotely mounted via nfs, hence the problem.
>
> The solution is to use a real parallel file system, not nfs :-).
>
>
> On Sep 18, 2012, at 2:14 AM, Pablo Guaza Peces wrote:
>
>> Wow!
>> It worked!!
>>
>> Do you know if there's a way to make it work like that but nos specifying if the file is in a nfs share in the code? Most of the users of this cluster execute their programs in different data centers and I'd like them to work in all of them without changing the code.
>>
>> Thanks so much!! :)
>>
>> El lun 17 sep 2012 16:46:46 CEST, Rajeev Thakur escribió:
>>> Try prefixing the file name with nfs: as "nfs:datafile"
>>>
>>>
>>> On Sep 17, 2012, at 3:53 AM, Pablo Guaza Peces wrote:
>>>
>>>> Hi everybody!
>>>> I've been having this problem for a while now and I haven't been able to solve it:
>>>> Whenever I access a MPI shared file my program freezes and it doesn't give any output or errors. I made this very very simple program in C to test it:
>>>>
>>>> #include "mpi.h"
>>>> #include <stdio.h>
>>>>
>>>> int main(int argc, char **argv)
>>>> {
>>>>     MPI_File fh;
>>>>
>>>>     MPI_Init(&argc,&argv);
>>>>
>>>>     MPI_File_open(MPI_COMM_WORLD, "datafile",
>>>>           MPI_MODE_CREATE | MPI_MODE_RDWR,
>>>>                   MPI_INFO_NULL, &fh);
>>>>
>>>>     MPI_File_close(&fh);
>>>>
>>>>     MPI_Finalize();
>>>>     return 0;
>>>> }
>>>>
>>>>
>>>> As I said it freezes and I have to kill it myself with qdel command. It actually creates the file "datafile", and there's no output in the error or output files besides the ones related to being manually killed.
>>>>
>>>> I send this program to torque with this PBS script:
>>>> #! /bin/bash
>>>> #PBS -S /bin/bash
>>>> #PBS -A batch
>>>> #PBS -N test_mpi_file
>>>> #PBS -l nodes=2:ppn=2
>>>> #PBS -l walltime=00:02:50
>>>> #PBS -j oe
>>>>
>>>> cd $PBS_O_WORKDIR
>>>>
>>>> mpiexec.hydra -rmk pbs /home/pablo/Programs/mbg/c/test_mpi_file
>>>>
>>>> I have the next SW configuration:
>>>> - mpich2 1.2.1 using Hydra
>>>> - Torque 2.5.7
>>>> - Maui 3.2.6
>>>>
>>>> Maybe it has something to do with the NFS home directory that is shared with all the nodes, because I can execute the program with no problem when I do it in just one machine, being the head node or any other. It only fails when two or more machines are accessing the file.
>>>>
>>>> Is there any way I could try to debug the program when it's being executed in at least two nodes?
>>>>
>>>> Any help would be very appreciated!
>>>>
>>>> Thanks
>>>> _______________________________________________
>>>> mpich-discuss mailing list     mpich-discuss at mcs.anl.gov
>>>> To manage subscription options or unsubscribe:
>>>> https://lists.mcs.anl.gov/mailman/listinfo/mpich-discuss
>>> _______________________________________________
>>> mpich-discuss mailing list     mpich-discuss at mcs.anl.gov
>>> To manage subscription options or unsubscribe:
>>> https://lists.mcs.anl.gov/mailman/listinfo/mpich-discuss
>> _______________________________________________
>> mpich-discuss mailing list     mpich-discuss at mcs.anl.gov
>> To manage subscription options or unsubscribe:
>> https://lists.mcs.anl.gov/mailman/listinfo/mpich-discuss
> _______________________________________________
> mpich-discuss mailing list     mpich-discuss at mcs.anl.gov
> To manage subscription options or unsubscribe:
> https://lists.mcs.anl.gov/mailman/listinfo/mpich-discuss



More information about the mpich-discuss mailing list