My program get stuck ... Bug ?

BERCHET ADRIEN adrien.berchet at univ-poitiers.fr
Tue Apr 29 14:59:21 CDT 2014


  

Here is the code without Boost::mpi (it is just commented and
replaced by proper MPI functions). 

I run it with the command : 

for i
in {1..100}; do echo $i && mpiexec -n 20 ./pnetcdf_test; done 

I have 8
cores available and the filesystem is ext4. 
---

Adrien
Berchet

Institut P'
Cnrs - Université de Poitiers - Ensma
UPR
3346
Département D2 : Fluides, Thermique, Combustion
Axe Hydrodynamique
et Écoulements Environnementaux
SP2MI - Téléport 2
Boulevard Marie et
Pierre Curie, BP 30179
F86962 Futuroscope Chasseneuil Cedex -
France

Bureau : 165
Mail : adrien.berchet at univ-poitiers.fr
[4]
Téléphone : 05 49 49 69 51

On Tue, 29 Apr 2014 14:25:46 -0500,
Wei-keng Liao wrote: 

> Hi, Adrien
> 
> Can you send me the codes with
Boost::mpi removed?
> 
> Also, please let me know the command line you
used.
> Essentially, I need the info about what number of cored are
>
available and how many did you use. Are you using a
> parallel file
system? (What file system do you use?)
> 
> Wei-keng
> 
> On Apr 29,
2014, at 2:21 PM, BERCHET ADRIEN wrote:
> 
>> Hi, thank you for your
quick answer. I am using the version 1.4.1 of PnetCDF and mpiexec
--version says : mpiexec (OpenRTE) 1.4.3. And I am currently testing on
Ubuntu 12.04 - 64 bits. I tried the sample program you joined and it
works fine. I runned it several hundreds of times with no issue. I also
tried to remove Boost::mpi from my code but it does not change anything.
Regards, --- Adrien Berchet Institut P' Cnrs - Université de Poitiers -
Ensma UPR 3346 Département D2 : Fluides, Thermique, Combustion Axe
Hydrodynamique et Écoulements Environnementaux SP2MI - Téléport 2
Boulevard Marie et Pierre Curie, BP 30179 F86962 Futuroscope Chasseneuil
Cedex - France Bureau : 165 Mail : adrien.berchet at univ-poitiers.fr
[3]Téléphone : 05 49 49 69 51 On Tue, 29 Apr 2014 13:20:59 -0500,
Wei-keng Liao wrote: 
>> 
>>> Hi, Adrien Please let us know what version
of PnetCDF you are using? Also, version of the MPI. Your codes look fine
to me (at least for the PnetCDF part). If the problem only happened when
the number of MPI processes is larger than the number of available
cores, maybe it is caused by MPI-IO. Have you seen the same problem
happened to a pure MPI-IO program? A sample MPI-IO program can be found
in
https://svn.mcs.anl.gov/repos/mpi/mpich2/trunk/src/mpi/romio/test/coll_perf.c
[2]Wei-keng On Apr 29, 2014, at 12:35 PM, BERCHET ADRIEN wrote: 
>>>

>>>> Hi there, I am not sure it is the good place to ask this but I
don't know where I can get help about it ... I wrote a very little code
to write NetCDF files using pnetcdf. Most of the time it work well and
the netcdf file is properly generated (I checked with ncdump). But
sometimes, the program just get stuck and runs indefinitely (it seems to
happen only when the number of MPI processes is larger than the number
of available cores but I am not sure about this). The program get stuck
when it calls ncmpi_put_vara_double_all(). Could someone have a look on
the code and tell me what is wrong please ? I looked for a solution for
hours but could not find anything. Thank you very much ! Adrien --
Adrien Berchet Institut P' Cnrs - Université de Poitiers - Ensma UPR
3346 Département D2 : Fluides, Thermique, Combustion Axe Hydrodynamique
et Écoulements Environnementaux SP2MI - Téléport 2 Boulevard Marie et
Pierre Curie, BP 30179 F86962 Futuroscope Chasseneuil Cedex - France
Bureau : 165 Mail : adrien.berchet at univ-poitiers.fr [1] Téléphone : 05
49 49 69 51
 

Links:
------
[1]
mailto:adrien.berchet at univ-poitiers.fr
[2]
https://svn.mcs.anl.gov/repos/mpi/mpich2/trunk/src/mpi/romio/test/coll_perf.c
[3]
mailto:adrien.berchet at univ-poitiers.fr
[4]
mailto:adrien.berchet at univ-poitiers.fr
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.mcs.anl.gov/pipermail/parallel-netcdf/attachments/20140429/0e7351a0/attachment.html>
-------------- next part --------------
A non-text attachment was scrubbed...
Name: pnetcdf_test.cpp
Type: text/x-c
Size: 4329 bytes
Desc: not available
URL: <http://lists.mcs.anl.gov/pipermail/parallel-netcdf/attachments/20140429/0e7351a0/attachment.bin>


More information about the parallel-netcdf mailing list