<!DOCTYPE html PUBLIC "-//W3C//DTD XHTML 1.0 Transitional//EN">
<html><body>
<p>Hi,</p>
<p>Ok I will try to build a newer version of OpenMPI and tell you what happens (not sure I will get time this week so the answer could come lately). And if it does not fix the problem, I will also try to generate a backtrace.</p>
<p>Thanks again !</p>
<div>---<br />
<pre><span style="font-size: x-small; color: #808080;"><span style="font-size: small; color: #000000;">Adrien Berchet<br /></span><br /></span></pre>
<pre><span style="font-size: x-small; color: #808080;">Institut P'<br />Cnrs - Université de Poitiers - Ensma<br />UPR 3346<br />Département D2 : Fluides, Thermique, Combustion<br />Axe Hydrodynamique et Écoulements Environnementaux<br />SP2MI - Téléport 2<br />Boulevard Marie et Pierre Curie, BP 30179<br />F86962 Futuroscope Chasseneuil Cedex - France<br /><br />Bureau : 165<br />Mail : <a href="mailto:adrien.berchet@univ-poitiers.fr" target="_blank">adrien.berchet@univ-poitiers.fr</a><br />Téléphone : 05 49 49 69 51</span></pre>
<pre> </pre>
</div>
<p>On Tue, 29 Apr 2014 17:04:48 -0500, Rob Latham wrote:</p>
<blockquote type="cite" style="padding-left:5px; border-left:#1010ff 2px solid; margin-left:5px; width:100%"><!-- html ignored --><!-- head ignored --><!-- meta ignored -->
<pre>On 04/29/2014 04:49 PM, Wei-keng Liao wrote:</pre>
<blockquote type="cite" style="padding-left:5px; border-left:#1010ff 2px solid; margin-left:5px; width:100%">Hi, Adrien I have tried your code and run script twice using MPICH and once using OpenMPI and did not see a hanging. The first 50 iterations ran fast and started to slow down after that, but they finished eventually with no error. The problem you encountered may be related to the OpenMPI. 1.4.3 is kind of old. I wonder if you can try the latest version? My OpenMPI is 1.6.5. Rob Latham has provided many fixes to the MPI-IO module to OpenMPI since 1.4.3. I believe they solved many problems, may including the one your are seeing.</blockquote>
<pre>I concur with Wei-keng. I can't think of what OpenMPI fix might have
occurred since 1.4.3...
If you cannot upgrade, then if you can get your program stuck into one
of these hangs, attaching to one (or several: rank 0 might be of most
interest) of the processes with gdb and generating a backtrace will at
least give us an idea of why your program is hanging.
==rob</pre>
<blockquote type="cite" style="padding-left:5px; border-left:#1010ff 2px solid; margin-left:5px; width:100%">Wei-keng On Apr 29, 2014, at 2:59 PM, BERCHET ADRIEN wrote:
<blockquote type="cite" style="padding-left:5px; border-left:#1010ff 2px solid; margin-left:5px; width:100%">Here is the code without Boost::mpi (it is just commented and replaced by proper MPI functions). I run it with the command : for i in {1..100}; do echo $i && mpiexec -n 20 ./pnetcdf_test; done I have 8 cores available and the filesystem is ext4. --- Adrien Berchet Institut P' Cnrs - Université de Poitiers - Ensma UPR 3346 Département D2 : Fluides, Thermique, Combustion Axe Hydrodynamique et Écoulements Environnementaux SP2MI - Téléport 2 Boulevard Marie et Pierre Curie, BP 30179 F86962 Futuroscope Chasseneuil Cedex - France Bureau : 165 Mail : <a href="mailto:adrien.berchet@univ-poitiers.fr">adrien.berchet@univ-poitiers.fr</a>Téléphone : 05 49 49 69 51 On Tue, 29 Apr 2014 14:25:46 -0500, Wei-keng Liao wrote:
<blockquote type="cite" style="padding-left:5px; border-left:#1010ff 2px solid; margin-left:5px; width:100%">Hi, Adrien Can you send me the codes with Boost::mpi removed? Also, please let me know the command line you used. Essentially, I need the info about what number of cored are available and how many did you use. Are you using a parallel file system? (What file system do you use?) Wei-keng On Apr 29, 2014, at 2:21 PM, BERCHET ADRIEN wrote:
<blockquote type="cite" style="padding-left:5px; border-left:#1010ff 2px solid; margin-left:5px; width:100%">Hi, thank you for your quick answer. I am using the version 1.4.1 of PnetCDF and mpiexec --version says : mpiexec (OpenRTE) 1.4.3. And I am currently testing on Ubuntu 12.04 - 64 bits. I tried the sample program you joined and it works fine. I runned it several hundreds of times with no issue. I also tried to remove Boost::mpi from my code but it does not change anything. Regards, --- Adrien Berchet Institut P' Cnrs - Université de Poitiers - Ensma UPR 3346 Département D2 : Fluides, Thermique, Combustion Axe Hydrodynamique et Écoulements Environnementaux SP2MI - Téléport 2 Boulevard Marie et Pierre Curie, BP 30179 F86962 Futuroscope Chasseneuil Cedex - France Bureau : 165 Mail : <a href="mailto:adrien.berchet@univ-poitiers.frTéléphone">adrien.berchet@univ-poitiers.frTéléphone</a>: 05 49 49 69 51 On Tue, 29 Apr 2014 13:20:59 -0500, Wei-keng Liao wrote:
<blockquote type="cite" style="padding-left:5px; border-left:#1010ff 2px solid; margin-left:5px; width:100%">Hi, Adrien Please let us know what version of PnetCDF you are using? Also, version of the MPI. Your codes look fine to me (at least for the PnetCDF part). If the problem only happened when the number of MPI processes is larger than the number of available cores, maybe it is caused by MPI-IO. Have you seen the same problem happened to a pure MPI-IO program? A sample MPI-IO program can be found in <a href="https://svn.mcs.anl.gov/repos/mpi/mpich2/trunk/src/mpi/romio/test/coll_perf.cWei-keng">https://svn.mcs.anl.gov/repos/mpi/mpich2/trunk/src/mpi/romio/test/coll_perf.cWei-keng</a>On Apr 29, 2014, at 12:35 PM, BERCHET ADRIEN wrote:
<blockquote type="cite" style="padding-left:5px; border-left:#1010ff 2px solid; margin-left:5px; width:100%">Hi there, I am not sure it is the good place to ask this but I don't know where I can get help about it ... I wrote a very little code to write NetCDF files using pnetcdf. Most of the time it work well and the netcdf file is properly generated (I checked with ncdump). But sometimes, the program just get stuck and runs indefinitely (it seems to happen only when the number of MPI processes is larger than the number of available cores but I am not sure about this). The program get stuck when it calls ncmpi_put_vara_double_all(). Could someone have a look on the code and tell me what is wrong please ? I looked for a solution for hours but could not find anything. Thank you very much ! Adrien -- Adrien Berchet Institut P' Cnrs - Université de Poitiers - Ensma UPR 3346 Département D2 : Fluides, Thermique, Combustion Axe Hydrodynamique et Écoulements Environnementaux SP2MI - Téléport 2 Boulevard Marie et Pierre Curie, BP 30179 F86962 Futuroscope Chasseneuil Cedex - France Bureau : 165</blockquote>
</blockquote>
</blockquote>
</blockquote>
</blockquote>
</blockquote>
<pre> Mail : <a href="mailto:adrien.berchet@univ-poitiers.fr">adrien.berchet@univ-poitiers.fr</a>Téléphone : 05 49 49 69 51
<span class="sig">-- Rob Latham Mathematics and Computer Science Division Argonne National Lab, IL USA </span></pre>
</blockquote>
</body></html>