pNetCDF problem with WRF on large core counts

Regimbal, Kevin Kevin.Regimbal at nrel.gov
Mon Aug 5 11:17:44 CDT 2013


Intel MPI version 4.1.0
OpenMPI version 1.6.4
MVapich2 version 1.8.1

We're running lustre 1.8.9 clients and 2.1.3 on the MDS/OSSs

I was running the perform-test-pnetcdf.c I found on this page:
http://software.intel.com/en-us/forums/topic/373166

I tested both scaling number of processes, and lustre stripe sizes with
parallel-netcdf version 1.3.1

The test runs successfully at lfs stripe sizes of 1, 64, and 100 in all
three MPIs.
The test runs successfully for 2048 cores in all three MPIs.
The test only runs successfully on mvapich2 for 4096 cores.

Kevin


On 8/5/13 7:50 AM, "Rob Latham" <robl at mcs.anl.gov> wrote:

>On Mon, Aug 05, 2013 at 06:38:56AM -0600, Michalakes, John wrote:
>> Hi,
>> 
>> We're running into problems running WRF with pNetCDF and it may have
>>something to do with which MPI implementation we use.  Both Intel MPI
>>and OpenMPI fail (hang processes) on MPI task counts greater that 256.
>>Mvapich2 works, however.  This is using the Lustre file system on a
>>Sandybridge Linux cluster here.  Are you aware of any task limits
>>associated with MPI-IO in these implementations that might be causing
>>the problem? Any ideas for reconfiguring?   There's a little more
>>information in the email stream below.
>
>"n0224:c7f1:167b2700: 82084370 us(82084370 us!!!):  CONN_RTU read: sk 77
>ERR 0x68, rcnt=-1, v=7 -> 172.20.5.34 PORT L-bac4 R-e1c6 PID L-0 R-0"
>
>
>Something timed out.  Either the infiniband layer or the Lustre layer
>is -- rightly -- astonished that a request took 82 seconds.
>
>So, what's different about Intel MPI, OpenMPI and Mvapich2 with
>respect to Lustre?  Can you give me version numbers for the three
>packages?   I'm asking because we have over the years improved the
>Lustre driver in MPICH's ROMIO thanks to community contributions.  I
>*thought* those changes had made it into the various "downstream" MPI
>implementations.
>
>MVAPICH2 at one point (and maybe still) had an alternate Lustre
>driver, which may explain why it performs well.  As it turns out, when
>an MPI implementation pays attention to a file system, good things
>happen.  Go figure!
>
>==rob
>
>> 
>> -----Original Message-----
>> From: Regimbal, Kevin
>> Sent: Sunday, August 04, 2013 1:33 PM
>> To: Michalakes, John
>> 
>> Hi John,
>> 
>> I've been playing with parallel-netcdf this weekend.  As far as I can
>>tell, pnetcdf does not work at large core counts (i.e. 4096) for either
>>intel MPI or openMPI.  It does work with mvapich2 at 4096 cores.  I
>>added a build for mvapich2 and a pnetcdf that ties to intel & mvapich2.
>> 
>> It's probably going to take a while to track down why large core counts
>>work.  Not sure if the issue is pnetcdf or MPIIO on the other MPIs.
>> 
>> Kevin
>> ________________________________________
>> From: Michalakes, John
>> Sent: Thursday, August 01, 2013 3:42 PM
>> Cc: Regimbal, Kevin
>> 
>> [...]
>> 
>> Regarding pNetCDF, this time the executable just hung reading the first
>>input file.  I then did another run and made sure to put an lsf
>>setstripe -c 4 . command in the runscript.  It hung again but this time
>>at least one of the tasks output this strange message before hanging:
>> 
>> n0224:c7f1:167b2700: 82084370 us(82084370 us!!!):  CONN_RTU read: sk 77
>>ERR 0x68, rcnt=-1, v=7 -> 172.20.5.34 PORT L-bac4 R-e1c6 PID L-0 R-0
>> 
>> [...]
>> 
>> I'm tried another run, this time, copying the input data into the
>>directory instead of accessing it via a symlink.  Then I saw this:
>> 
>> n0240:c9e5:50cde700: 64137185 us(64137185 us!!!):  CONN_REQUEST:
>>SOCKOPT ERR No route to host -> 172.20.3.55 53644 - ABORTING 5
>> n0240:c9e5:50cde700: 64137227 us(42 us): dapl_evd_conn_cb() unknown
>>event 0x0
>> n0240:c9e5:50cde700: 64173162 us(35935 us):  CONN_REQUEST: SOCKOPT ERR
>>No route to host -> 172.20.3.56 53631 - ABORTING 5
>> n0240:c9e5:50cde700: 64173192 us(30 us): dapl_evd_conn_cb() unknown
>>event 0x0
>> 
>
>-- 
>Rob Latham
>Mathematics and Computer Science Division
>Argonne National Lab, IL USA



More information about the parallel-netcdf mailing list