problem with 1.1.0 on SGI ALTIX

Joeckel, Patrick Patrick.Joeckel at dlr.de
Tue Dec 8 05:56:40 CST 2009


Dear parallel-netcdf developers and users,

with the same code I recently ran successfully on an IBM power6 (SLES 
10, poe, GPFS)
(see my e-mail to this list with subject "bad performance")
I run into trouble on an SGI ALTIX:

uname -a
Linux a01 2.6.16.60-0.42.5.547.0.PTF.352893-default #1 SMP Mon Aug 24 
09:41:41 UTC 2009 ia64 ia64 ia64 GNU/Linux
The parallel environment is "MPI on ALTIX", which is "MPT version 1.17"
The parallel file system is CXFS.
The F90-compiler is: ifort (IFORT) 10.1 20090817

First I had to set in the run-time environment
        MPI_TYPE_DEPTH=16
        MPI_TYPE_MAX=131072
otherwise the model terminates with error messages on respective
limits.

Then I receive
*** glibc detected ***  double free or corruption (!prev): 
0x6000000020bc0000 ***
with a traceback pointing to an issue in
/opt/sgi/mpt/mpt-1.22/lib/libmpi.so

After some "research" I figured out that with
         MALLOC_CHECK_=0
in the run-time environment, this issue could be avoided, but now I receive
      Rank 0, Process 1353414 received signal SIGSEGV(11)

All other processes remain waiting, until the PIDs are killed from the
queuing engine, when the time limit is reached.

With serial netcdf output the code runs fine!

Does anyone have the same / a similar problem, or -- even better -- a 
solution,
or an idea what the issue might be?

Any help is very much appreciated.

Yours,

Patrick


-- 
------------------------------------------------------------------
 Dr. Patrick Joeckel
 Deutsches Zentrum fuer Luft- und Raumfahrt e.V.
 in der Helmholtz-Gemeinschaft
 Institut fuer Physik der Atmosphaere

 Muenchner Strasse 20
 Oberpfaffenhofen, D-82234 Wessling

 Phone : +49-8153-28-2565
 Fax   : +49-8153-28-1841
 E-mail: Patrick.Joeckel at dlr.de
 Web   : http://www.dlr.de/ipa/
------------------------------------------------------------------



More information about the parallel-netcdf mailing list