problem with 1.1.0 on SGI ALTIX
Joeckel, Patrick
Patrick.Joeckel at dlr.de
Tue Dec 8 05:56:40 CST 2009
Dear parallel-netcdf developers and users,
with the same code I recently ran successfully on an IBM power6 (SLES
10, poe, GPFS)
(see my e-mail to this list with subject "bad performance")
I run into trouble on an SGI ALTIX:
uname -a
Linux a01 2.6.16.60-0.42.5.547.0.PTF.352893-default #1 SMP Mon Aug 24
09:41:41 UTC 2009 ia64 ia64 ia64 GNU/Linux
The parallel environment is "MPI on ALTIX", which is "MPT version 1.17"
The parallel file system is CXFS.
The F90-compiler is: ifort (IFORT) 10.1 20090817
First I had to set in the run-time environment
MPI_TYPE_DEPTH=16
MPI_TYPE_MAX=131072
otherwise the model terminates with error messages on respective
limits.
Then I receive
*** glibc detected *** double free or corruption (!prev):
0x6000000020bc0000 ***
with a traceback pointing to an issue in
/opt/sgi/mpt/mpt-1.22/lib/libmpi.so
After some "research" I figured out that with
MALLOC_CHECK_=0
in the run-time environment, this issue could be avoided, but now I receive
Rank 0, Process 1353414 received signal SIGSEGV(11)
All other processes remain waiting, until the PIDs are killed from the
queuing engine, when the time limit is reached.
With serial netcdf output the code runs fine!
Does anyone have the same / a similar problem, or -- even better -- a
solution,
or an idea what the issue might be?
Any help is very much appreciated.
Yours,
Patrick
--
------------------------------------------------------------------
Dr. Patrick Joeckel
Deutsches Zentrum fuer Luft- und Raumfahrt e.V.
in der Helmholtz-Gemeinschaft
Institut fuer Physik der Atmosphaere
Muenchner Strasse 20
Oberpfaffenhofen, D-82234 Wessling
Phone : +49-8153-28-2565
Fax : +49-8153-28-1841
E-mail: Patrick.Joeckel at dlr.de
Web : http://www.dlr.de/ipa/
------------------------------------------------------------------
More information about the parallel-netcdf
mailing list