[mpich-discuss] Mpi-collective abort:

Thejna Tharammal ttharammal at marum.de
Sat Aug 28 15:57:24 CDT 2010


Thank you Dave and Dr.Correa,
I'll try it with the latest version of MPI then :-)
Thejna.
 
----------------original message-----------------
From: "Gus Correa" gus at ldeo.columbia.edu
To: "Mpich Discuss" mpich-discuss at mcs.anl.gov
Date: Fri, 27 Aug 2010 11:58:13 -0400
-------------------------------------------------
 
 
> Hi Thejna
> 
> The MPI distributed with the PGI compilers is MPICH-1, which is
> too old, no longer maintained, and doesn't play well with current Linux
> kernels.
> 
> I suggest that you build MPICH2 from source code and use it.
> You should test if it works correctly with the cpi.c program,
> before you move to CCSM, which is a very complex program.
> 
> For specific help on how to build CCSM3 or CCSM4, you may want
> to ask questions to the NCAR CGD Bulletin Board:
> 
> http://bb.cgd.ucar.edu/
> 
> Most problems with CCSM are *not* related to MPI,
> but with the building scripts.
> 
> I hope this helps.
> 
> Gus Correa
> 
> 
> Thejna Tharammal wrote:
>> Hi,
>> I'm using MPICH version 1.0.5, on linux cluster- with scientific linux
>> (2.6.18-128.1.6.el5, x86_64)
>> 
>> Thank you,
>> Thejna.
>> ----------------original message-----------------
>> From: "Rajeev Thakur" thakur at mcs.anl.gov
>> To: mpich-discuss at mcs.anl.gov
>> Date: Thu, 26 Aug 2010 15:54:32 -0500
>> -------------------------------------------------
>> 
>> 
>>> Which version of MPICH2 are you using and on what platform and OS?
>>>
>>> Rajeev
>>>
>>> On Aug 26, 2010, at 8:22 AM, Thejna Tharammal wrote:
>>>
>>>> I'm trying to run a model [CCSm, MPMD with 5 executables] on Intel xeon
>> 64 bit 
>>>> cluster with 6 nodes(3Gz each,ethernet) , I use Mpi2 distributed with 
>>>> 
>>>> pgi-7.2-5 , the model compiles without errors, but during run it 
>>>> shows,
>>>>
>>>>
>>>> 1: (cpl_domain_compare) domain #1 name: atm contract domain
>>>> 1: (cpl_domain_compare) domain #2 name: lnd contract domain
>>>> 1: (cpl_domain_compare) domain #1 name: ocn contract domain
>>>> 1: (cpl_domain_compare) domain #2 name: ice contract domain
>>>> rank 0 in job 1 xxxx_35825 caused collective abort of all ranks
>>>> exit status of rank 0: killed by signal 9
>>>>
>>>> What could be the reason for this,
>>>>
>>>> Thank you,
>>>>
>>>> Thejna.
>>>>
>>>> _______________________________________________
>>>> mpich-discuss mailing list
>>>> mpich-discuss at mcs.anl.gov
>>>> https://lists.mcs.anl.gov/mailman/listinfo/mpich-discuss
>>> _______________________________________________
>>> mpich-discuss mailing list
>>> mpich-discuss at mcs.anl.gov
>>> https://lists.mcs.anl.gov/mailman/listinfo/mpich-discuss
>>>
>> 
>> 
>> _______________________________________________
>> mpich-discuss mailing list
>> mpich-discuss at mcs.anl.gov
>> https://lists.mcs.anl.gov/mailman/listinfo/mpich-discuss
> 
> _______________________________________________
> mpich-discuss mailing list
> mpich-discuss at mcs.anl.gov
> https://lists.mcs.anl.gov/mailman/listinfo/mpich-discuss
> 




More information about the mpich-discuss mailing list