<html>
<head>
<style><!--
.hmmessage P
{
margin:0px;
padding:0px
}
body.hmmessage
{
font-size: 10pt;
font-family:Verdana
}
--></style>
</head>
<body class='hmmessage'>
Hi everyone,<BR> <BR>I want to run Community Climate System Model on our machine under MPICH2. I compiled it successfully. However, I got some error message about mpi during runnig it.<BR> <BR>1) In the run script, I asked for 32 cpus ( use PBS batch system). After starting up mpd daemons, I wrote " /mnt/storage-space/disk1/mpich/bin/mpiexec -l -n 2 $EXEROOT/all/cpl : -n 2 $EXEROOT/all/csim : -n 8 $EXEROOT/all/clm : -n 4 $EXEROOT/all/pop : -n 16 $EXEROOT/all/cam" . <BR> The process is over quite quickly after I qsub it. With error message like:<BR>rank 5 in job 1 compute-0-10.local_46741 caused collective abort of all ranks<BR> exit status of rank 5: return code 1 <BR>AND<BR>14: Fatal error in MPI_Cart_shift: Invalid communicator, error stack:<BR>14: MPI_Cart_shift(172): MPI_Cart_shift(MPI_COMM_NULL, direction=1, displ=1, source=0x2582aa0, dest=0x2582aa4) failed<BR>14: MPI_Cart_shift(80).: Null communi
cator<BR>15: Fatal error in MPI_Cart_shift: Invalid communicator, error stack:<BR>15: MPI_Cart_shift(172): MPI_Cart_shift(MPI_COMM_NULL, direction=1, displ=1, source=0x2582aa0, dest=0x2582aa4) failed<BR>5: Assertion failed in file helper_fns.c at line 337: 0<BR>15: MPI_Cart_shift(80).: Null communicator<BR>5: memcpy argument memory ranges overlap, dst_=0xf2c37f4 src_=0xf2c37f4 len_=4<BR>9: Assertion failed in file helper_fns.c at line 337: 0<BR>5: <BR>9: memcpy argument memory ranges overlap, dst_=0x1880ce64 src_=0x1880ce64 len_=4<BR>5: internal ABORT - process 5<BR>9: <BR>9: internal ABORT - process 9<BR>4: Assertion failed in file helper_fns.c at line 337: 0<BR>4: memcpy argument memory ranges overlap, dst_=0x1c9615d0 src_=0x1c9615d0 len_=4<BR>4: <BR>4: internal ABORT - process 4<BR><BR>2) What quite puzzeled me is that if I delete any one of the five (cpl, csim, clm, pop, cam ) , the model can running sucsessfully. For example, delete "cpl", I wro
te " <BR>/mnt/storage-space/disk1/mpich/bin/mpiexec -l -n 2 $EXEROOT/all/csim : -n 8 $EXEROOT/all/clm : -n 4 $EXEROOT/all/pop : -n 16 $EXEROOT/all/cam" will be ok.<BR>but if I run all of the five at the same time, the error message as mentioned above will appear.<BR> <BR>3) If ask for a few more cpus, things may become better, I guess. So I have a try . Ask for 34 cpus but still use 2+2+8+4+16=32 cpus, mpi error message still exists.<BR> <BR>How should I solve the problem?<BR>Anyone can give some suggestions?<BR> <BR>Thanks in advace!<BR> <BR> <BR>L. S<BR>                                            <br /><hr />聊天+搜索+邮箱 想要轻松出游,手机MSN帮你搞定! <a href='http://3g.msn.cn/' target='_new'>立刻下载!</a></body>
</html>