[mpich-discuss] mpi runs for 15 hourrs, using 7 mins cpu

Rajeev Thakur thakur at mcs.anl.gov
Wed Feb 4 11:28:02 CST 2009


Hmm... Not sure what is going on here. Is your job expected to take 15
hours? You may also want to try using the Nemesis communication channel in
MPICH2, which will use shared memory for communication within a node and TCP
(or other network) across nodes. Configure with --with-device=ch3:nemesis.

Rajeev


> -----Original Message-----
> From: mpich-discuss-bounces at mcs.anl.gov 
> [mailto:mpich-discuss-bounces at mcs.anl.gov] On Behalf Of Mary 
> Ellen Fitzpatrick
> Sent: Wednesday, February 04, 2009 10:46 AM
> To: mpich-discuss at mcs.anl.gov; Mary Ellen Fitzpatrick
> Subject: [mpich-discuss] mpi runs for 15 hourrs, using 7 mins cpu
> 
> I have a dual-dual core Opteron cluster running Cento5, torque-2.3.6, 
> maui-3.2.6p21, mpich2-1.0.8(64bit) and a docking program, parallel 
> dock6.  I installed dock6 serial as 32-bit, then installed dock6 
> parallel as 32-bit.
> I have configured my queues and scripts to run the dock mpi jobs and 
> they do run to completion without errors.
> 
> The problem I am seeing is that my mpi job is running for a total of 
> 15hours, but is using only ~ 7minutes of cputime.
> outfile
> Limits:         ncpus=4,neednodes=1,nodes=1,walltime=48:00:00
> Resources:      
> cput=00:06:55,mem=9964kb,vmem=279836kb,walltime=15:12:46
> 
> When the job is running, I log into the node, and can see the 
> cpu's at 
> 100%, so it is not sitting idle and there is not an nfs 
> traffic to speak of.
> 
> Anyone run into this issue before?  Is this an mpi issue?
> 
> -- 
> Thanks
> Mary Ellen
> 
> 



More information about the mpich-discuss mailing list