[mpich-discuss] mpi runs for 15 hourrs, using 7 mins cpu

Mary Ellen Fitzpatrick mfitzpat at bu.edu
Wed Feb 4 12:13:14 CST 2009


Thanks, I recompiled with nemesis with additional configs below. 
$ ./configure --prefix=/usr/local/mpich2.nemesis --enable-cxx 
--enable-threads=multiple --with-thread-package=posix --enable-shared 
--enable-sharedlibs=gcc --with-device=ch3:nemesis 
--with-python=/usr/bin/python

Ran my mpi jobs on a smaller dataset.  Run time ~17 minutes with 8 
seconds of cpu usage....

The job runtime/cpu usage with the nemesis configured:
Session:        13944
Limits:         ncpus=4,neednodes=1,nodes=1,walltime=48:00:00
Resources:      cput=00:00:08,mem=9960kb,vmem=279864kb,walltime=00:17:32


Basically, the same issue, long run times, with minimal cpu usage.


Rajeev Thakur wrote:
> Hmm... Not sure what is going on here. Is your job expected to take 15
> hours? You may also want to try using the Nemesis communication channel in
> MPICH2, which will use shared memory for communication within a node and TCP
> (or other network) across nodes. Configure with --with-device=ch3:nemesis.
>
> Rajeev
>
>
>   
>> -----Original Message-----
>> From: mpich-discuss-bounces at mcs.anl.gov 
>> [mailto:mpich-discuss-bounces at mcs.anl.gov] On Behalf Of Mary 
>> Ellen Fitzpatrick
>> Sent: Wednesday, February 04, 2009 10:46 AM
>> To: mpich-discuss at mcs.anl.gov; Mary Ellen Fitzpatrick
>> Subject: [mpich-discuss] mpi runs for 15 hourrs, using 7 mins cpu
>>
>> I have a dual-dual core Opteron cluster running Cento5, torque-2.3.6, 
>> maui-3.2.6p21, mpich2-1.0.8(64bit) and a docking program, parallel 
>> dock6.  I installed dock6 serial as 32-bit, then installed dock6 
>> parallel as 32-bit.
>> I have configured my queues and scripts to run the dock mpi jobs and 
>> they do run to completion without errors.
>>
>> The problem I am seeing is that my mpi job is running for a total of 
>> 15hours, but is using only ~ 7minutes of cputime.
>> outfile
>> Limits:         ncpus=4,neednodes=1,nodes=1,walltime=48:00:00
>> Resources:      
>> cput=00:06:55,mem=9964kb,vmem=279836kb,walltime=15:12:46
>>
>> When the job is running, I log into the node, and can see the 
>> cpu's at 
>> 100%, so it is not sitting idle and there is not an nfs 
>> traffic to speak of.
>>
>> Anyone run into this issue before?  Is this an mpi issue?
>>
>> -- 
>> Thanks
>> Mary Ellen
>>
>>
>>     
>
>
>   

-- 
Thanks
Mary Ellen



More information about the mpich-discuss mailing list