[mpich-discuss] mpi runs for 15 hourrs, using 7 mins cpu
Mary Ellen Fitzpatrick
mfitzpat at bu.edu
Wed Feb 4 13:45:09 CST 2009
Recompiled without the --enable-threads=muliple.
Same result. Checking on the torque reporting cpu correctly.
Rajeev Thakur wrote:
> Can you just try withou the --enable-threads=multiple option? It is not
> needed. The default option is --enable-threads=runtime, which is more
> efficient. I am not sure if it will make any difference, but worth a try.
>
> It is also possible that Torque isn't reporting the CPU time correctly.
>
> Rajeev
>
>
>> -----Original Message-----
>> From: mpich-discuss-bounces at mcs.anl.gov
>> [mailto:mpich-discuss-bounces at mcs.anl.gov] On Behalf Of Mary
>> Ellen Fitzpatrick
>> Sent: Wednesday, February 04, 2009 12:13 PM
>> To: mpich-discuss at mcs.anl.gov; Mary Ellen Fitzpatrick
>> Subject: Re: [mpich-discuss] mpi runs for 15 hourrs, using 7 mins cpu
>>
>> Thanks, I recompiled with nemesis with additional configs below.
>> $ ./configure --prefix=/usr/local/mpich2.nemesis --enable-cxx
>> --enable-threads=multiple --with-thread-package=posix --enable-shared
>> --enable-sharedlibs=gcc --with-device=ch3:nemesis
>> --with-python=/usr/bin/python
>>
>> Ran my mpi jobs on a smaller dataset. Run time ~17 minutes with 8
>> seconds of cpu usage....
>>
>> The job runtime/cpu usage with the nemesis configured:
>> Session: 13944
>> Limits: ncpus=4,neednodes=1,nodes=1,walltime=48:00:00
>> Resources:
>> cput=00:00:08,mem=9960kb,vmem=279864kb,walltime=00:17:32
>>
>>
>> Basically, the same issue, long run times, with minimal cpu usage.
>>
>>
>> Rajeev Thakur wrote:
>>
>>> Hmm... Not sure what is going on here. Is your job expected
>>>
>> to take 15
>>
>>> hours? You may also want to try using the Nemesis
>>>
>> communication channel in
>>
>>> MPICH2, which will use shared memory for communication
>>>
>> within a node and TCP
>>
>>> (or other network) across nodes. Configure with
>>>
>> --with-device=ch3:nemesis.
>>
>>> Rajeev
>>>
>>>
>>>
>>>
>>>> -----Original Message-----
>>>> From: mpich-discuss-bounces at mcs.anl.gov
>>>> [mailto:mpich-discuss-bounces at mcs.anl.gov] On Behalf Of Mary
>>>> Ellen Fitzpatrick
>>>> Sent: Wednesday, February 04, 2009 10:46 AM
>>>> To: mpich-discuss at mcs.anl.gov; Mary Ellen Fitzpatrick
>>>> Subject: [mpich-discuss] mpi runs for 15 hourrs, using 7 mins cpu
>>>>
>>>> I have a dual-dual core Opteron cluster running Cento5,
>>>>
>> torque-2.3.6,
>>
>>>> maui-3.2.6p21, mpich2-1.0.8(64bit) and a docking program, parallel
>>>> dock6. I installed dock6 serial as 32-bit, then installed dock6
>>>> parallel as 32-bit.
>>>> I have configured my queues and scripts to run the dock
>>>>
>> mpi jobs and
>>
>>>> they do run to completion without errors.
>>>>
>>>> The problem I am seeing is that my mpi job is running for
>>>>
>> a total of
>>
>>>> 15hours, but is using only ~ 7minutes of cputime.
>>>> outfile
>>>> Limits: ncpus=4,neednodes=1,nodes=1,walltime=48:00:00
>>>> Resources:
>>>> cput=00:06:55,mem=9964kb,vmem=279836kb,walltime=15:12:46
>>>>
>>>> When the job is running, I log into the node, and can see the
>>>> cpu's at
>>>> 100%, so it is not sitting idle and there is not an nfs
>>>> traffic to speak of.
>>>>
>>>> Anyone run into this issue before? Is this an mpi issue?
>>>>
>>>> --
>>>> Thanks
>>>> Mary Ellen
>>>>
>>>>
>>>>
>>>>
>>>
>>>
>> --
>> Thanks
>> Mary Ellen
>>
>>
>>
>
>
>
--
Thanks
Mary Ellen
More information about the mpich-discuss
mailing list