[mpich-discuss] mpi runs for 15 hourrs, using 7 mins cpu
Rajeev Thakur
thakur at mcs.anl.gov
Wed Feb 4 12:32:05 CST 2009
Can you just try withou the --enable-threads=multiple option? It is not
needed. The default option is --enable-threads=runtime, which is more
efficient. I am not sure if it will make any difference, but worth a try.
It is also possible that Torque isn't reporting the CPU time correctly.
Rajeev
> -----Original Message-----
> From: mpich-discuss-bounces at mcs.anl.gov
> [mailto:mpich-discuss-bounces at mcs.anl.gov] On Behalf Of Mary
> Ellen Fitzpatrick
> Sent: Wednesday, February 04, 2009 12:13 PM
> To: mpich-discuss at mcs.anl.gov; Mary Ellen Fitzpatrick
> Subject: Re: [mpich-discuss] mpi runs for 15 hourrs, using 7 mins cpu
>
> Thanks, I recompiled with nemesis with additional configs below.
> $ ./configure --prefix=/usr/local/mpich2.nemesis --enable-cxx
> --enable-threads=multiple --with-thread-package=posix --enable-shared
> --enable-sharedlibs=gcc --with-device=ch3:nemesis
> --with-python=/usr/bin/python
>
> Ran my mpi jobs on a smaller dataset. Run time ~17 minutes with 8
> seconds of cpu usage....
>
> The job runtime/cpu usage with the nemesis configured:
> Session: 13944
> Limits: ncpus=4,neednodes=1,nodes=1,walltime=48:00:00
> Resources:
> cput=00:00:08,mem=9960kb,vmem=279864kb,walltime=00:17:32
>
>
> Basically, the same issue, long run times, with minimal cpu usage.
>
>
> Rajeev Thakur wrote:
> > Hmm... Not sure what is going on here. Is your job expected
> to take 15
> > hours? You may also want to try using the Nemesis
> communication channel in
> > MPICH2, which will use shared memory for communication
> within a node and TCP
> > (or other network) across nodes. Configure with
> --with-device=ch3:nemesis.
> >
> > Rajeev
> >
> >
> >
> >> -----Original Message-----
> >> From: mpich-discuss-bounces at mcs.anl.gov
> >> [mailto:mpich-discuss-bounces at mcs.anl.gov] On Behalf Of Mary
> >> Ellen Fitzpatrick
> >> Sent: Wednesday, February 04, 2009 10:46 AM
> >> To: mpich-discuss at mcs.anl.gov; Mary Ellen Fitzpatrick
> >> Subject: [mpich-discuss] mpi runs for 15 hourrs, using 7 mins cpu
> >>
> >> I have a dual-dual core Opteron cluster running Cento5,
> torque-2.3.6,
> >> maui-3.2.6p21, mpich2-1.0.8(64bit) and a docking program, parallel
> >> dock6. I installed dock6 serial as 32-bit, then installed dock6
> >> parallel as 32-bit.
> >> I have configured my queues and scripts to run the dock
> mpi jobs and
> >> they do run to completion without errors.
> >>
> >> The problem I am seeing is that my mpi job is running for
> a total of
> >> 15hours, but is using only ~ 7minutes of cputime.
> >> outfile
> >> Limits: ncpus=4,neednodes=1,nodes=1,walltime=48:00:00
> >> Resources:
> >> cput=00:06:55,mem=9964kb,vmem=279836kb,walltime=15:12:46
> >>
> >> When the job is running, I log into the node, and can see the
> >> cpu's at
> >> 100%, so it is not sitting idle and there is not an nfs
> >> traffic to speak of.
> >>
> >> Anyone run into this issue before? Is this an mpi issue?
> >>
> >> --
> >> Thanks
> >> Mary Ellen
> >>
> >>
> >>
> >
> >
> >
>
> --
> Thanks
> Mary Ellen
>
>
More information about the mpich-discuss
mailing list