[MPICH] mpich and pbs
Steve Young
chemadm at hamilton.edu
Wed Jun 20 13:29:32 CDT 2007
Well in vasp when the job runs in parallel you get output like the
following:
vasp.4.6.28 25Jul05 complex
executed on LinuxIFC date 2007.06.20 10:20:59
running on 8 nodes
distr: one band on 2 nodes, 4 groups
That is what we expect to see as it is showing that the job is using
both of the two nodes it was allocated.
using the OCS mpiexec I get the following with the same job. I do see
there are 8 processes running but they seem to be 8 serial processes.
vasp.4.6.28 25Jul05 complex
executed on LinuxIFC date 2007.06.20 12:32:48
running on 1 nodes
distr: one band on 1 nodes, 1 groups
for amber if I run a typical job it acts like it is running properly and
I see all the proper processes started. But I believe it is also running
8 serial amber processes. However, when we try running a different part
of amber code it won't work as it complains about:
Error: specified more groups ( 4 ) than the number of processors
( 1 ) !
This makes me believe that it too is running 8 serial processes. When I
use the mpiexec or mpirun from mpich then the job works fine. I just get
this using mpiexec from OSC.
I'll try running some of the examples and see what I can come up with.
-Steve
On Wed, 2007-06-20 at 12:47 -0500, Rajeev Thakur wrote:
> > However, now it appears that the program being run is in serial.
> > For example, an 8 cpu job gets stared on two nodes (each node
> > has 4 cpu's - 2 dual core opterons). We see all 8 processes running on
> > the nodes. But in looking at the output it appears like a
> > serial job. I get the same results trying to use vasp and amber.
>
> What do you mean by "it appears like a serial job"? Do you mean
> performance-wise?
>
> Try running the cpi example from the examples directory on 8 processes. If
> you see 4 hostnames from 1 machine and 4 from the other, the job should be
> running ok. It's up to the OS to schedule the 4 processes on each machine.
> MPI doesn't do that.
>
> Rajeev
>
>
>
> > -----Original Message-----
> > From: owner-mpich-discuss at mcs.anl.gov
> > [mailto:owner-mpich-discuss at mcs.anl.gov] On Behalf Of Steve Young
> > Sent: Wednesday, June 20, 2007 10:58 AM
> > To: mpich-discuss at mcs.anl.gov
> > Subject: [MPICH] mpich and pbs
> >
> > Hello everyone,
> > I still seem to be having an issue with getting mpich
> > to work properly.
> > I have version mpich2-1.0.5 compiled. This works as expected
> > when I use
> > mpiexec or mpirun. However, the nodes that jobs run on aren't in sync
> > with the nodes that PBS allocates to the job. In posting to the list
> > before I was informed to use the mpiexec from OSC that works
> > with PBS. I
> > installed that and jobs now get started on the proper nodes that PBS
> > allocates. However, now it appears that the program being run is in
> > serial. For example, an 8 cpu job gets stared on two nodes (each node
> > has 4 cpu's - 2 dual core opterons). We see all 8 processes running on
> > the nodes. But in looking at the output it appears like a
> > serial job. I
> > get the same results trying to use vasp and amber. So I'm not
> > sure what
> > I could do to correct this. Any ideas?
> >
> > -Steve
> >
> >
> >
>
More information about the mpich-discuss
mailing list