[MPICH] Re: [torqueusers] Re: [gt-user] Run a wrapped command with MPI

Matias Alberto Gavinowich mattlistas at gmail.com
Fri May 18 09:48:27 CDT 2007


Mehdi,

OK, I cilcked on a link at MPICH-G2's site and came to MPICH2, that's
why I assumed it was based on it.

Still, I don't think what I changed to pbs.pm will help in your case
(you can apply the patch anyway if you want, I just replaced the lines
I described in my previous email).

I don't see where in your RSL you are telilng MPI which executable to
actually run. In MPICH1 using mpirun through Globus GRAM's
jobmanager-pbs, it is the jobmanager that calls mpirun, and in the RSL
you indicate the actual name of the executable.

I am copying your question to the mpich list (please subscribe if you
have not done so).

Regards,

Matt


On 5/18/07, Mehdi Sheikhalishahi <mehdi.alishahi at gmail.com> wrote:
> Dear Matias,
>  No, MPICH-G2 is based on MPICH1 not MPICH2. MPICH-G2 at Local sites uses
> Local sites's mpich for example MPICH1. MPICH-G2 only coordinates between
> Local MPICH schedulers.
>
>
>
>  On 5/17/07, Matias Alberto Gavinowich <mattlistas at gmail.com> wrote:
> > Mehdi,
> >
> > I don't think my patch to pbs.pm will help with your problem. I am
> > running an old version of mpich, while MPICH-G2 is, as far as I know,
> > based on the new MPICH2.
> >
> > I have not tested MPICH-G2, but I see you are specifying that the job
> > type is MPI and also specifying that the executable is from MPI. Where
> > are you stating your own executable name? jobmanager-pbs should be the
> > one invoking the MPI command, at list that is how it is done with
> > mpirun and mpiexec (I don't know with MPICH-G2).
> >
> > Regards,
> >
> > Matt
> >
> >
> > On 5/17/07, Mehdi Sheikhalishahi < mehdi.alishahi at gmail.com> wrote:
> > > Dear Matias,
> > >   After submitting the following mpich-g2 job to the Torque Server via
> > > Globus, in the Torque Server I got the following messages periodically
> and
> > > my simple mpich-g2 job does not finish.
> > > Can you please help me? Does your patch to pbs.pm can resolve my problem
> for
> > > mpich-g2 jobs?
> > > ---------------mpich-g2
> > > job-------------------------------------------------
> > > +
> > > (
> > >
> &(resourceManagerContact="Server.eng4.shirazu.ac.ir/jobmanager-pbs
> > > ")
> > >    (count=2)
> > >    (jobtype=mpi)
> > >    (label="subjob 0")
> > >    (environment=(GLOBUS_DUROC_SUBJOB_INDEX 0)
> > >                 (LD_LIBRARY_PATH /usr/local/globus-4.0.3/lib/))
> > >    (directory="/home/grid/globusTest/MPICH-G2")
> > >    (executable="/home/grid/globusTest/MPICH-G2/ring")
> > >    (stdout=TorqueOut)
> > >    (stderr=TorqueErr)
> > > )
> > >
> ------------------------------------------------------------------------------------------------------------
> > > 05/15/2007 11:58:56;0100;PBS_Server;Req;;Type StatusJob
> > > request received from grid at Server.eng4.shirazu.ac.ir, sock=11
> > > 05/15/2007 11:59:06;0100;PBS_Server;Req;;Type
> > > AuthenticateUser request received from grid at Server.eng4.shirazu.ac.ir,
> > > sock=14
> > > 05/15/2007 11:59:06;0100;PBS_Server;Req;;Type StatusJob
> > > request received from grid at Server.eng4.shirazu.ac.ir, sock=11
> > > 05/15/2007 11:59:16;0100;PBS_Server;Req;;Type
> > > AuthenticateUser request received from grid at Server.eng4.shirazu.ac.ir,
> > > sock=14
> > > 05/15/2007 11:59:16;0100;PBS_Server;Req;;Type StatusJob
> > > request received from grid at Server.eng4.shirazu.ac.ir , sock=11
> > > 05/15/2007 11:59:26;0100;PBS_Server;Req;;Type
> > > AuthenticateUser request received from grid at Server.eng4.shirazu.ac.ir,
> > > sock=14
> > > 05/15/2007 11:59:26;0100;PBS_Server;Req;;Type StatusJob
> > > request received from grid at Server.eng4.shirazu.ac.ir , sock=11
> > > 05/15/2007 11:59:36;0100;PBS_Server;Req;;Type
> > > AuthenticateUser request received from grid at Server.eng4.shirazu.ac.ir,
> > > sock=14
> > > 05/15/2007 11:59:36;0100;PBS_Server;Req;;Type StatusJob
> > > request received from grid at Server.eng4.shirazu.ac.ir, sock=11
> > > 05/15/2007 11:59:47;0100;PBS_Server;Req;;Type
> > > AuthenticateUser request received from grid at Server.eng4.shirazu.ac.ir ,
> > > sock=14
> > > 05/15/2007 11:59:47;0100;PBS_Server;Req;;Type StatusJob
> > > request received from grid at Server.eng4.shirazu.ac.ir , sock=11
> > > 05/15/2007 11:59:57;0100;PBS_Server;Req;;Type
> > > AuthenticateUser request received from grid at Server.eng4.shirazu.ac.ir,
> > > sock=14
> > > 05/15/2007 11:59:57;0100;PBS_Server;Req;;Type StatusJob
> > > request received from grid at Server.eng4.shirazu.ac.ir, sock=11
> > > 05/15/2007 12:00:07;0100;PBS_Server;Req;;Type
> > > AuthenticateUser request received from grid at Server.eng4.shirazu.ac.ir,
> > > sock=14
> > > 05/15/2007 12:00:07;0100;PBS_Server;Req;;Type StatusJob
> > > request received from grid at Server.eng4.shirazu.ac.ir , sock=11
> > > 05/15/2007 12:00:17;0100;PBS_Server;Req;;Type
> > > AuthenticateUser request received from grid at Server.eng4.shirazu.ac.ir,
> > > sock=14
> > > 05/15/2007 12:00:17;0100;PBS_Server;Req;;Type StatusJob
> > > request received from grid at Server.eng4.shirazu.ac.ir, sock=11
> > > 05/15/2007 12:00:27;0100;PBS_Server;Req;;Type
> > > AuthenticateUser request received from grid at Server.eng4.shirazu.ac.ir,
> > > sock=14
> > > 05/15/2007 12:00:27;0100;PBS_Server;Req;;Type StatusJob
> > > request received from grid at Server.eng4.shirazu.ac.ir , sock=11
> > > 05/15/2007 12:00:37;0100;PBS_Server;Req;;Type
> > > AuthenticateUser request received from grid at Server.eng4.shirazu.ac.ir ,
> > > sock=14
> > > 05/15/2007 12:00:37;0100;PBS_Server;Req;;Type StatusJob
> > > request received from grid at Server.eng4.shirazu.ac.ir, sock=11
> > > 05/15/2007 12:00:47;0100;PBS_Server;Req;;Type
> > > AuthenticateUser request received from grid at Server.eng4.shirazu.ac.ir,
> > > sock=14
> > > 05/15/2007 12:00:47;0100;PBS_Server;Req;;Type StatusJob
> > > request received from grid at Server.eng4.shirazu.ac.ir , sock=11
> > > 05/15/2007 12:00:57;0100;PBS_Server;Req;;Type
> > > AuthenticateUser request received from grid at Server.eng4.shirazu.ac.ir,
> > > sock=14
> > > 05/15/2007 12:00:57;0100;PBS_Server;Req;;Type StatusJob
> > > request received from grid at Server.eng4.shirazu.ac.ir , sock=11
> > >
> > >
> > > On 5/16/07, Matias Alberto Gavinowich <mattlistas at gmail.com > wrote:
> > > >
> > > > Hello:
> > > >
> > > > Thank you all for your help. The problem was indeed the missing
> > > > command line arguments.
> > > >
> > > > I patched mi pbs.pm script from the
> > > > $GLOBUS_LOCATION/lib/perl/Globus/GRAM/JobManager
> > > directory in the way
> > > > I describe below. It is a bit untidy, but I got it to work.
> > > >
> > > > Replaced:
> > > >
> > > >             print CMD "#!/bin/sh\n";
> > > >             print CMD 'cd ', $description->directory(), "\n";
> > > >             print CMD "$rsh_env\n";
> > > >             print CMD $description->executable(), " $args\n";
> > > >             close(CMD);
> > > >             chmod 0700, $cmd_script_name;
> > > >
> > > > With:
> > > >
> > > >             print CMD "#!/bin/sh\n";
> > > >             print CMD 'cd ', $description->directory(), "\n";
> > > >             print CMD "$rsh_env\n";
> > > >
> > > >         if ($description->jobtype() eq "mpi")
> > > >         {
> > > >             if ($mpiexec ne 'no')
> > > >             {
> > > >                 # below is the line as it originally was
> > > >                 print CMD $description->executable(), " $args\n";
> > > >             }
> > > >             else
> > > >             {
> > > >                 # mpirun case
> > > >                 print CMD $description->executable(), " \"\$@\"\n";
> > > >             }
> > > >         }
> > > >         else
> > > >         {
> > > >             # below is the line as it originally was
> > > >             print CMD $description->executable(), " $args\n";
> > > >         }
> > > >
> > > >             close(CMD);
> > > >             chmod 0700, $cmd_script_name;
> > > >
> > > >
> > > > And replaced:
> > > >
> > > >        if ($description->jobtype() eq "mpi")
> > > >         {
> > > >             if ($mpiexec ne 'no')
> > > >             {
> > > >                 my $machinefilearg = "";
> > > >                 if ($cluster)
> > > >                 {
> > > >                     $machinefilearg = ' -machinefile
> > > $PBS_NODEFILE';
> > > >                 }
> > > >                 print JOB "$mpiexec $machinefilearg -n " .
> > > > $description->count();
> > > >             }
> > > >             else
> > > >             {
> > > >                 print JOB "$mpirun -np " . $description->count();
> > > >                 if ($cluster)
> > > >                 {
> > > >                     print JOB ' -machinefile $PBS_NODEFILE';
> > > >                 }
> > > >             }
> > > >
> > > >             print JOB " $cmd_script_name < " .  $description->stdin()
> .
> > > "\n";
> > > >         }
> > > >
> > > > With:
> > > >
> > > >         if ($description->jobtype() eq "mpi")
> > > >         {
> > > >             if ($mpiexec ne 'no')
> > > >             {
> > > >                 my $machinefilearg = "";
> > > >                 if ($cluster)
> > > >                 {
> > > >                     $machinefilearg = ' -machinefile
> > > $PBS_NODEFILE';
> > > >                 }
> > > >                 print JOB "$mpiexec $machinefilearg -n " .
> > > > $description->count();
> > > >                 # this line is copied from below, I separated the two
> > > > cases and only modified the mpirun case
> > > >                 print JOB " $cmd_script_name < " .
> > > > $description->stdin() . "\n";
> > > >             }
> > > >             else
> > > >             {
> > > >                 print JOB "$mpirun -np " . $description->count();
> > > >                 if ($cluster)
> > > >                 {
> > > >                     print JOB ' -machinefile $PBS_NODEFILE';
> > > >                 }
> > > >                 print JOB " $cmd_script_name $args < " .
> > > > $description->stdin() . "\n";
> > > >             }
> > > >
> > > >             # I broke the following line into two cases above
> > > >             # print JOB " $cmd_script_name < " .
> $description->stdin() .
> > > "\n";
> > > >         }
> > > >
> > > > Thanks again,
> > > >
> > > > Matt
> > > >
> > > >
> > > > On 5/15/07, Brian R. Toonen < toonen at mcs.anl.gov> wrote:
> > > > > Some implementations of mpirun pass command line arguments to the
> > > executable
> > > > > which are then used during MPI_Init() to determine how many
> processes to
> > > > > start, etc.  Your script isn't passing those command line arguments
> to
> > > the
> > > > > cpi executable.
> > > > >
> > > > > Try the following script.  It's a bit ugly, but it should work.
> > > > >
> > > > > #!/bin/sh
> > > > >
> > > > > IFS=
> > > > > args=""
> > > > > while test $# -gt 0 ; do
> > > > >     args="$args '"`echo "$1" | sed -e "s/'/'"'"'"'"'"'"'/g"`"'"
> > > > >     shift
> > > > > done
> > > > > args=`echo "$args" | sed 's/^ *//'`
> > > > >
> > > > > eval ./cpi "$args"
> > > > >
> > > > > --brian
> > > > >
> > > > > |-----Original Message-----
> > > > > |From: owner-gt-user at globus.org [mailto:owner-gt-user at globus.org ]
> On
> > > Behalf
> > > > > |Of Matias Alberto Gavinowich
> > > > > |Sent: Monday, May 14, 2007 14:43
> > > > > |To: mpich-discuss at mcs.anl.gov; torqueusers at supercluster.org; gt-
> > > > > |user at globus.org
> > > > > |Subject: [gt-user] Run a wrapped command with MPI
> > > > > |
> > > > > |Hello:
> > > > > |
> > > > > |I am having the following problem.
> > > > > |
> > > > > |I can run a comand through mpi invoking:
> > > > > |
> > > > > |mpirun -np 2 ./cpi   (I am using my machines.LINUX default file).
> > > > > |
> > > > > |Two processes are started, as expected.
> > > > > |
> > > > > |Then, I write a wrapper script that looks like:
> > > > > |
> > > > > |#!/bin/sh
> > > > > |./cpi             (I also tried with a full path)
> > > > > |
> > > > > |and I run:
> > > > > |
> > > > > |mpirun -np 2 ./cpiwrapper
> > > > > |
> > > > > |Only one process is started in this case.
> > > > > |
> > > > > |The trick is I need it to work with a wrapper, because
> jobmanager-pbs
> > > > > |from globus invokes it this way (with a wrapper script).
> > > > > |
> > > > > |AFAIK, I am running MPICH1.
> > > > > |
> > > > > |Has anyone come accross this?
> > > > > |
> > > > > |Thank you,
> > > > > |
> > > > > |Matt
> > > > >
> > > > >
> > > > _______________________________________________
> > > > torqueusers mailing list
> > > > torqueusers at supercluster.org
> > > >
> http://www.supercluster.org/mailman/listinfo/torqueusers
> > > >
> > >
> > >
> > >
> > > --
> > > Best Regards,
> > > S.Mehdi Sheikhalishahi,
> > > Web: http://www.cse.shirazu.ac.ir/~alishahi/
> > > Bye.
> >
>
>
>
> --
>
> Best Regards,
> S.Mehdi Sheikhalishahi,
> Web: http://www.cse.shirazu.ac.ir/~alishahi/
> Bye.




More information about the mpich-discuss mailing list