[mpich-discuss] problem with mpiexec while running parallel execution

Elie M elie.moujaes at hotmail.co.uk
Mon Feb 6 08:33:04 CST 2012


Thanks very much for the help. Actually i could not run mpiexec -info (there was no -info option); however a colleague gave me another (rather more complicated) script to do the parallel run. Now, I get another error which is the following:
"/var/log/slurm/slurmd/job785314/slurm_script:
line 95: cd: /storage/fis718/ananias/GB72scfPH.785314: Input/output error
/home_cluster/fis718/eliemouj/espresso-4.3.2/bin/./pw.x: symbol lookup error:
/home_cluster/fis718/eliemouj/espresso-4.3.2/bin/./pw.x: undefined symbol:
mpi_init_

/home_cluster/fis718/eliemouj/espresso-4.3.2/bin/./pw.x: symbol lookup error:
/home_cluster/fis718/eliemouj/espresso-4.3.2/bin/./pw.x: undefined symbol:
mpi_init_"


I have googled that error but could not understand a lot about what the possible solution could be. I am sorry to bother you with this but i am Linux newbie and these problems are complicated for me at this stage. Can you please help me in this by posting a detailed solution of what can be done, if possible.
Regards
Elie
> From: jhammond at alcf.anl.gov
> Date: Sun, 5 Feb 2012 16:37:58 -0600
> To: mpich-discuss at mcs.anl.gov
> Subject: Re: [mpich-discuss] problem with mpiexec while running parallel	execution
> 
> You're using mpibull2-1.3.9-18.s, which is not readily identified as a
> version of MPICH2 (maybe the developers know it), although it seems to
> be a derivative thereof.  Can you run
> "/opt/mpi/mpibull2-1.3.9-18.s/bin/mpiexec -info" to generate detailed
> version information on your MPICH2 installation?
> 
> Regardless of the version of MPICH2 you are using, your problem has to
> do with MPD, but MPD is no longer supported.  You can refer to
> http://wiki.mcs.anl.gov/mpich2/index.php/Frequently_Asked_Questions#Q:_I_don.27t_like_.3CWHATEVER.3E_about_mpd.2C_or_I.27m_having_a_problem_with_mpdboot.2C_can_you_fix_it.3F
> for more information.
> 
> Hydra is the replacement for MPD and it is an excellent process
> manager.  The system administrators at your site should install a more
> recent version of MPICH2 that will have Hydra as the default process
> manager.  If your machine has Infiniband, recent versions of MVAPICH2
> (which is derived from MPICH2) will also have Hydra support.
> 
> Best,
> 
> Jeff
> 
> 
> On Sun, Feb 5, 2012 at 4:03 PM, Elie M <elie.moujaes at hotmail.co.uk> wrote:
> > Dear sir/madam,
> >
> >
> >
> > I am running a parallel execution (pw.x) on a SLURM LINUX interface  and
> > once I run the command sbatch filename.srm, the calculation starts running
> > and then stops with the follwing error:
> >
> >
> >
> > "mpiexec_veredas5: cannot connect to local mpd
> >
> >  (/tmp/mpd2.console_sushil); possible causes:
> >
> >   1. no mpd is running on this host
> >
> >  2. an mpd is running but was started without a "console" (-n option)
> >
> >  In case 1, you can start an mpd on this host with:
> >
> >     mpd &
> >
> >  and you will be able to run jobs just on this host.
> >
> >  For more details on starting mpds on a set of hosts, see
> >
> >  the MPICH2 Installation Guide."
> >
> >
> > The script is executed using the package of quantum Espresso (QE). You will
> > find below the script I am using to run the QE: The architecture is an INTEL
> > based cluster.
> >
> >
> > " #!/bin/bash
> >
> > #SBATCH -o
> > /home_cluster/fis718/eliemouj/espresso-4.3.2/GB72/GB72-script.scf.out
> > #SBATCH -N 1
> > #SBATCH --nodelist=veredas13
> > #SBATCH -J scf-GB72-ph
> > #SBATCH --account=fis718
> > #SBATCH --partition=long
> > #SBATCH --get-user-env
> > #SBATCH -e GB72ph.scf.fit.err
> >
> > /opt/mpi/mpibull2-1.3.9-18.s/bin/mpiexec
> > /home_cluster/fis718eliemouj/espresso-4.3-2/bin/pw.x <GB72ph.scf.in
> >>GB72ph.scf.out
> >
> > "
> >
> > Please can anyone tell me what might be going wrong and how to fix this. I
> > am not that professional in LINUX; I would appreciate a rather detailed
> > solution for the problem or if possible where can I find such a solution.
> > Hope to hear from you soon.
> >
> >
> > Regards
> >
> >
> > Elie Moujaes
> >
> >
> >
> >
> > _______________________________________________
> > mpich-discuss mailing list     mpich-discuss at mcs.anl.gov
> > To manage subscription options or unsubscribe:
> > https://lists.mcs.anl.gov/mailman/listinfo/mpich-discuss
> >
> 
> 
> 
> -- 
> Jeff Hammond
> Argonne Leadership Computing Facility
> University of Chicago Computation Institute
> jhammond at alcf.anl.gov / (630) 252-5381
> http://www.linkedin.com/in/jeffhammond
> https://wiki.alcf.anl.gov/old/index.php/User:Jhammond
> _______________________________________________
> mpich-discuss mailing list     mpich-discuss at mcs.anl.gov
> To manage subscription options or unsubscribe:
> https://lists.mcs.anl.gov/mailman/listinfo/mpich-discuss
 		 	   		  
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.mcs.anl.gov/pipermail/mpich-discuss/attachments/20120206/c3912d34/attachment.htm>


More information about the mpich-discuss mailing list