[mpich-discuss] problem with mpiexec while running parallel execution

Jeff Hammond jhammond at alcf.anl.gov
Sun Feb 5 16:37:58 CST 2012


You're using mpibull2-1.3.9-18.s, which is not readily identified as a
version of MPICH2 (maybe the developers know it), although it seems to
be a derivative thereof.  Can you run
"/opt/mpi/mpibull2-1.3.9-18.s/bin/mpiexec -info" to generate detailed
version information on your MPICH2 installation?

Regardless of the version of MPICH2 you are using, your problem has to
do with MPD, but MPD is no longer supported.  You can refer to
http://wiki.mcs.anl.gov/mpich2/index.php/Frequently_Asked_Questions#Q:_I_don.27t_like_.3CWHATEVER.3E_about_mpd.2C_or_I.27m_having_a_problem_with_mpdboot.2C_can_you_fix_it.3F
for more information.

Hydra is the replacement for MPD and it is an excellent process
manager.  The system administrators at your site should install a more
recent version of MPICH2 that will have Hydra as the default process
manager.  If your machine has Infiniband, recent versions of MVAPICH2
(which is derived from MPICH2) will also have Hydra support.

Best,

Jeff


On Sun, Feb 5, 2012 at 4:03 PM, Elie M <elie.moujaes at hotmail.co.uk> wrote:
> Dear sir/madam,
>
>
>
> I am running a parallel execution (pw.x) on a SLURM LINUX interface  and
> once I run the command sbatch filename.srm, the calculation starts running
> and then stops with the follwing error:
>
>
>
> "mpiexec_veredas5: cannot connect to local mpd
>
>  (/tmp/mpd2.console_sushil); possible causes:
>
>   1. no mpd is running on this host
>
>  2. an mpd is running but was started without a "console" (-n option)
>
>  In case 1, you can start an mpd on this host with:
>
>     mpd &
>
>  and you will be able to run jobs just on this host.
>
>  For more details on starting mpds on a set of hosts, see
>
>  the MPICH2 Installation Guide."
>
>
> The script is executed using the package of quantum Espresso (QE). You will
> find below the script I am using to run the QE: The architecture is an INTEL
> based cluster.
>
>
> " #!/bin/bash
>
> #SBATCH -o
> /home_cluster/fis718/eliemouj/espresso-4.3.2/GB72/GB72-script.scf.out
> #SBATCH -N 1
> #SBATCH --nodelist=veredas13
> #SBATCH -J scf-GB72-ph
> #SBATCH --account=fis718
> #SBATCH --partition=long
> #SBATCH --get-user-env
> #SBATCH -e GB72ph.scf.fit.err
>
> /opt/mpi/mpibull2-1.3.9-18.s/bin/mpiexec
> /home_cluster/fis718eliemouj/espresso-4.3-2/bin/pw.x <GB72ph.scf.in
>>GB72ph.scf.out
>
> "
>
> Please can anyone tell me what might be going wrong and how to fix this. I
> am not that professional in LINUX; I would appreciate a rather detailed
> solution for the problem or if possible where can I find such a solution.
> Hope to hear from you soon.
>
>
> Regards
>
>
> Elie Moujaes
>
>
>
>
> _______________________________________________
> mpich-discuss mailing list     mpich-discuss at mcs.anl.gov
> To manage subscription options or unsubscribe:
> https://lists.mcs.anl.gov/mailman/listinfo/mpich-discuss
>



-- 
Jeff Hammond
Argonne Leadership Computing Facility
University of Chicago Computation Institute
jhammond at alcf.anl.gov / (630) 252-5381
http://www.linkedin.com/in/jeffhammond
https://wiki.alcf.anl.gov/old/index.php/User:Jhammond


More information about the mpich-discuss mailing list