[mpich2-dev] configuration problem

Pavan Balaji balaji at mcs.anl.gov
Fri Apr 16 16:30:50 CDT 2010


It looks like one of the application processes died for some reason.

You can use the -print-all-exitcodes option to mpiexec to see the exit 
codes for each process. Once you know which one the culprit process is, 
you can run it through a debugger to figure out what's going on.

For example, if you find that the second process is behaving badly, you 
can use:

% mpiexec -f host -np 1 ./abinit : -np 1 ddd ./abinit : -np 1 ./abinit < 
tnlo.files

This will bring up a ddd debugger window only for the second process. 
Alternatively, you can use a separate debugger window for all processes 
using:

% mpiexec -f host -np 3 ddd ./abinit < tnlo.files

  -- Pavan

On 04/16/2010 04:22 PM, lagoun brahim wrote:
> 
> HELLO :-)
> thank you Pavan for your reply
> my problem was at the firewall
> Now I have another problem:
> I created the host file (br:2/dft:2) when I start the calculation with 
> the following command: mpiexec -f host -np 3(4) apl I had the following 
> error message:
> mpiexec -f host -np 3 abinit<tnlo.files
>   ABINIT
> 
>   Give name for formatted input file:
> tnlo2.in
>   Give name for formatted output file:
> tnlo2.out
>   Give root name for generic input files:
> tnlo2i
>   Give root name for generic output files:
> tnlo2o
>   Give root name for generic temporary files:
> tnlo2
> Fatal error in PMPI_Barrier: Other MPI error, error stack:
> PMPI_Barrier(476).................: MPI_Barrier(MPI_COMM_WORLD) failed
> MPIR_Barrier(82)..................:
> MPIC_Sendrecv(161)................:
> MPIC_Wait(513)....................:
> MPIDI_CH3I_Progress(150)..........:
> MPID_nem_mpich2_blocking_recv(948):
> MPID_nem_tcp_connpoll(1709).......: Communication error
> Fatal error in PMPI_Barrier: Other MPI error, error stack:
> PMPI_Barrier(476).................: MPI_Barrier(MPI_COMM_WORLD) failed
> MPIR_Barrier(82)..................:
> MPIC_Sendrecv(161)................:
> MPIC_Wait(513)....................:
> MPIDI_CH3I_Progress(150)..........:
> MPID_nem_mpich2_blocking_recv(948):
> MPID_nem_tcp_connpoll(1709).......: Communication error
> Terminated (signal 15)
>  
> thank you again for your help
> 
> 

-- 
Pavan Balaji
http://www.mcs.anl.gov/~balaji


More information about the mpich2-dev mailing list