[mpich-discuss] Problems running mpi application on different CPUs

Rajeev Thakur thakur at mcs.anl.gov
Mon Sep 28 13:13:07 CDT 2009


Try using the mpdcheck utility to debug as described in the appendix of
the installation guide. Pick one client and the server.

Rajeev 

> -----Original Message-----
> From: mpich-discuss-bounces at mcs.anl.gov 
> [mailto:mpich-discuss-bounces at mcs.anl.gov] On Behalf Of 
> Gaetano Bellanca
> Sent: Monday, September 28, 2009 6:00 AM
> Cc: mpich-discuss at mcs.anl.gov
> Subject: Re: [mpich-discuss] Problems running mpi application 
> on different CPUs
> 
> Dear Rajeev,
> 
> thanks for your help. I disabled the firewall on the server (the only 
> one running) and tried with any other combination.
> All the clients together are running correctly. The same for the 
> processors on the server separately.
> The problem is only when I mix processes on the server and on 
> the client.
> 
> When I run mpdtrace on the server, all the CPUs are 
> responding correctly.
> The same happens if I run in parallel 'hostname'
> 
> Probably, it is a problem of my code, but it works on a cluster of 10 
> Pentium IV PEs.
> I discover a 'strange behavior':
> 1) running the code with the server as a first machine of the 
> pool, the 
> code hangs with the previously communicated error;
> 2) if I put the server as a second machine of the pool, the 
> code starts 
> and works regularly up to the writing procedures, opens the 
> first file 
> and then remains indefinitely waiting for something;
> 
> Should I compile mpich2 with some particular communicator? I have 
> nemesis, at the moment.
> I'm using this for mpich2 compilation:
> ./configure --prefix=/opt/mpich2/1.1/intel11.1 --enable-cxx 
> --enable-f90 
> --enable-fast --enable-traceback --with-mpe --enable-f90modules 
> --enable-cache --enable-romio --with-file-system=nfs+ufs+pvfs2 
> --with-device=ch3:nemesis --with-pvfs2=/usr/local 
> --with-java=/usr/lib/jvm/java-6-sun-1.6.0.07/ --with-pm=mpd:hydra
> 
> Regards
> 
> Gaetano
> 
> Rajeev Thakur ha scritto:
> > Try running on smaller subsets of the machines to debug the 
> problem. It
> > is possible that a process on some machine cannot connect to another
> > because of some firewall settings.
> >
> > Rajeev
> >
> >  
> >> -----Original Message-----
> >> From: mpich-discuss-bounces at mcs.anl.gov 
> >> [mailto:mpich-discuss-bounces at mcs.anl.gov] On Behalf Of 
> Gaetano Bellanca
> >> Sent: Saturday, September 26, 2009 7:10 AM
> >> To: mpich-discuss at mcs.anl.gov
> >> Subject: [mpich-discuss] Problems running mpi application on 
> >> different CPUs
> >>
> >> Hi,
> >>
> >> I'm sorry but  I posted with a wrong Object my previous message!!!
> >>
> >> I have a small cluster of
> >> a) 1 server: dual processor / quad core Intel(R) Xeon(R) CPU  E5345
> >> b) 4 clients: single processor / dual core Intel(R) 
> Core(TM)2 Duo CPU 
> >> E8400 connected  with a 1Gbit/s ethernet network.
> >>
> >> I compiled mpich2-1.1.1p1 on the first system (a) and 
> share mpich on 
> >> the other computers via nfs. I have mpd running as a root 
> on all the 
> >> computers (ubunt 8.04 . kernel 2.6.24-24-server)
> >>
> >> When I run my code in parallel on the first system, it works 
> >> correctly; the same happens running the same code  in 
> parallel on the 
> >> other computers (always running the code from the server). On the 
> >> contrary, running the code using processors from both the 
> server and 
> >> the clients at the same time with the command:
> >>
> >> mpiexec -machinefile machinefile -n 4 my_parallel_code
> >>
> >> I receive this error message:
> >>
> >> Fatal error in MPI_Init: Other MPI error, error stack:
> >> MPIR_Init_thread(394): Initialization failed
> >> (unknown)(): Other MPI error
> >> rank 3 in job 8  c1_4545   caused collective abort of all ranks
> >>  exit status of rank 3: return code 1
> >>
> >> Should I use some particular flags in compilation or at run time?
> >>
> >> Regards.
> >>
> >> Gaetano
> >>
> >> -- 
> >> Gaetano Bellanca - Department of Engineering - University 
> of Ferrara 
> >> Via Saragat, 1 - 44100 - Ferrara - ITALY Voice (VoIP): +39 0532 
> >> 974809 Fax: +39 0532 974870 mailto:gaetano.bellanca at unife.it
> >>
> >> L'istruzione costa? Stanno provando con l'ignoranza!
> >>
> >>
> >>     
> >
> >
> >   
> 
> -- 
> Gaetano Bellanca - Department of Engineering - University of Ferrara
> Via Saragat, 1 - 44100 - Ferrara - ITALY
> Voice (VoIP): +39 0532 974809 Fax: +39 0532 974870
> mailto:gaetano.bellanca at unife.it
> 
> L'istruzione costa? Stanno provando con l'ignoranza!
> 
> 
> 



More information about the mpich-discuss mailing list