[mpich-discuss] [mpich2-maint] Problems with time and stdin

chan at mcs.anl.gov chan at mcs.anl.gov
Thu Nov 19 11:55:28 CST 2009


----- "ephi maglaras-cssi" <ephi.maglaras-cssi at snecma.fr> wrote:

> Can someone from the support give me an answer which solves or not my
> problems, please ? Thanks !

I am going to try again...

> Do you mean that "time" returns the time of communications between
> processors ? So we probably don't have the same "time" command.

Which time command we are using is irrelevant here.  Your original
toto.f90 has no MPI communication call, all the program does is fetching
rank and size of MPI_COMM_WORLD, and does a dummy DO loop, i.e. everything
is local call as far as MPICH2 is concerned.  

> I gave the comparison with intel MPI in order to show what are the
> results
> of "time" on our system ("man time"), it's not a problem with intel
> MPI :
>  - "real time" is the wall time, i.e. the elapsed real time between
> invocation and termination. It's about 10 seconds for the job, with
> MPI or MPICH2.

Are you using a really old machine ? either toto.f90 or loop.f90 should
take less than 1 second on a typical machine.  I compiled these 2 programs with ifort.

/homes/chan/tmp> /usr/bin/time -p loop_ifort
      110100
real 0.00
user 0.00
sys 0.01
/homes/chan/tmp> /usr/bin/time -p /homes/chan/mpich_work/install_linux64_svn_intel/bin/mpiexec -n 4 toto_ifort
Hello, I am processor n° 3/4.
Processor n° 3/4 has finished, the result is 200003.
Hello, I am processor n° 1/4.
Processor n° 1/4 has finished, the result is 200001.
Hello, I am processor n° 4/4.
Processor n° 4/4 has finished, the result is 200004.
Hello, I am processor n° 2/4.
Processor n° 2/4 has finished, the result is 200002.
real 0.29
user 0.06
sys 0.02

>  - "user time" is the CPU time, i.e. the sum of CPU time of all
> processors. We are interested in this time for our statistics. So with
> 4
> processors, it's nearly 4*(real time). You can verify the result of
> "time"
> with the simple program "loop.f90" : compile and run it with the
> command
> "time". Here it gives 64.959s for the real time and 63.537s for the
> user
> CPU time. On our system, "time" returns the good value with an MPI job
> but
> not with an MPICH2 one. That is the problem which does not enable us
> to
> make statistics.

Again, as I said earlier I have no access to Intel MPI so can't say anything
for sure.  I would guess that Intel MPI's MPI_Init() may have spent time to
form the connection among the 4 processes, MPICH2 delays the connection process
till the 1st communication call.  That may explain the 4 * real time. Another
possible reason is that Intel MPI uses threads internally somehow on all 4 cores.
In any case, 64 seconds for your toto.f90 that does basically nothing is still
too much even for a single core machine and 4 processes are on 4 separate nodes.
The best way to find out the reason for "4 * realtime" is to contact Intel MPI.
 
> As for the stdin problem, the test case I sent you (tutu.f90) does
> not
> have a too large stdin file, only 1000 lines, so why the stdin is not
> supported (error messages) ? It runs well with native MPI, with mpich
> (1),
> with lammpi but not with mpich2.


> We don't have hydra and I can't modify the sources of my "real"
> program
> since it runs well with other mpis.
> [rattachement "loop.f90" supprimé par Ephi MAGLARAS-CSSI/EXT/SPS]

Hydra is a process manager, so instead of using default mpiexec
from mpich2, you uses mpiexec.hydra to launch your MPI program.  Using
hydra does not require you to modify your program.  But you need to
use a newer version of MPICH2 that ships with hydra.  Also the default device
in newer MPICH2 is nemesis which is optimized to use on multi-core machine.
The latest is version 1.2.1, so 1.0.8p1 is considered old.

> Thank you.
> Best regards.
> Ephi.
> 
> 
> 
> 
> 
> chan at mcs.anl.gov
> 12/11/2009 21:58
> Veuillez répondre à
> Anthony Chan <chan at mcs.anl.gov>
> 
> 
> A
> Ephi MAGLARAS-CSSI/EXT/SPS at SPS
> cc
> Alain DEGUEIL/FP/SPS at SPS, Yves CHAMPAGNAC/FP/SPS at SPS,
> mpich-discuss at mcs.anl.gov
> Objet
> Re: [mpich2-maint] Problems with time and stdin
> 
> 
> 
> 
> 
> 
> 
> Since this is a general MPICH usage question, I am forwarding
> to mpich-discuss.
> 
> It seems you are comparing Intel MPI with MPICH2-1.0.8p1.
> I don't have access to Intel MPI so don't know why the intel
> MPI's uses so much user time on your simple program that
> does not do any communication. Maybe you could contact Intel
> MPI support for the answer.
> 
> What do you mean by "stdin files not well supported" ?
> If the default process manager, mpd, does not fit your
> need, you could try other process managers like hydra
> (i.e. mpiexec.hydra) which is available though the latest
> MPICH2 release, 1.2.1rc1.  If you want to read large amount
> of data to rank 0 (or all ranks), you may want to open/read
> the file on rank 0 (then broadcast the data to every ranks).
> 
> A.Chan
> 
> ----- "ephi maglaras-cssi" <ephi.maglaras-cssi at snecma.fr> wrote:
> 
> > Dear,
> > I meet 2 kind of problems when running an mpich2 job :
> >   - the "time" command does not give the user time
> >   - stdin files are not well supported
> > I compared mpich2 with native mpi. You can find test cases below
> (zip
> >
> > files), see README files for the problem descriptions.
> > How can I solve these problems ?
> > Thanks.
> > Best regards.
> > Ephi MAGLARAS.
> >
> > _______________________________________________
> > mpich2-maint mailing list
> > mpich2-maint at lists.mcs.anl.gov
> > https://lists.mcs.anl.gov/mailman/listinfo/mpich2-maint
> [rattachement "time.zip" supprimé par Ephi MAGLARAS-CSSI/EXT/SPS]
> [rattachement "stdin.zip" supprimé par Ephi MAGLARAS-CSSI/EXT/SPS]
> 
> 
> #
> " This e-mail and any attached documents may contain confidential or
> proprietary information. If you are not the intended recipient, please
> advise the sender immediately and delete this e-mail and all attached
> documents from your computer system. Any unauthorised disclosure,
> distribution or copying hereof is prohibited."
> 
>  " Ce courriel et les documents qui y sont attaches peuvent contenir
> des informations confidentielles. Si vous n'etes  pas le destinataire
> escompte, merci d'en informer l'expediteur immediatement et de
> detruire ce courriel  ainsi que tous les documents attaches de votre
> systeme informatique. Toute divulgation, distribution ou copie du
> present courriel et des documents attaches sans autorisation prealable
> de son emetteur est interdite."
> #


More information about the mpich-discuss mailing list