[MPICH] RE: Romio status and mailing list

Ashley Pittman ashley at quadrics.com
Thu Mar 16 10:39:42 CST 2006


Sylvain,

I think the advice on MPI_ERRORS_ARE_FATAL is spot on.  If this doesn't
work for you however then you should probably look at using MPI_Reduce
or even better MPI_Allreduce rather than MPI_Gather to detect errors.

As MPI_SUCCESS is defined to be 0 then finding the MAX is all that's
needed.

Ashley,

On Thu, 2006-03-16 at 09:56 -0600, Rajeev Thakur wrote:
> Sylvain,
>         On issue is that the MPI Standard defines the default error handler
> to the MPI_ERRORS_RETURN for I/O, whereas it is MPI_ERRORS_ARE_FATAL for the
> rest of MPI. Are you checking the error returns from the MPI-IO functions?
> 
> Rajeev 
> 
> PS: To post to mpich-discuss, you need to subscribe to the list. See
> http://www-unix.mcs.anl.gov/mpi/mpich2/maillist.htm
> 
> 
> 
> > -----Original Message-----
> > From: Sylvain Jeaugey [mailto:sylvain.jeaugey at bull.net] 
> > Sent: Thursday, March 16, 2006 2:28 AM
> > To: Rajeev Thakur
> > Cc: 'Sylvain Jeaugey'; mpich-discuss at mcs.anl.gov
> > Subject: RE: Romio status and mailing list
> > 
> > Rajeev,
> > 
> > Thanks for your mail.
> > It would seem fine to me if the other processes caused an 
> > abort. Still, it 
> > doesn't happen (did I misconfigure MPICH ?), and it often 
> > causes a hang 
> > (0 being in barrier, send/recv or finalize), and this is much more 
> > an issue in my point of view.
> > 
> > I think that aborting would be enough to guarantee a "clean" job 
> > termination and still keep good performance (gather being often much 
> > slower than bcast).
> > 
> > Comments welcome.
> > 
> > Cheers,
> > Sylvain
> > 
> > PS: I added mpich-discuss to CCs and removed *@mcs.anl.gov.
> > 
> > On Tue, 14 Mar 2006, Rajeev Thakur wrote:
> > 
> > > Sylvain,
> > >         With MPI_Bcast, at least one process will detect 
> > the inconsistent
> > > parameter and complain, not necessarily the root. It's not 
> > essential that
> > > the root be the complainer I think. 
> > > 
> > > > From: Sylvain Jeaugey [mailto:sylvain.jeaugey at bull.net] 
> > > > Sent: Tuesday, March 14, 2006 5:51 AM
> > > > Subject: Romio status and mailing list
> > > > 
> > > > We have a few concerns about the code behaviour in error 
> > > > cases, especially
> > > > on the arguments checks done with MPI_Bcast. Indeed, using 
> > > > this function,
> > > > the root process won't be able to detect an 
> > inconsistency. Thus, we're
> > > > wondering if MPI_Gather wouldn't do the job better than Bcast 
> > > > (though it
> > > > would be worse in terms of performance).
> > > > 
> > > > Sylvain
> > > > 
> > > > 
> > > > 
> > > 
> > 
> > 
> 




More information about the mpich-discuss mailing list