[MPICH] RE: Romio status and mailing list
Ashley Pittman
ashley at quadrics.com
Thu Mar 16 10:39:42 CST 2006
Sylvain,
I think the advice on MPI_ERRORS_ARE_FATAL is spot on. If this doesn't
work for you however then you should probably look at using MPI_Reduce
or even better MPI_Allreduce rather than MPI_Gather to detect errors.
As MPI_SUCCESS is defined to be 0 then finding the MAX is all that's
needed.
Ashley,
On Thu, 2006-03-16 at 09:56 -0600, Rajeev Thakur wrote:
> Sylvain,
> On issue is that the MPI Standard defines the default error handler
> to the MPI_ERRORS_RETURN for I/O, whereas it is MPI_ERRORS_ARE_FATAL for the
> rest of MPI. Are you checking the error returns from the MPI-IO functions?
>
> Rajeev
>
> PS: To post to mpich-discuss, you need to subscribe to the list. See
> http://www-unix.mcs.anl.gov/mpi/mpich2/maillist.htm
>
>
>
> > -----Original Message-----
> > From: Sylvain Jeaugey [mailto:sylvain.jeaugey at bull.net]
> > Sent: Thursday, March 16, 2006 2:28 AM
> > To: Rajeev Thakur
> > Cc: 'Sylvain Jeaugey'; mpich-discuss at mcs.anl.gov
> > Subject: RE: Romio status and mailing list
> >
> > Rajeev,
> >
> > Thanks for your mail.
> > It would seem fine to me if the other processes caused an
> > abort. Still, it
> > doesn't happen (did I misconfigure MPICH ?), and it often
> > causes a hang
> > (0 being in barrier, send/recv or finalize), and this is much more
> > an issue in my point of view.
> >
> > I think that aborting would be enough to guarantee a "clean" job
> > termination and still keep good performance (gather being often much
> > slower than bcast).
> >
> > Comments welcome.
> >
> > Cheers,
> > Sylvain
> >
> > PS: I added mpich-discuss to CCs and removed *@mcs.anl.gov.
> >
> > On Tue, 14 Mar 2006, Rajeev Thakur wrote:
> >
> > > Sylvain,
> > > With MPI_Bcast, at least one process will detect
> > the inconsistent
> > > parameter and complain, not necessarily the root. It's not
> > essential that
> > > the root be the complainer I think.
> > >
> > > > From: Sylvain Jeaugey [mailto:sylvain.jeaugey at bull.net]
> > > > Sent: Tuesday, March 14, 2006 5:51 AM
> > > > Subject: Romio status and mailing list
> > > >
> > > > We have a few concerns about the code behaviour in error
> > > > cases, especially
> > > > on the arguments checks done with MPI_Bcast. Indeed, using
> > > > this function,
> > > > the root process won't be able to detect an
> > inconsistency. Thus, we're
> > > > wondering if MPI_Gather wouldn't do the job better than Bcast
> > > > (though it
> > > > would be worse in terms of performance).
> > > >
> > > > Sylvain
> > > >
> > > >
> > > >
> > >
> >
> >
>
More information about the mpich-discuss
mailing list