[MPICH] MPI hint cb_nodes used in both open and setview
Wei-keng Liao
wkliao at ece.northwestern.edu
Sat Feb 9 10:12:41 CST 2008
Hi, Rob,
The simplest fix is to ignore the subsequent values for cb_nodes. If ROMIO
chooses this ignore strategy, the subsequent cb_config_list hints should
also be ignored. This is because both cb_nodes and cb_config_list
determine what processes must open the file during MPI_File_open.
Recalculating aggregator approach can be complicate. Say, if a different
set of processes in cb_nodes or cb_config_list is used in a subsequent
setview, file must be closed by those processes not in the new aggregators
and opened by the processes not in the old aggregators. (Some other
settings may need to change, too.) Data consistency is also an issue here.
Ideally, an MPI-IO hint should take effect when provided through a
setview and MPI allows a program to call any number of setviews between
file open and close. Fortunately, MPI also allows an implementation to
ignore the hints :). In this case, we are not ignoring the hints
completely, but choose where to ignore them.
Wei-keng
On Sat, 9 Feb 2008, Rob Ross wrote:
> Hi Wei-keng,
>
> Thanks for pointing out this bug. So what do you think that it should do?
> Ignore the subsequent values for cb_nodes, or recalculate aggregators?
>
> Rob
>
> On Feb 8, 2008, at 11:28 PM, Wei-keng Liao wrote:
>
> >
> >I found that when a program uses the ROMIO hint cb_nodes in an MPI info
> >object and the info object is used in both MPI_File_open and
> >MPI_File_set_view, the latter will overwrite the value of
> >fd->hints->cb_nodes that first was set in MPI_File_open.
> >
> >I think MPI does not say one cannot use the same info in both open and
> >set_view.
> >
> >For example, a program allocates 10 MPI processes on a cluster with 2
> >processes per compute node. If the user sets cb_nodes hint to 7,
> >MPI_File_open will "intelligently" set fd->hints->cb_nodes to 5 and
> >allocate space for fd->hints->ranklist[] (of size 5), but later
> >MPI_File_set_view will change fd->hints->cb_nodes back to 7. So, when
> >fd->hints->ranklist[] is referred in ADIOI_Calc_aggregator(), it may refer
> >fd->hints->ranklist[5,6], which is out of array boundary.
> >
> >I have a simple program that generates coredump because of this.
> >
> >Wei-keng
> >
>
More information about the mpich-discuss
mailing list