[MPICH] MPI hint cb_nodes used in both open and setview

Wei-keng Liao wkliao at ece.northwestern.edu
Sat Feb 9 10:12:41 CST 2008


Hi, Rob,

The simplest fix is to ignore the subsequent values for cb_nodes. If ROMIO 
chooses this ignore strategy, the subsequent cb_config_list hints should 
also be ignored. This is because both cb_nodes and cb_config_list 
determine what processes must open the file during MPI_File_open.

Recalculating aggregator approach can be complicate. Say, if a different 
set of processes in cb_nodes or cb_config_list is used in a subsequent 
setview, file must be closed by those processes not in the new aggregators 
and opened by the processes not in the old aggregators. (Some other
settings may need to change, too.) Data consistency is also an issue here.

Ideally, an MPI-IO hint should take effect when provided through a 
setview and MPI allows a program to call any number of setviews between 
file open and close. Fortunately, MPI also allows an implementation to 
ignore the hints :). In this case, we are not ignoring the hints 
completely, but choose where to ignore them.

Wei-keng



On Sat, 9 Feb 2008, Rob Ross wrote:
> Hi Wei-keng,
> 
> Thanks for pointing out this bug. So what do you think that it should do?
> Ignore the subsequent values for cb_nodes, or recalculate aggregators?
> 
> Rob
> 
> On Feb 8, 2008, at 11:28 PM, Wei-keng Liao wrote:
> 
> >
> >I found that when a program uses the ROMIO hint cb_nodes in an MPI info
> >object and the info object is used in both MPI_File_open and
> >MPI_File_set_view, the latter will overwrite the value of
> >fd->hints->cb_nodes that first was set in MPI_File_open.
> >
> >I think MPI does not say one cannot use the same info in both open and
> >set_view.
> >
> >For example, a program allocates 10 MPI processes on a cluster with 2
> >processes per compute node. If the user sets cb_nodes hint to 7,
> >MPI_File_open will "intelligently" set fd->hints->cb_nodes to 5 and
> >allocate space for fd->hints->ranklist[] (of size 5), but later
> >MPI_File_set_view will change fd->hints->cb_nodes back to 7. So, when
> >fd->hints->ranklist[] is referred in ADIOI_Calc_aggregator(), it may refer
> >fd->hints->ranklist[5,6], which is out of array boundary.
> >
> >I have a simple program that generates coredump because of this.
> >
> >Wei-keng
> >
> 




More information about the mpich-discuss mailing list