[mpich2-dev] patch for ticket #1511

Wed May 23 15:31:37 CDT 2012

I think you should interpret the "fixed" resolution of this ticket as that your patch was accepted. ;)

I guess I just didn't notice that you used your proper SVN account to report the ticket, so you didn't get any email when I added comments to it.  I'll try to watch out for that in the future and add your email to the CC list in that case.

-Dave

On May 23, 2012, at 3:04 PM CDT, Jeff Hammond wrote:

> Should I interpret the lack of response as "tl;dnr" or something else?
> 
> Thanks,
> 
> Jeff
> 
> On Thu, May 3, 2012 at 8:35 AM, Jeff Hammond <jhammond at alcf.anl.gov> wrote:
>> I submit the attached patch as a fix for ticket #1511
>> (https://trac.mcs.anl.gov/projects/mpich2/ticket/1511), which Dave
>> assigned to me a while ago at my request.
>> 
>> This patch allows all the RMA tests to run on an arbitrarily large
>> number of processes, rather than just 1, 2, or 3, as was required for
>> 10 or so tests.  There may be other tests that need this treatment as
>> well, but the RMA tests were the most critical ones for BGQ
>> acceptance, hence that is where I started.
>> 
>> I have tested that entire set of RMA tests pass with 4 procs (this was
>> impossible before) with these changes, at least on my laptop.  CLANG
>> didn't produce any compiler errors.
>> 
>> The solution implemented in my patch is simple: all unnecessary ranks
>> do nothing for the test.  I have chosen this approach because there
>> was no discernible value in replicating the test across all pairs or
>> triples of nodes, yet the bookkeeping required was non-trivial.  It
>> wasn't onerous either, but the benefit/cost was near zero.  I do not
>> believe that the correctness of MPICH2 should depend on what rank runs
>> the test code.
>> 
>> Regarding the suggestion by Pavan that I instead parse all tests for
>> required number of processes, store that in a database, then write a
>> custom script to use that database for invocation on all possible
>> schedulers (PBS, Maui, LoadLeveler, Cobalt, etc.), I decided against
>> this because I did not want to be restricted to running the MPICH2
>> test suite using a particular script.  In fact, I currently cannot
>> figure out how to use the runtests script (a README would be helpful),
>> so I just invoke all the tests with "find -perm 755 -exec mpiexec -np
>> 4 {} \;" and find this highly satisfactory (although execvp barfs on
>> directories).
>> 
>> I believe that it is important to allow a test to be run completely
>> standalone on any number of processes (greater than the minimum, of
>> course) without knowledge of a special database.  It seems silly to
>> require a database query to launch a single test.
>> 
>> Anyways, I hope that this patch is accepted and I can focus on the
>> rest of the BGQ MPI-related acceptance tests.
>> 
>> Best,
>> 
>> Jeff
>> 
>> 
>> 
>> --
>> Jeff Hammond
>> Argonne Leadership Computing Facility
>> University of Chicago Computation Institute
>> jhammond at alcf.anl.gov / (630) 252-5381
>> http://www.linkedin.com/in/jeffhammond
>> https://wiki.alcf.anl.gov/parts/index.php/User:Jhammond (in-progress)
>> https://wiki.alcf.anl.gov/old/index.php/User:Jhammond (deprecated)
>> https://wiki-old.alcf.anl.gov/index.php/User:Jhammond(deprecated)
> 
> 
> 
> -- 
> Jeff Hammond
> Argonne Leadership Computing Facility
> University of Chicago Computation Institute
> jhammond at alcf.anl.gov / (630) 252-5381
> http://www.linkedin.com/in/jeffhammond
> https://wiki.alcf.anl.gov/parts/index.php/User:Jhammond