[mpich2-dev] patch for ticket #1511

Jeff Hammond jhammond at alcf.anl.gov
Wed May 23 15:04:31 CDT 2012


Should I interpret the lack of response as "tl;dnr" or something else?

Thanks,

Jeff

On Thu, May 3, 2012 at 8:35 AM, Jeff Hammond <jhammond at alcf.anl.gov> wrote:
> I submit the attached patch as a fix for ticket #1511
> (https://trac.mcs.anl.gov/projects/mpich2/ticket/1511), which Dave
> assigned to me a while ago at my request.
>
> This patch allows all the RMA tests to run on an arbitrarily large
> number of processes, rather than just 1, 2, or 3, as was required for
> 10 or so tests.  There may be other tests that need this treatment as
> well, but the RMA tests were the most critical ones for BGQ
> acceptance, hence that is where I started.
>
> I have tested that entire set of RMA tests pass with 4 procs (this was
> impossible before) with these changes, at least on my laptop.  CLANG
> didn't produce any compiler errors.
>
> The solution implemented in my patch is simple: all unnecessary ranks
> do nothing for the test.  I have chosen this approach because there
> was no discernible value in replicating the test across all pairs or
> triples of nodes, yet the bookkeeping required was non-trivial.  It
> wasn't onerous either, but the benefit/cost was near zero.  I do not
> believe that the correctness of MPICH2 should depend on what rank runs
> the test code.
>
> Regarding the suggestion by Pavan that I instead parse all tests for
> required number of processes, store that in a database, then write a
> custom script to use that database for invocation on all possible
> schedulers (PBS, Maui, LoadLeveler, Cobalt, etc.), I decided against
> this because I did not want to be restricted to running the MPICH2
> test suite using a particular script.  In fact, I currently cannot
> figure out how to use the runtests script (a README would be helpful),
> so I just invoke all the tests with "find -perm 755 -exec mpiexec -np
> 4 {} \;" and find this highly satisfactory (although execvp barfs on
> directories).
>
> I believe that it is important to allow a test to be run completely
> standalone on any number of processes (greater than the minimum, of
> course) without knowledge of a special database.  It seems silly to
> require a database query to launch a single test.
>
> Anyways, I hope that this patch is accepted and I can focus on the
> rest of the BGQ MPI-related acceptance tests.
>
> Best,
>
> Jeff
>
>
>
> --
> Jeff Hammond
> Argonne Leadership Computing Facility
> University of Chicago Computation Institute
> jhammond at alcf.anl.gov / (630) 252-5381
> http://www.linkedin.com/in/jeffhammond
> https://wiki.alcf.anl.gov/parts/index.php/User:Jhammond (in-progress)
> https://wiki.alcf.anl.gov/old/index.php/User:Jhammond (deprecated)
> https://wiki-old.alcf.anl.gov/index.php/User:Jhammond(deprecated)



-- 
Jeff Hammond
Argonne Leadership Computing Facility
University of Chicago Computation Institute
jhammond at alcf.anl.gov / (630) 252-5381
http://www.linkedin.com/in/jeffhammond
https://wiki.alcf.anl.gov/parts/index.php/User:Jhammond


More information about the mpich2-dev mailing list