[mpich-discuss] Faster MPI_Attr_get?

Jeff Hammond jhammond at alcf.anl.gov
Fri May 11 16:41:53 CDT 2012


On Fri, May 11, 2012 at 4:38 PM, Jed Brown <jedbrown at mcs.anl.gov> wrote:
> On Fri, May 11, 2012 at 4:20 PM, Jeff Hammond <jhammond at alcf.anl.gov> wrote:
>>
>> that's probably greatly underestimating the cost of this function
>> since i assume in this test every time the function is called, both
>> the dcache and icache hit every time.
>
>
> Actually, that's kinda the case I'm interested in. If the threads pull in
> enough data to knock the attribute out of L1, chances are that they will
> take at least a few microseconds (except in pathological cases that hit
> associativity). Suppose I'm doing BLAS level 1 flavor of vector operations
> with vectors of length a couple thousand, so just big enough to be out of L1
> if done in serial. But with 16 or 32 threads, the actual work is very fast
> (order of 100 cycles) because we have enough L1 and the operations don't
> conflict with other cache lines. There are no atomic instructions in
> launching a kernel, though there is a write from one thread and a read from
> another, so the writer needs to get exclusive access to a cache line and
> then the reader need to get the line back. But that line shuffling doesn't
> affect whether MPI's attribute table stays in cache.
>
> I can make an example to either validate or support the discussion above

if you're happy with the benchmark, who am i to question it?

>> Is "-O2" a suboption to "-pipe" or are you giving the compiler
>> conflicting flags?
>
>
> No, MPICH2 slaps it's own -O2 on the end of whatever the user asked for,
> -pipe is irrelevant for optimization and perhaps not useful any more (it
> used to reduce file system traffic by having gcc using pipes instead of
> temporary files). The last optimization option is used in any case, so it
> doesn't matter.

okay, sorry for the distraction.

>> > MPICH2 Version:     1.5b1
>> > MPICH2 Release date: unreleased development copy
>> > MPICH2 Device:     ch3:nemesis
>> > MPICH2 configure: --prefix=/homes/jedbrown/usr/mpich-intel
>> > --enable-shared
>> > --enable-error-checking=runtime --enable-error-messages=all
>> > --enable-timer-type=clock_gettime CC=icc CXX=icpc --enable-fc=0
>> > --enable-f77=0 FC= F77=
>> > MPICH2 CC: icc    -O2
>> > MPICH2 CXX: icpc   -O2
>> > MPICH2 F77: gfortran
>> > MPICH2 FC: gfortran
>>
>> "--enable-error-checking=runtime --enable-error-messages=all" would
>> seem to be the kind of thing Dave is talking about that affect
>> performance.
>
>
> Right, so should I turn that off? This is a development environment, so I
> definitely want those options. I can build a different MPI for profiling,
> but it's been irrelevant in other tests. Can't the overhead of run-time
> error checking amount to a few unlikely conditionals?
>
> I could also build MPICH2 with all error checking turned off, but with
> debugging symbols so that I can determine which lines are sucking up the
> time?

--enable-g=dbg gives you debug symbols.  is that sufficient?

jeff

-- 
Jeff Hammond
Argonne Leadership Computing Facility
University of Chicago Computation Institute
jhammond at alcf.anl.gov / (630) 252-5381
http://www.linkedin.com/in/jeffhammond
https://wiki.alcf.anl.gov/parts/index.php/User:Jhammond (in-progress)
https://wiki.alcf.anl.gov/old/index.php/User:Jhammond (deprecated)
https://wiki-old.alcf.anl.gov/index.php/User:Jhammond(deprecated)


More information about the mpich-discuss mailing list