[mpich-discuss] Trouble in getting the logging interface to work

Anthony Chan chan at mcs.anl.gov
Tue Mar 25 14:41:11 CDT 2008


MPE profiling libraries are written based on PMPI so it is different
from PERUSE.   I'm not familar with PERUSE's event-based model. Wouldn't 
it mean that you need to implement an event-based callback mechanism in 
your modified MPICH2 libraries ?  If that is the case, I would think that
you'll make MPE calls in your PERUSE's callback function. In other
words, you should compile everything as "mpicc -mpe=log"
NOT "mpicc -mpe=mpilog"....

On Tue, 25 Mar 2008, Krishna Chaitanya wrote:

>> The 2 MPI_Init definitions are defined in 2 different libraries as
> you have found.
>      True. But, when the MPI application is being compiled, we notice
> the order "-lmpe -lmpich". But, how is it that a conflict is not
> reported?
>
> I am using PERUSE to profile the MPICH library. So, as of now, I am
> having PERUSE_TRACE_COMM_EVENT defined in libmpich.a. However, that is
> currently generating text output. I wish to graphically represent the
> PERUSE events in conjunction with the other MPI events.
>
> Looking at the way the MPE library has been built, I understood that I
> need to have PERUSE_TRACE_COMM_EVENT defined in liblmpe.a. This way,
> when I compile the PERUSE-MPICH application with the -mpe=mpilog
> switch, the function is invoked in the mpe library is invoked, the
> occurence  is logged, and then the actual PERUSE function defined in
> libmpich.a is invoked.
> Have I got that right?
>
>
> Thanks,
> Krishna Chaitanya
>
>
> On 3/25/08, Anthony Chan <chan at mcs.anl.gov> wrote:
>>
>>
>> On Mon, 24 Mar 2008, Krishna Chaitanya wrote:
>>
>>> That answers a lot of my questions.
>>> Sticking to MPI_Init for this discussion, the function is defined at two
>>> places.So,there will be a reference to MPI_Init in lmpich and also a
>>> reference in lmpe. But, how is it that when an MPI application is being
>>> compiled, mpicc doesnt complain that there were two references to
>> MPI_Init?
>>> What exactly has been done to solve this?
>>
>> log_mpi_core.c uses the PMPI interface defined in MPI standard.  The 2
>> MPI_Init definitions are defined in 2 different libraries as you have
>> found.
>>
>>> I am actually facing this problem with the PERUSE function that i have
>>> defined at :
>>> 1 > /src/peruse/peruse.c             and
>>> 2 > src/mpe2/src/wrappers/src/log_mpi_core.c.
>>>
>>>    The first one goes into lmpich and the second one is a part of lmpe.
>>> However, when I compile my MPI program, I get the following message :
>>> /home/kc/mpich-install//lib/libmpich.a(peruse.o): In function
>>> `PERUSE_TRACE_COMM_EVENT':
>>> /home/kc/mpich-src/src/peruse/peruse.c:324: multiple definition of
>>> `PERUSE_TRACE_COMM_EVENT'
>>>
>> /home/kc/mpich-install//lib/liblmpe.a(log_mpi_core.o):/home/kc/mpich-src/src/mpe2/src/wrappers/src/log_mpi_core.c:6830:
>>> first defined here
>>> collect2: ld returned 1 exit status
>>
>> Why did you define PERUSE_TRACE_COMM_EVENT twice ?  One in libmpich.a
>> and one in liblmpe.a. Are you trying to use MPE to profile PERUSE or to
>> use PERUSE to profile MPE ?
>>
>> A.Chan
>>
>>>
>>>
>>> Thanks for your time,
>>> Krishna Chaitanya K
>>>
>>>
>>> On Mon, Mar 24, 2008 at 10:22 AM, Anthony Chan <chan at mcs.anl.gov> wrote:
>>>
>>>>
>>>>
>>>> On Mon, 24 Mar 2008, Krishna Chaitanya wrote:
>>>>
>>>>>   Sorry for re-posting.
>>>>>> I took a look at the documentation at src/util/multichannel/mpi.c
>>>>>           Guess this is only for windows.
>>>>>           It would be of great help if someone could point me to the
>>>>> function that takes care of mapping MPI_Init to its wrapper, defined in
>>>>> src/mpe2/src/wrappers/src/log_mpi_core.c, when the library is compiled
>>>> with
>>>>> the --enable-mpe switch., instead of the function defined in
>>>>> src/mpi/init/init.c
>>>>
>>>> The function in src/mpi/init/init.c is defined when linked with -lmpich,
>>>> i.e. mpicc. The functions in log_mpi_core.c is defined when linked with
>>>> "mpicc -mpe=mpilog".  Try do "mpicc .... -show" will show you what
>>>> libraries and their link order.
>>>>
>>>> A.Chan
>>>>
>>>>>
>>>>> Krishna Chaitanya K
>>>>>
>>>>> On Sun, Mar 23, 2008 at 2:29 PM, Krishna Chaitanya <kris.c1986 at gmail.com
>>>>>
>>>>> wrote:
>>>>>
>>>>>>> See section "CUSTOMIZING LOGFILES" in mpich2-xxx/src/mpe2/README.
>>>>>> Correct me if I am wrong :
>>>>>> Since I am dealing with PERUSE events, whenever such an event occurs, a
>>>>>> PERUSE function, defined in <mpich-dir>/src/peruse/peruse.c, is invoked
>>>> by
>>>>>> the MPI library. I am trying to get this event displayed in the
>>>> jumpshot
>>>>>> output. For this to be done, I need to define a wrapper function which
>>>> gets
>>>>>> invoked when a PERUSE event occurs, to log the event and then to call
>>>> the
>>>>>> actuall peruse function, which is similar to the way the wrapper
>>>> function at
>>>>>> log_mpi_core.c is called, when MPI_Init is called.
>>>>>>
>>>>>> Could you please clarify on the dynamic mapping?
>>>>>> I took a look at the documentation at src/util/multichannel/mpi.c. I
>>>>>> think, I understood what is going on in LoadFunctions() and the way the
>>>>>> function pointers are assigned addresses depending the dll that is
>>>> being
>>>>>> used.
>>>>>>
>>>>>> Krishna Chaitanya K
>>>>>>
>>>>>>
>>>>>> On Sun, Mar 23, 2008 at 12:59 PM, Anthony Chan <chan at mcs.anl.gov>
>>>> wrote:
>>>>>>
>>>>>>>
>>>>>>> See section "CUSTOMIZING LOGFILES" in mpich2-xxx/src/mpe2/README.
>>>>>>> You don't need to modify MPE libraries.
>>>>>>>
>>>>>>> A.Chan
>>>>>>>
>>>>>>> On Sun, 23 Mar 2008, Krishna Chaitanya wrote:
>>>>>>>
>>>>>>>> I have modified the mpe library to log the events that I am
>>>> interested
>>>>>>> in
>>>>>>>> monitoring. But, I am bit hazy about how a function like MPI_Init is
>>>>>>>> actually linked to the MPI_Init routine in the file log_mpi_core.c
>>>>>>> when we
>>>>>>>> compile the MPI application with the -mpe=mpilog switch. Could
>>>> someone
>>>>>>> point
>>>>>>>> me to the routine that takes care of such a mapping?
>>>>>>>>
>>>>>>>> Thanks,
>>>>>>>> Krishna Chaitanya K
>>>>>>>>
>>>>>>>> On Sat, Mar 22, 2008 at 3:01 AM, Krishna Chaitanya <
>>>>>>> kris.c1986 at gmail.com>
>>>>>>>> wrote:
>>>>>>>>
>>>>>>>>> Thanks a lot. I installed the latest jdk version and I am now able
>>>> to
>>>>>>> look
>>>>>>>>> at the jumpshot output.
>>>>>>>>>
>>>>>>>>> Krishna Chaitanya K
>>>>>>>>>
>>>>>>>>>
>>>>>>>>> On Sat, Mar 22, 2008 at 1:45 AM, Anthony Chan <chan at mcs.anl.gov>
>>>>>>> wrote:
>>>>>>>>>
>>>>>>>>>>
>>>>>>>>>> The error that you showed earlier does not suggest the problem is
>>>>>>> with
>>>>>>>>>> running jumpshot on your machine with limited memory.  If your
>>>> clog2
>>>>>>>>>> file
>>>>>>>>>> isn't too bad, send it to me.
>>>>>>>>>>
>>>>>>>>>> On Fri, 21 Mar 2008, Krishna Chaitanya wrote:
>>>>>>>>>>
>>>>>>>>>>> I resolved that issue.
>>>>>>>>>>> My comp ( Intel centrino 32 bit , 256 MB RAM - Dated, I agree)
>>>>>>> hangs
>>>>>>>>>> each
>>>>>>>>>>> time I launch jumpshot with the slogfile. Since this is an
>>>>>>> independent
>>>>>>>>>>> project, I am constrained when it comes to the availability of
>>>>>>>>>> machines.
>>>>>>>>>>> Would you recommend that I give it a try on a 64bit AMD, 512MB
>>>> RAM?
>>>>>>> (
>>>>>>>>>> Will
>>>>>>>>>>> have to start from installing linux on this machine. Is it worth
>>>>>>> the
>>>>>>>>>> effort
>>>>>>>>>>> ?) If it requires higher configuration, would you please suggest a
>>>>>>>>>> lighter
>>>>>>>>>>> graphical tool that I can use to present the  occurrence of events
>>>>>>> and
>>>>>>>>>> the
>>>>>>>>>>> corresponding times?
>>>>>>>>>>>
>>>>>>>>>>> Thanks,
>>>>>>>>>>> Krishna Chaitanya K
>>>>>>>>>>>
>>>>>>>>>>> On Fri, Mar 21, 2008 at 8:23 PM, Anthony Chan <chan at mcs.anl.gov>
>>>>>>>>>> wrote:
>>>>>>>>>>>
>>>>>>>>>>>>
>>>>>>>>>>>>
>>>>>>>>>>>> On Fri, 21 Mar 2008, Krishna Chaitanya wrote:
>>>>>>>>>>>>
>>>>>>>>>>>>>
>>>>>>>>>>>>> The file block pointer to the Tree Directory is NOT
>>>> initialized!,
>>>>>>>>>> can't
>>>>>>>>>>>> read
>>>>>>>>>>>>> it.
>>>>>>>>>>>>>
>>>>>>>>>>>>
>>>>>>>>>>>> That means the slog2 file isn't generated completely.  Something
>>>>>>> went
>>>>>>>>>>>> wrong in the convertion process (assuming your clog2 file is
>>>>>>>>>> complete).
>>>>>>>>>>>> If your MPI program doesn't finish MPI_Finalize normally, your
>>>>>>> clog2
>>>>>>>>>>>> file will be incomplete.
>>>>>>>>>>>>
>>>>>>>>>>>>>
>>>>>>>>>>>>>         IS there any environment variable that needs to be
>>>>>>>>>> initialsed?
>>>>>>>>>>>>
>>>>>>>>>>>> Nothing needs to be initialized by hand.
>>>>>>>>>>>>
>>>>>>>>>>>> A.Chan
>>>>>>>>>>>>>
>>>>>>>>>>>>> Thanks,
>>>>>>>>>>>>> Krishna Chaitanya K
>>>>>>>>>>>>>
>>>>>>>>>>>>>
>>>>>>>>>>>>> On Thu, Mar 20, 2008 at 4:56 PM, Dave Goodell <
>>>>>>> goodell at mcs.anl.gov>
>>>>>>>>>>>> wrote:
>>>>>>>>>>>>>
>>>>>>>>>>>>>> It's pretty hard to debug this issue via email.  However, you
>>>>>>> could
>>>>>>>>>>>>>> try running valgrind on your modified MPICH2 to see if any
>>>>>>> obvious
>>>>>>>>>>>>>> bugs pop out.  When you do, make sure that you configure with
>>>>>>> "--
>>>>>>>>>>>>>> enable-g=dbg,meminit" in order to avoid spurious warnings and
>>>> to
>>>>>>> be
>>>>>>>>>>>>>> able to see stack traces.
>>>>>>>>>>>>>>
>>>>>>>>>>>>>> -Dave
>>>>>>>>>>>>>>
>>>>>>>>>>>>>> On Mar 19, 2008, at 1:05 PM, Krishna Chaitanya wrote:
>>>>>>>>>>>>>>
>>>>>>>>>>>>>>> The problem seems to be with the communicator in MPI_Bcast()
>>>>>>>>>> (/src/
>>>>>>>>>>>>>>> mpi/coll/bcast.c).
>>>>>>>>>>>>>>> The comm_ptr is initialized to NULL and after a call to
>>>>>>>>>>>>>>> MPID_Comm_get_ptr( comm, comm_ptr ); , the comm_ptr points to
>>>>>>> the
>>>>>>>>>>>>>>> communicator object which was created throught MPI_Init().
>>>>>>>>>>>>>>> However,  MPID_Comm_valid_ptr( comm_ptr, mpi_errno ) returns
>>>>>>> with
>>>>>>>>>> a
>>>>>>>>>>>>>>> value other than MPI_SUCCESS.
>>>>>>>>>>>>>>> During some traces, it used to crash at this point itself. On
>>>>>>> some
>>>>>>>>>>>>>>> other traces, it used to go into the progress engine as I
>>>>>>>>>> described
>>>>>>>>>>>>>>> in my previous mails.
>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>> What could be the reason? Hope someone chips in. I havent been
>>>>>>>>>> able
>>>>>>>>>>>>>>> to figure this out for sometime now.
>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>> Krishna Chaitanya K
>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>> On Wed, Mar 19, 2008 at 8:44 AM, Krishna Chaitanya
>>>>>>>>>>>>>>> <kris.c1986 at gmail.com> wrote:
>>>>>>>>>>>>>>> This might help :
>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>> In the MPID_Comm structure, I have included the following line
>>>>>>> for
>>>>>>>>>>>>>>> the peruse place-holder :
>>>>>>>>>>>>>>>  struct mpich_peruse_handle_t** c_peruse_handles;
>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>> And in the function, MPID_Init_thread(), i have the line
>>>>>>>>>>>>>>>  MPIR_Process.comm_world->c_peruse_handles = NULL;
>>>>>>>>>>>>>>>  when the rest of the members of the comm_world structure are
>>>>>>>>>> being
>>>>>>>>>>>>>>> populated.
>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>> Thanks,
>>>>>>>>>>>>>>> Krishna Chaitanya K
>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>> On Wed, Mar 19, 2008 at 8:19 AM, Krishna Chaitanya
>>>>>>>>>>>>>>> <kris.c1986 at gmail.com> wrote:
>>>>>>>>>>>>>>> Thanks for the help. I am facing an weird problem right now.
>>>> To
>>>>>>>>>>>>>>> incorporate the PERUSE component, I have modified the
>>>>>>> communicator
>>>>>>>>>>>>>>> data structure to incude the PERUSE handles. The program
>>>>>>> executes
>>>>>>>>>>>>>>> as expected when compiled without the "mpe=mpilog" flag.When I
>>>>>>>>>>>>>>> compile it with the mpe component, the program gives this
>>>>>>> output :
>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>> Fatal error in MPI_Bcast: Invalid communicator, error stack:
>>>>>>>>>>>>>>> MPI_Bcast(784): MPI_Bcast(buf=0x9260f98, count=1, MPI_INT,
>>>>>>> root=0,
>>>>>>>>>>>>>>> MPI_COMM_WORLD) failed
>>>>>>>>>>>>>>> MPI_Bcast(717): Invalid communicator
>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>> On tracing further, I understood this :
>>>>>>>>>>>>>>> MPI_Init () (  log_mpi_core.c )
>>>>>>>>>>>>>>>  -- >  PMPI_Init ( the communicator object is created here )
>>>>>>>>>>>>>>>  -- >  MPE_Init_log ()
>>>>>>>>>>>>>>>         -- > CLOG_Local_init()
>>>>>>>>>>>>>>>               -- > CLOG_Buffer_init4write ()
>>>>>>>>>>>>>>>                     -- > CLOG_Preamble_env_init()
>>>>>>>>>>>>>>>                           -- >   MPI_Bcast ()  (bcast.c)
>>>>>>>>>>>>>>>                                   -- > MPIR_Bcast ()
>>>>>>>>>>>>>>>                                          -- >  MPIC_Recv ()  /
>>>>>>>>>>>>>>> MPIC_Send()
>>>>>>>>>>>>>>>                                          -- >  MPIC_Wait()
>>>>>>>>>>>>>>>                                       < Program crashes >
>>>>>>>>>>>>>>>      The MPIC_Wait function is invoking the progress engine,
>>>>>>> which
>>>>>>>>>>>>>>> works properly without the mpe component.
>>>>>>>>>>>>>>>       Even within the progress engine, MPIDU_Sock_wait() and
>>>>>>>>>>>>>>> MPIDI_CH3I_Progress_handle_sock_event() are executed a couple
>>>>>>> of
>>>>>>>>>>>>>>> times before the program crashes in the
>>>>>>> MPIDU_Socki_handle_read()
>>>>>>>>>>>>>>> or the MPIDU_Socki_handle_write() functions. ( The read() and
>>>>>>> the
>>>>>>>>>>>>>>> write() functions work two times, I think)
>>>>>>>>>>>>>>>      I am finding it very hard to reason why the program
>>>>>>> crashes
>>>>>>>>>>>>>>> with mpe. Could you please suggest where I need to look at to
>>>>>>> sort
>>>>>>>>>>>>>>> this issue out?
>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>> Thanks,
>>>>>>>>>>>>>>> Krishna Chaitanya K
>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>> On Wed, Mar 19, 2008 at 2:20 AM, Anthony Chan <
>>>> chan at mcs.anl.gov
>>>>>>>>
>>>>>>>>>>>>>>> wrote:
>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>> On Wed, 19 Mar 2008, Krishna Chaitanya wrote:
>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>> Hi,
>>>>>>>>>>>>>>>>          I tried configuring MPICH2 by doing :
>>>>>>>>>>>>>>>> ./configure --prefix=/home/kc/mpich-install/ --enable-mpe
>>>>>>>>>>>>>>>> --with-logging=SLOG  CC=gcc CFLAGS=-g   && make && make
>>>>>>> install
>>>>>>>>>>>>>>>>          It  flashed an error messaage saying :
>>>>>>>>>>>>>>>> onfigure: error: ./src/util/logging/SLOG does not exist.
>>>>>>>>>>>>>>> Configure aborted
>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>> The --with-logging is for MPICH2's internal logging, not MPE's
>>>>>>>>>>>>>>> logging.
>>>>>>>>>>>>>>> As what you did below is fine is fine.
>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>          After that, I tried :
>>>>>>>>>>>>>>>> ./configure --prefix=/home/kc/mpich-install/ --enable-mpe
>>>>>>> CC=gcc
>>>>>>>>>>>>>>> CFLAGS=-g
>>>>>>>>>>>>>>>> && make && make install
>>>>>>>>>>>>>>>>         The installation was normal, when I tried compiling
>>>> an
>>>>>>>>>>>>>>> example
>>>>>>>>>>>>>>>> program by doing :
>>>>>>>>>>>>>>>> mpicc -mpilog -o sample  sample.c
>>>>>>>>>>>>>>>> cc1: error: unrecognized command line option "-mpilog"
>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>> Do "mpicc -mpe=mpilog -o sample sample.c" instead.  For more
>>>>>>>>>> details,
>>>>>>>>>>>>>>> see "mpicc -mpe=help" and see mpich2/src/mpe2/README.
>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>> A.Chan
>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>          Can anyone please tell me what needs to be done to
>>>>>>> use
>>>>>>>>>>>>>>> the SLOG
>>>>>>>>>>>>>>>> logging format?
>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>> Thanks,
>>>>>>>>>>>>>>>> Krishna Chaitanya K
>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>> --
>>>>>>>>>>>>>>>> In the middle of difficulty, lies opportunity
>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>> --
>>>>>>>>>>>>>>> In the middle of difficulty, lies opportunity
>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>> --
>>>>>>>>>>>>>>> In the middle of difficulty, lies opportunity
>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>> --
>>>>>>>>>>>>>>> In the middle of difficulty, lies opportunity
>>>>>>>>>>>>>>
>>>>>>>>>>>>>>
>>>>>>>>>>>>>
>>>>>>>>>>>>>
>>>>>>>>>>>>> --
>>>>>>>>>>>>> In the middle of difficulty, lies opportunity
>>>>>>>>>>>>>
>>>>>>>>>>>>
>>>>>>>>>>>>
>>>>>>>>>>>
>>>>>>>>>>>
>>>>>>>>>>> --
>>>>>>>>>>> In the middle of difficulty, lies opportunity
>>>>>>>>>>>
>>>>>>>>>>
>>>>>>>>>>
>>>>>>>>>
>>>>>>>>>
>>>>>>>>> --
>>>>>>>>> In the middle of difficulty, lies opportunity
>>>>>>>>>
>>>>>>>>
>>>>>>>>
>>>>>>>>
>>>>>>>> --
>>>>>>>> In the middle of difficulty, lies opportunity
>>>>>>>>
>>>>>>>
>>>>>>>
>>>>>>
>>>>>>
>>>>>> --
>>>>>> In the middle of difficulty, lies opportunity
>>>>>>
>>>>>
>>>>>
>>>>>
>>>>> --
>>>>> In the middle of difficulty, lies opportunity
>>>>>
>>>>
>>>>
>>>
>>>
>>> --
>>> In the middle of difficulty, lies opportunity
>>>
>>
>>
>
>
> -- 
> In the middle of difficulty, lies opportunity
>
>




More information about the mpich-discuss mailing list