[mpich-discuss] There is any change in mpich2 compilers between ver.1.2 and ver.1.4.1p1?

Darius Buntinas buntinas at mcs.anl.gov
Sun May 20 19:46:33 CDT 2012


The message you're seeing is a result of a segmentation fault in some process.  This is usually the result of a bug in the application.  The best way to diagnose this is to rerun the application with core files enabled, then open the core file in a debugger to see where the segmentation fault occurred.  E.g.,

    ulimit -c unlimited
    mpiexec ...

Then look for a file called core.XXX (where XXX is the pid of the failed process) and open it in a debugger, e.g.:

   gdb executable core.XXX

In gdb give the command 
    bt
to see the back trace to see where the error occurred.

If you're running this on a mac, the core file will be located in /core, and if there are multiple core files in there already, you can find the one you're looking for by the creation time.

-d


On May 18, 2012, at 10:02 PM, 유경완 wrote:

> Hi, thanks for read this mail
> 
>  
> First of all, I very appreciate to make this mpich2 programs, because I used this program very usefully with clustering. Really thanks about it
> 
>  
> But, I have little problems with using this... So I wanna ask something. sorry for bother you.
> 
> The problem was that when I upgrading cluster computers and also upgrading mpich2's version from 1.2 to 1.4.1p1,
> 
> and then installing was finished and mpiexec worked well with mpich2 1.4.1p version.
> 
> Then I tested compiling with mpicxx and it seems like works  well with no errors.
> 
> But, when I processed mpiexec with just compiled files, then there appear errors like this...
> 
>  
> %%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%
> 
> [root at octofous2 yookw]# ./odengmorun 8
> 
> =====================================================================================
> =   BAD TERMINATION OF ONE OF YOUR APPLICATION PROCESSES
> =   EXIT CODE: 11
> =   CLEANING UP REMAINING PROCESSES
> =   YOU CAN IGNORE THE BELOW CLEANUP MESSAGES
> =====================================================================================
> [proxy:0:1 at n002] HYD_pmcd_pmip_control_cmd_cb (/home/octofous2/libraries/mpich2-1.4.1p1/src/pm/hydra/pm/pmiserv/pmip_cb.c:928): assert (!closed) failed
> [proxy:0:1 at n002] HYDT_dmxu_poll_wait_for_event (/home/octofous2/libraries/mpich2-1.4.1p1/src/pm/hydra/tools/demux/demux_poll.c:77): callback returned error status
> [proxy:0:1 at n002] main (/home/octofous2/libraries/mpich2-1.4.1p1/src/pm/hydra/pm/pmiserv/pmip.c:226): demux engine error waiting for event
> [mpiexec at octofous2.psl] HYDT_bscu_wait_for_completion (/home/octofous2/libraries/mpich2-1.4.1p1/src/pm/hydra/tools/bootstrap/utils/bscu_wait.c:70): one of the processes terminated badly; aborting
> [mpiexec at octofous2.psl] HYDT_bsci_wait_for_completion (/home/octofous2/libraries/mpich2-1.4.1p1/src/pm/hydra/tools/bootstrap/src/bsci_wait.c:23): launcher returned error waiting for completion
> [mpiexec at octofous2.psl] HYD_pmci_wait_for_completion (/home/octofous2/libraries/mpich2-1.4.1p1/src/pm/hydra/pm/pmiserv/pmiserv_pmci.c:191): launcher returned error waiting for completion
> [mpiexec at octofous2.psl] main (/home/octofous2/libraries/mpich2-1.4.1p1/src/pm/hydra/ui/mpich/mpiexec.c:405): process manager error waiting for completion
> 8 cpus
> 
> %%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%
> 
>  
>  
> which odengmorun was
> 
>  
> %%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%
> 
> #!/bin/bash
> 
> mpiexec -f ./machine.list -n $1 ./yoo.out
> echo $1 cpus
> 
> %%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%
> 
>  
> This is  weird for me because when I  compiled with past computers which have mpich2 1.2 version same code, the mpiexec was worked...
> 
> So, there is only change of version and maybe some configures( sorry but I was changed administrator of clusters and I don't know about past computer's configure options.... but this times configure of new computer was
> 
> --with-pm=hydra:gforker:smpd --enable-fast=O3 -prefix=/home/octofous2/mpich2-install )
> 
> Sorry for ask like this but can I know any change in compiling between version 1.2 and 1.4.1p1 which can be a clue of this problem?
> 
>  
> Thanks for read
> 
> Best regards
> 
> _______________________________________________
> mpich-discuss mailing list     mpich-discuss at mcs.anl.gov
> To manage subscription options or unsubscribe:
> https://lists.mcs.anl.gov/mailman/listinfo/mpich-discuss

-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.mcs.anl.gov/pipermail/mpich-discuss/attachments/20120521/f0a9eb4e/attachment.html>


More information about the mpich-discuss mailing list