[mpich-discuss] Fatal error in MPI_Comm_spawn

ZhangXP soaliap at 126.com
Wed Dec 7 08:19:45 CST 2011


Hi,

I compiled mvapich2-1.7 with default interface in a single computer with 4 cores, and the system I used is Red Hat Enterprise Linux Server release 6.0. It seemed all successful when I executed "./configure", "make", "make install". But when I executed "make installcheck" and "make testing", there were some errors.


1. make installcheck
    (1). When I executed "make installcheck", there were some errors:
    ========================================================================================
        CMA: unable to get RDMA device list
        librdmacm: couldn't read ABI version.
        librdmacm: assuming: 4
    ========================================================================================
    (2). I executed "make uninstalled", "make clean" and "make distclean", and then executed "./configure --disable-rdma-cm", "make", "make install" and "make installcheck" again. But there still were some errors:
    ========================================================================================
        Running installation runtest for C collchk program...
        *** Test C program with the MPI collective/datatype checking library ..... No.
        The failed command is :
        /usr/local/mvapich2-1.7/bin/mpiexec -n 4 ./wrong_int_byte
        Starting MPI Collective and Datatype Checking!
        Backtrace of the callstack at rank 3:
        At [0]: ./wrong_int_byte(CollChk_err_han+0x16f)[0x41e86f]
        [cli_3]: aborting job:
        Fatal error in MPI_Comm_call_errhandler:
        Collective Checking: BCAST (Rank 3) --> Inconsistent datatype signatures detected between rank 3 and rank 0.

        Running installation runtest for Fortran collchk program...
        *** Test F77 program with the MPI collective/datatype checking library ... Yes.
    ========================================================================================
    I saw some infomation like the errors above from old document "mpich2-doc-user.pdf". It says "The error message here shows that the MPI Bcast has been used with inconsistent datatype in the program wrong reals.f". The code in wrong_int_byte.c:
    ========================================================================================
        if ( rank == size-1 )
            /* Create pathological case */
            MPI_Bcast( &ibuff, sizeof(int), MPI_BYTE, 0, MPI_COMM_WORLD );
        else
            MPI_Bcast( &ibuff, 1, MPI_INT, 0, MPI_COMM_WORLD );
    ========================================================================================
    What I confused were:
    a). Whether the file name "wrong_int_byte" means it would execute failed?
    b). Why "wrong_int_byte" failed and "wrong_reals" not failed?

2. make testing
(1). When I executing "make testing", there were many the same errors:
========================================================================================
Unexpected output in spawn1: [cli_0]: aborting job:
Unexpected output in spawn1: Fatal error in MPI_Comm_spawn:
Unexpected output in spawn1: Other MPI error
Unexpected output in spawn1: 
Program spawn1 exited without No Errors
========================================================================================
I found these errors always accured when executing the testing examples in "test/mpi/spawn" directory.
(2). I found if I configure with TCP/IP-CH3 interface, the make "make testing" would success. But I saw 
   "the MVAPICH team strongly recommends the use of following interfaces for different adapters: 
    ... 
    5) Shared-Memory-CH3 for single node SMP system and laptop"
from "mvapich2-1.7_user_guide.pdf". And I didn't understand why I configuring a build for default interface failed!



Anybody helps me? Thanks!



-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.mcs.anl.gov/pipermail/mpich-discuss/attachments/20111207/348e703a/attachment-0001.htm>


More information about the mpich-discuss mailing list