[MPICH2-dev] A task produces segmentation fault in case of a wrong datatype

Ezhov, Dmitry dmitry.ezhov at intel.com
Fri Jun 29 01:47:29 CDT 2007


Dear Sirs,

 

I'm from the Intel MPI team.

 

The following problem was found:

If a wrong datatype is used for an MPI operation, the program produces segmentation fault:

 

rank 0 in job 3  host0_35059   caused collective abort of all ranks

  exit status of rank 0: killed by signal 11

 

but the error handling procedure seems to treat such kind of errors in another way. I investigated it and found that the following code in src/mpid/common/datatype/mpid_type_get_envelope.c produces segmentation fault:

...

    MPID_Datatype_get_ptr(datatype, dtp);

 

    *combiner      = dtp->contents->combiner;

    *num_integers  = dtp->contents->nr_ints;

    *num_addresses = dtp->contents->nr_aints;

    *num_datatypes = dtp->contents->nr_types;

    }

...

 

MPID_Datatype_get_ptr() can set dtp to 0 (in case of the wrong datatype), but then the next operation tries to get a field from dtp assuming that it is a pointer to a structure. I added the check whether dtp is 0:

...

    MPID_Datatype_get_ptr(datatype, dtp);

 

    if (dtp) {

    *combiner      = dtp->contents->combiner;

    *num_integers  = dtp->contents->nr_ints;

    *num_addresses = dtp->contents->nr_aints;

    *num_datatypes = dtp->contents->nr_types;

    }

    }

...

 

and after this the error handling procedure performs the wrong datatype OK:

 

[cli_0]: aborting job:

Fatal error in MPI_Isend: Invalid datatype, error stack:

MPI_Isend(145): MPI_Isend(buf=0x7fffffffd4e0, count=255, dtype=USER<0x0000000b>, dest=1, tag=0, MPI_COMM_WORLD, request=0x7fffffffccdc) failed

MPI_Isend(109): Null Datatype pointer

rank 0 in job 2  host0_35059   caused collective abort of all ranks

  exit status of rank 0: return code 1

 

I attached a small reproducer of such a situation.

The issue was reproduced with mpich2-1.0.5p4.

 

What do you think about this issue?

 

Thank you in advance.

 

-- 

Best regards.

 

Dmitry Ezhov

Software Engeneer,

 

Intel SSG/ESSD/Cluster Software & Technologies Lab,

Russia, Sarov.

 

 

-------------- next part --------------
An HTML attachment was scrubbed...
URL: <https://lists.mcs.anl.gov/mailman/private/mpich2-dev/attachments/20070629/98043a07/attachment.htm>
-------------- next part --------------
A non-text attachment was scrubbed...
Name: reproducer.c
Type: application/octet-stream
Size: 513 bytes
Desc: reproducer.c
URL: <https://lists.mcs.anl.gov/mailman/private/mpich2-dev/attachments/20070629/98043a07/attachment.obj>


More information about the mpich2-dev mailing list