[MPICH2-dev] A task produces segmentation fault in case of a wrong datatype
Ezhov, Dmitry
dmitry.ezhov at intel.com
Fri Jun 29 01:47:29 CDT 2007
Dear Sirs,
I'm from the Intel MPI team.
The following problem was found:
If a wrong datatype is used for an MPI operation, the program produces segmentation fault:
rank 0 in job 3 host0_35059 caused collective abort of all ranks
exit status of rank 0: killed by signal 11
but the error handling procedure seems to treat such kind of errors in another way. I investigated it and found that the following code in src/mpid/common/datatype/mpid_type_get_envelope.c produces segmentation fault:
...
MPID_Datatype_get_ptr(datatype, dtp);
*combiner = dtp->contents->combiner;
*num_integers = dtp->contents->nr_ints;
*num_addresses = dtp->contents->nr_aints;
*num_datatypes = dtp->contents->nr_types;
}
...
MPID_Datatype_get_ptr() can set dtp to 0 (in case of the wrong datatype), but then the next operation tries to get a field from dtp assuming that it is a pointer to a structure. I added the check whether dtp is 0:
...
MPID_Datatype_get_ptr(datatype, dtp);
if (dtp) {
*combiner = dtp->contents->combiner;
*num_integers = dtp->contents->nr_ints;
*num_addresses = dtp->contents->nr_aints;
*num_datatypes = dtp->contents->nr_types;
}
}
...
and after this the error handling procedure performs the wrong datatype OK:
[cli_0]: aborting job:
Fatal error in MPI_Isend: Invalid datatype, error stack:
MPI_Isend(145): MPI_Isend(buf=0x7fffffffd4e0, count=255, dtype=USER<0x0000000b>, dest=1, tag=0, MPI_COMM_WORLD, request=0x7fffffffccdc) failed
MPI_Isend(109): Null Datatype pointer
rank 0 in job 2 host0_35059 caused collective abort of all ranks
exit status of rank 0: return code 1
I attached a small reproducer of such a situation.
The issue was reproduced with mpich2-1.0.5p4.
What do you think about this issue?
Thank you in advance.
--
Best regards.
Dmitry Ezhov
Software Engeneer,
Intel SSG/ESSD/Cluster Software & Technologies Lab,
Russia, Sarov.
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <https://lists.mcs.anl.gov/mailman/private/mpich2-dev/attachments/20070629/98043a07/attachment.htm>
-------------- next part --------------
A non-text attachment was scrubbed...
Name: reproducer.c
Type: application/octet-stream
Size: 513 bytes
Desc: reproducer.c
URL: <https://lists.mcs.anl.gov/mailman/private/mpich2-dev/attachments/20070629/98043a07/attachment.obj>
More information about the mpich2-dev
mailing list