[Fwd: pnetcdf & Open MPI]
Rob Ross
rross at mcs.anl.gov
Fri May 5 09:59:07 CDT 2006
Hi Dries, all,
The problem is that the OpenMPI group takes ROMIO as-is from an older
release. Then they pass MPI_COMBINER_SUBARRAY as-is to ROMIO, which in
that release it does not understand. Then ROMIO blows up.
I consider this an OpenMPI bug.
Rob
Jianwei Li wrote:
> Hi Dries,
>
> Thank you for your test!
>
> It looks like that the Open MPI you are using does not support
> MPI_COMBINER_SUBARRAY.
>
> Can you double check that in your configure output?
> For example, my system gives the following configuration:
>
> checking if MPI includes MPI_COMBINER_DUP... yes
> checking if MPI includes MPI_COMBINER_HVECTOR_INTEGER... yes
> checking if MPI includes MPI_COMBINER_HINDEXED_INTEGER... yes
> checking if MPI includes MPI_COMBINER_SUBARRAY... yes
> checking if MPI includes MPI_COMBINER_DARRAY... yes
> checking if MPI includes MPI_COMBINER_RESIZED... yes
> checking if MPI includes MPI_COMBINER_STRUCT_INTEGER... yes
> checking if MPI includes MPI_COMBINER_INDEXED_BLOCK... yes
> checking if MPI includes MPI_COMBINER_F90_REAL... yes
> checking if MPI includes MPI_COMBINER_F90_INTEGER... yes
> checking if MPI includes MPI_COMBINER_F90_COMPLEX... yes
> checking if MPI includes MPI_CHARACTER... no
> checking if MPI includes MPI_REAL... no
> checking if MPI includes MPI_INTEGER... no
> checking if MPI includes MPI_DOUBLE_PRECISION... no
> checking if MPI includes MPI_INTEGER1... no
> checking if MPI includes MPI_INTEGER2... no
> checking if MPI includes MPI_INTEGER4... no
> checking if MPI includes MPI_INTEGER8... no
> checking if MPI includes MPI_INTEGER16... no
> checking if MPI includes MPI_REAL4... no
> checking if MPI includes MPI_REAL8... no
> checking if MPI includes MPI_REAL16... no
> checking if MPI includes MPI_COMPLEX8... no
> checking if MPI includes MPI_COMPLEX16... no
> checking if MPI includes MPI_COMPLEX32... no
>
> Thanks!
>
> Jianwei
>
>
> On Thu, 2006-05-04 at 12:38 +0200, Dries Kimpe wrote:
>> The facts:
>>
>> * parallel netcdf compiles, both with Open MPI (svn trunk) and mpich2 .
>> * With Open MPI, all tests fail with the following message
>>
>> Testing write ... Error: Unsupported datatype passed to
>> ADIOI_Count_contiguous_blocks
>> [lts.mydomain.be:26763] [0,0,0] ORTE_ERROR_LOG: Not found in file
>> ../../../../orte/mca/pls/base/pls_base_proxy.c at line 189
>>
>> (both independent&collective writes, no mather what the underlying variable type is).
>>
>> Steps to reproduce the problem:
>>
>> 1) build Open MPI trunk revision 9809
>>
>> Special configure options used: --enable-static --enable-shared
>> gcc:
>> gcc (GCC) 3.3.6 (Gentoo 3.3.6, ssp-3.3.6-1.0, pie-8.7.8)
>> Copyright (C) 2003 Free Software Foundation, Inc.
>> This is free software; see the source for copying conditions. There is NO
>> warranty; not even for MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.
>>
>> 2) verify that the correct compiler is being called
>>
>> lts at mhd3 ~/work/pnetcdf/parallel-netcdf-1.0.1/build/ompi $ mpic++ -showme
>> g++ -I/home/lts/openmpi/include -I/home/lts/openmpi/include/openmpi -pthread -L/home/lts/openmpi/lib
>> -lmpi_cxx -lmpi -lorte -lopal -lrt -ldl -Wl,--export-dynamic -lnsl -lutil -lm -ldl
>> lts at mhd3 ~/work/pnetcdf/parallel-netcdf-1.0.1/build/ompi $ mpicc -showme
>> gcc -I/home/lts/openmpi/include -I/home/lts/openmpi/include/openmpi -pthread -L/home/lts/openmpi/lib
>> -lmpi -lorte -lopal -lrt -ldl -Wl,--export-dynamic -lnsl -lutil -lm -ldl
>> lts at mhd3 ~/work/pnetcdf/parallel-netcdf-1.0.1/build/ompi $ mpicc --version
>> gcc (GCC) 3.3.6 (Gentoo 3.3.6, ssp-3.3.6-1.0, pie-8.7.8)
>> Copyright (C) 2003 Free Software Foundation, Inc.
>> This is free software; see the source for copying conditions. There is NO
>> warranty; not even for MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.
>>
>>
>> 3) unpack parallel-netcdf-1.0.1.tar.bz2
>> mkdir build
>> cd build
>> mkdir openmpi
>> cd openmpi
>> ../../configure --prefix=/home/lts/openmpi --disable-fortran CC=mpicc
>> make
>> make install
>>
>>
>> 4) go to test directory
>> make
>> (-> fortran tests fail to compile, which is 'normal')
>>
>> Try out test/test_double:
>>
>> lts at mhd3 ~/work/pnetcdf/parallel-netcdf-1.0.1/build/ompi/test/test_double $ ./test_write test.nc
>> Testing write ... ADIOI_GEN_DELETE (line 22): **io No such file or directoryError: Unsupported
>> datatype passed to ADIOI_Count_contiguous_blocks
>> [mhd3:24861] [0,0,0] ORTE_ERROR_LOG: Not found in file
>> ../../../../orte/mca/pls/base/pls_base_proxy.c at line 189
>>
>> verify correct libraries are being used:
>> lts at mhd3 ~/work/pnetcdf/parallel-netcdf-1.0.1/build/ompi/test/test_double $ ldd test_write
>> linux-gate.so.1 => (0xffffe000)
>> libmpi.so.0 => /home/lts/openmpi/lib/libmpi.so.0 (0xb7e3d000)
>> liborte.so.0 => /home/lts/openmpi/lib/liborte.so.0 (0xb7da5000)
>> libopal.so.0 => /home/lts/openmpi/lib/libopal.so.0 (0xb7d71000)
>> librt.so.1 => /lib/librt.so.1 (0xb7d47000)
>> libdl.so.2 => /lib/libdl.so.2 (0xb7d43000)
>> libnsl.so.1 => /lib/libnsl.so.1 (0xb7d2e000)
>> libutil.so.1 => /lib/libutil.so.1 (0xb7d29000)
>> libm.so.6 => /lib/libm.so.6 (0xb7d07000)
>> libpthread.so.0 => /lib/libpthread.so.0 (0xb7cf5000)
>> libc.so.6 => /lib/libc.so.6 (0xb7be2000)
>> /lib/ld-linux.so.2 (0xb7f53000)
>>
>>
>> Same behaviour on multiple CPUs:
>> lts at mhd3 ~/work/pnetcdf/parallel-netcdf-1.0.1/build/ompi/test/test_double $ mpirun -np 2
>> ./test_write test.nc
>> Testing write ... ADIOI_GEN_DELETE (line 22): **io No such file or directoryError: Unsupported
>> datatype passed to ADIOI_Count_contiguous_blocks
>> Error: Unsupported datatype passed to ADIOI_Count_contiguous_blocks
>> 1 additional process aborted (not shown)
>>
>>
>>
>> With test_dtype:
>> lts at mhd3 ~/work/pnetcdf/parallel-netcdf-1.0.1/build/ompi/test/test_dtype $ ./test_nonblocking test.nc
>> testing memory subarray layout ...
>> ADIOI_GEN_DELETE (line 22): **io No such file or directory Filesize = 2.024MB, MAX_Memory_needed =
>> 4.048MB
>>
>> Initialization: NDIMS = 3, NATIVE_ETYPE = float, NC_TYPE = NC_DOUBLE
>>
>> NC Var_1 Shape: [17, 51, 153] Always ORDER_C
>> NC Var_2 Shape: [153, 51, 17] Always ORDER_C
>> Memory Array Shape: [17, 51, 153] MPI_ORDER_C
>> Memory Array Copys: buf1 for write, buf2 for read back (and compare)
>>
>> Logical Array Partition: BLOCK partition along all dimensions
>>
>> Access Pattern (subarray): NPROCS = 1
>>
>> Proc 0 of 1: starts = [ 0, 0, 0], counts = [17, 51, 153]
>>
>> TEST1:
>> [nonblocking] all procs writing their subarrays into Var_1 ...
>> Error: Unsupported datatype passed to ADIOI_Count_contiguous_blocks
>> [mhd3:27627] [0,0,0] ORTE_ERROR_LOG: Not found in file
>> ../../../../orte/mca/pls/base/pls_base_proxy.c at line 189
>> lts at mhd3 ~/work/pnetcdf/parallel-netcdf-1.0.1/build/ompi/test/test_dtype $
>>
>>
>> I already searched google for this kind of array, but found nothing useful.
>>
>> Greetings,
>> Dries
>>
>>
>> Disclaimer: http://www.kuleuven.be/cwis/email_disclaimer.htm
>>
>> email message attachment (pnetcdf & Open MPI)
>> On Thu, 2006-05-04 at 12:38 +0200, Dries Kimpe wrote:
>
More information about the parallel-netcdf
mailing list