[petsc-users] Unable to read in values thru namelist in Fortran after using PETSc 64bit in linux

TAY wee-beng zonexo at gmail.com
Wed Apr 3 00:48:11 CDT 2019


Hi,

I just encounter a mpi_allgatherv problem when using 64bit PETSc:

call 
MPI_ALLGATHERV(tmp_mpi_data,counter,MPIU_INTEGER,tmp_mpi_data2,counter_global,idisp,MPIU_INTEGER,MPI_COMM_WORLD,ierr)

The error is:

Fatal error in PMPI_Allgatherv: Message truncated, error stack:
PMPI_Allgatherv(1452).....: MPI_Allgatherv(sbuf=0x75cfdc0, scount=1119, 
dtype=0x4c000831, rbuf=0x75d8600, rcounts=0x7ffffffdb880, 
displs=0x7ffffffdb860, dtype=0x4c000831, MPI_COMM_WORLD) failed
MPIR_Allgatherv_impl(1013): fail failed
MPIR_Allgatherv(967)......: fail failed

MPIR_Allgatherv_intra(222): fail failed
MPIR_Localcopy(107).......: Message truncated; 8952 bytes received but 
buffer size is 8864

The variables are all defined as PetscInt. The strange thing is that I 
did 2-3 MPI_ALLGATHERV which are exactly the same, but only the last one 
got problem. Should I change all these variables to PetscMPIInt?

Also, can I change all PetscInt in the code to PetscMPIInt, except if 
it's labeled void *?


Thank you very much.

Yours sincerely,

================================================
TAY Wee-Beng (Zheng Weiming) 郑伟明
Personal research webpage: http://tayweebeng.wixsite.com/website
Youtube research showcase: https://www.youtube.com/channel/UC72ZHtvQNMpNs2uRTSToiLA
linkedin: www.linkedin.com/in/tay-weebeng
================================================

On 21/9/2018 2:57 AM, Smith, Barry F. wrote:
>     Yes, you need to go through your code and check each MPI call and make sure you use PetscMPIInt for integer arguments and PetscInt for the void* arguments and also make sure that the data type you use in the MPI calls (when communicating PetscInt) is MPIU_INT.
>
>      You should not need a fancy debugger to find out the crash point. Just a basic debugger like gdb, lldb, or dbx will
>
>     Barry
>
>
>> On Sep 20, 2018, at 8:57 AM, TAY wee-beng <zonexo at gmail.com> wrote:
>>
>> Hi,
>>
>> Sorry I'm still a bit confused. My 64bit code still doesn't work once I use more than 1 procs. It just aborts at some point. I'm been trying to use ARM Forge mpi debugging tool to find the error but it's a bit difficult to back trace.
>>
>> So I should carefully inspect each mpi subroutine or function, is that correct?
>>
>> If it's INT, then I should use PetscMPIInt. If it's labeled void *, I should use PetscInt. Is that so?
>>
>> Thank you very much
>>
>> Yours sincerely,
>>
>> ================================================
>> TAY Wee-Beng 郑伟明 (Zheng Weiming)
>> Personal research webpage: http://tayweebeng.wixsite.com/website
>> Youtube research showcase: https://www.youtube.com/channel/UC72ZHtvQNMpNs2uRTSToiLA
>> linkedin: www.linkedin.com/in/tay-weebeng
>> ================================================
>>
>> On 19/9/2018 12:49 AM, Smith, Barry F. wrote:
>>>     PetscMPIInt  (or integer) are for all lengths passed to MPI functions; simple look at the prototypes for the MPI function you care about and it will tell you which arguments are integer.  The DATA you are passing into the MPI arrays (which are labeled void * in the manual pages) should be PetscInt.
>>>
>>>
>>>
>>>> On Sep 18, 2018, at 1:39 AM, TAY wee-beng <zonexo at gmail.com> wrote:
>>>>
>>>> Hi,
>>>>
>>>> In that case, does it apply to all MPI subroutines such as MPI_ALLGATHER?
>>>>
>>>> In other words, must I assign local_array_length etc as PetscMPIInt?
>>>>
>>>> call MPI_ALLGATHER(local_array_length,1,MPIU_INTEGER,array_length,1,MPIU_INTEGER,MPI_COMM_WORLD,ierr)
>>>>
>>>> Or is it ok to change all integers from PetscInt to PetscMPIInt?
>>>>
>>>> With the exception of ierr - PetscErrorCode
>>>>
>>>>
>>>> Thank you very much.
>>>>
>>>> Yours sincerely,
>>>>
>>>> ================================================
>>>> TAY Wee-Beng (Zheng Weiming) 郑伟明
>>>> Personal research webpage: http://tayweebeng.wixsite.com/website
>>>> Youtube research showcase: https://www.youtube.com/channel/UC72ZHtvQNMpNs2uRTSToiLA
>>>> linkedin: www.linkedin.com/in/tay-weebeng
>>>> ================================================
>>>>
>>>> On 18/9/2018 1:39 PM, Balay, Satish wrote:
>>>>> https://www.mpich.org/static/docs/v3.1/www3/MPI_Comm_size.html
>>>>>
>>>>> int MPI_Comm_size( MPI_Comm comm, int *size )
>>>>>
>>>>> i.e there is no PetscInt here. [MPI does not know about PETSc datatypes]
>>>>>
>>>>> For convinence we provide PetscMPIInt to keep track of such variables
>>>>> [similarly PetscBLASInt]. For eg: Check src/vec/vec/examples/tests/ex2f.F
>>>>>
>>>>> Satish
>>>>>
>>>>> On Tue, 18 Sep 2018, TAY wee-beng wrote:
>>>>>
>>>>>> Hi,
>>>>>>
>>>>>> I managed to find the error appearing after using PETSc 64bit in linux -
>>>>>>
>>>>>> call MPI_COMM_SIZE(MPI_COMM_WORLD, num_procs, ierr)
>>>>>>
>>>>>> I have assigned num_procs as PetscInt and I got 0 instead of 1 (for 1 procs)
>>>>>>
>>>>>> Assigning num_procs as integer as the problem.
>>>>>>
>>>>>> Is this supposed to be the case? Or is it a bug?
>>>>>>
>>>>>> Thank you very much.
>>>>>>
>>>>>> Yours sincerely,
>>>>>>
>>>>>> ================================================
>>>>>> TAY Wee-Beng (Zheng Weiming) 郑伟明
>>>>>> Personal research webpage: http://tayweebeng.wixsite.com/website
>>>>>> Youtube research showcase:
>>>>>> https://www.youtube.com/channel/UC72ZHtvQNMpNs2uRTSToiLA
>>>>>> linkedin: www.linkedin.com/in/tay-weebeng
>>>>>> ================================================
>>>>>>
>>>>>> On 8/9/2018 1:14 AM, Smith, Barry F. wrote:
>>>>>>>         You can try valgrind
>>>>>>>         http://www.mcs.anl.gov/petsc/documentation/faq.html#valgrind
>>>>>>>
>>>>>>>     Barry
>>>>>>>
>>>>>>>
>>>>>>>> On Sep 7, 2018, at 1:44 AM, TAY wee-beng <zonexo at gmail.com> wrote:
>>>>>>>>
>>>>>>>> Hi,
>>>>>>>>
>>>>>>>> I found that I am unable to read in values thru namelist in Fortran after
>>>>>>>> using PETSc 64bit in linux.
>>>>>>>>
>>>>>>>> I have a parameter txt file which is read in using namelist in Fortran:
>>>>>>>>
>>>>>>>> namelist /body_input/ no_body, convex_body, motion_type, hover, wing_config
>>>>>>>> ...
>>>>>>>>
>>>>>>>> open (unit = 44 , FILE = "ibm3d_input.txt" , status = "old", iostat =
>>>>>>>> openstatus(4))
>>>>>>>>
>>>>>>>>           if (openstatus(4) > 0) then
>>>>>>>>
>>>>>>>>               print *, "ibm3d_input file not present or wrong filename."
>>>>>>>>
>>>>>>>>               stop
>>>>>>>>
>>>>>>>>           end if
>>>>>>>>
>>>>>>>>           read (44,nml = solver_input)
>>>>>>>>
>>>>>>>>           read (44,nml = grid_input)
>>>>>>>>
>>>>>>>>           read (44,nml = body_input)...
>>>>>>>>
>>>>>>>>
>>>>>>>> After using PETSc 64bit, my code aborts and I realise that it is because
>>>>>>>> the values have became NaN. Strangely, it does not occur in windows with
>>>>>>>> VS2008.
>>>>>>>>
>>>>>>>> I wonder if it's a bug with the Intel Fortran compiler 2018.
>>>>>>>>
>>>>>>>> Anyone has similar experience?
>>>>>>>>
>>>>>>>> -- 
>>>>>>>> Thank you very much.
>>>>>>>>
>>>>>>>> Yours sincerely,
>>>>>>>>
>>>>>>>> ================================================
>>>>>>>> TAY Wee-Beng (Zheng Weiming) 郑伟明
>>>>>>>> Personal research webpage: http://tayweebeng.wixsite.com/website
>>>>>>>> Youtube research showcase:
>>>>>>>> https://www.youtube.com/channel/UC72ZHtvQNMpNs2uRTSToiLA
>>>>>>>> linkedin: www.linkedin.com/in/tay-weebeng
>>>>>>>> ================================================
>>>>>>>>


More information about the petsc-users mailing list