[petsc-users] [petsc-maint] DMSwarm on multiple processors

Barry Smith bsmith at petsc.dev
Thu Oct 26 09:58:55 CDT 2023


   Please run with -malloc_debug option or even better run under Valgrind https://petsc.org/release/faq/



> On Oct 26, 2023, at 10:35 AM, Joauma Marichal via petsc-users <petsc-users at mcs.anl.gov> wrote:
> 
> Hello, 
>  
> Here is a very simple version where I have issues.
>  
> Which I run as follows:
>  
> cd Grid_generation 
> make clean 
> make all
> ./grid_generation 
> cd ..
> make clean 
> make all
> ./cobpor # on 1 proc
> # OR
> mpiexec ./cobpor -ksp_type cg -pc_type pfmg -dm_mat_type hyprestruct -pc_pfmg_skip_relax 1 -pc_pfmg_rap_time non-Galerkin # on multiple procs
>  
> The error that I get is the following:
> munmap_chunk(): invalid pointer
> [cns266:2552391] *** Process received signal ***
> [cns266:2552391] Signal: Aborted (6)
> [cns266:2552391] Signal code:  (-6)
> [cns266:2552391] [ 0] /lib64/libc.so <http://libc.so/>.6(+0x4eb20)[0x7fd7fd194b20]
> [cns266:2552391] [ 1] /lib64/libc.so <http://libc.so/>.6(gsignal+0x10f)[0x7fd7fd194a9f]
> [cns266:2552391] [ 2] /lib64/libc.so <http://libc.so/>.6(abort+0x127)[0x7fd7fd167e05]
> [cns266:2552391] [ 3] /lib64/libc.so <http://libc.so/>.6(+0x91037)[0x7fd7fd1d7037]
> [cns266:2552391] [ 4] /lib64/libc.so <http://libc.so/>.6(+0x9819c)[0x7fd7fd1de19c]
> [cns266:2552391] [ 5] /lib64/libc.so <http://libc.so/>.6(+0x9844c)[0x7fd7fd1de44c]
> [cns266:2552391] [ 6] /gpfs/home/acad/ucl-tfl/marichaj/marha/lib_petsc/lib/libpetsc.so <http://libpetsc.so/>.3.019(PetscFreeAlign+0xe)[0x7fd7fe63d50e]
> [cns266:2552391] [ 7] /gpfs/home/acad/ucl-tfl/marichaj/marha/lib_petsc/lib/libpetsc.so <http://libpetsc.so/>.3.019(DMSetMatType+0x3d)[0x7fd7feab87ad]
> [cns266:2552391] [ 8] /gpfs/home/acad/ucl-tfl/marichaj/marha/lib_petsc/lib/libpetsc.so <http://libpetsc.so/>.3.019(DMSetFromOptions+0x109)[0x7fd7feab8b59]
> [cns266:2552391] [ 9] ./cobpor[0x402df9]
> [cns266:2552391] [10] /lib64/libc.so <http://libc.so/>.6(__libc_start_main+0xf3)[0x7fd7fd180cf3]
> [cns266:2552391] [11] ./cobpor[0x40304e]
> [cns266:2552391] *** End of error message ***
>  
>  
> Thanks a lot for your help. 
>  
> Best regards, 
>  
> Joauma
>  
>  
>  
> De : Matthew Knepley <knepley at gmail.com <mailto:knepley at gmail.com>>
> Date : mercredi, 25 octobre 2023 à 14:45
> À : Joauma Marichal <joauma.marichal at uclouvain.be <mailto:joauma.marichal at uclouvain.be>>
> Cc : petsc-maint at mcs.anl.gov <mailto:petsc-maint at mcs.anl.gov> <petsc-maint at mcs.anl.gov <mailto:petsc-maint at mcs.anl.gov>>, petsc-users at mcs.anl.gov <mailto:petsc-users at mcs.anl.gov> <petsc-users at mcs.anl.gov <mailto:petsc-users at mcs.anl.gov>>
> Objet : Re: [petsc-maint] DMSwarm on multiple processors
> 
> On Wed, Oct 25, 2023 at 8:32 AM Joauma Marichal via petsc-maint <petsc-maint at mcs.anl.gov <mailto:petsc-maint at mcs.anl.gov>> wrote:
> Hello, 
>  
> I am using the DMSwarm library in some Eulerian-Lagrangian approach to have vapor bubbles in water. 
> I have obtained nice results recently and wanted to perform bigger simulations. Unfortunately, when I increase the number of processors used to run the simulation, I get the following error:
>  
> free(): invalid size
> 
> [cns136:590327] *** Process received signal ***
> 
> [cns136:590327] Signal: Aborted (6)
> 
> [cns136:590327] Signal code:  (-6)
> 
> [cns136:590327] [ 0] /lib64/libc.so <http://libc.so/>.6(+0x4eb20)[0x7f56cd4c9b20]
> 
> [cns136:590327] [ 1] /lib64/libc.so <http://libc.so/>.6(gsignal+0x10f)[0x7f56cd4c9a9f]
> 
> [cns136:590327] [ 2] /lib64/libc.so <http://libc.so/>.6(abort+0x127)[0x7f56cd49ce05]
> 
> [cns136:590327] [ 3] /lib64/libc.so <http://libc.so/>.6(+0x91037)[0x7f56cd50c037]
> 
> [cns136:590327] [ 4] /lib64/libc.so <http://libc.so/>.6(+0x9819c)[0x7f56cd51319c]
> 
> [cns136:590327] [ 5] /lib64/libc.so <http://libc.so/>.6(+0x99aac)[0x7f56cd514aac]
> 
> [cns136:590327] [ 6] /gpfs/home/acad/ucl-tfl/marichaj/marha/lib_petsc/lib/libpetsc.so <http://libpetsc.so/>.3.019(PetscSFSetUpRanks+0x4c4)[0x7f56cea71e64]
> 
> [cns136:590327] [ 7] /gpfs/home/acad/ucl-tfl/marichaj/marha/lib_petsc/lib/libpetsc.so <http://libpetsc.so/>.3.019(+0x841642)[0x7f56cea83642]
> 
> [cns136:590327] [ 8] /gpfs/home/acad/ucl-tfl/marichaj/marha/lib_petsc/lib/libpetsc.so <http://libpetsc.so/>.3.019(PetscSFSetUp+0x9e)[0x7f56cea7043e]
> 
> [cns136:590327] [ 9] /gpfs/home/acad/ucl-tfl/marichaj/marha/lib_petsc/lib/libpetsc.so <http://libpetsc.so/>.3.019(VecScatterCreate+0x164e)[0x7f56cea7bbde]
> 
> [cns136:590327] [10] /gpfs/home/acad/ucl-tfl/marichaj/marha/lib_petsc/lib/libpetsc.so <http://libpetsc.so/>.3.019(DMSetUp_DA_3D+0x3e38)[0x7f56cee84dd8]
> 
> [cns136:590327] [11] /gpfs/home/acad/ucl-tfl/marichaj/marha/lib_petsc/lib/libpetsc.so <http://libpetsc.so/>.3.019(DMSetUp_DA+0xd8)[0x7f56cee9b448]
> 
> [cns136:590327] [12] /gpfs/home/acad/ucl-tfl/marichaj/marha/lib_petsc/lib/libpetsc.so <http://libpetsc.so/>.3.019(DMSetUp+0x20)[0x7f56cededa20]
> 
> [cns136:590327] [13] ./cobpor[0x4418dc]
> 
> [cns136:590327] [14] ./cobpor[0x408b63]
> 
> [cns136:590327] [15] /lib64/libc.so <http://libc.so/>.6(__libc_start_main+0xf3)[0x7f56cd4b5cf3]
> 
> [cns136:590327] [16] ./cobpor[0x40bdee]
> 
> [cns136:590327] *** End of error message ***
> 
> --------------------------------------------------------------------------
> 
> Primary job  terminated normally, but 1 process returned
> 
> a non-zero exit code. Per user-direction, the job has been aborted.
> 
> --------------------------------------------------------------------------
> 
> --------------------------------------------------------------------------
> 
> mpiexec noticed that process rank 84 with PID 590327 on node cns136 exited on signal 6 (Aborted).
> 
> --------------------------------------------------------------------------
> 
>  
> When I reduce the number of processors the error disappears and when I run my code without the vapor bubbles it also works.
> The problem seems to take place at this moment:
>  
> DMCreate(PETSC_COMM_WORLD,swarm);
>     DMSetType(*swarm,DMSWARM);
>     DMSetDimension(*swarm,3);
>     DMSwarmSetType(*swarm,DMSWARM_PIC);
>     DMSwarmSetCellDM(*swarm,*dmcell);
>  
>  
> Thanks a lot for your help. 
>  
> Things that would help us track this down:
>  
> 1) The smallest example where it fails
>  
> 2) The smallest number of processes where it fails
>  
> 3) A stack trace of the failure
>  
> 4) A simple example that we can run that also fails
>  
>   Thanks,
>  
>      Matt
>  
> Best regards, 
>  
> Joauma
> 
>  
> --
> What most experimenters take for granted before they begin their experiments is infinitely more interesting than any results to which their experiments lead.
> -- Norbert Wiener
>  
> https://www.cse.buffalo.edu/~knepley/ <http://www.cse.buffalo.edu/~knepley/>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.mcs.anl.gov/pipermail/petsc-users/attachments/20231026/e322706f/attachment-0001.html>


More information about the petsc-users mailing list