[petsc-users] [petsc-maint] DMSwarm on multiple processors
Joauma Marichal
joauma.marichal at uclouvain.be
Mon Dec 18 04:09:36 CST 2023
Hello,
Sorry for the delay. I attach the file that I obtain when running the code with the debug mode.
Thanks for your help.
Best regards,
Joauma
De : Matthew Knepley <knepley at gmail.com>
Date : jeudi, 23 novembre 2023 à 15:32
À : Joauma Marichal <joauma.marichal at uclouvain.be>
Cc : petsc-maint at mcs.anl.gov <petsc-maint at mcs.anl.gov>, petsc-users at mcs.anl.gov <petsc-users at mcs.anl.gov>
Objet : Re: [petsc-maint] DMSwarm on multiple processors
On Thu, Nov 23, 2023 at 9:01 AM Joauma Marichal <joauma.marichal at uclouvain.be<mailto:joauma.marichal at uclouvain.be>> wrote:
Hello,
My problem persists… Is there anything I could try?
Yes. It appears to be failing from a call inside PetscSFSetUpRanks(). It does allocation, and the failure
is in libc, and it only happens on larger examples, so I suspect some allocation problem. Can you rebuild with debugging and run this example? Then we can see if the allocation fails.
Thanks,
Matt
Thanks a lot.
Best regards,
Joauma
De : Matthew Knepley <knepley at gmail.com<mailto:knepley at gmail.com>>
Date : mercredi, 25 octobre 2023 à 14:45
À : Joauma Marichal <joauma.marichal at uclouvain.be<mailto:joauma.marichal at uclouvain.be>>
Cc : petsc-maint at mcs.anl.gov<mailto:petsc-maint at mcs.anl.gov> <petsc-maint at mcs.anl.gov<mailto:petsc-maint at mcs.anl.gov>>, petsc-users at mcs.anl.gov<mailto:petsc-users at mcs.anl.gov> <petsc-users at mcs.anl.gov<mailto:petsc-users at mcs.anl.gov>>
Objet : Re: [petsc-maint] DMSwarm on multiple processors
On Wed, Oct 25, 2023 at 8:32 AM Joauma Marichal via petsc-maint <petsc-maint at mcs.anl.gov<mailto:petsc-maint at mcs.anl.gov>> wrote:
Hello,
I am using the DMSwarm library in some Eulerian-Lagrangian approach to have vapor bubbles in water.
I have obtained nice results recently and wanted to perform bigger simulations. Unfortunately, when I increase the number of processors used to run the simulation, I get the following error:
free(): invalid size
[cns136:590327] *** Process received signal ***
[cns136:590327] Signal: Aborted (6)
[cns136:590327] Signal code: (-6)
[cns136:590327] [ 0] /lib64/libc.so.6(+0x4eb20)[0x7f56cd4c9b20]
[cns136:590327] [ 1] /lib64/libc.so.6(gsignal+0x10f)[0x7f56cd4c9a9f]
[cns136:590327] [ 2] /lib64/libc.so.6(abort+0x127)[0x7f56cd49ce05]
[cns136:590327] [ 3] /lib64/libc.so.6(+0x91037)[0x7f56cd50c037]
[cns136:590327] [ 4] /lib64/libc.so.6(+0x9819c)[0x7f56cd51319c]
[cns136:590327] [ 5] /lib64/libc.so.6(+0x99aac)[0x7f56cd514aac]
[cns136:590327] [ 6] /gpfs/home/acad/ucl-tfl/marichaj/marha/lib_petsc/lib/libpetsc.so.3.019(PetscSFSetUpRanks+0x4c4)[0x7f56cea71e64]
[cns136:590327] [ 7] /gpfs/home/acad/ucl-tfl/marichaj/marha/lib_petsc/lib/libpetsc.so.3.019(+0x841642)[0x7f56cea83642]
[cns136:590327] [ 8] /gpfs/home/acad/ucl-tfl/marichaj/marha/lib_petsc/lib/libpetsc.so.3.019(PetscSFSetUp+0x9e)[0x7f56cea7043e]
[cns136:590327] [ 9] /gpfs/home/acad/ucl-tfl/marichaj/marha/lib_petsc/lib/libpetsc.so.3.019(VecScatterCreate+0x164e)[0x7f56cea7bbde]
[cns136:590327] [10] /gpfs/home/acad/ucl-tfl/marichaj/marha/lib_petsc/lib/libpetsc.so.3.019(DMSetUp_DA_3D+0x3e38)[0x7f56cee84dd8]
[cns136:590327] [11] /gpfs/home/acad/ucl-tfl/marichaj/marha/lib_petsc/lib/libpetsc.so.3.019(DMSetUp_DA+0xd8)[0x7f56cee9b448]
[cns136:590327] [12] /gpfs/home/acad/ucl-tfl/marichaj/marha/lib_petsc/lib/libpetsc.so.3.019(DMSetUp+0x20)[0x7f56cededa20]
[cns136:590327] [13] ./cobpor[0x4418dc]
[cns136:590327] [14] ./cobpor[0x408b63]
[cns136:590327] [15] /lib64/libc.so.6(__libc_start_main+0xf3)[0x7f56cd4b5cf3]
[cns136:590327] [16] ./cobpor[0x40bdee]
[cns136:590327] *** End of error message ***
--------------------------------------------------------------------------
Primary job terminated normally, but 1 process returned
a non-zero exit code. Per user-direction, the job has been aborted.
--------------------------------------------------------------------------
--------------------------------------------------------------------------
mpiexec noticed that process rank 84 with PID 590327 on node cns136 exited on signal 6 (Aborted).
--------------------------------------------------------------------------
When I reduce the number of processors the error disappears and when I run my code without the vapor bubbles it also works.
The problem seems to take place at this moment:
DMCreate(PETSC_COMM_WORLD,swarm);
DMSetType(*swarm,DMSWARM);
DMSetDimension(*swarm,3);
DMSwarmSetType(*swarm,DMSWARM_PIC);
DMSwarmSetCellDM(*swarm,*dmcell);
Thanks a lot for your help.
Things that would help us track this down:
1) The smallest example where it fails
2) The smallest number of processes where it fails
3) A stack trace of the failure
4) A simple example that we can run that also fails
Thanks,
Matt
Best regards,
Joauma
--
What most experimenters take for granted before they begin their experiments is infinitely more interesting than any results to which their experiments lead.
-- Norbert Wiener
https://www.cse.buffalo.edu/~knepley/<http://www.cse.buffalo.edu/~knepley/>
--
What most experimenters take for granted before they begin their experiments is infinitely more interesting than any results to which their experiments lead.
-- Norbert Wiener
https://www.cse.buffalo.edu/~knepley/<http://www.cse.buffalo.edu/~knepley/>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.mcs.anl.gov/pipermail/petsc-users/attachments/20231218/ee164d94/attachment-0001.html>
-------------- next part --------------
A non-text attachment was scrubbed...
Name: slurm-3184479.out
Type: application/octet-stream
Size: 55415 bytes
Desc: slurm-3184479.out
URL: <http://lists.mcs.anl.gov/pipermail/petsc-users/attachments/20231218/ee164d94/attachment-0001.obj>
More information about the petsc-users
mailing list