[petsc-users] Is PETSc using internet?
Barry Smith
bsmith at petsc.dev
Tue Jul 21 19:38:09 CDT 2020
Here is one type of hang.
$ petscmpiexec -n 2 ./ex1
then in another window
$ ps | grep ex1
12015 ttys000 0:00.01 /bin/csh -f /Users/barrysmith/Src/petsc/lib/petsc/bin/petscmpiexec -n 2 ./ex1
12038 ttys000 0:00.01 mpiexec -n 2 ./ex1
12193 ttys001 0:00.00 grep ex1
~/Src/petsc/src/snes/tests (barry/2020-07-12/factor-view-no-malloc *=)
$ lldb -p 12038
(lldb) process attach --pid 12038
Process 12038 stopped
* thread #1, queue = 'com.apple.main-thread', stop reason = signal SIGSTOP
frame #0: 0x00007fff6dbe73d6 libsystem_kernel.dylib`poll + 10
libsystem_kernel.dylib`poll:
-> 0x7fff6dbe73d6 <+10>: jae 0x7fff6dbe73e0 ; <+20>
0x7fff6dbe73d8 <+12>: movq %rax, %rdi
0x7fff6dbe73db <+15>: jmp 0x7fff6dbe222d ; cerror
0x7fff6dbe73e0 <+20>: retq
Target 0: (mpiexec) stopped.
Executable module set to "/Users/barrysmith/soft/clang-ifort/bin/mpiexec".
Architecture set to: x86_64h-apple-macosx-.
(lldb) bt
* thread #1, queue = 'com.apple.main-thread', stop reason = signal SIGSTOP
* frame #0: 0x00007fff6dbe73d6 libsystem_kernel.dylib`poll + 10
frame #1: 0x0000000106f35ff1 mpiexec`HYDT_dmxu_poll_wait_for_event + 737
frame #2: 0x0000000106f35897 mpiexec`HYDT_dmx_wait_for_event + 23
frame #3: 0x0000000106ef7208 mpiexec`HYD_pmci_wait_for_completion + 984
frame #4: 0x0000000106ecbe67 mpiexec`main + 8391
frame #5: 0x00007fff6da9fcc9 libdyld.dylib`start + 1
It is indicative of some "network" problem even though I am planning to run both processes on my Mac.
It doesn't have anything to do with PETSc, but the network state of your machine (even when disconnected from the network) and MPICH
Where do you get the hang if you run like above?
Barry
> On Jul 21, 2020, at 11:57 AM, Satish Balay via petsc-users <petsc-users at mcs.anl.gov> wrote:
>
> you can run in the gdb to see if its hanging in a call to gethostbyname() or somewhere else.
>
> Satish
>
> On Tue, 21 Jul 2020, Eda Oktay wrote:
>
>> Dear Lawrence,
>>
>> The problem is not the error by the way, my program is waiting something
>> without stopping and it is not giving error. It just does nothing.
>>
>> Does the problem is still because of hostname?
>>
>> Thanks!
>>
>> Eda
>>
>> On Tue, Jul 21, 2020, 1:16 PM Lawrence Mitchell <wencel at gmail.com> wrote:
>>
>>>
>>>
>>>> On 21 Jul 2020, at 11:06, Eda Oktay <eda.oktay at metu.edu.tr> wrote:
>>>>
>>>> Dear Lawrence,
>>>>
>>>> I am using MPICC but not Mac, Fedora 25. If it will still work, I will
>>> try that.
>>>>
>>>> Thanks!
>>>
>>> It might be the case. When you observe the error, does "nslookup
>>> localhost" take a long time?
>>>
>>> Lawrence
>>
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.mcs.anl.gov/pipermail/petsc-users/attachments/20200721/74c86f91/attachment.html>
More information about the petsc-users
mailing list