[petsc-users] Is PETSc using internet?

Barry Smith bsmith at petsc.dev
Tue Jul 21 19:38:09 CDT 2020


   Here is one type of hang. 

    $ petscmpiexec -n 2  ./ex1

then in another window

$ ps  | grep ex1
12015 ttys000    0:00.01 /bin/csh -f /Users/barrysmith/Src/petsc/lib/petsc/bin/petscmpiexec -n 2 ./ex1
12038 ttys000    0:00.01 mpiexec -n 2 ./ex1
12193 ttys001    0:00.00 grep ex1
~/Src/petsc/src/snes/tests (barry/2020-07-12/factor-view-no-malloc *=) 
$ lldb -p 12038
(lldb) process attach --pid 12038
Process 12038 stopped
* thread #1, queue = 'com.apple.main-thread', stop reason = signal SIGSTOP
    frame #0: 0x00007fff6dbe73d6 libsystem_kernel.dylib`poll + 10
libsystem_kernel.dylib`poll:
->  0x7fff6dbe73d6 <+10>: jae    0x7fff6dbe73e0            ; <+20>
    0x7fff6dbe73d8 <+12>: movq   %rax, %rdi
    0x7fff6dbe73db <+15>: jmp    0x7fff6dbe222d            ; cerror
    0x7fff6dbe73e0 <+20>: retq   
Target 0: (mpiexec) stopped.

Executable module set to "/Users/barrysmith/soft/clang-ifort/bin/mpiexec".
Architecture set to: x86_64h-apple-macosx-.
(lldb) bt
* thread #1, queue = 'com.apple.main-thread', stop reason = signal SIGSTOP
  * frame #0: 0x00007fff6dbe73d6 libsystem_kernel.dylib`poll + 10
    frame #1: 0x0000000106f35ff1 mpiexec`HYDT_dmxu_poll_wait_for_event + 737
    frame #2: 0x0000000106f35897 mpiexec`HYDT_dmx_wait_for_event + 23
    frame #3: 0x0000000106ef7208 mpiexec`HYD_pmci_wait_for_completion + 984
    frame #4: 0x0000000106ecbe67 mpiexec`main + 8391
    frame #5: 0x00007fff6da9fcc9 libdyld.dylib`start + 1

It is indicative of some "network" problem even though I am planning to run both processes on my Mac. 
It doesn't have anything to do with PETSc, but the network state of your machine (even when disconnected from the network) and MPICH

Where do you get the hang if you run like above?

 Barry


> On Jul 21, 2020, at 11:57 AM, Satish Balay via petsc-users <petsc-users at mcs.anl.gov> wrote:
> 
> you can run in the gdb to see if its hanging in a call to gethostbyname() or somewhere else.
> 
> Satish
> 
> On Tue, 21 Jul 2020, Eda Oktay wrote:
> 
>> Dear Lawrence,
>> 
>> The problem is not the error by the way, my program is waiting something
>> without stopping and it is not giving error. It just does nothing.
>> 
>> Does the problem is still because of hostname?
>> 
>> Thanks!
>> 
>> Eda
>> 
>> On Tue, Jul 21, 2020, 1:16 PM Lawrence Mitchell <wencel at gmail.com> wrote:
>> 
>>> 
>>> 
>>>> On 21 Jul 2020, at 11:06, Eda Oktay <eda.oktay at metu.edu.tr> wrote:
>>>> 
>>>> Dear Lawrence,
>>>> 
>>>> I am using MPICC but not Mac, Fedora 25. If it will still work, I will
>>> try that.
>>>> 
>>>> Thanks!
>>> 
>>> It might be the case. When you observe the error, does "nslookup
>>> localhost" take a long time?
>>> 
>>> Lawrence
>> 
> 

-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.mcs.anl.gov/pipermail/petsc-users/attachments/20200721/74c86f91/attachment.html>


More information about the petsc-users mailing list