[Nek5000-users] help me!

nek5000-users at lists.mcs.anl.gov nek5000-users at lists.mcs.anl.gov
Sat Dec 5 17:49:18 CST 2015


Would you mind sharing what kind of setup you have? How is your OpenMPI
configured?
On Dec 5, 2015 5:41 AM, <nek5000-users at lists.mcs.anl.gov> wrote:

> Hi Neks,
> i am still trying to run my first simulation, everytime, something is
> wrong so it becoming stressful.
> In my last run, into the output i have this errors:
>
>
>
> WARNING: It appears that your OpenFabrics subsystem is configured to only
> allow registering part of your physical memory.  This can cause MPI jobs to
> run with erratic performance, hang, and/or crash.
>
> This may be caused by your OpenFabrics vendor limiting the amount of
> physical memory that can be registered.  You should investigate the
> relevant Linux kernel module parameters that control how much physical
> memory can be registered, and increase them to allow registering all
> physical memory on your machine.
>
> See this Open MPI FAQ item for more information on these Linux kernel
> module
> parameters:
>
> http://www.open-mpi.org/faq/?category=openfabrics#ib-locked-pages
>
>   Local host:              node9
>   Registerable memory:     32768 MiB
>   Total memory:            65457 MiB
>
> Your MPI job will continue, but may be behave poorly and/or hang.
> --------------------------------------------------------------------------
> /----------------------------------------------------------\\
> |      _   __ ______ __ __  ______  ____    ____    ____   |
> |     / | / // ____// //_/ / ____/ / __ \\ / __ \\ / __ \\ |
> |    /  |/ // __/  / ,<   /___ \\  / / / // / / // / / /   |
> |   / /|  // /___ / /| | ____/ / / /_/ // /_/ // /_/ /     |
> |  /_/ |_//_____//_/ |_|/_____/  \\____/ \\____/ \\____/   |
> |                                                          |
> |----------------------------------------------------------|
> |                                                          |
> | NEK5000:  Open Source Spectral Element Solver            |
> | COPYRIGHT (c) 2008-2010 UCHICAGO ARGONNE, LLC            |
> | Version:  1.0rc1 / SVN  r1050                            |
> | Web:      http://nek5000.mcs.anl.gov                     |
> |                                                          |
> \\----------------------------------------------------------/
>
>
>  Number of processors:          64
>  REAL    wdsize      :           8
>  INTEGER wdsize      :           4
>
>
>   Beginning session:
> /datastor/eng/gualtieri/SIMULATIONS_NEK/RIB_3D_01/curve.rea
>
>
>  timer accuracy:   0.0000000E+00 sec
>
>  read .rea file
> --------------------------------------------------------------------------
> An MPI process has executed an operation involving a call to the
> "fork()" system call to create a child process.  Open MPI is currently
> operating in a condition that could result in memory corruption or
> other system errors; your MPI job may hang, crash, or produce silent
> data corruption.  The use of fork() (or system() or other calls that
> create child processes) is strongly discouraged.
>
> The process that invoked fork was:
>
>   Local host:          node9 (PID 72129)
>   MPI_COMM_WORLD rank: 0
>
> If you are *absolutely sure* that your application will successfully
> and correctly survive a call to fork(), you may disable this warning
> by setting the mpi_warn_on_fork MCA parameter to 0.
> --------------------------------------------------------------------------
> Thread 4 (Thread 0x2b167717b700 (LWP 72146)):
> #0  0x00000034136df343 in poll () from /lib64/libc.so.6
> #1  0x00002b16754a2d46 in poll_dispatch () from
> /opt/shared/openmpi-1.8.1_gnu/lib/libopen-pal.so.6
> #2  0x00002b167549a7db in opal_libevent2021_event_base_loop () from
> /opt/shared/openmpi-1.8.1_gnu/lib/libopen-pal.so.6
> #3  0x00002b16751f830e in orte_progress_thread_engine () from
> /opt/shared/openmpi-1.8.1_gnu/lib/libopen-rte.so.7
> #4  0x0000003413a079d1 in start_thread () from /lib64/libpthread.so.0
> #5  0x00000034136e8b6d in clone () from /lib64/libc.so.6
> Thread 3 (Thread 0x2b167af9e700 (LWP 72162)):
> #0  0x00000034136e15e3 in select () from /lib64/libc.so.6
> #1  0x00002b16783e31c3 in service_thread_start () from
> /opt/shared/openmpi-1.8.1_gnu/lib/openmpi/mca_btl_openib.so
> #2  0x0000003413a079d1 in start_thread () from /lib64/libpthread.so.0
> #3  0x00000034136e8b6d in clone () from /lib64/libc.so.6
> Thread 2 (Thread 0x2b1683560700 (LWP 72190)):
> #0  0x00000034136df343 in poll () from /lib64/libc.so.6
> #1  0x00002b16783e1902 in btl_openib_async_thread () from
> /opt/shared/openmpi-1.8.1_gnu/lib/openmpi/mca_btl_openib.so
> #2  0x0000003413a079d1 in start_thread () from /lib64/libpthread.so.0
> #3  0x00000034136e8b6d in clone () from /lib64/libc.so.6
> Thread 1 (Thread 0x2b1675708200 (LWP 72129)):
> #0  0x0000003413a0f203 in wait () from /lib64/libpthread.so.0
> #1  0x00002b1674ee100d in ?? () from /usr/lib64/libgfortran.so.3
> #2  0x00002b1674ee282e in ?? () from /usr/lib64/libgfortran.so.3
> #3  0x00002b1674ee2a81 in _gfortran_generate_error () from
> /usr/lib64/libgfortran.so.3
> #4  0x00002b1674f878a5 in ?? () from /usr/lib64/libgfortran.so.3
> #5  0x00002b1674f87b53 in _gfortran_st_open () from
> /usr/lib64/libgfortran.so.3
> #6  0x0000000000432446 in readat_ ()
> #7  0x00000000004030b5 in nek_init_ ()
> #8  0x0000000000402974 in main ()
> -------------------------------------------------------
> Primary job  terminated normally, but 1 process returned
> a non-zero exit code.. Per user-direction, the job has been aborted.
> -------------------------------------------------------
> --------------------------------------------------------------------------
> mpirun detected that one or more processes exited with non-zero status,
> thus causing
> the job to be terminated. The first process to do so was:
>
>   Process name: [[17614,1],0]
>   Exit code:    2
> --------------------------------------------------------------------------
>
> My question is, are these related to the software and openmpi (how i
> suppose) or are they related to my setup?
> Thanks to everyone.
> _______________________________________________
> Nek5000-users mailing list
> Nek5000-users at lists.mcs.anl.gov
> https://lists.mcs.anl.gov/mailman/listinfo/nek5000-users
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.mcs.anl.gov/pipermail/nek5000-users/attachments/20151205/b0805207/attachment.html>


More information about the Nek5000-users mailing list