[Nek5000-users] help me!
nek5000-users at lists.mcs.anl.gov
nek5000-users at lists.mcs.anl.gov
Sat Dec 5 17:49:18 CST 2015
Would you mind sharing what kind of setup you have? How is your OpenMPI
configured?
On Dec 5, 2015 5:41 AM, <nek5000-users at lists.mcs.anl.gov> wrote:
> Hi Neks,
> i am still trying to run my first simulation, everytime, something is
> wrong so it becoming stressful.
> In my last run, into the output i have this errors:
>
>
>
> WARNING: It appears that your OpenFabrics subsystem is configured to only
> allow registering part of your physical memory. This can cause MPI jobs to
> run with erratic performance, hang, and/or crash.
>
> This may be caused by your OpenFabrics vendor limiting the amount of
> physical memory that can be registered. You should investigate the
> relevant Linux kernel module parameters that control how much physical
> memory can be registered, and increase them to allow registering all
> physical memory on your machine.
>
> See this Open MPI FAQ item for more information on these Linux kernel
> module
> parameters:
>
> http://www.open-mpi.org/faq/?category=openfabrics#ib-locked-pages
>
> Local host: node9
> Registerable memory: 32768 MiB
> Total memory: 65457 MiB
>
> Your MPI job will continue, but may be behave poorly and/or hang.
> --------------------------------------------------------------------------
> /----------------------------------------------------------\\
> | _ __ ______ __ __ ______ ____ ____ ____ |
> | / | / // ____// //_/ / ____/ / __ \\ / __ \\ / __ \\ |
> | / |/ // __/ / ,< /___ \\ / / / // / / // / / / |
> | / /| // /___ / /| | ____/ / / /_/ // /_/ // /_/ / |
> | /_/ |_//_____//_/ |_|/_____/ \\____/ \\____/ \\____/ |
> | |
> |----------------------------------------------------------|
> | |
> | NEK5000: Open Source Spectral Element Solver |
> | COPYRIGHT (c) 2008-2010 UCHICAGO ARGONNE, LLC |
> | Version: 1.0rc1 / SVN r1050 |
> | Web: http://nek5000.mcs.anl.gov |
> | |
> \\----------------------------------------------------------/
>
>
> Number of processors: 64
> REAL wdsize : 8
> INTEGER wdsize : 4
>
>
> Beginning session:
> /datastor/eng/gualtieri/SIMULATIONS_NEK/RIB_3D_01/curve.rea
>
>
> timer accuracy: 0.0000000E+00 sec
>
> read .rea file
> --------------------------------------------------------------------------
> An MPI process has executed an operation involving a call to the
> "fork()" system call to create a child process. Open MPI is currently
> operating in a condition that could result in memory corruption or
> other system errors; your MPI job may hang, crash, or produce silent
> data corruption. The use of fork() (or system() or other calls that
> create child processes) is strongly discouraged.
>
> The process that invoked fork was:
>
> Local host: node9 (PID 72129)
> MPI_COMM_WORLD rank: 0
>
> If you are *absolutely sure* that your application will successfully
> and correctly survive a call to fork(), you may disable this warning
> by setting the mpi_warn_on_fork MCA parameter to 0.
> --------------------------------------------------------------------------
> Thread 4 (Thread 0x2b167717b700 (LWP 72146)):
> #0 0x00000034136df343 in poll () from /lib64/libc.so.6
> #1 0x00002b16754a2d46 in poll_dispatch () from
> /opt/shared/openmpi-1.8.1_gnu/lib/libopen-pal.so.6
> #2 0x00002b167549a7db in opal_libevent2021_event_base_loop () from
> /opt/shared/openmpi-1.8.1_gnu/lib/libopen-pal.so.6
> #3 0x00002b16751f830e in orte_progress_thread_engine () from
> /opt/shared/openmpi-1.8.1_gnu/lib/libopen-rte.so.7
> #4 0x0000003413a079d1 in start_thread () from /lib64/libpthread.so.0
> #5 0x00000034136e8b6d in clone () from /lib64/libc.so.6
> Thread 3 (Thread 0x2b167af9e700 (LWP 72162)):
> #0 0x00000034136e15e3 in select () from /lib64/libc.so.6
> #1 0x00002b16783e31c3 in service_thread_start () from
> /opt/shared/openmpi-1.8.1_gnu/lib/openmpi/mca_btl_openib.so
> #2 0x0000003413a079d1 in start_thread () from /lib64/libpthread.so.0
> #3 0x00000034136e8b6d in clone () from /lib64/libc.so.6
> Thread 2 (Thread 0x2b1683560700 (LWP 72190)):
> #0 0x00000034136df343 in poll () from /lib64/libc.so.6
> #1 0x00002b16783e1902 in btl_openib_async_thread () from
> /opt/shared/openmpi-1.8.1_gnu/lib/openmpi/mca_btl_openib.so
> #2 0x0000003413a079d1 in start_thread () from /lib64/libpthread.so.0
> #3 0x00000034136e8b6d in clone () from /lib64/libc.so.6
> Thread 1 (Thread 0x2b1675708200 (LWP 72129)):
> #0 0x0000003413a0f203 in wait () from /lib64/libpthread.so.0
> #1 0x00002b1674ee100d in ?? () from /usr/lib64/libgfortran.so.3
> #2 0x00002b1674ee282e in ?? () from /usr/lib64/libgfortran.so.3
> #3 0x00002b1674ee2a81 in _gfortran_generate_error () from
> /usr/lib64/libgfortran.so.3
> #4 0x00002b1674f878a5 in ?? () from /usr/lib64/libgfortran.so.3
> #5 0x00002b1674f87b53 in _gfortran_st_open () from
> /usr/lib64/libgfortran.so.3
> #6 0x0000000000432446 in readat_ ()
> #7 0x00000000004030b5 in nek_init_ ()
> #8 0x0000000000402974 in main ()
> -------------------------------------------------------
> Primary job terminated normally, but 1 process returned
> a non-zero exit code.. Per user-direction, the job has been aborted.
> -------------------------------------------------------
> --------------------------------------------------------------------------
> mpirun detected that one or more processes exited with non-zero status,
> thus causing
> the job to be terminated. The first process to do so was:
>
> Process name: [[17614,1],0]
> Exit code: 2
> --------------------------------------------------------------------------
>
> My question is, are these related to the software and openmpi (how i
> suppose) or are they related to my setup?
> Thanks to everyone.
> _______________________________________________
> Nek5000-users mailing list
> Nek5000-users at lists.mcs.anl.gov
> https://lists.mcs.anl.gov/mailman/listinfo/nek5000-users
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.mcs.anl.gov/pipermail/nek5000-users/attachments/20151205/b0805207/attachment.html>
More information about the Nek5000-users
mailing list