[Nek5000-users] help me!
nek5000-users at lists.mcs.anl.gov
nek5000-users at lists.mcs.anl.gov
Sat Dec 5 04:40:41 CST 2015
Hi Neks,
i am still trying to run my first simulation, everytime, something is
wrong so it becoming stressful.
In my last run, into the output i have this errors:
WARNING: It appears that your OpenFabrics subsystem is configured to only
allow registering part of your physical memory. This can cause MPI jobs to
run with erratic performance, hang, and/or crash.
This may be caused by your OpenFabrics vendor limiting the amount of
physical memory that can be registered. You should investigate the
relevant Linux kernel module parameters that control how much physical
memory can be registered, and increase them to allow registering all
physical memory on your machine.
See this Open MPI FAQ item for more information on these Linux kernel module
parameters:
http://www.open-mpi.org/faq/?category=openfabrics#ib-locked-pages
Local host: node9
Registerable memory: 32768 MiB
Total memory: 65457 MiB
Your MPI job will continue, but may be behave poorly and/or hang.
--------------------------------------------------------------------------
/----------------------------------------------------------\\
| _ __ ______ __ __ ______ ____ ____ ____ |
| / | / // ____// //_/ / ____/ / __ \\ / __ \\ / __ \\ |
| / |/ // __/ / ,< /___ \\ / / / // / / // / / / |
| / /| // /___ / /| | ____/ / / /_/ // /_/ // /_/ / |
| /_/ |_//_____//_/ |_|/_____/ \\____/ \\____/ \\____/ |
| |
|----------------------------------------------------------|
| |
| NEK5000: Open Source Spectral Element Solver |
| COPYRIGHT (c) 2008-2010 UCHICAGO ARGONNE, LLC |
| Version: 1.0rc1 / SVN r1050 |
| Web: http://nek5000.mcs.anl.gov |
| |
\\----------------------------------------------------------/
Number of processors: 64
REAL wdsize : 8
INTEGER wdsize : 4
Beginning session:
/datastor/eng/gualtieri/SIMULATIONS_NEK/RIB_3D_01/curve.rea
timer accuracy: 0.0000000E+00 sec
read .rea file
--------------------------------------------------------------------------
An MPI process has executed an operation involving a call to the
"fork()" system call to create a child process. Open MPI is currently
operating in a condition that could result in memory corruption or
other system errors; your MPI job may hang, crash, or produce silent
data corruption. The use of fork() (or system() or other calls that
create child processes) is strongly discouraged.
The process that invoked fork was:
Local host: node9 (PID 72129)
MPI_COMM_WORLD rank: 0
If you are *absolutely sure* that your application will successfully
and correctly survive a call to fork(), you may disable this warning
by setting the mpi_warn_on_fork MCA parameter to 0.
--------------------------------------------------------------------------
Thread 4 (Thread 0x2b167717b700 (LWP 72146)):
#0 0x00000034136df343 in poll () from /lib64/libc.so.6
#1 0x00002b16754a2d46 in poll_dispatch () from
/opt/shared/openmpi-1.8.1_gnu/lib/libopen-pal.so.6
#2 0x00002b167549a7db in opal_libevent2021_event_base_loop () from
/opt/shared/openmpi-1.8.1_gnu/lib/libopen-pal.so.6
#3 0x00002b16751f830e in orte_progress_thread_engine () from
/opt/shared/openmpi-1.8.1_gnu/lib/libopen-rte.so.7
#4 0x0000003413a079d1 in start_thread () from /lib64/libpthread.so.0
#5 0x00000034136e8b6d in clone () from /lib64/libc.so.6
Thread 3 (Thread 0x2b167af9e700 (LWP 72162)):
#0 0x00000034136e15e3 in select () from /lib64/libc.so.6
#1 0x00002b16783e31c3 in service_thread_start () from
/opt/shared/openmpi-1.8.1_gnu/lib/openmpi/mca_btl_openib.so
#2 0x0000003413a079d1 in start_thread () from /lib64/libpthread.so.0
#3 0x00000034136e8b6d in clone () from /lib64/libc.so.6
Thread 2 (Thread 0x2b1683560700 (LWP 72190)):
#0 0x00000034136df343 in poll () from /lib64/libc.so.6
#1 0x00002b16783e1902 in btl_openib_async_thread () from
/opt/shared/openmpi-1.8.1_gnu/lib/openmpi/mca_btl_openib.so
#2 0x0000003413a079d1 in start_thread () from /lib64/libpthread.so.0
#3 0x00000034136e8b6d in clone () from /lib64/libc.so.6
Thread 1 (Thread 0x2b1675708200 (LWP 72129)):
#0 0x0000003413a0f203 in wait () from /lib64/libpthread.so.0
#1 0x00002b1674ee100d in ?? () from /usr/lib64/libgfortran.so.3
#2 0x00002b1674ee282e in ?? () from /usr/lib64/libgfortran.so.3
#3 0x00002b1674ee2a81 in _gfortran_generate_error () from
/usr/lib64/libgfortran.so.3
#4 0x00002b1674f878a5 in ?? () from /usr/lib64/libgfortran.so.3
#5 0x00002b1674f87b53 in _gfortran_st_open () from
/usr/lib64/libgfortran.so.3
#6 0x0000000000432446 in readat_ ()
#7 0x00000000004030b5 in nek_init_ ()
#8 0x0000000000402974 in main ()
-------------------------------------------------------
Primary job terminated normally, but 1 process returned
a non-zero exit code.. Per user-direction, the job has been aborted.
-------------------------------------------------------
--------------------------------------------------------------------------
mpirun detected that one or more processes exited with non-zero status,
thus causing
the job to be terminated. The first process to do so was:
Process name: [[17614,1],0]
Exit code: 2
--------------------------------------------------------------------------
My question is, are these related to the software and openmpi (how i
suppose) or are they related to my setup?
Thanks to everyone.
More information about the Nek5000-users
mailing list