<p dir="ltr">Would you mind sharing what kind of setup you have? How is your OpenMPI configured?</p>
<div class="gmail_quote">On Dec 5, 2015 5:41 AM, <<a href="mailto:nek5000-users@lists.mcs.anl.gov">nek5000-users@lists.mcs.anl.gov</a>> wrote:<br type="attribution"><blockquote class="gmail_quote" style="margin:0 0 0 .8ex;border-left:1px #ccc solid;padding-left:1ex">Hi Neks,<br>
i am still trying to run my first simulation, everytime, something is wrong so it becoming stressful.<br>
In my last run, into the output i have this errors:<br>
<br>
<br>
<br>
WARNING: It appears that your OpenFabrics subsystem is configured to only<br>
allow registering part of your physical memory. This can cause MPI jobs to<br>
run with erratic performance, hang, and/or crash.<br>
<br>
This may be caused by your OpenFabrics vendor limiting the amount of<br>
physical memory that can be registered. You should investigate the<br>
relevant Linux kernel module parameters that control how much physical<br>
memory can be registered, and increase them to allow registering all<br>
physical memory on your machine.<br>
<br>
See this Open MPI FAQ item for more information on these Linux kernel module<br>
parameters:<br>
<br>
<a href="http://www.open-mpi.org/faq/?category=openfabrics#ib-locked-pages" rel="noreferrer" target="_blank">http://www.open-mpi.org/faq/?category=openfabrics#ib-locked-pages</a><br>
<br>
Local host: node9<br>
Registerable memory: 32768 MiB<br>
Total memory: 65457 MiB<br>
<br>
Your MPI job will continue, but may be behave poorly and/or hang.<br>
--------------------------------------------------------------------------<br>
/----------------------------------------------------------\\<br>
| _ __ ______ __ __ ______ ____ ____ ____ |<br>
| / | / // ____// //_/ / ____/ / __ \\ / __ \\ / __ \\ |<br>
| / |/ // __/ / ,< /___ \\ / / / // / / // / / / |<br>
| / /| // /___ / /| | ____/ / / /_/ // /_/ // /_/ / |<br>
| /_/ |_//_____//_/ |_|/_____/ \\____/ \\____/ \\____/ |<br>
| |<br>
|----------------------------------------------------------|<br>
| |<br>
| NEK5000: Open Source Spectral Element Solver |<br>
| COPYRIGHT (c) 2008-2010 UCHICAGO ARGONNE, LLC |<br>
| Version: 1.0rc1 / SVN r1050 |<br>
| Web: <a href="http://nek5000.mcs.anl.gov" rel="noreferrer" target="_blank">http://nek5000.mcs.anl.gov</a> |<br>
| |<br>
\\----------------------------------------------------------/<br>
<br>
<br>
Number of processors: 64<br>
REAL wdsize : 8<br>
INTEGER wdsize : 4<br>
<br>
<br>
Beginning session:<br>
/datastor/eng/gualtieri/SIMULATIONS_NEK/RIB_3D_01/curve.rea<br>
<br>
<br>
timer accuracy: 0.0000000E+00 sec<br>
<br>
read .rea file<br>
--------------------------------------------------------------------------<br>
An MPI process has executed an operation involving a call to the<br>
"fork()" system call to create a child process. Open MPI is currently<br>
operating in a condition that could result in memory corruption or<br>
other system errors; your MPI job may hang, crash, or produce silent<br>
data corruption. The use of fork() (or system() or other calls that<br>
create child processes) is strongly discouraged.<br>
<br>
The process that invoked fork was:<br>
<br>
Local host: node9 (PID 72129)<br>
MPI_COMM_WORLD rank: 0<br>
<br>
If you are *absolutely sure* that your application will successfully<br>
and correctly survive a call to fork(), you may disable this warning<br>
by setting the mpi_warn_on_fork MCA parameter to 0.<br>
--------------------------------------------------------------------------<br>
Thread 4 (Thread 0x2b167717b700 (LWP 72146)):<br>
#0 0x00000034136df343 in poll () from /lib64/libc.so.6<br>
#1 0x00002b16754a2d46 in poll_dispatch () from /opt/shared/openmpi-1.8.1_gnu/lib/libopen-pal.so.6<br>
#2 0x00002b167549a7db in opal_libevent2021_event_base_loop () from /opt/shared/openmpi-1.8.1_gnu/lib/libopen-pal.so.6<br>
#3 0x00002b16751f830e in orte_progress_thread_engine () from /opt/shared/openmpi-1.8.1_gnu/lib/libopen-rte.so.7<br>
#4 0x0000003413a079d1 in start_thread () from /lib64/libpthread.so.0<br>
#5 0x00000034136e8b6d in clone () from /lib64/libc.so.6<br>
Thread 3 (Thread 0x2b167af9e700 (LWP 72162)):<br>
#0 0x00000034136e15e3 in select () from /lib64/libc.so.6<br>
#1 0x00002b16783e31c3 in service_thread_start () from /opt/shared/openmpi-1.8.1_gnu/lib/openmpi/mca_btl_openib.so<br>
#2 0x0000003413a079d1 in start_thread () from /lib64/libpthread.so.0<br>
#3 0x00000034136e8b6d in clone () from /lib64/libc.so.6<br>
Thread 2 (Thread 0x2b1683560700 (LWP 72190)):<br>
#0 0x00000034136df343 in poll () from /lib64/libc.so.6<br>
#1 0x00002b16783e1902 in btl_openib_async_thread () from /opt/shared/openmpi-1.8.1_gnu/lib/openmpi/mca_btl_openib.so<br>
#2 0x0000003413a079d1 in start_thread () from /lib64/libpthread.so.0<br>
#3 0x00000034136e8b6d in clone () from /lib64/libc.so.6<br>
Thread 1 (Thread 0x2b1675708200 (LWP 72129)):<br>
#0 0x0000003413a0f203 in wait () from /lib64/libpthread.so.0<br>
#1 0x00002b1674ee100d in ?? () from /usr/lib64/libgfortran.so.3<br>
#2 0x00002b1674ee282e in ?? () from /usr/lib64/libgfortran.so.3<br>
#3 0x00002b1674ee2a81 in _gfortran_generate_error () from /usr/lib64/libgfortran.so.3<br>
#4 0x00002b1674f878a5 in ?? () from /usr/lib64/libgfortran.so.3<br>
#5 0x00002b1674f87b53 in _gfortran_st_open () from /usr/lib64/libgfortran.so.3<br>
#6 0x0000000000432446 in readat_ ()<br>
#7 0x00000000004030b5 in nek_init_ ()<br>
#8 0x0000000000402974 in main ()<br>
-------------------------------------------------------<br>
Primary job terminated normally, but 1 process returned<br>
a non-zero exit code.. Per user-direction, the job has been aborted.<br>
-------------------------------------------------------<br>
--------------------------------------------------------------------------<br>
mpirun detected that one or more processes exited with non-zero status, thus causing<br>
the job to be terminated. The first process to do so was:<br>
<br>
Process name: [[17614,1],0]<br>
Exit code: 2<br>
--------------------------------------------------------------------------<br>
<br>
My question is, are these related to the software and openmpi (how i suppose) or are they related to my setup?<br>
Thanks to everyone.<br>
_______________________________________________<br>
Nek5000-users mailing list<br>
<a href="mailto:Nek5000-users@lists.mcs.anl.gov" target="_blank">Nek5000-users@lists.mcs.anl.gov</a><br>
<a href="https://lists.mcs.anl.gov/mailman/listinfo/nek5000-users" rel="noreferrer" target="_blank">https://lists.mcs.anl.gov/mailman/listinfo/nek5000-users</a><br>
</blockquote></div>