[mpich-discuss] A problem of carrying out mpich2

Dave Goodell goodell at mcs.anl.gov
Wed Jun 16 11:40:07 CDT 2010


Are you saying that redirecting standard input is not eliminating the warning about redirecting standard input?

As for your crash, if you aren't getting a useful core dump, then you should pursue support from the VASP folks.

-Dave

On Jun 15, 2010, at 7:24 PM PDT, Dr. Qian-Lin Tang wrote:

> Hi, Yuriy and Dave,
> 
> I use the command of "ulimit -c unlimited" beforhand to eliminate memory stacking. The same error message pops up when performing parallel version of VASP. How to resolve my problem? Thanks.
> 
> Best regards,
> 
> Qian-Lin                                                                       
> 
> ======= 2010-06-16 02:21:32 您在来信中写道:=======                      
> 
>> Signal 9 is SIGKILL, and in the context of MPI (or at least MPICH2+mpd) usually means that the process manager killed the application because of some other problem.  The  problem is often another process crashing, such as from a signal 11 (SIGSEGV).  Unfortunately, the other failing process does not always show up in the output.  If you enable  core dumps (usually "ulimit -c unlimited") then sometimes you'll get a core dump for the offending process.
>> 
>> Our new process manager, hydra, does a better job with the error reporting and process cleanup.  However you need to use a fairly modern version of MPICH2 in order to use hydra: http://wiki.mcs.anl.gov/mpich2/index.php/Using_the_Hydra_Process_Manager
>> 
>> -Dave
>> 
>> On Jun 15, 2010, at 10:45 AM PDT, Yuriy Khalak wrote:
>> e
>>> Dear Qianlin,
>>> 
>>> Signal 9, which seems to be causing the crash, usually indicates a segmentation fault in the code bing run by mpiexec. I had something similar happen to me recently under c++ and mpich. Turned out I was trying to delete the contents of a pointer twice.
>>> 
>>> In my case Linux did a core dump, which I could trace with gdb and determine approximately where the segmentation fault occurred. So if you can find a core dump in your program's working directory, the problem is probably in the VASP code, not in mpich2.
>>> 
>>> Keep in mind that I'm just a user of mpich2 and very well could be wrong.
>>> 
>>> Regards,
>>>             Yuriy
>>> 
>>>> Dear mpich2-support,
>>>> 
>>>> Based on mpif90, I have installed the parallel version of a commercilal code VASP . Sometimes mpich2 can work well with vasp, but it also failed for some VASP-treated jobs with the following error messages:
>>>> ------------------------------
>>> -----
>>>> running on    8 nodes
>>>> distr:  one band on    1 nodes,    8 groups
>>>> vasp.4.6.21  23Feb03 complex
>>>> POSCAR found :  3 types and   30 ions
>>>> LDA part: xc-table for Ceperly-Alder, Vosko type interpolation para-ferro
>>>> POSCAR, INCAR and KPOINTS ok, starting setup
>>>> WARNING: wrap around errors must be expected
>>>> mpiexec_qltang1 (handle_stdin_input 1089): stdin problem; if pgm is run in background, redirect from /dev/null
>>>> mpiexec_qltang1 (handle_stdin_input 1090):     e.g.: mpiexec -n 4 a.out < /dev/null &
>>>> mpiexec_qltang1 (handle_stdin_input 1089): stdin problem; if pgm is run in background, redirect from /dev/null
>>>> mpiexec_qltang1 (handle_stdin_input 1090):     e.g.: mpiexec -n 4 a.out < /dev/null &
>>>> mpiexec_qltang1 (handle_stdin_input 1089): stdin problem; if pgm is run in background, redirect from /dev/null
>>>> mpiexec_qltang1 (handle_stdin_input 1090):     e.g.: mpiexec -n 4 a.out < /dev/null &
>>>> mpiexec_qltang1 (handle_stdin_input 1089): stdin problem; if pgm is run in background, redirect from /dev/null
>>>> mpiexec_qltang1 (handle_stdin_input 1090):     e.g.: mpiexec -n 4 a.out < /dev/null &
>>>> mpiexec_qltang1 (handle_stdin_input 1089): stdin problem; if pgm is run in background, redirect from /dev/null
>>>> mpiexec_qltang1 (handle_stdin_input 1090):     e.g.: mpiexec -n 4 a.out < /dev/null &
>>>> mpiexec_qltang1 (handle_stdin_input 1089): stdin problem; if pgm is run in background, redirect from /dev/null
>>>> mpiexec_qltang1 (handle_stdin_input 1090):     e.g.: mpiexec -n 4 a.out < /dev/null &
>>>> mpiexec_qltang1 (handle_stdin_input 1089): stdin problem; if pgm is run in background, redirect from /dev/null
>>>> mpiexec_qltang1 (handle_stdin_input 1090):     e.g.: mpiexec -n 4 a.out < /dev/null &
>>>> FFT: planning ...            2
>>>> reading WAVECAR
>>>> mpiexec_qltang1 (handle_stdin_input 1089): stdin problem; if pgm is run in background, redirect from /dev/null
>>>> mpiexec_qltang1 (handle_stdin_input 1090):     e.g.: mpiexec -n 4 a.out < /dev/null &
>>>> mpiexec_qltang1 (handle_stdin_input 1089): stdin problem; if pgm is run in background, redirect from /dev/null
>>>> mpiexec_qltang1 (handle_stdin_input 1090):     e.g.: mpiexec -n 4 a.out < /dev/null &
>>>> mpiexec_qltang1 (handle_stdin_input 1089): stdin problem; if pgm is run in background, redirect from /dev/null
>>>> mpiexec_qltang1 (handle_stdin_input 1090):     e.g.: mpiexec -n 4 a.out < /dev/null &
>>>> mpiexec_qltang1 (handle_stdin_input 1089): stdin problem; if pgm is run in background, redirect from /dev/null
>>>> mpiexec_qltang1 (handle_stdin_input 1090):     e.g.: mpiexec -n 4 a.out < /dev/null &
>>>> mpiexec_qltang1 (handle_stdin_input 1089): stdin problem; if pgm is run in background, redirect from /dev/null
>>>> mpiexec_qltang1 (handle_stdin_input 1090):     e.g.: mpiexec -n 4 a.out < /dev/null &
>>>> mpiexec_qltang1 (handle_stdin_input 1089): stdin problem; if pgm is run in background, redirect from /dev/null
>>>> mpiexec_qltang1 (handle_stdin_input 1090):     e.g.: mpiexec -n 4 a.out < /dev/null &
>>>> mpiexec_qltang1 (handle_stdin_input 1089): stdin problem; if pgm is run in background, redirect from /dev/null
>>>> mpiexec_qltang1 (handle_stdin_input 1090):     e.g.: mpiexec -n 4 a.out < /dev/null &
>>>> mpiexec_qltang1 (handle_stdin_input 1089): stdin problem; if pgm is run in background, redirect from /dev/null
>>>> mpiexec_qltang1 (handle_stdin_input 1090):     e.g.: mpiexec -n 4 a.out < /dev/null &
>>>> mpiexec_qltang1 (handle_stdin_input 1089): stdin problem; if pgm is run in background, redirect from /dev/null
>>>> mpiexec_qltang1 (handle_stdin_input 1090):     e.g.: mpiexec -n 4 a.out < /dev/null &
>>>> mpiexec_qltang1 (handle_stdin_input 1089): stdin problem; if pgm is run in background, redirect from /dev/null
>>>> mpiexec_qltang1 (handle_stdin_input 1090):     e.g.: mpiexec -n 4 a.out < /dev/null &
>>>> mpiexec_qltang1 (handle_stdin_input 1089): stdin problem; if pgm is run in background, redirect from /dev/null
>>>> mpiexec_qltang1 (handle_stdin_input 1090):     e.g.: mpiexec -n 4 a.out < /dev/null &
>>>> mpiexec_qltang1 (handle_stdin_input 1089): stdin problem; if pgm is run in background, redirect from /dev/null
>>>> mpiexec_qltang1 (handle_stdin_input 1090):     e.g.: mpiexec -n 4 a.out < /dev/null &
>>>> mpiexec_qltang1 (handle_stdin_input 1089): stdin problem; if pgm is run in background, redirect from /dev/null
>>>> mpiexec_qltang1 (handle_stdin_input 1090):     e.g.: mpiexec -n 4 a.out < /dev/null &
>>>> mpiexec_qltang1 (handle_stdin_input 1089): stdin problem; if pgm is run in background, redirect from /dev/null
>>>> mpiexec_qltang1 (handle_stdin_input 1090):     e.g.: mpiexec -n 4 a.out < /dev/null &
>>>> mpiexec_qltang1 (handle_stdin_input 1089): stdin problem; if pgm is run in background, redirect from /dev/null
>>>> mpiexec_qltang1 (handle_stdin_input 1090):     e.g.: mpiexec -n 4 a.out < /dev/null &
>>>> mpiexec_qltang1 (handle_stdin_input 1089): stdin problem; if pgm is run in background, redirect from /dev/null
>>>> mpiexec_qltang1 (handle_stdin_input 1090):     e.g.: mpiexec -n 4 a.out < /dev/null &
>>>> mpiexec_qltang1 (handle_stdin_input 1089): stdin problem; if pgm is run in background, redirect from /dev/null
>>>> mpiexec_qltang1 (handle_stdin_input 1090):     e.g.: mpiexec -n 4 a.out < /dev/null &
>>>> mpiexec_qltang1 (handle_stdin_input 1089): stdin problem; if pgm is run in background, redirect from /dev/null
>>>> mpiexec_qltang1 (handle_stdin_input 1090):     e.g.: mpiexec -n 4 a.out < /dev/null &
>>>> mpiexec_qltang1 (handle_stdin_input 1089): stdin problem; if pgm is run in background, redirect from /dev/null
>>>> mpiexec_qltang1 (handle_stdin_input 1090):     e.g.: mpiexec -n 4 a.out < /dev/null &
>>>> mpiexec_qltang1 (handle_stdin_input 1089): stdin problem; if pgm is run in background, redirect from /dev/null
>>>> mpiexec_qltang1 (handle_stdin_input 1090):     e.g.: mpiexec -n 4 a.out < /dev/null &
>>>> mpiexec_qltang1 (handle_stdin_input 1089): stdin problem; if pgm is run in background, redirect from /dev/null
>>>> mpiexec_qltang1 (handle_stdin_input 1090):     e.g.: mpiexec -n 4 a.out < /dev/null &
>>>> WARNING: random wavefunctions but no delay for mixing, default for NELMDL
>>>> entering main loop
>>>>      N       E                     dE             d eps       ncg     rms          rms(c)
>>>> rank 6 in job 36  qltang1_54199   caused collective abort of all ranks
>>>> exit status of rank 6: killed by signal 9
>>>> rank 3 in job 36  qltang1_54199   caused collective abort of all ranks
>>>> exit status of rank 3: killed by signal 9
>>>> -----------------------------
>>>> I want to know how to solve the above question. Thank you alot.
>>>> 
>>>> Best regards,
>>>> Qianlin
>>> _______________________________________________
>>> mpich-discuss mailing list
>>> mpich-discuss at mcs.anl.gov
>>> https://lists.mcs.anl.gov/mailman/listinfo/mpich-discuss
>> 
>> _______________________________________________
>> mpich-discuss mailing list
>> mpich-discuss at mcs.anl.gov
>> https://lists.mcs.anl.gov/mailman/listinfo/mpich-discuss
> 
> = = = = = = = = = = = = = = = = = = = =
> 			
> 
>         致
> 礼!
> 
> 				 
>         Dr. Qian-Lin Tang
>         qltang at xidian.edu.cn
>           2010-06-16
> 
> _______________________________________________
> mpich-discuss mailing list
> mpich-discuss at mcs.anl.gov
> https://lists.mcs.anl.gov/mailman/listinfo/mpich-discuss



More information about the mpich-discuss mailing list