[mpich-discuss] A problem of carrying out mpich2

Dr. Qian-Lin Tang qltang at xidian.edu.cn
Thu Jun 17 02:30:55 CDT 2010


Hi, Dave,

I will do what you suggested for me. Thanks:-)

Best regards,

Qian-Lin

======= 2010-06-17 11:16:31 您在来信中写道:=======

>I'm not sure what's going on with the redirection issue.  Please try the hydra process manager instead: http://wiki.mcs.anl.gov/mpich2/index.php/Using_the_Hydra_Process_Manager
>
>"ulimit -c unlimited" is the usual means to enable core dumps.  If that's not working for you, then either your VASP program isn't dumping core or core dumps must be enabled some other way.  You'll have to google for the appropriate way to enable them on your platform.
>
>We really don't know anything about VASP, I would recommend contacting the VASP group for support for VASP crashes: vasp.materialphysik at univie.ac.at
>
>-Dave
>
>On Jun 16, 2010, at 7:20 PM PDT, Dr. Qian-Lin Tang wrote:
>
>> Hi, Dave,
>> 
>> Yes, redirecting standard input is not eliminating the warning about redirecting standard input. For some VASP jobs, mpich2 can work well, while for other jobs mpich2 sometimes breaks down. I don't know how to get a useful core dump. Please give me a hint. Thanks alot.
>> 
>> Best regards,
>> 
>> Qianlin
>> 
>> 	======= 2010-06-17 00:40:07 您在来信中 道:=======
>> 
>>> Are you saying that redirecting standard input is not eliminating the warning about redirecting standard input?
>>> 
>>> As for your crash, if you aren't getting a useful core dump, then you should pursue support from the VASP folks.
>> l>
>>> -Dave
>>> 
>>> On Jun 15, 2010, at 7:24 PM PDT, Dr. Qian-Lin Tang wrote:
>>> 
>>>> Hi, Yuriy and Dave,
>>>> 
>>>> I use the command of "ulimit -c unlimited" beforhand to eliminate memory stacking. The same error message pops up when performing parallel version of VASP. How to resolve my problem? Thanks.
>>>> 
>>>> Best regards,
>>>> 
>>>> Qian-Lin                                                                       
>>>> 
>>>> ======= 2010-06-16 02:21:32 您在来信中写道:=======                      
>>>> 
>>>>> Signal 9 is SIGKILL, and in the context of MPI (or at least MPICH2+mpd) usually means that the process manager killed the application because of some other problem.  The  problem is often another process crashing, such as from a signal 11 (SIGSEGV).  Unfortunately, the other failing process does not always show up in the output.  If you enable  core dumps (usually "ulimit -c unlimited") then sometimes you'll get a core dump for the offending process.
>>>>> 
>>>>> Our new process manager, hydra, does a better job with the error reporting and process cleanup.  However you need to use a fairly modern version of MPICH2 in order to use hydra: http://wiki.mcs.anl.gov/mpich2/index.php/Using_the_Hydra_Process_Manager
>>>>> 
>>>>> -Dave
>>>>> 
>>>>> On Jun 15, 2010, at 10:45 AM PDT, Yuriy Khalak wrote:
>>>>> e
>>>>>> Dear Qianlin,
>>>>>> 
>>>>>> Signal 9, which seems to be causing the crash, usually indicates a segmentation fault in the code bing run by mpiexec. I had something similar happen to me recently under c++ and mpich. Turned out I was trying to delete the contents of a pointer twice.
>>>>>> 
>>>>>> In my case Linux did a core dump, which I could trace with gdb and determine approximately where the segmentation fault occurred. So if you can find a core dump in your program's working directory, the problem is probably in the VASP code, not in mpich2.
>>>>>> 
>>>>>> Keep in mind that I'm just a user of mpich2 and very well could be wrong.
>>>>>> 
>>>>>> Regards,
>>>>>>            Yuriy
>>>>>> 
>>>>>>> Dear mpich2-support,
>>>>>>> 
>>>>>>> Based on mpif90, I have installed the parallel version of a commercilal code VASP . Sometimes mpich2 can work well with vasp, but it also failed for some VASP-treated jobs with the following error messages:
>>>>>>> ------------------------------
>>>>>> -----
>>>>>>> running on    8 nodes
>>>>>>> distr:  one band on    1 nodes,    8 groups
>>>>>>> vasp.4.6.21  23Feb03 complex
>>>>>>> POSCAR found :  3 types and   30 ions
>>>>>>> LDA part: xc-table for Ceperly-Alder, Vosko type interpolation para-ferro
>>>>>>> POSCAR, INCAR and KPOINTS ok, starting setup
>>>>>>> WARNING: wrap around errors must be expected
>>>>>>> mpiexec_qltang1 (handle_stdin_input 1089): stdin problem; if pgm is run in background, redirect from /dev/null
>>>>>>> mpiexec_qltang1 (handle_stdin_input 1090):     e.g.: mpiexec -n 4 a.out < /dev/null &
>>>>>>> mpiexec_qltang1 (handle_stdin_input 1089): stdin problem; if pgm is run in background, redirect from /dev/null
>>>>>>> mpiexec_qltang1 (handle_stdin_input 1090):     e.g.: mpiexec -n 4 a.out < /dev/null &
>>>>>>> mpiexec_qltang1 (handle_stdin_input 1089): stdin problem; if pgm is run in background, redirect from /dev/null
>>>>>>> mpiexec_qltang1 (handle_stdin_input 1090):     e.g.: mpiexec -n 4 a.out < /dev/null &
>>>>>>> mpiexec_qltang1 (handle_stdin_input 1089): stdin problem; if pgm is run in background, redirect from /dev/null
>>>>>>> mpiexec_qltang1 (handle_stdin_input 1090):     e.g.: mpiexec -n 4 a.out < /dev/null &
>>>>>>> mpiexec_qltang1 (handle_stdin_input 1089): stdin problem; if pgm is run in background, redirect from /dev/null
>>>>>>> mpiexec_qltang1 (handle_stdin_input 1090):     e.g.: mpiexec -n 4 a.out < /dev/null &
>>>>>>> mpiexec_qltang1 (handle_stdin_input 1089): stdin problem; if pgm is run in background, redirect from /dev/null
>>>>>>> mpiexec_qltang1 (handle_stdin_input 1090):     e.g.: mpiexec -n 4 a.out < /dev/null &
>>>>>>> mpiexec_qltang1 (handle_stdin_input 1089): stdin problem; if pgm is run in background, redirect from /dev/null
>>>>>>> mpiexec_qltang1 (handle_stdin_input 1090):     e.g.: mpiexec -n 4 a.out < /dev/null &
>>>>>>> FFT: planning ...            2
>>>>>>> reading WAVECAR
>>>>>>> mpiexec_qltang1 (handle_stdin_input 1089): stdin problem; if pgm is run in background, redirect from /dev/null
>>>>>>> mpiexec_qltang1 (handle_stdin_input 1090):     e.g.: mpiexec -n 4 a.out < /dev/null &
>>>>>>> mpiexec_qltang1 (handle_stdin_input 1089): stdin problem; if pgm is run in background, redirect from /dev/null
>>>>>>> mpiexec_qltang1 (handle_stdin_input 1090):     e.g.: mpiexec -n 4 a.out < /dev/null &
>>>>>>> mpiexec_qltang1 (handle_stdin_input 1089): stdin problem; if pgm is run in background, redirect from /dev/null
>>>>>>> mpiexec_qltang1 (handle_stdin_input 1090):     e.g.: mpiexec -n 4 a.out < /dev/null &
>>>>>>> mpiexec_qltang1 (handle_stdin_input 1089): stdin problem; if pgm is run in background, redirect from /dev/null
>>>>>>> mpiexec_qltang1 (handle_stdin_input 1090):     e.g.: mpiexec -n 4 a.out < /dev/null &
>>>>>>> mpiexec_qltang1 (handle_stdin_input 1089): stdin problem; if pgm is run in background, redirect from /dev/null
>>>>>>> mpiexec_qltang1 (handle_stdin_input 1090):     e.g.: mpiexec -n 4 a.out < /dev/null &
>>>>>>> mpiexec_qltang1 (handle_stdin_input 1089): stdin problem; if pgm is run in background, redirect from /dev/null
>>>>>>> mpiexec_qltang1 (handle_stdin_input 1090):     e.g.: mpiexec -n 4 a.out < /dev/null &
>>>>>>> mpiexec_qltang1 (handle_stdin_input 1089): stdin problem; if pgm is run in background, redirect from /dev/null
>>>>>>> mpiexec_qltang1 (handle_stdin_input 1090):     e.g.: mpiexec -n 4 a.out < /dev/null &
>>>>>>> mpiexec_qltang1 (handle_stdin_input 1089): stdin problem; if pgm is run in background, redirect from /dev/null
>>>>>>> mpiexec_qltang1 (handle_stdin_input 1090):     e.g.: mpiexec -n 4 a.out < /dev/null &
>>>>>>> mpiexec_qltang1 (handle_stdin_input 1089): stdin problem; if pgm is run in background, redirect from /dev/null
>>>>>>> mpiexec_qltang1 (handle_stdin_input 1090):     e.g.: mpiexec -n 4 a.out < /dev/null &
>>>>>>> mpiexec_qltang1 (handle_stdin_input 1089): stdin problem; if pgm is run in background, redirect from /dev/null
>>>>>>> mpiexec_qltang1 (handle_stdin_input 1090):     e.g.: mpiexec -n 4 a.out < /dev/null &
>>>>>>> mpiexec_qltang1 (handle_stdin_input 1089): stdin problem; if pgm is run in background, redirect from /dev/null
>>>>>>> mpiexec_qltang1 (handle_stdin_input 1090):     e.g.: mpiexec -n 4 a.out < /dev/null &
>>>>>>> mpiexec_qltang1 (handle_stdin_input 1089): stdin problem; if pgm is run in background, redirect from /dev/null
>>>>>>> mpiexec_qltang1 (handle_stdin_input 1090):     e.g.: mpiexec -n 4 a.out < /dev/null &
>>>>>>> mpiexec_qltang1 (handle_stdin_input 1089): stdin problem; if pgm is run in background, redirect from /dev/null
>>>>>>> mpiexec_qltang1 (handle_stdin_input 1090):     e.g.: mpiexec -n 4 a.out < /dev/null &
>>>>>>> mpiexec_qltang1 (handle_stdin_input 1089): stdin problem; if pgm is run in background, redirect from /dev/null
>>>>>>> mpiexec_qltang1 (handle_stdin_input 1090):     e.g.: mpiexec -n 4 a.out < /dev/null &
>>>>>>> mpiexec_qltang1 (handle_stdin_input 1089): stdin problem; if pgm is run in background, redirect from /dev/null
>>>>>>> mpiexec_qltang1 (handle_stdin_input 1090):     e.g.: mpiexec -n 4 a.out < /dev/null &
>>>>>>> mpiexec_qltang1 (handle_stdin_input 1089): stdin problem; if pgm is run in background, redirect from /dev/null
>>>>>>> mpiexec_qltang1 (handle_stdin_input 1090):     e.g.: mpiexec -n 4 a.out < /dev/null &
>>>>>>> mpiexec_qltang1 (handle_stdin_input 1089): stdin problem; if pgm is run in background, redirect from /dev/null
>>>>>>> mpiexec_qltang1 (handle_stdin_input 1090):     e.g.: mpiexec -n 4 a.out < /dev/null &
>>>>>>> mpiexec_qltang1 (handle_stdin_input 1089): stdin problem; if pgm is run in background, redirect from /dev/null
>>>>>>> mpiexec_qltang1 (handle_stdin_input 1090):     e.g.: mpiexec -n 4 a.out < /dev/null &
>>>>>>> mpiexec_qltang1 (handle_stdin_input 1089): stdin problem; if pgm is run in background, redirect from /dev/null
>>>>>>> mpiexec_qltang1 (handle_stdin_input 1090):     e.g.: mpiexec -n 4 a.out < /dev/null &
>>>>>>> mpiexec_qltang1 (handle_stdin_input 1089): stdin problem; if pgm is run in background, redirect from /dev/null
>>>>>>> mpiexec_qltang1 (handle_stdin_input 1090):     e.g.: mpiexec -n 4 a.out < /dev/null &
>>>>>>> mpiexec_qltang1 (handle_stdin_input 1089): stdin problem; if pgm is run in background, redirect from /dev/null
>>>>>>> mpiexec_qltang1 (handle_stdin_input 1090):     e.g.: mpiexec -n 4 a.out < /dev/null &
>>>>>>> WARNING: random wavefunctions but no delay for mixing, default for NELMDL
>>>>>>> entering main loop
>>>>>>>     N       E                     dE             d eps       ncg     rms          rms(c)
>>>>>>> rank 6 in job 36  qltang1_54199   caused collective abort of all ranks
>>>>>>> exit status of rank 6: killed by signal 9
>>>>>>> rank 3 in job 36  qltang1_54199   caused collective abort of all ranks
>>>>>>> exit status of rank 3: killed by signal 9
>>>>>>> -----------------------------
>>>>>>> I want to know how to solve the above question. Thank you alot.
>>>>>>> 
>>>>>>> Best regards,
>>>>>>> Qianlin
>>>>>> _______________________________________________
>>>>>> mpich-discuss mailing list
>>>>>> mpich-discuss at mcs.anl.gov
>>>>>> https://lists.mcs.anl.gov/mailman/listinfo/mpich-discuss
>>>>> 
>>>>> _______________________________________________
>>>>> mpich-discuss mailing list
>>>>> mpich-discuss at mcs.anl.gov
>>>>> https://lists.mcs.anl.gov/mailman/listinfo/mpich-discuss
>>>> 
>>>> = = = = = = = = = = = = = = = = = = = =
>>>> 			
>>>> 
>>>>         致
>>>> 礼!
>>>> 
>>>> 				 
>>>>         Dr. Qian-Lin Tang
>>>>         qltang at xidian.edu.cn
>>>>           2010-06-16
>>>> 
>>>> _______________________________________________
>>>> mpich-discuss mailing list
>>>> mpich-discuss at mcs.anl.gov
>>>> https://lists.mcs.anl.gov/mailman/listinfo/mpich-discuss
>>> 
>>> _______________________________________________
>>> mpich-discuss mailing list
>>> mpich-discuss at mcs.anl.gov
>>> https://lists.mcs.anl.gov/mailman/listinfo/mpich-discuss
>> 
>> = = = = = = = = = = = = = = = = = = = =
>> 			
>> 
>>         致
>> 礼!
>> 
>> 				 
>>         Dr. Qian-Lin Tang
>>         qltang at xidian.edu.cn
>>           2010-06-17
>> 
>> _______________________________________________
>> mpich-discuss mailing list
>> mpich-discuss at mcs.anl.gov
>> https://lists.mcs.anl.gov/mailman/listinfo/mpich-discuss
>
>_______________________________________________
>mpich-discuss mailing list
>mpich-discuss at mcs.anl.gov
>https://lists.mcs.anl.gov/mailman/listinfo/mpich-discuss

= = = = = = = = = = = = = = = = = = = =
			

        致
礼!
 
				 
        Dr. Qian-Lin Tang
        qltang at xidian.edu.cn
          2010-06-17



More information about the mpich-discuss mailing list