[mpich-discuss] A problem of carrying out mpich2
Joe C
thejcbejcrew at gmail.com
Tue Jun 15 19:17:10 CDT 2010
Thanks for all the help, but I've decided not to connect the two computers.
Thanks though,
Bernie
On 6/15/10, Dave Goodell <goodell at mcs.anl.gov> wrote:
> Signal 9 is SIGKILL, and in the context of MPI (or at least MPICH2+mpd)
> usually means that the process manager killed the application because of
> some other problem. The other problem is often another process crashing,
> such as from a signal 11 (SIGSEGV). Unfortunately, the other failing
> process does not always show up in the output. If you enable core dumps
> (usually "ulimit -c unlimited") then sometimes you'll get a core dump for
> the offending process.
>
> Our new process manager, hydra, does a better job with the error reporting
> and process cleanup. However you need to use a fairly modern version of
> MPICH2 in order to use hydra:
> http://wiki.mcs.anl.gov/mpich2/index.php/Using_the_Hydra_Process_Manager
>
> -Dave
>
> On Jun 15, 2010, at 10:45 AM PDT, Yuriy Khalak wrote:
>
>> Dear Qianlin,
>>
>> Signal 9, which seems to be causing the crash, usually indicates a
>> segmentation fault in the code bing run by mpiexec. I had something
>> similar happen to me recently under c++ and mpich. Turned out I was trying
>> to delete the contents of a pointer twice.
>>
>> In my case Linux did a core dump, which I could trace with gdb and
>> determine approximately where the segmentation fault occurred. So if you
>> can find a core dump in your program's working directory, the problem is
>> probably in the VASP code, not in mpich2.
>>
>> Keep in mind that I'm just a user of mpich2 and very well could be wrong.
>>
>> Regards,
>> Yuriy
>>
>> > Dear mpich2-support,
>> >
>> > Based on mpif90, I have installed the parallel version of a commercilal
>> > code VASP . Sometimes mpich2 can work well with vasp, but it also failed
>> > for some VASP-treated jobs with the following error messages:
>> > ------------------------------
>> -----
>> > running on 8 nodes
>> > distr: one band on 1 nodes, 8 groups
>> > vasp.4.6.21 23Feb03 complex
>> > POSCAR found : 3 types and 30 ions
>> > LDA part: xc-table for Ceperly-Alder, Vosko type interpolation
>> > para-ferro
>> > POSCAR, INCAR and KPOINTS ok, starting setup
>> > WARNING: wrap around errors must be expected
>> > mpiexec_qltang1 (handle_stdin_input 1089): stdin problem; if pgm is run
>> > in background, redirect from /dev/null
>> > mpiexec_qltang1 (handle_stdin_input 1090): e.g.: mpiexec -n 4 a.out
>> > < /dev/null &
>> > mpiexec_qltang1 (handle_stdin_input 1089): stdin problem; if pgm is run
>> > in background, redirect from /dev/null
>> > mpiexec_qltang1 (handle_stdin_input 1090): e.g.: mpiexec -n 4 a.out
>> > < /dev/null &
>> > mpiexec_qltang1 (handle_stdin_input 1089): stdin problem; if pgm is run
>> > in background, redirect from /dev/null
>> > mpiexec_qltang1 (handle_stdin_input 1090): e.g.: mpiexec -n 4 a.out
>> > < /dev/null &
>> > mpiexec_qltang1 (handle_stdin_input 1089): stdin problem; if pgm is run
>> > in background, redirect from /dev/null
>> > mpiexec_qltang1 (handle_stdin_input 1090): e.g.: mpiexec -n 4 a.out
>> > < /dev/null &
>> > mpiexec_qltang1 (handle_stdin_input 1089): stdin problem; if pgm is run
>> > in background, redirect from /dev/null
>> > mpiexec_qltang1 (handle_stdin_input 1090): e.g.: mpiexec -n 4 a.out
>> > < /dev/null &
>> > mpiexec_qltang1 (handle_stdin_input 1089): stdin problem; if pgm is run
>> > in background, redirect from /dev/null
>> > mpiexec_qltang1 (handle_stdin_input 1090): e.g.: mpiexec -n 4 a.out
>> > < /dev/null &
>> > mpiexec_qltang1 (handle_stdin_input 1089): stdin problem; if pgm is run
>> > in background, redirect from /dev/null
>> > mpiexec_qltang1 (handle_stdin_input 1090): e.g.: mpiexec -n 4 a.out
>> > < /dev/null &
>> > FFT: planning ... 2
>> > reading WAVECAR
>> > mpiexec_qltang1 (handle_stdin_input 1089): stdin problem; if pgm is run
>> > in background, redirect from /dev/null
>> > mpiexec_qltang1 (handle_stdin_input 1090): e.g.: mpiexec -n 4 a.out
>> > < /dev/null &
>> > mpiexec_qltang1 (handle_stdin_input 1089): stdin problem; if pgm is run
>> > in background, redirect from /dev/null
>> > mpiexec_qltang1 (handle_stdin_input 1090): e.g.: mpiexec -n 4 a.out
>> > < /dev/null &
>> > mpiexec_qltang1 (handle_stdin_input 1089): stdin problem; if pgm is run
>> > in background, redirect from /dev/null
>> > mpiexec_qltang1 (handle_stdin_input 1090): e.g.: mpiexec -n 4 a.out
>> > < /dev/null &
>> > mpiexec_qltang1 (handle_stdin_input 1089): stdin problem; if pgm is run
>> > in background, redirect from /dev/null
>> > mpiexec_qltang1 (handle_stdin_input 1090): e.g.: mpiexec -n 4 a.out
>> > < /dev/null &
>> > mpiexec_qltang1 (handle_stdin_input 1089): stdin problem; if pgm is run
>> > in background, redirect from /dev/null
>> > mpiexec_qltang1 (handle_stdin_input 1090): e.g.: mpiexec -n 4 a.out
>> > < /dev/null &
>> > mpiexec_qltang1 (handle_stdin_input 1089): stdin problem; if pgm is run
>> > in background, redirect from /dev/null
>> > mpiexec_qltang1 (handle_stdin_input 1090): e.g.: mpiexec -n 4 a.out
>> > < /dev/null &
>> > mpiexec_qltang1 (handle_stdin_input 1089): stdin problem; if pgm is run
>> > in background, redirect from /dev/null
>> > mpiexec_qltang1 (handle_stdin_input 1090): e.g.: mpiexec -n 4 a.out
>> > < /dev/null &
>> > mpiexec_qltang1 (handle_stdin_input 1089): stdin problem; if pgm is run
>> > in background, redirect from /dev/null
>> > mpiexec_qltang1 (handle_stdin_input 1090): e.g.: mpiexec -n 4 a.out
>> > < /dev/null &
>> > mpiexec_qltang1 (handle_stdin_input 1089): stdin problem; if pgm is run
>> > in background, redirect from /dev/null
>> > mpiexec_qltang1 (handle_stdin_input 1090): e.g.: mpiexec -n 4 a.out
>> > < /dev/null &
>> > mpiexec_qltang1 (handle_stdin_input 1089): stdin problem; if pgm is run
>> > in background, redirect from /dev/null
>> > mpiexec_qltang1 (handle_stdin_input 1090): e.g.: mpiexec -n 4 a.out
>> > < /dev/null &
>> > mpiexec_qltang1 (handle_stdin_input 1089): stdin problem; if pgm is run
>> > in background, redirect from /dev/null
>> > mpiexec_qltang1 (handle_stdin_input 1090): e.g.: mpiexec -n 4 a.out
>> > < /dev/null &
>> > mpiexec_qltang1 (handle_stdin_input 1089): stdin problem; if pgm is run
>> > in background, redirect from /dev/null
>> > mpiexec_qltang1 (handle_stdin_input 1090): e.g.: mpiexec -n 4 a.out
>> > < /dev/null &
>> > mpiexec_qltang1 (handle_stdin_input 1089): stdin problem; if pgm is run
>> > in background, redirect from /dev/null
>> > mpiexec_qltang1 (handle_stdin_input 1090): e.g.: mpiexec -n 4 a.out
>> > < /dev/null &
>> > mpiexec_qltang1 (handle_stdin_input 1089): stdin problem; if pgm is run
>> > in background, redirect from /dev/null
>> > mpiexec_qltang1 (handle_stdin_input 1090): e.g.: mpiexec -n 4 a.out
>> > < /dev/null &
>> > mpiexec_qltang1 (handle_stdin_input 1089): stdin problem; if pgm is run
>> > in background, redirect from /dev/null
>> > mpiexec_qltang1 (handle_stdin_input 1090): e.g.: mpiexec -n 4 a.out
>> > < /dev/null &
>> > mpiexec_qltang1 (handle_stdin_input 1089): stdin problem; if pgm is run
>> > in background, redirect from /dev/null
>> > mpiexec_qltang1 (handle_stdin_input 1090): e.g.: mpiexec -n 4 a.out
>> > < /dev/null &
>> > mpiexec_qltang1 (handle_stdin_input 1089): stdin problem; if pgm is run
>> > in background, redirect from /dev/null
>> > mpiexec_qltang1 (handle_stdin_input 1090): e.g.: mpiexec -n 4 a.out
>> > < /dev/null &
>> > mpiexec_qltang1 (handle_stdin_input 1089): stdin problem; if pgm is run
>> > in background, redirect from /dev/null
>> > mpiexec_qltang1 (handle_stdin_input 1090): e.g.: mpiexec -n 4 a.out
>> > < /dev/null &
>> > mpiexec_qltang1 (handle_stdin_input 1089): stdin problem; if pgm is run
>> > in background, redirect from /dev/null
>> > mpiexec_qltang1 (handle_stdin_input 1090): e.g.: mpiexec -n 4 a.out
>> > < /dev/null &
>> > mpiexec_qltang1 (handle_stdin_input 1089): stdin problem; if pgm is run
>> > in background, redirect from /dev/null
>> > mpiexec_qltang1 (handle_stdin_input 1090): e.g.: mpiexec -n 4 a.out
>> > < /dev/null &
>> > mpiexec_qltang1 (handle_stdin_input 1089): stdin problem; if pgm is run
>> > in background, redirect from /dev/null
>> > mpiexec_qltang1 (handle_stdin_input 1090): e.g.: mpiexec -n 4 a.out
>> > < /dev/null &
>> > WARNING: random wavefunctions but no delay for mixing, default for
>> > NELMDL
>> > entering main loop
>> > N E dE d eps ncg
>> > rms rms(c)
>> > rank 6 in job 36 qltang1_54199 caused collective abort of all ranks
>> > exit status of rank 6: killed by signal 9
>> > rank 3 in job 36 qltang1_54199 caused collective abort of all ranks
>> > exit status of rank 3: killed by signal 9
>> > -----------------------------
>> > I want to know how to solve the above question. Thank you alot.
>> >
>> > Best regards,
>> > Qianlin
>> _______________________________________________
>> mpich-discuss mailing list
>> mpich-discuss at mcs.anl.gov
>> https://lists.mcs.anl.gov/mailman/listinfo/mpich-discuss
>
> _______________________________________________
> mpich-discuss mailing list
> mpich-discuss at mcs.anl.gov
> https://lists.mcs.anl.gov/mailman/listinfo/mpich-discuss
>
More information about the mpich-discuss
mailing list