[mpich-discuss] MPI PBS Error
Bharath Pattabiraman
bharath650 at gmail.com
Sat Mar 24 13:17:20 CDT 2012
Hi,
I am getting the following error with my application when I run it on 64 nodes (proc per node).
=>> PBS: job killed: node 20 (qnode0553) requested job terminate, 'EOF' (code 1099) - received SIST
ER_EOF attempting to communicate with sister MOM's
mpirun: killing job...
--------------------------------------------------------------------------
MPI_ABORT was invoked on rank 26 in communicator MPI_COMM_WORLD
with errorcode 15.
NOTE: invoking MPI_ABORT causes Open MPI to kill all MPI processes.
You may or may not see output from other processes, depending on
exactly when Open MPI kills them.
--------------------------------------------------------------------------
--------------------------------------------------------------------------
mpirun was unable to cleanly terminate the daemons on the nodes shown
below. Additional manual cleanup may be required - please refer to
the "orte-clean" tool for assistance.
--------------------------------------------------------------------------
qnode0660
qnode0638
qnode0632
qnode0630
qnode0616
qnode0592
qnode0690
qnode0519
qnode0724
qnode0544
qnode0669
qnode0522
qnode0526
qnode0527
qnode0534
qnode0537
qnode0541
qnode0543
qnode0549
qnode0553
qnode0555
qnode0559
qnode0561
qnode0566
qnode0569
qnode0570
qnode0574
qnode0578
qnode0581
qnode0587
qnode0593
qnode0595
qnode0598
qnode0602
qnode0609
qnode0618
qnode0621
qnode0622
.
.
.
.
.
Regards,
Bharat
More information about the mpich-discuss
mailing list