[mpich-discuss] MPI PBS Error

Bharath Pattabiraman bharath650 at gmail.com
Sat Mar 24 13:17:20 CDT 2012


Hi,

I am getting the following error with my application when I run it on 64 nodes (proc per node). 

=>> PBS: job killed: node 20 (qnode0553) requested job terminate, 'EOF' (code 1099) - received SIST
ER_EOF attempting to communicate with sister MOM's
mpirun: killing job...

--------------------------------------------------------------------------
MPI_ABORT was invoked on rank 26 in communicator MPI_COMM_WORLD
with errorcode 15.

NOTE: invoking MPI_ABORT causes Open MPI to kill all MPI processes.
You may or may not see output from other processes, depending on
exactly when Open MPI kills them.
--------------------------------------------------------------------------
--------------------------------------------------------------------------
mpirun was unable to cleanly terminate the daemons on the nodes shown
below. Additional manual cleanup may be required - please refer to
the "orte-clean" tool for assistance.
--------------------------------------------------------------------------
        qnode0660
        qnode0638
        qnode0632
        qnode0630
        qnode0616
        qnode0592
        qnode0690
        qnode0519
        qnode0724
        qnode0544
        qnode0669
        qnode0522
        qnode0526
        qnode0527
        qnode0534
        qnode0537
        qnode0541
        qnode0543
        qnode0549
        qnode0553
        qnode0555
        qnode0559
        qnode0561
        qnode0566
        qnode0569
        qnode0570
        qnode0574
        qnode0578
        qnode0581
        qnode0587
        qnode0593
        qnode0595
        qnode0598
        qnode0602
        qnode0609
        qnode0618
        qnode0621
        qnode0622
.
.
.
.
.

Regards,
Bharat


More information about the mpich-discuss mailing list