[mpich-discuss] submission error on IBM cluster

aiswarya.pawar at gmail.com aiswarya.pawar at gmail.com
Wed Dec 14 09:17:12 CST 2011


Am using mipch2-1.2.7 version.
Sent from my BlackBerry® on Reliance Mobile, India's No. 1 Network. Go for it!

-----Original Message-----
From: aiswarya pawar <aiswarya.pawar at gmail.com>
Date: Wed, 14 Dec 2011 19:13:17 
To: <mpich-discuss at mcs.anl.gov>
Subject: submission error on IBM cluster

Hi users,

I have a submission script for gromacs software to be used on IBM cluster,
but i get an error while running it. the script goes like this=

#!/bin/sh
# @ error   = job1.$(Host).$(Cluster).$(Process).err
# @ output  = job1.$(Host).$(Cluster).$(Process).out
# @ class = ptask32
# @ job_type = parallel
# @ node = 1
# @ tasks_per_node = 4
# @ queue

echo "_____________________________________"
echo "LOADL_STEP_ID=$LOADL_STEP_ID"
echo "_____________________________________"

machine_file="/tmp/machinelist.$LOADL_STEP_ID"
rm -f $machine_file
for node in $LOADL_PROCESSOR_LIST
do
echo $node >> $machine_file
done
machine_count=`cat /tmp/machinelist.$LOADL_STEP_ID|wc -l`
echo $machine_count
echo MachineList:
cat /tmp/machinelist.$LOADL_STEP_ID
echo "_____________________________________"
unset LOADLBATCH
env  |grep LOADLBATCH
cd /home/staff/1adf/
/usr/bin/poe /home/gromacs-4.5.5/bin/mdrun -deffnm /home/staff/1adf/md
-procs $machine_count -hostfile /tmp/machinelist.$LOADL_STEP_ID
rm /tmp/machinelist.$LOADL_STEP_ID


i get an out file as=
_____________________________________
LOADL_STEP_ID=cnode39.97541.0
_____________________________________
4
MachineList:
cnode62
cnode7
cnode4
cnode8
_____________________________________
p0_25108:  p4_error: interrupt SIGx: 4
p0_2890:  p4_error: interrupt SIGx: 4
p0_2901:  p4_error: interrupt SIGx: 15
p0_22760:  p4_error: interrupt SIGx: 15


an error file =

Reading file /home/staff/1adf/md.tpr, VERSION 4.5.4 (single precision)
Sorry couldn't backup /home/staff/1adf/md.log to
/home/staff/1adf/#md.log.14#

Back Off! I just backed up /home/staff/1adf/md.log to
/home/staff/1adf/#md.log.14#
ERROR: 0031-300  Forcing all remote tasks to exit due to exit code 1 in
task 0

Please anyone can help with this error.

Thanks

-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.mcs.anl.gov/pipermail/mpich-discuss/attachments/20111214/7d88df59/attachment.htm>


More information about the mpich-discuss mailing list