[mpich-discuss] submission error on IBM cluster

Jeff Hammond jhammond at alcf.anl.gov
Wed Dec 14 08:43:58 CST 2011


What version of MPICH2 are you using?

Jeff

On Wed, Dec 14, 2011 at 7:43 AM, aiswarya pawar <aiswarya.pawar at gmail.com>wrote:

> Hi users,
>
> I have a submission script for gromacs software to be used on IBM cluster,
> but i get an error while running it. the script goes like this=
>
> #!/bin/sh
> # @ error   = job1.$(Host).$(Cluster).$(Process).err
> # @ output  = job1.$(Host).$(Cluster).$(Process).out
> # @ class = ptask32
> # @ job_type = parallel
> # @ node = 1
> # @ tasks_per_node = 4
> # @ queue
>
> echo "_____________________________________"
> echo "LOADL_STEP_ID=$LOADL_STEP_ID"
> echo "_____________________________________"
>
> machine_file="/tmp/machinelist.$LOADL_STEP_ID"
> rm -f $machine_file
> for node in $LOADL_PROCESSOR_LIST
> do
> echo $node >> $machine_file
> done
> machine_count=`cat /tmp/machinelist.$LOADL_STEP_ID|wc -l`
> echo $machine_count
> echo MachineList:
> cat /tmp/machinelist.$LOADL_STEP_ID
> echo "_____________________________________"
> unset LOADLBATCH
> env  |grep LOADLBATCH
> cd /home/staff/1adf/
> /usr/bin/poe /home/gromacs-4.5.5/bin/mdrun -deffnm /home/staff/1adf/md
> -procs $machine_count -hostfile /tmp/machinelist.$LOADL_STEP_ID
> rm /tmp/machinelist.$LOADL_STEP_ID
>
>
> i get an out file as=
> _____________________________________
> LOADL_STEP_ID=cnode39.97541.0
> _____________________________________
> 4
> MachineList:
> cnode62
> cnode7
> cnode4
> cnode8
> _____________________________________
> p0_25108:  p4_error: interrupt SIGx: 4
> p0_2890:  p4_error: interrupt SIGx: 4
> p0_2901:  p4_error: interrupt SIGx: 15
> p0_22760:  p4_error: interrupt SIGx: 15
>
>
> an error file =
>
> Reading file /home/staff/1adf/md.tpr, VERSION 4.5.4 (single precision)
> Sorry couldn't backup /home/staff/1adf/md.log to
> /home/staff/1adf/#md.log.14#
>
> Back Off! I just backed up /home/staff/1adf/md.log to
> /home/staff/1adf/#md.log.14#
> ERROR: 0031-300  Forcing all remote tasks to exit due to exit code 1 in
> task 0
>
> Please anyone can help with this error.
>
> Thanks
>
>
>
>
>
> _______________________________________________
> mpich-discuss mailing list     mpich-discuss at mcs.anl.gov
> To manage subscription options or unsubscribe:
> https://lists.mcs.anl.gov/mailman/listinfo/mpich-discuss
>
>


-- 
Jeff Hammond
Argonne Leadership Computing Facility
University of Chicago Computation Institute
jhammond at alcf.anl.gov / (630) 252-5381
http://www.linkedin.com/in/jeffhammond
https://wiki-old.alcf.anl.gov/index.php/User:Jhammond
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.mcs.anl.gov/pipermail/mpich-discuss/attachments/20111214/1bb9c1d0/attachment.htm>


More information about the mpich-discuss mailing list