[Swift-user] hung submission
Mihael Hategan
hategan at mcs.anl.gov
Sun May 3 15:30:37 CDT 2015
Hi,
That's actually good since we eliminated lots of moving parts.
~/Scratch seems to be the right spot according to
https://wiki.rc.ucl.ac.uk/wiki/Managing_Data_on_Legion
What I suspect might be happening is that the mountpoints are different
between login nodes and compute nodes.
Can you try running these on both the login node and a compute node:
mount (or df)
ls -al $HOME/Scratch
and then pasting the outputs back in an email.
Mihael
On Sun, 2015-05-03 at 20:18 +0000, Altaweel, Mark wrote:
> If I do a qsub on the script I get the same error message:
>
> job_number: 6597054
> exec_file: job_scripts/6597054
> submission_time: Sun May 3 21:15:23 2015
> owner: tcrnma3
> uid: 147447
> group: users
> gid: 1002
> sge_o_home: /home/tcrnma3/
> sge_o_log_name: tcrnma3
> sge_o_path: /shared/ucl/apps/mrxvt/0.5.4/bin:/shared/ucl/apps/nedit/5.6/bin:/shared/ucl/apps/gerun/i:/usr/mpi/qlogic//sbin:/usr/mpi/qlogic//bin:/usr/lib64/qt-3.3/bin:/usr/kerberos/bin:/usr/local/bin:/bin:/usr/bin:/sbin:/usr/sbin:/shared/ucl/apps/bin:/cm/shared/apps/intel/toolkit/Compiler/11.1/072//bin:/cm/shared/apps/intel/toolkit/Compiler/11.1/072//bin/intel64:/cm/shared/apps/sge/6.2u3/bin/lx26-amd64:/home/tcrnma3//bin
> sge_o_shell: /bin/bash
> sge_o_workdir: /imports/home1/tcrnma3/Scratch/UrbanModel/run005/scripts
> sge_o_host: login06
> account: ucl_jsv4h;S=0;T=1.0;W=1.0;X=1.0;Y=1.0;V=0;Z=1.0;U=1.0
> stderr_path_list: NONE:NONE:/imports/home1/tcrnma3/Scratch/UrbanModel/run005/scripts/SGE7948718974736431209.submit.stderr
> hard resource_list: batch=true,bonus=0,h_rt=540,jcs=0,jct=1,jcu=1,jcv=0,jcw=1,jcx=1,jcy=1,jcz=1,maxversion=2,memory=1M,penalty=604801,s_rt=530
> mail_list: tcrnma3 at login06.data.legion.ucl.ac.uk<mailto:tcrnma3 at login06.data.legion.ucl.ac.uk>
> notify: FALSE
> job_name: B0503-3707460-0
> stdout_path_list: NONE:NONE:/imports/home1/tcrnma3/Scratch/UrbanModel/run005/scripts/SGE7948718974736431209.submit.stdout
> jobshare: 0
> restart: n
> shell_list: NONE:/bin/ksh
> env_list: WORKER_LOGGING_LEVEL=NONE,XAUTHORITY=/scratch/scratch/tcrnma3/.Xauthority,PAID=0,GPU=0,OMP_NUM_THREADS=1,MICCOUNT=0,SCRATCH_SPACE=10737418240,MEMPERSLOT=1048576,SGE_SHARENODE=1,IFS=
> script_file: SGE7948718974736431209.submit
> project: AllUsers
> error reason 1: 05/03/2015 21:15:57 [147447:18805]: error: can't open output file "/imports/home1/tcrnma3/Scratch/Ur
> scheduling info: (Collecting of scheduler job information is turned off)
>
> Mark
>
> On May 3, 2015, at 9:06 PM, Mihael Hategan <hategan at mcs.anl.gov<mailto:hategan at mcs.anl.gov>> wrote:
>
> It seems that it is more likely that the error message gets truncated
> rather than the path itself. After all, stdout_path_list does contain
> what seems to be the correct path.
>
> There should be a
> script: /imports/home1/tcrnma3/Scratch/UrbanModel/run005/scripts/SGE7948718974736431209.submit
> (or similar) that should be available while a swift run is in progress.
>
> I think one way to troubleshoot things would be to copy that script and
> submit it manually.
>
> Mihael
>
> On Sun, 2015-05-03 at 19:20 +0000, Altaweel, Mark wrote:
> Yes so I do import swift in the shell script that gets distributed. However, same conclusion it seems. I don’t understand why it truncates the path, unless it is there but only writes a certain number of the characters.
>
> This is added to the script:
>
> export PATH=$PATH:~/Scratch/swift-0.96-sge-mod/bin
> module load java/1.7.0_45
>
> So java is included. If I remove it same thing happens though.
>
> Mark
>
>
>
> On May 3, 2015, at 8:07 PM, Mihael Hategan <hategan at mcs.anl.gov<mailto:hategan at mcs.anl.gov><mailto:hategan at mcs.anl.gov>> wrote:
>
> On Sun, 2015-05-03 at 18:43 +0000, Altaweel, Mark wrote:
> error reason 1: 05/03/2015 19:38:15 [147447:22761]: error: can't open output file "/imports/home1/tcrnma3/Scratch/Ur
>
> ... aaand my PE suggestion had little to do with the problem.
>
> Is /imports mounted on compute nodes?
>
> Mihael
>
> _______________________________________________
> Swift-user mailing list
> Swift-user at ci.uchicago.edu<mailto:Swift-user at ci.uchicago.edu><mailto:Swift-user at ci.uchicago.edu>
> https://lists.ci.uchicago.edu/cgi-bin/mailman/listinfo/swift-user
>
>
>
>
More information about the Swift-user
mailing list