[Swift-user] hung submission

Mihael Hategan hategan at mcs.anl.gov
Sun May 3 15:30:37 CDT 2015


Hi,

That's actually good since we eliminated lots of moving parts.

~/Scratch seems to be the right spot according to
https://wiki.rc.ucl.ac.uk/wiki/Managing_Data_on_Legion

What I suspect might be happening is that the mountpoints are different
between login nodes and compute nodes.

Can you try running these on both the login node and a compute node:

mount (or df)
ls -al $HOME/Scratch

and then pasting the outputs back in an email.

Mihael

On Sun, 2015-05-03 at 20:18 +0000, Altaweel, Mark wrote:
> If I do a qsub on the script I get the same error message:
> 
> job_number:                 6597054
> exec_file:                  job_scripts/6597054
> submission_time:            Sun May  3 21:15:23 2015
> owner:                      tcrnma3
> uid:                        147447
> group:                      users
> gid:                        1002
> sge_o_home:                 /home/tcrnma3/
> sge_o_log_name:             tcrnma3
> sge_o_path:                 /shared/ucl/apps/mrxvt/0.5.4/bin:/shared/ucl/apps/nedit/5.6/bin:/shared/ucl/apps/gerun/i:/usr/mpi/qlogic//sbin:/usr/mpi/qlogic//bin:/usr/lib64/qt-3.3/bin:/usr/kerberos/bin:/usr/local/bin:/bin:/usr/bin:/sbin:/usr/sbin:/shared/ucl/apps/bin:/cm/shared/apps/intel/toolkit/Compiler/11.1/072//bin:/cm/shared/apps/intel/toolkit/Compiler/11.1/072//bin/intel64:/cm/shared/apps/sge/6.2u3/bin/lx26-amd64:/home/tcrnma3//bin
> sge_o_shell:                /bin/bash
> sge_o_workdir:              /imports/home1/tcrnma3/Scratch/UrbanModel/run005/scripts
> sge_o_host:                 login06
> account:                    ucl_jsv4h;S=0;T=1.0;W=1.0;X=1.0;Y=1.0;V=0;Z=1.0;U=1.0
> stderr_path_list:           NONE:NONE:/imports/home1/tcrnma3/Scratch/UrbanModel/run005/scripts/SGE7948718974736431209.submit.stderr
> hard resource_list:         batch=true,bonus=0,h_rt=540,jcs=0,jct=1,jcu=1,jcv=0,jcw=1,jcx=1,jcy=1,jcz=1,maxversion=2,memory=1M,penalty=604801,s_rt=530
> mail_list:                  tcrnma3 at login06.data.legion.ucl.ac.uk<mailto:tcrnma3 at login06.data.legion.ucl.ac.uk>
> notify:                     FALSE
> job_name:                   B0503-3707460-0
> stdout_path_list:           NONE:NONE:/imports/home1/tcrnma3/Scratch/UrbanModel/run005/scripts/SGE7948718974736431209.submit.stdout
> jobshare:                   0
> restart:                    n
> shell_list:                 NONE:/bin/ksh
> env_list:                   WORKER_LOGGING_LEVEL=NONE,XAUTHORITY=/scratch/scratch/tcrnma3/.Xauthority,PAID=0,GPU=0,OMP_NUM_THREADS=1,MICCOUNT=0,SCRATCH_SPACE=10737418240,MEMPERSLOT=1048576,SGE_SHARENODE=1,IFS=
> script_file:                SGE7948718974736431209.submit
> project:                    AllUsers
> error reason    1:          05/03/2015 21:15:57 [147447:18805]: error: can't open output file "/imports/home1/tcrnma3/Scratch/Ur
> scheduling info:            (Collecting of scheduler job information is turned off)
> 
> Mark
> 
> On May 3, 2015, at 9:06 PM, Mihael Hategan <hategan at mcs.anl.gov<mailto:hategan at mcs.anl.gov>> wrote:
> 
> It seems that it is more likely that the error message gets truncated
> rather than the path itself. After all, stdout_path_list does contain
> what seems to be the correct path.
> 
> There should be a
> script: /imports/home1/tcrnma3/Scratch/UrbanModel/run005/scripts/SGE7948718974736431209.submit
> (or similar) that should be available while a swift run is in progress.
> 
> I think one way to troubleshoot things would be to copy that script and
> submit it manually.
> 
> Mihael
> 
> On Sun, 2015-05-03 at 19:20 +0000, Altaweel, Mark wrote:
> Yes so I do import swift in the shell script that gets distributed. However, same conclusion it seems. I don’t understand why it truncates the path, unless it is there but only writes a certain number of the characters.
> 
> This is added to the script:
> 
> export PATH=$PATH:~/Scratch/swift-0.96-sge-mod/bin
> module load java/1.7.0_45
> 
> So java is included. If I remove it same thing happens though.
> 
> Mark
> 
> 
> 
> On May 3, 2015, at 8:07 PM, Mihael Hategan <hategan at mcs.anl.gov<mailto:hategan at mcs.anl.gov><mailto:hategan at mcs.anl.gov>> wrote:
> 
> On Sun, 2015-05-03 at 18:43 +0000, Altaweel, Mark wrote:
> error reason    1:          05/03/2015 19:38:15 [147447:22761]: error: can't open output file "/imports/home1/tcrnma3/Scratch/Ur
> 
> ... aaand my PE suggestion had little to do with the problem.
> 
> Is /imports mounted on compute nodes?
> 
> Mihael
> 
> _______________________________________________
> Swift-user mailing list
> Swift-user at ci.uchicago.edu<mailto:Swift-user at ci.uchicago.edu><mailto:Swift-user at ci.uchicago.edu>
> https://lists.ci.uchicago.edu/cgi-bin/mailman/listinfo/swift-user
> 
> 
> 
> 





More information about the Swift-user mailing list