[Swift-devel] Re: coaster error on ranger
Mihael Hategan
hategan at mcs.anl.gov
Thu Jun 11 13:09:20 CDT 2009
On Thu, 2009-06-11 at 13:04 -0500, Zhao Zhang wrote:
> No, I don't specify any wall time.
Well, you need to specify one.
> The last entry is for the run_ampl script.
>
> zhao
>
> login3% cat tc.data
> #This is the transformation catalog.
> #
> #It comes pre-configured with a number of simple transformations with
> #paths that are likely to work on a linux box. However, on some systems,
> #the paths to these executables will be different (for example, sometimes
> #some of these programs are found in /usr/bin rather than in /bin)
> #
> #NOTE WELL: fields in this file must be separated by tabs, not spaces; and
> #there must be no trailing whitespace at the end of each line.
> #
> # sitename transformation path INSTALLED platform profiles
> bgps echo /bin/echo INSTALLED INTEL32::LINUX null
> bgp000 cat /bin/cat INSTALLED INTEL32::LINUX null
> localhost sleep /bin/sleep INSTALLED
> INTEL32::LINUX null
> localhost echo /bin/echo INSTALLED
> INTEL32::LINUX null
> localhost ls /bin/ls INSTALLED
> INTEL32::LINUX null
> localhost wc /bin/wc INSTALLED
> INTEL32::LINUX null
> localhost grep /bin/grep INSTALLED
> INTEL32::LINUX null
> localhost sort /bin/sort INSTALLED
> INTEL32::LINUX null
> localhost paste /bin/paste INSTALLED
> INTEL32::LINUX null
> localhost date /bin/date INSTALLED
> INTEL32::LINUX null
> localhost db /home/wilde/angle/data/db
> INSTALLED INTEL32::LINUX null
> localhost set1 /home/wilde/angle/data/set1
> INSTALLED INTEL32::LINUX null
> localhost set3 /home/wilde/angle/data/set3
> INSTALLED INTEL32::LINUX null
> localhost run_ampl
> /share/home/00946/zzhang/SEE-work/static/run_ampl INSTALLED
> INTEL32::LINUX null
> tgtacc run_ampl
> /share/home/00946/zzhang/SEE-work/static/run_ampl INSTALLED
> INTEL32::LINUX null
>
>
> Mihael Hategan wrote:
> > Your jobs seem to not have a walltime specified. Can you post your
> > tc.data?
> >
> > On Thu, 2009-06-11 at 10:37 -0500, Zhao Zhang wrote:
> >
> >> Hi, Mihael
> >>
> >> The coaster log is at /home/zzhang/see/logs/coasters.log. The latest
> >> record should be the run that failed last night.
> >>
> >> best
> >> zhao
> >>
> >> Mihael Hategan wrote:
> >>
> >>> On Thu, 2009-06-11 at 09:24 -0500, Zhao Zhang wrote:
> >>>
> >>>
> >>>> Hi, Mike and Mihael
> >>>>
> >>>> Here is the error, I think this is related to the job wall time of
> >>>> coaster settings.
> >>>>
> >>>> Mihael, could you give me some suggestions on how to set the parameters
> >>>> for coasters on ranger?
> >>>>
> >>>>
> >>> I need to know what the problem is first. And for that I need to take a
> >>> look at the coaster log (and possibly gram logs). So if you could copy
> >>> that to some shared space in the CI, that would be good.
> >>>
> >>>
> >>>
> >>>> For now I am running 100 jobs, each job could take 2~3 hours. Thanks.
> >>>>
> >>>> best
> >>>> zhao
> >>>>
> >>>> Execution failed:
> >>>> Exception in run_ampl:
> >>>> Arguments: [run70, template, armington.mod, armington_process.cmd,
> >>>> armington_ou\
> >>>> tput.cmd, subproblems/producer_tree.mod, ces.so]
> >>>> Host: tgtacc
> >>>> Directory: ampl-20090611-0122-hzktisu5/jobs/h/run_ampl-h92ap3cj
> >>>> stderr.txt:
> >>>>
> >>>> stdout.txt:
> >>>> ----
> >>>>
> >>>> Caused by:
> >>>> Shutting down worker
> >>>> Cleaning up...
> >>>> Shutting down service at https://129.114.50.163:58556
> >>>>
> >>>> And here is my sites.xml
> >>>> bash-3.00$ cat tgranger-sge-gram2.xml
> >>>> <config>
> >>>> <pool handle="tgtacc" >
> >>>> <gridftp url="gsiftp://gridftp.ranger.tacc.teragrid.org" />
> >>>> <execution provider="coaster"
> >>>> url="gatekeeper.ranger.tacc.teragrid.org" jobManager="gt2:gt2:SGE"/>
> >>>> <!-- <profile namespace="globus"
> >>>> key="project">TG-DBS080004N</profile> -->
> >>>> <profile namespace="globus" key="project">TG-CCR080022N</profile>
> >>>> <workdirectory >/work/00946/zzhang/work</workdirectory>
> >>>> <profile namespace="env"
> >>>> key="SWIFT_JOBDIR_PATH">/tmp/zzhang/jobdir</profile>
> >>>> <profile namespace="globus" key="coastersPerNode">16</profile>
> >>>> <profile namespace="globus" key="queue">development</profile>
> >>>> <profile namespace="karajan" key="initialScore">100</profile>
> >>>> <profile namespace="karajan" key="jobThrottle">10</profile>
> >>>> <profile namespace="globus" key="slots">20</profile>
> >>>> <profile namespace="globus" key="lowOverAllocation">5</profile>
> >>>> <profile namespace="globus" key="highOverAllocation">1</profile>
> >>>> <profile namespace="globus" key="maxNodes">5</profile>
> >>>> </pool>
> >>>> </config>
> >>>>
> >>>>
> >>>>
> >>>>
> >>>>
> >>>>
> >>>
> >>>
> >
> >
> >
More information about the Swift-devel
mailing list