[Swift-devel] Problem with 0.92 sending jobs to OSG via Condor-G
Michael Wilde
wilde at mcs.anl.gov
Thu Jan 13 10:20:34 CST 2011
Allan, you are right!
So the code in provide-condor is an obsolete fossil?
My earlier diffs were wrong because I diffed trunk against 0.92, but the problem occurred in the merge of stable *to* trunk (obviously now ;)
The error I think is in rev 2989:
--- modules/provider-localscheduler/src/org/globus/cog/abstraction/impl/scheduler/condor/CondorExecutor.java (revision 2988)
+++ modules/provider-localscheduler/src/org/globus/cog/abstraction/impl/scheduler/condor/CondorExecutor.java (working copy)
The working trunk version generates this Condor submit file:
universe = grid
grid_resource = gt2 ff-grid3.unl.edu/jobmanager-pbs
stream_output = False
stream_error = False
Transfer_Executable = false
output = /home/wilde/.globus/scripts/Condor50896.submit.stdout
error = /home/wilde/.globus/scripts/Condor50896.submit.stderr
remote_initialdir = /panfs/panasas/CMS/data/engage/tmp/ff-grid3.unl.edu/catsn-20110113-1059-4xb6b31h
executable = /bin/bash
arguments = /panfs/panasas/CMS/data/engage/tmp/ff-grid3.unl.edu/catsn-20110113-1059-4xb6b31h/shared/_swiftwrap cat-fmk15f4k -jobdir f -scratch -e /bin/cat -out outdir/f.0001.out -err stderr.txt -i -d outdir -if data.txt -of outdir/f.0001.out -k -cdmfile -status file -a data.txt
notification = Never
leave_in_queue = TRUE
queue
while the failing 0.92 version generates this:
universe = grid
grid_resource = gt2 belhaven-1.renci.org/jobmanager-condor
stream_output = False
stream_error = False
Transfer_Executable = false
output = /home/wilde/.globus/scripts/Condor43688.submit.stdout
error = /home/wilde/.globus/scripts/Condor43688.submit.stderr
remote_initialdir = /nfs/osg-data/engage/tmp/belhaven-1.renci.org/catsn-20110113-1050-eskyjcb5
executable = /bin/bash
arguments = /nfs/osg-data/engage/tmp/belhaven-1.renci.org/catsn-20110113-1050-eskyjcb5/shared/_swiftwrap cat-kbmn4f4k -jobdir k -scrat
ch "" -e /bin/cat -out outdir/f.0001.out -err stderr.txt -i -d outdir -if data.txt -of outdir/f.0001.out -k "" -cdmfile "" -status fil
e -a data.txt
notification = Never
leave_in_queue = TRUE
queue
It is not yet clear to me if the older code is working because it *failed* to escape the quotes on the arguments line with \", or because it *omitted* the "". I need to look more closely to see if Im being fooled by the .submit file text I pasted above (ie if the \" is really there, or if "" is missing entirely).
At any rate - Mihael, can you sync up with me on this (ie whichever of us get to it first should fix). Or Sarah, David, Justin, or Allan?
Mihael, I think your top prio should be the coaster staging timing issue that Allan and Justin are both encountering (we think).
We need to add a test for how this works and verify that its creating a valid submit file.
Thanks,
- Mike
The diffs are below:
--- modules/provider-localscheduler/src/org/globus/cog/abstraction/impl/scheduler/condor/CondorExecutor.java (revision 2988)
+++ modules/provider-localscheduler/src/org/globus/cog/abstraction/impl/scheduler/condor/CondorExecutor.java (working copy)
@@ -116,97 +116,6 @@
wr.close();
}
- private static final boolean[] TRIGGERS;
-
- static {
- TRIGGERS = new boolean[128];
- TRIGGERS[' '] = true;
- TRIGGERS['\n'] = true;
- TRIGGERS['\t'] = true;
- TRIGGERS['\\'] = true;
- TRIGGERS['>'] = true;
- TRIGGERS['<'] = true;
- TRIGGERS['"'] = true;
- }
-
- protected String quote(String s) {
- if ("".equals(s)) {
- return "";
- }
- boolean quotes = false;
- for (int i = 0; i < s.length(); i++) {
- char c = s.charAt(i);
- if (c < 128 && TRIGGERS[c]) {
- quotes = true;
- break;
- }
- }
- if (!quotes) {
- return s;
- }
- StringBuffer sb = new StringBuffer();
- if (quotes) {
- sb.append('\\');
- sb.append('"');
- }
- for (int i = 0; i < s.length(); i++) {
- char c = s.charAt(i);
- if (c == '"' || c == '\\') {
- sb.append('\\');
- }
- sb.append(c);
- }
- if (quotes) {
- sb.append('\\');
- sb.append('"');
- }
- return sb.toString();
- }
-
- protected String replaceVars(String str) {
- StringBuffer sb = new StringBuffer();
- boolean escaped = false;
- for (int i = 0; i < str.length(); i++) {
- char c = str.charAt(i);
- if (c == '\\') {
- if (escaped) {
- sb.append('\\');
- }
- else {
- escaped = true;
- }
- }
- else {
- if (c == '$' && !escaped) {
- if (i == str.length() - 1) {
- sb.append('$');
- }
- else {
- int e = str.indexOf(' ', i);
- if (e == -1) {
- e = str.length();
- }
- String name = str.substring(i + 1, e);
- Object attr = getSpec().getAttribute(name);
- if (attr != null) {
- sb.append(attr.toString());
- }
- else {
- sb.append('$');
- sb.append(name);
- }
- i = e;
- }
- }
- else {
- sb.append(c);
- }
- escaped = false;
- }
- }
- return sb.toString();
- }
-
protected String getName() {
return "Condor";
}
login1$
----- Original Message -----
> I think my diffs were wrong. Please ignore this thread till I re-do
> them.
>
> - Mike
>
> ----- Original Message -----
> > ----- Original Message -----
> > > Shouldn't we be looking at the diffs in provider-localscheduler?
> >
> > I don't *think* so - my tests were using COndor-G directly:
> >
> > <profile namespace="globus" key="jobType">grid</profile>
> > <profile namespace="globus" key="gridResource">gt2
> > ff-grid3.unl.edu/jobmanager-pbs</profile>
> >
> > But in any case, I diff'ed the entire cog and swift trees, and saw
> > almost *no* diffs (see later msg). The only one I am suspicious of
> > at
> > the moment is the @Override patch.
> >
> > I need to find when that change was made and whether I somehow
> > compiled *with* the Overrides in place in the older working copy.
> >
> > - Mike
> >
> > >
> > > -Allan (mobile)
> > >
> > > On Jan 13, 2011 11:17 AM, "Michael Wilde" < wilde at mcs.anl.gov >
> > > wrote:
> > > >
> > > >
> > > >
> > > > ----- Original Message -----
> > > > > > I need to check what local mods I had applied, but I think
> > > > > > its
> > > > > > more
> > > > > > likely that some Condor submit file quoting fix fell off in
> > > > > > 0.92
> > > > > > integration.
> > > > >
> > > > > Yeah. A svn diff > somefile would help.
> > > >
> > > > Hmmm. So far svn diffs show no changes within provider-condor,
> > > > neither between trunk and 0.92 branch nor within my working
> > > > copies
> > > > of those two on engage-submit, which seem to behave differently
> > > > regarding Condor quoting.
> > > >
> > > > Could the change(s) that were made a long time ago to fix Condor
> > > > quoting be in a different module than provider-condor? If so,
> > > > whats
> > > > a likely place to look?
> > > >
> > > > I'll check vdl-int.k next.
> > > >
> > > > - Mike
> > > >
> > > > > >
> > > > > > So Marc, sorry - this release is not usable for you yet.
> > > > > >
> > > > > > - Mike
> > > > > >
> > > > > >
> > > > > > ----- Original Message -----
> > > > > > > Im trying my first tests of 0.92 on engage-submit, sending
> > > > > > > 100
> > > > > > > trivial
> > > > > > > cat jobs to 10 OSG sites.
> > > > > > >
> > > > > > > My jobs seem to be all dying with the error "Found illegal
> > > > > > > unescaped
> > > > > > > double-quote" (see below).
> > > > > > >
> > > > > > > Has anyone successfully run a Condor-G job on OSG with
> > > > > > > 0.92?
> > > > > > >
> > > > > > > I'll dig deeper and try the same test with the older
> > > > > > > version
> > > > > > > of
> > > > > > > trunk
> > > > > > > that Marc has been using here with better success. Will
> > > > > > > also
> > > > > > > try a
> > > > > > > single job run and capture a simpler log and the condor-g
> > > > > > > submit
> > > > > > > file.
> > > > > > >
> > > > > > > Allan, have you tried 0.92 against COndor-G? If not, could
> > > > > > > you?
> > > > > > >
> > > > > > > Sarah, we should add some Condor-G-to-GT2 testing to 0.92
> > > > > > > validation I
> > > > > > > think.
> > > > > > >
> > > > > > > - Mike
> > > > > > >
> > > > > > > --
> > > > > > >
> > > > > > > Caused by:
> > > > > > > org.globus.cog.abstraction.impl.common.task.TaskSubmissionException:
> > > > > > > Cannot submit job: Could not submit job (condor_submit
> > > > > > > reported an
> > > > > > > exit code of 1). Submitting job(s)
> > > > > > > Found illegal unescaped double-quote: "" -e /bin/cat -out
> > > > > > > outdir/f.0065.out -err stderr.txt -i -d outdir -if
> > > > > > > data.txt
> > > > > > > -of
> > > > > > > outdir/f.0065.out -k "" -cdmfile "" -status file -a
> > > > > > > data.txtThe
> > > > > > > full
> > > > > > > arguments you specified were:
> > > > > > > /osg/data/engage/tmp/
> > > > > > > osg.hpc.ufl.edu/catsn-20110113-0025-vv4p4up3/shared/_swiftwrap
> > > > > > > cat-ajxnee4k -jobdir a -scratch "" -e /bin/cat -out
> > > > > > > outdir/f.0065.out
> > > > > > > -err stderr.txt -i -d outdir -if data.txt -of
> > > > > > > outdir/f.0065.out -k
> > > > > > > ""
> > > > > > > -cdmfile "" -status file -a data.txt
> > > > > > >
> > > > > > >
> > > > > > > Script is:
> > > > > > >
> > > > > > > e$ cat catsn.swift
> > > > > > > type file;
> > > > > > >
> > > > > > > app (file o) cat (file i)
> > > > > > > {
> > > > > > > cat @i stdout=@o;
> > > > > > > }
> > > > > > >
> > > > > > > file out[]<simple_mapper; location="outdir",
> > > > > > > prefix="f.",suffix=".out">;
> > > > > > > foreach j in [1:@toint(@arg("n","1"))] {
> > > > > > > file data<"data.txt">;
> > > > > > > out[j] = cat(data);
> > > > > > > }
> > > > > > >
> > > > > > >
> > > > > > > --
> > > > > > > Michael Wilde
> > > > > > > Computation Institute, University of Chicago
> > > > > > > Mathematics and Computer Science Division
> > > > > > > Argonne National Laboratory
> > > > > > >
> > > > > > > _______________________________________________
> > > > > > > Swift-devel mailing list
> > > > > > > Swift-devel at ci.uchicago.edu
> > > > > > > http://mail.ci.uchicago.edu/mailman/listinfo/swift-devel
> > > > > >
> > > >
> > > > --
> > > > Michael Wilde
> > > > Computation Institute, University of Chicago
> > > > Mathematics and Computer Science Division
> > > > Argonne National Laboratory
> > > >
> > > > _______________________________________________
> > > > Swift-devel mailing list
> > > > Swift-devel at ci.uchicago.edu
> > > > http://mail.ci.uchicago.edu/mailman/listinfo/swift-devel
> > > >
> >
> > --
> > Michael Wilde
> > Computation Institute, University of Chicago
> > Mathematics and Computer Science Division
> > Argonne National Laboratory
> >
> > _______________________________________________
> > Swift-devel mailing list
> > Swift-devel at ci.uchicago.edu
> > http://mail.ci.uchicago.edu/mailman/listinfo/swift-devel
>
> --
> Michael Wilde
> Computation Institute, University of Chicago
> Mathematics and Computer Science Division
> Argonne National Laboratory
>
> _______________________________________________
> Swift-devel mailing list
> Swift-devel at ci.uchicago.edu
> http://mail.ci.uchicago.edu/mailman/listinfo/swift-devel
--
Michael Wilde
Computation Institute, University of Chicago
Mathematics and Computer Science Division
Argonne National Laboratory
More information about the Swift-devel
mailing list