[Swift-user] cannot submit on fusion

Michael Wilde wilde at mcs.anl.gov
Sat Mar 13 12:08:55 CST 2010


Marcin, I took the liberty of poking around your work dir on fusion.

The problem seems to be that qsub is rejecting the job that swift s submitting to it:

Caused by: Cannot submit job: Could not submit job (qsub reported an exit code of 1). no error output

Now we need to find out why that is.

I see that your tc.data file does not end in a newline. Lets try to get rid of the message about the ":" to eliminate that as a possibility.

Can you also do these things:

- do a quick qsub test of a "echo hi" script to ensure that your Fusion PBS project is still valid, and the qsub is working for you. In this test, set a max wall time the same as what youre trying to set via tc.data (but which I think is being ignored because Swift is unable to parse the GLOBUS namespace declaration from that line)

- see if there are recent files under $HOME/.globus/scripts or other directories under .globus (which I cannot access) which may contain a clue as to why PBS rejecting the job.

- Mike

----- "Marcin Hitczenko" <marcin at galton.uchicago.edu> wrote:

> Hi Mike,
> 
> When I look at tc.data it seems to be fine (I made sure and ran again
> and
> got the same error). I also have not changed tc.data since I ran it
> last
> and I seem to remember getting the same error about the illegal
> character
> before.
> 
> I am quite sure I haven't changed anything since I ran last, so I am
> wondering if it might be some changes in fusion which I need to
> update
> for?
> 
...

> >
> > I see two problems, the second likely being the result of the
> first:
> >
> > In your output file: [ERROR] Parsing profiles on line 21 Illegal
> character
> > ':'at position 22 :Illegal character ':'
> > is referring to your tc.data file. I think your Globus MaxWallTime
> profile
> > entry got moved to a separate line, instead of being separated by
> tabs as
> > the last column of the previous line.
> >
> > I suspect that may have caused jobs to get submitted to PBS with
> defaults
> > that were invalid for the default queue that your jobs are going
> into,
> > thus causing the second error: Cannot submit job: Could not submit
> job
> > (qsub reported an exit code of 1). no error output
> >
> > So fix tc.data, and see if this fixes the problem.
> >
> > - Mike
> >
> > ----- "Marcin Hitczenko" <marcin at galton.uchicago.edu> wrote:
> >
> >> Hi Mike,
> >>
> >> Thanks for your response. I am attaching the .log file. Also, the
> >> swift.out file I included in the original email has the output of
> my
> >> run.
> >> I am including it again.
> >>
> >> Best,
> >>
> >> Marcin
> >>
> >> > Marcin, I forgot to point out: "failed to transfer wrapper log"
> is
> >> just a
> >> > catch-all error message which means "something went wrong with
> an
> >> app()
> >> > job that Swift ran, and the job did not return the expected log
> file
> >> that
> >> > comes from the wrapper script under which Swift runs the job. 
> We
> >> need to
> >> > improve the text of this message.
> >> >
> >> > Also, if you can, always run swift with standard output and
> error
> >> > redirected into a file, and send that file as well when you
> report
> >> a
> >> > problem.
> >> >
> >> > Thanks,
> >> >
> >> > Mike
> >> >
> >> > ----- "Michael Wilde" <wilde at mcs.anl.gov> wrote:
> >> >
> >> >> Marcin, can you also post the .log file from this run? (it will
> be
> >> >> named getcoefs*.log where * is a long unique id including the
> >> date)
> >> >>
> >> >> - Mike
> >> >>
> >> >> ----- "Marcin Hitczenko" <marcin at galton.uchicago.edu> wrote:
> >> >>
> >> >> > Hi,
> >> >> >
> >> >> > I am trying to run swift scripts on fusion and I am
> encountering
> >> an
> >> >> > error
> >> >> > I have never had before (The scripts I am running have worked
> >> >> before).
> >> >> > It
> >> >> > seems it is having trouble submitting job because it "failed
> to
> >> >> > transfer
> >> >> > wrapper log". I am including my swift script, tc.data,
> sites.xml
> >> >> and
> >> >> > the
> >> >> > output file of when I ran it.
> >> >> >
> >> >> > I am not sure if I need to change anything? Like I said, I am
> >> sure
> >> >> > the
> >> >> > same script worked a month or so ago.
> >> >> >
> >> >> > Thanks for your help.
> >> >> >
> >> >> > Best,
> >> >> >
> >> >> > Marcin
> >> >> >
> >> >> > _______________________________________________
> >> >> > Swift-user mailing list
> >> >> > Swift-user at ci.uchicago.edu
> >> >> > http://mail.ci.uchicago.edu/mailman/listinfo/swift-user
> >> >>
> >> >> --
> >> >> Michael Wilde
> >> >> Computation Institute, University of Chicago
> >> >> Mathematics and Computer Science Division
> >> >> Argonne National Laboratory
> >> >>
> >> >> _______________________________________________
> >> >> Swift-user mailing list
> >> >> Swift-user at ci.uchicago.edu
> >> >> http://mail.ci.uchicago.edu/mailman/listinfo/swift-user
> >> >
> >> > --
> >> > Michael Wilde
> >> > Computation Institute, University of Chicago
> >> > Mathematics and Computer Science Division
> >> > Argonne National Laboratory
> >> >
> >
> > --
> > Michael Wilde
> > Computation Institute, University of Chicago
> > Mathematics and Computer Science Division
> > Argonne National Laboratory
> >

-- 
Michael Wilde
Computation Institute, University of Chicago
Mathematics and Computer Science Division
Argonne National Laboratory




More information about the Swift-user mailing list