[Swift-devel] Problems with coasters and managedfork jobmanager
Allan Espinosa
aespinosa at cs.uchicago.edu
Fri Feb 6 02:57:24 CST 2009
Hi Mike,
I think Greg posted about an OSG stack upgrade this week so gram won't
be available. That's why i just used local:pbs for my runs today.
-Allan
On Thu, Feb 5, 2009 at 11:22 PM, Michael Wilde <wilde at mcs.anl.gov> wrote:
> I was able to run with coasters on teraport last week, using gt2:gt2:pbs,
> but not today.
>
> I see the error "Failed to parse command file (line 21)" in my swift output
> and in the gram log (excerpt of the latter, below).
>
> This line # was originally 17. I added some comment lines to bootstrap.sh to
> see if the line number would move, and indeed it did. So it suggests
> something in the jobmanager thats unable to handle the text of the bootstrap
> script embedded in its RSL. But I dont think the line in this error is the
> line in the bootstrap script.
>
> Does anyone know how to find the script text that the jobmanager is
> complaining about?
>
> As far as I can tell, something changed on teraport (or my config?) as my
> gram logs from last week indicate that the plain fork jobmanager was being
> used. (Ive got an email in to teraport support to probe this).
>
> I see Mats's note in a prio mail about concern that the managed-fork
> mechanism may kill the coaster service, but no comments about script parsing
> errors.
>
> I'll send more logs in this tomorrow if I havent found it yet.
>
> Thanks,
>
> Mike
>
>
>
> Thu Feb 5 21:17:51 2009 JM_SCRIPT: Error file is not empty, and submission
> failed
>
> Thu Feb 5 21:17:51 2009 JM_SCRIPT: Error text is
> ERROR: Failed to parse command file (line 21).
>
> Thu Feb 5 21:17:51 2009 JM_SCRIPT: Writing extended error information to
> stderr
> 2/5 21:17:51 JM: GT3 extended error message:
> GRAM_SCRIPT_GT3_FAILURE_MESSAGE: ERROR: Failed to parse command file (line
> 21).
> 2/5 21:17:51 JMI: while return_buf = GRAM_SCRIPT_GT3_FAILURE_MESSAGE =
> ERROR: Failed to parse command file (line 21).
> 2/5 21:17:51 JMI: while return_buf = GRAM_SCRIPT_ERROR = 17
> 2/5 21:17:51 Job Manager State Machine (entering):
> GLOBUS_GRAM_JOB_MANAGER_STATE_SUBMIT
> 2/5 21:17:51 JM: in globus_gram_job_manager_reporting_file_create()
> 2/5 21:17:51 JM: not reporting job information
> 2/5 21:17:51 JM: in globus_gram_job_manager_history_file_create()
> 2/5 21:17:51 Job Manager State Machine (entering):
> GLOBUS_GRAM_JOB_MANAGER_STATE_FAILED
> 2/5 21:17:51 closing destination
> https://128.135.125.17:50002/dev/stdout-urn:cog-1233890265644
> 2/5 21:17:51 JM: exiting
> globus_l_gram_job_manager_output_destination_close()
> 2/5 21:18:00 closing destination
> https://128.135.125.17:50002/dev/stderr-urn:cog-1233890265644
> 2/5 21:18:00 JM: exiting
> globus_l_gram_job_manager_output_destination_close()
> 2/5 21:18:00 Job Manager State Machine (entering):
> GLOBUS_GRAM_JOB_MANAGER_STATE_FAILED_CLOSE_OUTPUT
> 2/5 21:18:00 JM: NOT empty client callback list.
> 2/5 21:18:00 JM: sending callback of status 4 (failure code 155) to
> https://128.135.125.17:50003/1233890268457.
> 2/5 21:18:00 Job Manager State Machine (entering):
> GLOBUS_GRAM_JOB_MANAGER_STATE_FAILED_TWO_PHASE
> 2/5 21:18:00 Job Manager State Machine (entering):
> GLOBUS_GRAM_JOB_MANAGER_STATE_FAILED_TWO_PHASE_COMMITTED
> 2/5 21:18:00 Job Manager State Machine (entering):
> GLOBUS_GRAM_JOB_MANAGER_STATE_FAILED_FILE_CLEAN_UP
> 2/5 21:18:00 Job Manager State Machine (entering):
> GLOBUS_GRAM_JOB_MANAGER_STATE_FAILED_SCRATCH_CLEAN_UP
> 2/5 21:18:00 JMI: testing job manager scripts for type managedfork exist and
> permissions are ok.
> 2/5 21:18:00 JMI: completed script validation: job manager type is
> managedfork.
> _______________________________________________
> Swift-devel mailing list
> Swift-devel at ci.uchicago.edu
> http://mail.ci.uchicago.edu/mailman/listinfo/swift-devel
>
>
--
Allan M. Espinosa <http://allan.88-mph.net/blog>
PhD student, Computer Science
University of Chicago <http://people.cs.uchicago.edu/~aespinosa>
More information about the Swift-devel
mailing list