[Swift-devel] Problems with coasters and managedfork jobmanager
Michael Wilde
wilde at mcs.anl.gov
Thu Feb 5 23:22:11 CST 2009
I was able to run with coasters on teraport last week, using
gt2:gt2:pbs, but not today.
I see the error "Failed to parse command file (line 21)" in my swift
output and in the gram log (excerpt of the latter, below).
This line # was originally 17. I added some comment lines to
bootstrap.sh to see if the line number would move, and indeed it did. So
it suggests something in the jobmanager thats unable to handle the text
of the bootstrap script embedded in its RSL. But I dont think the line
in this error is the line in the bootstrap script.
Does anyone know how to find the script text that the jobmanager is
complaining about?
As far as I can tell, something changed on teraport (or my config?) as
my gram logs from last week indicate that the plain fork jobmanager was
being used. (Ive got an email in to teraport support to probe this).
I see Mats's note in a prio mail about concern that the managed-fork
mechanism may kill the coaster service, but no comments about script
parsing errors.
I'll send more logs in this tomorrow if I havent found it yet.
Thanks,
Mike
Thu Feb 5 21:17:51 2009 JM_SCRIPT: Error file is not empty, and
submission failed
Thu Feb 5 21:17:51 2009 JM_SCRIPT: Error text is
ERROR: Failed to parse command file (line 21).
Thu Feb 5 21:17:51 2009 JM_SCRIPT: Writing extended error information
to stderr
2/5 21:17:51 JM: GT3 extended error message:
GRAM_SCRIPT_GT3_FAILURE_MESSAGE: ERROR: Failed to parse command file
(line 21).
2/5 21:17:51 JMI: while return_buf = GRAM_SCRIPT_GT3_FAILURE_MESSAGE =
ERROR: Failed to parse command file (line 21).
2/5 21:17:51 JMI: while return_buf = GRAM_SCRIPT_ERROR = 17
2/5 21:17:51 Job Manager State Machine (entering):
GLOBUS_GRAM_JOB_MANAGER_STATE_SUBMIT
2/5 21:17:51 JM: in globus_gram_job_manager_reporting_file_create()
2/5 21:17:51 JM: not reporting job information
2/5 21:17:51 JM: in globus_gram_job_manager_history_file_create()
2/5 21:17:51 Job Manager State Machine (entering):
GLOBUS_GRAM_JOB_MANAGER_STATE_FAILED
2/5 21:17:51 closing destination
https://128.135.125.17:50002/dev/stdout-urn:cog-1233890265644
2/5 21:17:51 JM: exiting
globus_l_gram_job_manager_output_destination_close()
2/5 21:18:00 closing destination
https://128.135.125.17:50002/dev/stderr-urn:cog-1233890265644
2/5 21:18:00 JM: exiting
globus_l_gram_job_manager_output_destination_close()
2/5 21:18:00 Job Manager State Machine (entering):
GLOBUS_GRAM_JOB_MANAGER_STATE_FAILED_CLOSE_OUTPUT
2/5 21:18:00 JM: NOT empty client callback list.
2/5 21:18:00 JM: sending callback of status 4 (failure code 155) to
https://128.135.125.17:50003/1233890268457.
2/5 21:18:00 Job Manager State Machine (entering):
GLOBUS_GRAM_JOB_MANAGER_STATE_FAILED_TWO_PHASE
2/5 21:18:00 Job Manager State Machine (entering):
GLOBUS_GRAM_JOB_MANAGER_STATE_FAILED_TWO_PHASE_COMMITTED
2/5 21:18:00 Job Manager State Machine (entering):
GLOBUS_GRAM_JOB_MANAGER_STATE_FAILED_FILE_CLEAN_UP
2/5 21:18:00 Job Manager State Machine (entering):
GLOBUS_GRAM_JOB_MANAGER_STATE_FAILED_SCRATCH_CLEAN_UP
2/5 21:18:00 JMI: testing job manager scripts for type managedfork exist
and permissions are ok.
2/5 21:18:00 JMI: completed script validation: job manager type is
managedfork.
More information about the Swift-devel
mailing list