[Swift-devel] Problems with coasters and managedfork jobmanager

Michael Wilde wilde at mcs.anl.gov
Thu Feb 5 23:22:11 CST 2009


I was able to run with coasters on teraport last week, using 
gt2:gt2:pbs, but not today.

I see the error "Failed to parse command file (line 21)" in my swift 
output and in the gram log (excerpt of the latter, below).

This line # was originally 17. I added some comment lines to 
bootstrap.sh to see if the line number would move, and indeed it did. So 
it suggests something in the jobmanager thats unable to handle the text 
of the bootstrap script embedded in its RSL. But I dont think the line 
in this error is the line in the bootstrap script.

Does anyone know how to find the script text that the jobmanager is 
complaining about?

As far as I can tell, something changed on teraport (or my config?) as 
my gram logs from last week indicate that the plain fork jobmanager was 
being used. (Ive got an email in to teraport support to probe this).

I see Mats's note in a prio mail about concern that the managed-fork 
mechanism may kill the coaster service, but no comments about script 
parsing errors.

I'll send more logs in this tomorrow if I havent found it yet.

Thanks,

Mike



Thu Feb  5 21:17:51 2009 JM_SCRIPT: Error file is not empty, and 
submission failed

Thu Feb  5 21:17:51 2009 JM_SCRIPT: Error text is
ERROR: Failed to parse command file (line 21).

Thu Feb  5 21:17:51 2009 JM_SCRIPT: Writing extended error information 
to stderr
2/5 21:17:51 JM: GT3 extended error message: 
GRAM_SCRIPT_GT3_FAILURE_MESSAGE: ERROR: Failed to parse command file 
(line 21).
2/5 21:17:51 JMI: while return_buf = GRAM_SCRIPT_GT3_FAILURE_MESSAGE = 
ERROR: Failed to parse command file (line 21).
2/5 21:17:51 JMI: while return_buf = GRAM_SCRIPT_ERROR = 17
2/5 21:17:51 Job Manager State Machine (entering): 
GLOBUS_GRAM_JOB_MANAGER_STATE_SUBMIT
2/5 21:17:51 JM: in globus_gram_job_manager_reporting_file_create()
2/5 21:17:51 JM: not reporting job information
2/5 21:17:51 JM: in globus_gram_job_manager_history_file_create()
2/5 21:17:51 Job Manager State Machine (entering): 
GLOBUS_GRAM_JOB_MANAGER_STATE_FAILED
2/5 21:17:51 closing destination 
https://128.135.125.17:50002/dev/stdout-urn:cog-1233890265644
2/5 21:17:51 JM: exiting 
globus_l_gram_job_manager_output_destination_close()
2/5 21:18:00 closing destination 
https://128.135.125.17:50002/dev/stderr-urn:cog-1233890265644
2/5 21:18:00 JM: exiting 
globus_l_gram_job_manager_output_destination_close()
2/5 21:18:00 Job Manager State Machine (entering): 
GLOBUS_GRAM_JOB_MANAGER_STATE_FAILED_CLOSE_OUTPUT
2/5 21:18:00 JM: NOT empty client callback list.
2/5 21:18:00 JM: sending callback of status 4 (failure code 155) to 
https://128.135.125.17:50003/1233890268457.
2/5 21:18:00 Job Manager State Machine (entering): 
GLOBUS_GRAM_JOB_MANAGER_STATE_FAILED_TWO_PHASE
2/5 21:18:00 Job Manager State Machine (entering): 
GLOBUS_GRAM_JOB_MANAGER_STATE_FAILED_TWO_PHASE_COMMITTED
2/5 21:18:00 Job Manager State Machine (entering): 
GLOBUS_GRAM_JOB_MANAGER_STATE_FAILED_FILE_CLEAN_UP
2/5 21:18:00 Job Manager State Machine (entering): 
GLOBUS_GRAM_JOB_MANAGER_STATE_FAILED_SCRATCH_CLEAN_UP
2/5 21:18:00 JMI: testing job manager scripts for type managedfork exist 
and permissions are ok.
2/5 21:18:00 JMI: completed script validation: job manager type is 
managedfork.



More information about the Swift-devel mailing list