[Swift-devel] Problems with coasters and managedfork jobmanager

Mihael Hategan hategan at mcs.anl.gov
Thu Feb 5 23:50:51 CST 2009


This particular line seems troubling to me:

2/5 21:18:00 JMI: testing job manager scripts for type managedfork
exist and permissions are ok.

Does this mean that managed fork is now in use on TP? Is there any way
to still use plain fork?

On Thu, 2009-02-05 at 23:22 -0600, Michael Wilde wrote:
> I was able to run with coasters on teraport last week, using 
> gt2:gt2:pbs, but not today.
> 
> I see the error "Failed to parse command file (line 21)" in my swift 
> output and in the gram log (excerpt of the latter, below).
> 
> This line # was originally 17. I added some comment lines to 
> bootstrap.sh to see if the line number would move, and indeed it did. So 
> it suggests something in the jobmanager thats unable to handle the text 
> of the bootstrap script embedded in its RSL. But I dont think the line 
> in this error is the line in the bootstrap script.
> 
> Does anyone know how to find the script text that the jobmanager is 
> complaining about?
> 
> As far as I can tell, something changed on teraport (or my config?) as 
> my gram logs from last week indicate that the plain fork jobmanager was 
> being used. (Ive got an email in to teraport support to probe this).
> 
> I see Mats's note in a prio mail about concern that the managed-fork 
> mechanism may kill the coaster service, but no comments about script 
> parsing errors.
> 
> I'll send more logs in this tomorrow if I havent found it yet.
> 
> Thanks,
> 
> Mike
> 
> 
> 
> Thu Feb  5 21:17:51 2009 JM_SCRIPT: Error file is not empty, and 
> submission failed
> 
> Thu Feb  5 21:17:51 2009 JM_SCRIPT: Error text is
> ERROR: Failed to parse command file (line 21).
> 
> Thu Feb  5 21:17:51 2009 JM_SCRIPT: Writing extended error information 
> to stderr
> 2/5 21:17:51 JM: GT3 extended error message: 
> GRAM_SCRIPT_GT3_FAILURE_MESSAGE: ERROR: Failed to parse command file 
> (line 21).
> 2/5 21:17:51 JMI: while return_buf = GRAM_SCRIPT_GT3_FAILURE_MESSAGE = 
> ERROR: Failed to parse command file (line 21).
> 2/5 21:17:51 JMI: while return_buf = GRAM_SCRIPT_ERROR = 17
> 2/5 21:17:51 Job Manager State Machine (entering): 
> GLOBUS_GRAM_JOB_MANAGER_STATE_SUBMIT
> 2/5 21:17:51 JM: in globus_gram_job_manager_reporting_file_create()
> 2/5 21:17:51 JM: not reporting job information
> 2/5 21:17:51 JM: in globus_gram_job_manager_history_file_create()
> 2/5 21:17:51 Job Manager State Machine (entering): 
> GLOBUS_GRAM_JOB_MANAGER_STATE_FAILED
> 2/5 21:17:51 closing destination 
> https://128.135.125.17:50002/dev/stdout-urn:cog-1233890265644
> 2/5 21:17:51 JM: exiting 
> globus_l_gram_job_manager_output_destination_close()
> 2/5 21:18:00 closing destination 
> https://128.135.125.17:50002/dev/stderr-urn:cog-1233890265644
> 2/5 21:18:00 JM: exiting 
> globus_l_gram_job_manager_output_destination_close()
> 2/5 21:18:00 Job Manager State Machine (entering): 
> GLOBUS_GRAM_JOB_MANAGER_STATE_FAILED_CLOSE_OUTPUT
> 2/5 21:18:00 JM: NOT empty client callback list.
> 2/5 21:18:00 JM: sending callback of status 4 (failure code 155) to 
> https://128.135.125.17:50003/1233890268457.
> 2/5 21:18:00 Job Manager State Machine (entering): 
> GLOBUS_GRAM_JOB_MANAGER_STATE_FAILED_TWO_PHASE
> 2/5 21:18:00 Job Manager State Machine (entering): 
> GLOBUS_GRAM_JOB_MANAGER_STATE_FAILED_TWO_PHASE_COMMITTED
> 2/5 21:18:00 Job Manager State Machine (entering): 
> GLOBUS_GRAM_JOB_MANAGER_STATE_FAILED_FILE_CLEAN_UP
> 2/5 21:18:00 Job Manager State Machine (entering): 
> GLOBUS_GRAM_JOB_MANAGER_STATE_FAILED_SCRATCH_CLEAN_UP
> 2/5 21:18:00 JMI: testing job manager scripts for type managedfork exist 
> and permissions are ok.
> 2/5 21:18:00 JMI: completed script validation: job manager type is 
> managedfork.
> _______________________________________________
> Swift-devel mailing list
> Swift-devel at ci.uchicago.edu
> http://mail.ci.uchicago.edu/mailman/listinfo/swift-devel




More information about the Swift-devel mailing list