[Swift-devel] Re: latest attempt with GRAM4

Michael Wilde wilde at mcs.anl.gov
Tue Feb 12 17:41:18 CST 2008


That would certainly explain this clobbered the head node.
Im sorry that we all missed this last week.

If true, we would have seen the applications running on the headnode.
I wonder if anyone noticed?

Mike, heres a sample entry Ive used in the past for UC-TG:

<pool handle="UC" gridlaunch="/home/wilde/swift/tools/mystart" 
sysinfo="INTEL32::LINUX">
     <gridftp url="gsiftp://tg-gridftp.uc.teragrid.org" 
storage="/home/wilde/swiftdata/UC/storage" major="2" minor="2" />
     <jobmanager universe="vanilla" 
url="tg-grid.uc.teragrid.org/jobmanager-pbs" major="2" minor="2"/>
     <workdirectory>/home/wilde/swiftdata/UC/work</workdirectory>
</pool>


The missing part is the "/jobmanager-pbs" in the url= tag of the 
<jobmanager> element.

- mikew


On 2/12/08 5:17 PM, Mihael Hategan wrote:
> Are you sure you were using PBS with pre-ws GRAM and not fork?
> 
> On Tue, 2008-02-12 at 15:14 -0800, Mike Kubal wrote:
>> With pre-WS-GRAM, it doesn't seem to matter which
>> account/project id I use where. I can have the
>> TG-MCB010025N specified in the sites-files and
>> TG-MCA01S018 specified in ~kubal/.tg-default_project
>> on the uc teragrid, or vice versa and it still works,
>> or having them match in both places.
>>
>> With WS-GRAM, I have to use TG-MCB010025N, the local
>> uc-teragrid project id, in both places. Using
>> TG-MCA01S018, the teragrid wide charge number/account
>> number, causes the qsub failure error.
>>
>>
>>
>>
>> --- Michael Wilde <wilde at mcs.anl.gov> wrote:
>>
>>> Mike, did you do a recent test with pre-WS-GRAM with
>>> the 
>>> .tg_default_project file set *incorrectly*?
>>>
>>> I think the puzzle was why this would cause WS-GRAM
>>> to fail but not 
>>> pre-WS-GRAM, as it would seem they would both get
>>> the TG account to use 
>>> in the same manner.
>>>
>>> - mikew
>>>
>>> On 2/12/08 4:43 PM, Mike Kubal wrote:
>>>> Just to be sure I tested with pre-WS and it worked
>>>> also. 
>>>>  
>>>> --- Mihael Hategan <hategan at mcs.anl.gov> wrote:
>>>>
>>>>> Would it be worth trying to find out why it
>>> worked
>>>>> with pre-WS GRAM?
>>>>>
>>>>> Mihael
>>>>>
>>>>> On Tue, 2008-02-12 at 14:20 -0800, Mike Kubal
>>> wrote:
>>>>>> Thanks Joe. This solved the account id problem.
>>>>>>
>>>>>> --- joseph insley <insley at mcs.anl.gov> wrote:
>>>>>>
>>>>>>> Mike K,
>>>>>>>
>>>>>>> looks like you have the wrong value in your
>>>>>>> .tg_default_project file:
>>>>>>>
>>>>>>> insley at tg-viz-login1:~> more
>>>>>>> ~kubal/.tg_default_project
>>>>>>> TG-MCA01S018
>>>>>>>
>>>>>>> you should be using: TG-MCB010025N
>>>>>>>
>>>>>>> insley at tg-viz-login1:~> tgusage -i -u kubal
>>>>>>>
>>>>>>> [snip]
>>>>>>>
>>>>>>> Account: TG-MCA01S018
>>>>>>> Title: Computational Studies of Complex
>>>>> Processes in
>>>>>>> Biological  
>>>>>>> Macromolecular Systems
>>>>>>> Resource: teragrid
>>>>>>>
>>>>>>> ****
>>>>>>> Local project name on dtf.anl.teragrid is
>>>>>>> TG-MCB010025N
>>>>>>> ****
>>>>>>>
>>>>>>> Allocation Period: 2007-08-03 to 2008-03-31
>>>>>>>
>>>>>>> Name (Last First) or Account       Total     
>>>>>>> Remaining        Usage
>>>>>>> ----------------------------     ---------- 
>>>>>>> ------------   ----------
>>>>>>>     Kubal  Michael                101880 SU    
>>>>>>> 99358 SU       296 SU
>>>>>>>
>> ----------------------------------------------------------------------
>>>>>>>     TG-MCA01S018                  101880 SU    
>>>>>>> 99358 SU      2522 SU
>>>>>>>
>>>>>>>
>>>>>>>
>>>>>>> On Feb 12, 2008, at 2:37 PM, Mihael Hategan
>>>>> wrote:
>>>>>>>> You should probably remove the line
>>>>> completely.
>>>>>>>> Did you chose a default project on the login
>>>>> node
>>>>>>> with tgprojects?
>>>>>>>> On Tue, 2008-02-12 at 12:34 -0800, Mike Kubal
>>>>>>> wrote:
>>>>>>>>> I tried running with the account id removed
>>>>> from
>>>>>>> the
>>>>>>>>> sites.file as in the following line:
>>>>>>>>>
>>>>>>>>> <profile namespace="globus" key=""></profile>
>>>>>>>>>
>>>>>>>>> but received the same error.
>>>>>>>>>
>>>>>>>>>
>>>>>>>>>
>>>>>>>>> --- Mihael Hategan <hategan at mcs.anl.gov>
>>>>> wrote:
>>>>>>>>>> Is this the same for pre-WS GRAM?
>>>>>>>>>>
>>>>>>>>>> On Tue, 2008-02-12 at 14:20 -0600, Stuart
>>>>> Martin
>>>>>>>>>> wrote:
>>>>>>>>>>> that's right, qsub is used for PBS (and
>>>>> some
>>>>>>>>>> others too)
>>>>>>>>>>> bsub is LSF
>>>>>>>>>>> condor_q for condor
>>>>>>>>>>> ...
>>>>>>>>>>>
>>>>>>>>>>> -Stu
>>>>>>>>>>>
>>>>>>>>>>> On Feb 12, 2008, at Feb 12, 2:15 PM, Mihael
>>>>>>>>>> Hategan wrote:
>>>>>>>>>>>> On Tue, 2008-02-12 at 12:09 -0800, Mike
>>>>> Kubal
>>>>>>>>>> wrote:
>>>>>>>>>>>>> I'll give it a try.
>>>>>>>>>>>>>
>>>>>>>>>>>>> When using GRAM4, is qsub the method used
>>>>> to
>>>>>>>>>>>>> ultimately put the job in the queue?
>>>>>>>>>>>> Looks like it. I also believe it's the
>>>>> case
>>>>>>> with
>>>>>>>>>> pre-ws gram. Stu
>>>>>>>>>>>> may be
>>>>>>>>>>>> able to clarify.
>>>>>>>>>>>>
>>>>>>>>>>>>> MikeK
>>>>>>>>>>>>> --- Mihael Hategan <hategan at mcs.anl.gov>
>>>>>>> wrote:
>>>>>>>>>>>>>> While this doesn't solve the underlying
>>>>>>>>>> problem, it
>>>>>>>>>>>>>> may help you get
>>>>>>>>>>>>>> this to work: log into tg-login1.uc...,
>>>>> set
>>>>>>>>>> this
>>>>>>>>>>>>>> project as default,
>>>>>>>>>>>>>> then remove the project spec from the
>>>>> sites
>>>>>>>>>> file and
>>>>>>>>>>>>>> try again.
>>>>>>>>>>>>>>
>>>>>>>>>>>>>> Mihael
>>>>>>>>>>>>>>
>>>>>>>>>>>>>> On Tue, 2008-02-12 at 11:36 -0800, Mike
>>>>>>> Kubal
>>>>>>>>>> wrote:
>>>>>>>>>>>>>>> Yes, I believe you are right. The
>>>>> kickstart
>>>>>>>>>>>>>> message
>>>>>>>>>>>>>>> may be only a warning. After digging a
>>>>>>> little
>>>>>>>>>>>>>> deeper
>>>>>>>>>>>>>>> it appears the job is failing due to a
>>>>>>>>>>>>>> project/account
>>>>>>>>>>>>>>> id problem. I get the following error:
>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>> Caused by:
>>>>>>>>>>>>>>>        The executable could not be
>>>>>>> started.,
>>>>>>>>>>>>>> qsub:
>>>>>>>>>>>>>>> Invalid Account MSG=invalid account
>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>> I am specifying the same TG-account in
>>>>> my
>>>>>>>>>>>>>> site-file
>>>>>>>>>>>>>>> for the gram4 run that fails, as in the
>>>>>>>>>> site-file
>>>>>>>>>>>>>> for
>>>>>>>>>>>>>>> the pre-ws job that suceeds. This is
>>>>> the
>>>>>>> same
>>>>>>>>>>>>>> project,
>>>>>>>>>>>>>>> TG-MCA01S018, that is set in my
>>>>>>>>>>>>>> .tg_default_project
>>>>>>>>>>>>>>> file in ~kubal/ on the UC teragrid.
>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>> --- Ben Clifford <benc at hawaga.org.uk>
>>>>>>> wrote:
>>>>>>>>>>>>>>>> yeah, run that same without kickstart.
>>>>> the
>>>>>>>>>> error
>>>>>>>>>>>>>>>> reported is that
>>>>>>>>>>>>>>>> kickstart didn't work right - but
>>>>> there's
>>>>>>>>>>>>>> perhaps
>>>>>>>>>>>>>>>> some underlying error.
>>>>>>>>>>>>>>>> -- 
>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>
>> === message truncated ===
>>
>>
>>
>>       ____________________________________________________________________________________
>> Looking for last minute shopping deals?  
>> Find them fast with Yahoo! Search.  http://tools.search.yahoo.com/newsearch/category.php?category=shopping
>>
> 
> 



More information about the Swift-devel mailing list