[Swift-devel] Notes from 0.93 meeting

Jonathan Monette jonmon at mcs.anl.gov
Fri Aug 26 18:54:29 CDT 2011


Did you set GLOBUS_HOSTNAME to communicado.ci.uchicago.edu or probably better the ip-address of communicado?
On Aug 26, 2011, at 6:52 PM, David Kelly wrote:

> I tried setting GLOBUS_HOSTNAME on communicado. The gram log file is no longer created, but I still don't see any jobs being submitted?
> 
> There is a new set of logs at www.ci.uchicago.edu/~davidk/ranger-gt2-logs2.tar.gz
> 
> David
> 
> ----- Original Message -----
>> From: "Mihael Hategan" <hategan at mcs.anl.gov>
>> To: "David Kelly" <davidk at ci.uchicago.edu>
>> Cc: "swift-devel Devel" <swift-devel at ci.uchicago.edu>, "Jonathan Monette" <jonmon at mcs.anl.gov>
>> Sent: Friday, August 26, 2011 1:42:13 PM
>> Subject: Re: [Swift-devel] Notes from 0.93 meeting
>> "The job manager failed to open stderr" tends to happen when you have
>> GLOBUS_HOSTNAME set incorrectly.
>> 
>> On Fri, 2011-08-26 at 13:38 -0500, David Kelly wrote:
>>> When I am trying to run the script now, Swift does not seem to be
>>> submitting the jobs correctly. Nothing it showing up in qstat.
>>> 
>>> I noticed that a gram log gets created in my home directory that
>>> says:
>>> ts=2011-08-26T17:30:03.910618Z id=27215 event=gram.job.end
>>> level=ERROR gramid=/16145868447994515851/17606392074284884670/
>>> job_status=4 status=-73 reason="the job manager failed to open
>>> stdout"
>>> 
>>> I'm guessing this is the cause of the problem. Bugs #153 and #215
>>> were related to similar problems with stdout and gt2/sge.
>>> 
>>> The full logs are at
>>> http://www.ci.uchicago.edu/~davidk/ranger-gt2-logs.tar.gz
>>> 
>>> Thanks,
>>> David
>>> 
>>> 
>>> ----- Original Message -----
>>>> From: "Mihael Hategan" <hategan at mcs.anl.gov>
>>>> To: "Jonathan Monette" <jonmon at mcs.anl.gov>
>>>> Cc: "swift-devel Devel" <swift-devel at ci.uchicago.edu>
>>>> Sent: Thursday, August 25, 2011 5:31:34 PM
>>>> Subject: Re: [Swift-devel] Notes from 0.93 meeting
>>>> On Thu, 2011-08-25 at 17:18 -0500, Jonathan Monette wrote:
>>>>> I can send mail to ci support and cc mike to it and ask what
>>>>> they
>>>>> can
>>>>> do.
>>>>> 
>>>>> Mihael, is there anyway for Swift to give a little more feedback
>>>>> besides unknown CA or is that a jglobus problem?
>>>> 
>>>> It's a jglobus problem.
>>>> 
>>>> That in itself may not be a big issue, but jglobus is now being
>>>> heavily
>>>> re-organized by the globus team, so I'm not sure what the best
>>>> long-term
>>>> strategy is here.
>>>>> 
>>>>> ----- Reply message -----
>>>>> From: "Sarah Kenny" <skenny at uchicago.edu>
>>>>> Date: Thu, Aug 25, 2011 5:11 pm
>>>>> Subject: [Swift-devel] Notes from 0.93 meeting
>>>>> To: "Jonathan Monette" <jonmon at mcs.anl.gov>
>>>>> Cc: "Mihael Hategan" <hategan at mcs.anl.gov>, "swift-devel Devel"
>>>>> <swift-devel at ci.uchicago.edu>
>>>>> 
>>>>> 
>>>>> 
>>>>> if i had a nickel for every time i dealt with this i'd be rich!
>>>>> :)
>>>>> actually, now that i'm looking at our uci machines i actually
>>>>> have
>>>>> them updating hourly...so, maybe you want to ask the admins to
>>>>> do
>>>>> that
>>>>> to avoid a full day of confusion whenever they expire :P
>>>>> 
>>>>> *usually* i can't gsissh either if the certs have expired but,
>>>>> yeah,
>>>>> they must be using different CA's now for that on ranger as
>>>>> mihael
>>>>> suggests...
>>>>> 
>>>>> On Thu, Aug 25, 2011 at 2:46 PM, Jonathan Monette
>>>>> <jonmon at mcs.anl.gov>
>>>>> wrote:
>>>>>        True. I did not think that each mechanism would use
>>>>>        different
>>>>>        CAs. We might want to ask ci support to update the grid
>>>>>        certs
>>>>>        more frequently then to avoid this situation.
>>>>> 
>>>>> 
>>>>>        On Aug 25, 2011, at 4:42 PM, Mihael Hategan wrote:
>>>>> 
>>>>>> On Thu, 2011-08-25 at 16:40 -0500, Jonathan Monette
>>>>>> wrote:
>>>>>>> That is weird. If you were able to gsissh to ranger I
>>>>>        would assume
>>>>>>> that you are able to globus-url-copy to ranger.
>>>>>> 
>>>>>> Not if the two use different CAs. Or if a password was
>>>>>> typed
>>>>>        at the ssh
>>>>>> login.
>>>>>> 
>>>>>>> Anyways, what Sarah said should work. I would assume
>>>>>>> that
>>>>>        ci would
>>>>>>> update more frequently to avoid this problem.
>>>>>>> On Aug 25, 2011, at 4:38 PM, Sarah Kenny wrote:
>>>>>>> 
>>>>>>>> communicado's certs
>>>>>>>> (/etc/grid-security/certificates)
>>>>>>>> are
>>>>>>>> out-of-date...if you copy
>>>>>        ranger's /etc/grid-security/certificates
>>>>>>>> directory to communicado and point yr X509_CERT_DIR
>>>>>>>> to
>>>>>>>> it
>>>>>        you can
>>>>>>>> get a job thru (a simple globus-job-run with my
>>>>>>>> vaild
>>>>>>>> cert
>>>>>        fails
>>>>>>>> from communicado at the moment if i don't do this).
>>>>>>>> 
>>>>>>>> i set our machines at uci to update daily...i think
>>>>>>>> it's
>>>>>        less
>>>>>>>> frequently at ci...
>>>>>>>> 
>>>>>>>> On Thu, Aug 25, 2011 at 2:17 PM, Mihael Hategan
>>>>>>>> <hategan at mcs.anl.gov> wrote:
>>>>>>>>       Can you try a globus-url-copy to
>>>>>>>>       gridftp.ranger?
>>>>>>>> 
>>>>>>>>       gridftp.ranger seems to have the NCSA myproxy
>>>>>>>>       CA.
>>>>>        You say
>>>>>>>>       you have the
>>>>>>>>       proper certificates dir in your
>>>>>>>>       X509_CERT_DIR,
>>>>>>>>       and
>>>>>        that
>>>>>>>>       directory
>>>>>>>>       contains the TACC root cert. So it should
>>>>>>>>       work.
>>>>>>>>       And
>>>>>        so
>>>>>>>>       should swift.
>>>>>>>> 
>>>>>>>>       Though I think that jglobus should be more
>>>>>>>>       clear
>>>>>        about
>>>>>>>>       "Unknown ca"
>>>>>>>>       errors. At least the name of the unknown CA
>>>>>>>>       should
>>>>>        be part
>>>>>>>>       of the error
>>>>>>>>       message.
>>>>>>>> 
>>>>>>>> 
>>>>>>>>       On Thu, 2011-08-25 at 15:55 -0500, David
>>>>>>>>       Kelly
>>>>>        wrote:
>>>>>>>>> $ grid-proxy-info -all
>>>>>>>>> subject : /C=US/O=National Center for
>>>>>>>>> Supercomputing
>>>>>>>>       Applications/CN=David Kelly
>>>>>>>>> issuer : /C=US/O=National Center for Supercomputing
>>>>>>>>       Applications/OU=Certificate
>>>>>>>>       Authorities/CN=MyProxy
>>>>>>>>> identity : /C=US/O=National Center for
>>>>>>>>> Supercomputing
>>>>>>>>       Applications/CN=David Kelly
>>>>>>>>> type : end entity credential
>>>>>>>>> strength : 1024 bits
>>>>>>>>> path : /tmp/x509up_u1878
>>>>>>>>> timeleft : 9:56:53
>>>>>>>>> 
>>>>>>>>> 
>>>>>>>>> ----- Original Message -----
>>>>>>>>>> From: "Mihael Hategan" <hategan at mcs.anl.gov>
>>>>>>>>>> To: "David Kelly" <davidk at ci.uchicago.edu>
>>>>>>>>>> Cc: "Ketan Maheshwari"
>>>>>>>>>> <ketancmaheshwari at gmail.com>,
>>>>>>>>       "swift-devel Devel"
>>>>>>>>       <swift-devel at ci.uchicago.edu>
>>>>>>>>>> Sent: Thursday, August 25, 2011 3:42:57 PM
>>>>>>>>>> Subject: Re: [Swift-devel] Notes from 0.93 meeting
>>>>>>>>>> Odd. Can you paste the output of 'grid-proxy-info
>>>>>>>>>> -all'?
>>>>>>>>>> 
>>>>>>>>>> On Thu, 2011-08-25 at 15:18 -0500, David Kelly
>>>>>>>>>> wrote:
>>>>>>>>>>> Sure, here is the full log:
>>>>>>>>>>> 
>>>>>>>>>>> 
>>>>>>>> 
>>>>>         http://www.ci.uchicago.edu/~davidk/001-catsn-ranger-20110825-1515-5tydro91.log
>>>>>>>>>>> 
>>>>>>>>>>> ----- Original Message -----
>>>>>>>>>>>> From: "Mihael Hategan" <hategan at mcs.anl.gov>
>>>>>>>>>>>> To: "David Kelly" <davidk at ci.uchicago.edu>
>>>>>>>>>>>> Cc: "Ketan Maheshwari"
>>>>>>>>>>>> <ketancmaheshwari at gmail.com>,
>>>>>>>>       "swift-devel
>>>>>>>>>>>> Devel" <swift-devel at ci.uchicago.edu>
>>>>>>>>>>>> Sent: Thursday, August 25, 2011 2:43:31 PM
>>>>>>>>>>>> Subject: Re: [Swift-devel] Notes from 0.93
>>>>>>>>>>>> meeting
>>>>>>>>>>>> It's possible that the CA dir on Ranger is not
>>>>>>>>       properly set up.
>>>>>>>>>>>> Can
>>>>>>>>>>>> you
>>>>>>>>>>>> post the full log?
>>>>>>>>>>>> 
>>>>>>>>>>>> On Thu, 2011-08-25 at 13:56 -0500, David Kelly
>>>>>>>>       wrote:
>>>>>>>>>>>>> Those environment variables were not set up. I
>>>>>>>>       have them defined
>>>>>>>>>>>>> now, but I'm still getting the same error.
>>>>>>>>>>>>> 
>>>>>>>>>>>>> [davidk at communicado ranger]$ env |grep 509
>>>>>>>>>>>>> X509_CERT_DIR=/opt/osg-1.2.16/globus/TRUSTED_CA
>>>>>>>>>>>>> X509_CADIR=/opt/osg-1.2.16/globus/TRUSTED_CA
>>>>>>>>>>>>> 
>>>>>>>>>>>>> [davidk at communicado ranger]$ swift -sites.file
>>>>>>>>       sites.xml
>>>>>>>>>>>>> -tc.file
>>>>>>>>>>>>> tc.data 001-catsn-ranger.swift
>>>>>>>>>>>>> Swift svn swift-r4987 (swift modified locally)
>>>>>>>>       cog-r3229
>>>>>>>>>>>>> 
>>>>>>>>>>>>> RunID: 20110825-1352-f1v940b4
>>>>>>>>>>>>> Progress: time: Thu, 25 Aug 2011 13:52:59 -0500
>>>>>>>>>>>>> Progress: time: Thu, 25 Aug 2011 13:53:00 -0500
>>>>>>>>       Selecting site:7
>>>>>>>>>>>>> Initializing site shared directory:3
>>>>>>>>>>>>> Execution failed:
>>>>>>>>>>>>>     Authentication failed [Caused by: Failure
>>>>>>>>       unspecified at
>>>>>>>>>>>>>     GSS-API
>>>>>>>>>>>>>     level [Caused by: Unknown CA]]
>>>>>>>>>>>>> 
>>>>>>>>>>>>> 
>>>>>>>>>>>>> ----- Original Message -----
>>>>>>>>>>>>>> From: "Ketan Maheshwari"
>>>>>>>>       <ketancmaheshwari at gmail.com>
>>>>>>>>>>>>>> To: "David Kelly" <davidk at ci.uchicago.edu>
>>>>>>>>>>>>>> Cc: "Jonathan Monette" <jonmon at mcs.anl.gov>,
>>>>>>>>       "swift-devel
>>>>>>>>>>>>>> Devel"
>>>>>>>>>>>>>> <swift-devel at ci.uchicago.edu>
>>>>>>>>>>>>>> Sent: Thursday, August 25, 2011 1:32:50 PM
>>>>>>>>>>>>>> Subject: Re: [Swift-devel] Notes from 0.93
>>>>>>>>       meeting
>>>>>>>>>>>>>> Hi,
>>>>>>>>>>>>>> 
>>>>>>>>>>>>>> 
>>>>>>>>>>>>>> Are your CADIR and CACERT env vars set up?
>>>>>>>>>>>>>> 
>>>>>>>>>>>>>> 
>>>>>>>>>>>>>> 
>>>>>>>>>>>>>> [communicado:swiftgrid]$ echo $X509_CADIR
>>>>>>>>>>>>>> /opt/osg-1.2.16/globus/TRUSTED_CA
>>>>>>>>>>>>>> 
>>>>>>>>>>>>>> 
>>>>>>>>>>>>>> 
>>>>>>>>>>>>>> [communicado:swiftgrid]$ echo $X509_CERT_DIR
>>>>>>>>>>>>>> /opt/osg-1.2.16/globus/TRUSTED_CA
>>>>>>>>>>>>>> 
>>>>>>>>>>>>>> 
>>>>>>>>>>>>>> 
>>>>>>>>>>>>>> On Thu, Aug 25, 2011 at 1:29 PM, David Kelly <
>>>>>>>>>>>>>> davidk at ci.uchicago.edu
>>>>>>>>>>>>>>> wrote:
>>>>>>>>>>>>>> 
>>>>>>>>>>>>>> 
>>>>>>>>>>>>>> Thanks Jon,
>>>>>>>>>>>>>> 
>>>>>>>>>>>>>> Here is what happens when I try this from
>>>>>>>>       communicado:
>>>>>>>>>>>>>> 
>>>>>>>>>>>>>> [davidk at communicado ~]$ myproxy-logon -l
>>>>>>>>>>>>>> dkelly
>>>>>>>>       -s
>>>>>>>>>>>>>> myproxy.teragrid.org
>>>>>>>>>>>>>> Enter MyProxy pass phrase:
>>>>>>>>>>>>>> A credential has been received for user dkelly
>>>>>>>>       in
>>>>>>>>>>>>>> /tmp/x509up_u1878.
>>>>>>>>>>>>>> 
>>>>>>>>>>>>>> [davidk at communicado ranger]$ swift -sites.file
>>>>>>>>       sites.xml
>>>>>>>>>>>>>> -tc.file
>>>>>>>>>>>>>> tc.data 001-catsn-ranger.swift
>>>>>>>>>>>>>> Swift svn swift-r4987 (swift modified locally)
>>>>>>>>       cog-r3229
>>>>>>>>>>>>>> 
>>>>>>>>>>>>>> RunID: 20110825-1326-o3e38fe0
>>>>>>>>>>>>>> Progress: time: Thu, 25 Aug 2011 13:26:43
>>>>>>>>>>>>>> -0500
>>>>>>>>>>>>>> Progress: time: Thu, 25 Aug 2011 13:26:44
>>>>>>>>>>>>>> -0500
>>>>>>>>       Selecting
>>>>>>>>>>>>>> site:8
>>>>>>>>>>>>>> Initializing site shared directory:2
>>>>>>>>>>>>>> Execution failed:
>>>>>>>>>>>>>> Authentication failed [Caused by: Failure
>>>>>>>>       unspecified at
>>>>>>>>>>>>>> GSS-API
>>>>>>>>>>>>>> level
>>>>>>>>>>>>>> [Caused by: Unknown CA]]
>>>>>>>>>>>>>> 
>>>>>>>>>>>>>> Any ideas?
>>>>>>>>>>>>>> 
>>>>>>>>>>>>>> Thanks,
>>>>>>>>>>>>>> David
>>>>>>>>>>>>>> 
>>>>>>>>>>>>>> 
>>>>>>>>>>>>>> 
>>>>>>>>>>>>>> _______________________________________________
>>>>>>>>>>>>>> Swift-devel mailing list
>>>>>>>>>>>>>> Swift-devel at ci.uchicago.edu
>>>>>>>>>>>>>> 
>>>>>>>> 
>>>>>         https://lists.ci.uchicago.edu/cgi-bin/mailman/listinfo/swift-devel
>>>>>>>>>>>>>> 
>>>>>>>>>>>>>> 
>>>>>>>>>>>>>> 
>>>>>>>>>>>>>> 
>>>>>>>>>>>>>> --
>>>>>>>>>>>>>> Ketan
>>>>>>>>>>>>> _______________________________________________
>>>>>>>>>>>>> Swift-devel mailing list
>>>>>>>>>>>>> Swift-devel at ci.uchicago.edu
>>>>>>>>>>>>> 
>>>>>>>> 
>>>>>         https://lists.ci.uchicago.edu/cgi-bin/mailman/listinfo/swift-devel
>>>>>>>> 
>>>>>>>> 
>>>>>>>>       _______________________________________________
>>>>>>>>       Swift-devel mailing list
>>>>>>>>       Swift-devel at ci.uchicago.edu
>>>>>>>> 
>>>>>         https://lists.ci.uchicago.edu/cgi-bin/mailman/listinfo/swift-devel
>>>>>>>> 
>>>>>>>> 
>>>>>>>> 
>>>>>>>> 
>>>>>>>> --
>>>>>>>> Sarah Kenny
>>>>>>>> Programmer ~ Brain Circuits Laboratory ~ Rm 2224 Bio
>>>>>>>> Sci
>>>>>        III
>>>>>>>> University of California Irvine, Dept. of Neurology
>>>>>>>> ~
>>>>>        773-818-8300
>>>>>>>> 
>>>>>>>> _______________________________________________
>>>>>>>> Swift-devel mailing list
>>>>>>>> Swift-devel at ci.uchicago.edu
>>>>>>>> 
>>>>>        https://lists.ci.uchicago.edu/cgi-bin/mailman/listinfo/swift-devel
>>>>>>> 
>>>>>> 
>>>>>> 
>>>>> 
>>>>> 
>>>>> 
>>>>> 
>>>>> 
>>>>> --
>>>>> Sarah Kenny
>>>>> Programmer ~ Brain Circuits Laboratory ~ Rm 2224 Bio Sci III
>>>>> University of California Irvine, Dept. of Neurology ~
>>>>> 773-818-8300
>>>>> 
>>>> 
>>>> 
>>>> _______________________________________________
>>>> Swift-devel mailing list
>>>> Swift-devel at ci.uchicago.edu
>>>> https://lists.ci.uchicago.edu/cgi-bin/mailman/listinfo/swift-devel




More information about the Swift-devel mailing list