[Swift-devel] Notes from 0.93 meeting

Mihael Hategan hategan at mcs.anl.gov
Fri Aug 26 20:23:14 CDT 2011


Can you try GT2:SGE instead of GT2:GT2:SGE?

On Fri, 2011-08-26 at 19:13 -0500, David Kelly wrote:
> I set it to communicado.ci.uchicago.edu. I'll try again with IP address.
> 
> ----- Original Message -----
> > From: "Jonathan Monette" <jonmon at mcs.anl.gov>
> > To: "David Kelly" <davidk at ci.uchicago.edu>
> > Cc: "Mihael Hategan" <hategan at mcs.anl.gov>, "swift-devel Devel" <swift-devel at ci.uchicago.edu>
> > Sent: Friday, August 26, 2011 6:54:29 PM
> > Subject: Re: [Swift-devel] Notes from 0.93 meeting
> > Did you set GLOBUS_HOSTNAME to communicado.ci.uchicago.edu or probably
> > better the ip-address of communicado?
> > On Aug 26, 2011, at 6:52 PM, David Kelly wrote:
> > 
> > > I tried setting GLOBUS_HOSTNAME on communicado. The gram log file is
> > > no longer created, but I still don't see any jobs being submitted?
> > >
> > > There is a new set of logs at
> > > www.ci.uchicago.edu/~davidk/ranger-gt2-logs2.tar.gz
> > >
> > > David
> > >
> > > ----- Original Message -----
> > >> From: "Mihael Hategan" <hategan at mcs.anl.gov>
> > >> To: "David Kelly" <davidk at ci.uchicago.edu>
> > >> Cc: "swift-devel Devel" <swift-devel at ci.uchicago.edu>, "Jonathan
> > >> Monette" <jonmon at mcs.anl.gov>
> > >> Sent: Friday, August 26, 2011 1:42:13 PM
> > >> Subject: Re: [Swift-devel] Notes from 0.93 meeting
> > >> "The job manager failed to open stderr" tends to happen when you
> > >> have
> > >> GLOBUS_HOSTNAME set incorrectly.
> > >>
> > >> On Fri, 2011-08-26 at 13:38 -0500, David Kelly wrote:
> > >>> When I am trying to run the script now, Swift does not seem to be
> > >>> submitting the jobs correctly. Nothing it showing up in qstat.
> > >>>
> > >>> I noticed that a gram log gets created in my home directory that
> > >>> says:
> > >>> ts=2011-08-26T17:30:03.910618Z id=27215 event=gram.job.end
> > >>> level=ERROR gramid=/16145868447994515851/17606392074284884670/
> > >>> job_status=4 status=-73 reason="the job manager failed to open
> > >>> stdout"
> > >>>
> > >>> I'm guessing this is the cause of the problem. Bugs #153 and #215
> > >>> were related to similar problems with stdout and gt2/sge.
> > >>>
> > >>> The full logs are at
> > >>> http://www.ci.uchicago.edu/~davidk/ranger-gt2-logs.tar.gz
> > >>>
> > >>> Thanks,
> > >>> David
> > >>>
> > >>>
> > >>> ----- Original Message -----
> > >>>> From: "Mihael Hategan" <hategan at mcs.anl.gov>
> > >>>> To: "Jonathan Monette" <jonmon at mcs.anl.gov>
> > >>>> Cc: "swift-devel Devel" <swift-devel at ci.uchicago.edu>
> > >>>> Sent: Thursday, August 25, 2011 5:31:34 PM
> > >>>> Subject: Re: [Swift-devel] Notes from 0.93 meeting
> > >>>> On Thu, 2011-08-25 at 17:18 -0500, Jonathan Monette wrote:
> > >>>>> I can send mail to ci support and cc mike to it and ask what
> > >>>>> they
> > >>>>> can
> > >>>>> do.
> > >>>>>
> > >>>>> Mihael, is there anyway for Swift to give a little more feedback
> > >>>>> besides unknown CA or is that a jglobus problem?
> > >>>>
> > >>>> It's a jglobus problem.
> > >>>>
> > >>>> That in itself may not be a big issue, but jglobus is now being
> > >>>> heavily
> > >>>> re-organized by the globus team, so I'm not sure what the best
> > >>>> long-term
> > >>>> strategy is here.
> > >>>>>
> > >>>>> ----- Reply message -----
> > >>>>> From: "Sarah Kenny" <skenny at uchicago.edu>
> > >>>>> Date: Thu, Aug 25, 2011 5:11 pm
> > >>>>> Subject: [Swift-devel] Notes from 0.93 meeting
> > >>>>> To: "Jonathan Monette" <jonmon at mcs.anl.gov>
> > >>>>> Cc: "Mihael Hategan" <hategan at mcs.anl.gov>, "swift-devel Devel"
> > >>>>> <swift-devel at ci.uchicago.edu>
> > >>>>>
> > >>>>>
> > >>>>>
> > >>>>> if i had a nickel for every time i dealt with this i'd be rich!
> > >>>>> :)
> > >>>>> actually, now that i'm looking at our uci machines i actually
> > >>>>> have
> > >>>>> them updating hourly...so, maybe you want to ask the admins to
> > >>>>> do
> > >>>>> that
> > >>>>> to avoid a full day of confusion whenever they expire :P
> > >>>>>
> > >>>>> *usually* i can't gsissh either if the certs have expired but,
> > >>>>> yeah,
> > >>>>> they must be using different CA's now for that on ranger as
> > >>>>> mihael
> > >>>>> suggests...
> > >>>>>
> > >>>>> On Thu, Aug 25, 2011 at 2:46 PM, Jonathan Monette
> > >>>>> <jonmon at mcs.anl.gov>
> > >>>>> wrote:
> > >>>>>        True. I did not think that each mechanism would use
> > >>>>>        different
> > >>>>>        CAs. We might want to ask ci support to update the grid
> > >>>>>        certs
> > >>>>>        more frequently then to avoid this situation.
> > >>>>>
> > >>>>>
> > >>>>>        On Aug 25, 2011, at 4:42 PM, Mihael Hategan wrote:
> > >>>>>
> > >>>>>> On Thu, 2011-08-25 at 16:40 -0500, Jonathan Monette
> > >>>>>> wrote:
> > >>>>>>> That is weird. If you were able to gsissh to ranger I
> > >>>>>        would assume
> > >>>>>>> that you are able to globus-url-copy to ranger.
> > >>>>>>
> > >>>>>> Not if the two use different CAs. Or if a password was
> > >>>>>> typed
> > >>>>>        at the ssh
> > >>>>>> login.
> > >>>>>>
> > >>>>>>> Anyways, what Sarah said should work. I would assume
> > >>>>>>> that
> > >>>>>        ci would
> > >>>>>>> update more frequently to avoid this problem.
> > >>>>>>> On Aug 25, 2011, at 4:38 PM, Sarah Kenny wrote:
> > >>>>>>>
> > >>>>>>>> communicado's certs
> > >>>>>>>> (/etc/grid-security/certificates)
> > >>>>>>>> are
> > >>>>>>>> out-of-date...if you copy
> > >>>>>        ranger's /etc/grid-security/certificates
> > >>>>>>>> directory to communicado and point yr X509_CERT_DIR
> > >>>>>>>> to
> > >>>>>>>> it
> > >>>>>        you can
> > >>>>>>>> get a job thru (a simple globus-job-run with my
> > >>>>>>>> vaild
> > >>>>>>>> cert
> > >>>>>        fails
> > >>>>>>>> from communicado at the moment if i don't do this).
> > >>>>>>>>
> > >>>>>>>> i set our machines at uci to update daily...i think
> > >>>>>>>> it's
> > >>>>>        less
> > >>>>>>>> frequently at ci...
> > >>>>>>>>
> > >>>>>>>> On Thu, Aug 25, 2011 at 2:17 PM, Mihael Hategan
> > >>>>>>>> <hategan at mcs.anl.gov> wrote:
> > >>>>>>>>       Can you try a globus-url-copy to
> > >>>>>>>>       gridftp.ranger?
> > >>>>>>>>
> > >>>>>>>>       gridftp.ranger seems to have the NCSA myproxy
> > >>>>>>>>       CA.
> > >>>>>        You say
> > >>>>>>>>       you have the
> > >>>>>>>>       proper certificates dir in your
> > >>>>>>>>       X509_CERT_DIR,
> > >>>>>>>>       and
> > >>>>>        that
> > >>>>>>>>       directory
> > >>>>>>>>       contains the TACC root cert. So it should
> > >>>>>>>>       work.
> > >>>>>>>>       And
> > >>>>>        so
> > >>>>>>>>       should swift.
> > >>>>>>>>
> > >>>>>>>>       Though I think that jglobus should be more
> > >>>>>>>>       clear
> > >>>>>        about
> > >>>>>>>>       "Unknown ca"
> > >>>>>>>>       errors. At least the name of the unknown CA
> > >>>>>>>>       should
> > >>>>>        be part
> > >>>>>>>>       of the error
> > >>>>>>>>       message.
> > >>>>>>>>
> > >>>>>>>>
> > >>>>>>>>       On Thu, 2011-08-25 at 15:55 -0500, David
> > >>>>>>>>       Kelly
> > >>>>>        wrote:
> > >>>>>>>>> $ grid-proxy-info -all
> > >>>>>>>>> subject : /C=US/O=National Center for
> > >>>>>>>>> Supercomputing
> > >>>>>>>>       Applications/CN=David Kelly
> > >>>>>>>>> issuer : /C=US/O=National Center for Supercomputing
> > >>>>>>>>       Applications/OU=Certificate
> > >>>>>>>>       Authorities/CN=MyProxy
> > >>>>>>>>> identity : /C=US/O=National Center for
> > >>>>>>>>> Supercomputing
> > >>>>>>>>       Applications/CN=David Kelly
> > >>>>>>>>> type : end entity credential
> > >>>>>>>>> strength : 1024 bits
> > >>>>>>>>> path : /tmp/x509up_u1878
> > >>>>>>>>> timeleft : 9:56:53
> > >>>>>>>>>
> > >>>>>>>>>
> > >>>>>>>>> ----- Original Message -----
> > >>>>>>>>>> From: "Mihael Hategan" <hategan at mcs.anl.gov>
> > >>>>>>>>>> To: "David Kelly" <davidk at ci.uchicago.edu>
> > >>>>>>>>>> Cc: "Ketan Maheshwari"
> > >>>>>>>>>> <ketancmaheshwari at gmail.com>,
> > >>>>>>>>       "swift-devel Devel"
> > >>>>>>>>       <swift-devel at ci.uchicago.edu>
> > >>>>>>>>>> Sent: Thursday, August 25, 2011 3:42:57 PM
> > >>>>>>>>>> Subject: Re: [Swift-devel] Notes from 0.93 meeting
> > >>>>>>>>>> Odd. Can you paste the output of 'grid-proxy-info
> > >>>>>>>>>> -all'?
> > >>>>>>>>>>
> > >>>>>>>>>> On Thu, 2011-08-25 at 15:18 -0500, David Kelly
> > >>>>>>>>>> wrote:
> > >>>>>>>>>>> Sure, here is the full log:
> > >>>>>>>>>>>
> > >>>>>>>>>>>
> > >>>>>>>>
> > >>>>>         http://www.ci.uchicago.edu/~davidk/001-catsn-ranger-20110825-1515-5tydro91.log
> > >>>>>>>>>>>
> > >>>>>>>>>>> ----- Original Message -----
> > >>>>>>>>>>>> From: "Mihael Hategan" <hategan at mcs.anl.gov>
> > >>>>>>>>>>>> To: "David Kelly" <davidk at ci.uchicago.edu>
> > >>>>>>>>>>>> Cc: "Ketan Maheshwari"
> > >>>>>>>>>>>> <ketancmaheshwari at gmail.com>,
> > >>>>>>>>       "swift-devel
> > >>>>>>>>>>>> Devel" <swift-devel at ci.uchicago.edu>
> > >>>>>>>>>>>> Sent: Thursday, August 25, 2011 2:43:31 PM
> > >>>>>>>>>>>> Subject: Re: [Swift-devel] Notes from 0.93
> > >>>>>>>>>>>> meeting
> > >>>>>>>>>>>> It's possible that the CA dir on Ranger is not
> > >>>>>>>>       properly set up.
> > >>>>>>>>>>>> Can
> > >>>>>>>>>>>> you
> > >>>>>>>>>>>> post the full log?
> > >>>>>>>>>>>>
> > >>>>>>>>>>>> On Thu, 2011-08-25 at 13:56 -0500, David Kelly
> > >>>>>>>>       wrote:
> > >>>>>>>>>>>>> Those environment variables were not set up. I
> > >>>>>>>>       have them defined
> > >>>>>>>>>>>>> now, but I'm still getting the same error.
> > >>>>>>>>>>>>>
> > >>>>>>>>>>>>> [davidk at communicado ranger]$ env |grep 509
> > >>>>>>>>>>>>> X509_CERT_DIR=/opt/osg-1.2.16/globus/TRUSTED_CA
> > >>>>>>>>>>>>> X509_CADIR=/opt/osg-1.2.16/globus/TRUSTED_CA
> > >>>>>>>>>>>>>
> > >>>>>>>>>>>>> [davidk at communicado ranger]$ swift -sites.file
> > >>>>>>>>       sites.xml
> > >>>>>>>>>>>>> -tc.file
> > >>>>>>>>>>>>> tc.data 001-catsn-ranger.swift
> > >>>>>>>>>>>>> Swift svn swift-r4987 (swift modified locally)
> > >>>>>>>>       cog-r3229
> > >>>>>>>>>>>>>
> > >>>>>>>>>>>>> RunID: 20110825-1352-f1v940b4
> > >>>>>>>>>>>>> Progress: time: Thu, 25 Aug 2011 13:52:59 -0500
> > >>>>>>>>>>>>> Progress: time: Thu, 25 Aug 2011 13:53:00 -0500
> > >>>>>>>>       Selecting site:7
> > >>>>>>>>>>>>> Initializing site shared directory:3
> > >>>>>>>>>>>>> Execution failed:
> > >>>>>>>>>>>>>     Authentication failed [Caused by: Failure
> > >>>>>>>>       unspecified at
> > >>>>>>>>>>>>>     GSS-API
> > >>>>>>>>>>>>>     level [Caused by: Unknown CA]]
> > >>>>>>>>>>>>>
> > >>>>>>>>>>>>>
> > >>>>>>>>>>>>> ----- Original Message -----
> > >>>>>>>>>>>>>> From: "Ketan Maheshwari"
> > >>>>>>>>       <ketancmaheshwari at gmail.com>
> > >>>>>>>>>>>>>> To: "David Kelly" <davidk at ci.uchicago.edu>
> > >>>>>>>>>>>>>> Cc: "Jonathan Monette" <jonmon at mcs.anl.gov>,
> > >>>>>>>>       "swift-devel
> > >>>>>>>>>>>>>> Devel"
> > >>>>>>>>>>>>>> <swift-devel at ci.uchicago.edu>
> > >>>>>>>>>>>>>> Sent: Thursday, August 25, 2011 1:32:50 PM
> > >>>>>>>>>>>>>> Subject: Re: [Swift-devel] Notes from 0.93
> > >>>>>>>>       meeting
> > >>>>>>>>>>>>>> Hi,
> > >>>>>>>>>>>>>>
> > >>>>>>>>>>>>>>
> > >>>>>>>>>>>>>> Are your CADIR and CACERT env vars set up?
> > >>>>>>>>>>>>>>
> > >>>>>>>>>>>>>>
> > >>>>>>>>>>>>>>
> > >>>>>>>>>>>>>> [communicado:swiftgrid]$ echo $X509_CADIR
> > >>>>>>>>>>>>>> /opt/osg-1.2.16/globus/TRUSTED_CA
> > >>>>>>>>>>>>>>
> > >>>>>>>>>>>>>>
> > >>>>>>>>>>>>>>
> > >>>>>>>>>>>>>> [communicado:swiftgrid]$ echo $X509_CERT_DIR
> > >>>>>>>>>>>>>> /opt/osg-1.2.16/globus/TRUSTED_CA
> > >>>>>>>>>>>>>>
> > >>>>>>>>>>>>>>
> > >>>>>>>>>>>>>>
> > >>>>>>>>>>>>>> On Thu, Aug 25, 2011 at 1:29 PM, David Kelly <
> > >>>>>>>>>>>>>> davidk at ci.uchicago.edu
> > >>>>>>>>>>>>>>> wrote:
> > >>>>>>>>>>>>>>
> > >>>>>>>>>>>>>>
> > >>>>>>>>>>>>>> Thanks Jon,
> > >>>>>>>>>>>>>>
> > >>>>>>>>>>>>>> Here is what happens when I try this from
> > >>>>>>>>       communicado:
> > >>>>>>>>>>>>>>
> > >>>>>>>>>>>>>> [davidk at communicado ~]$ myproxy-logon -l
> > >>>>>>>>>>>>>> dkelly
> > >>>>>>>>       -s
> > >>>>>>>>>>>>>> myproxy.teragrid.org
> > >>>>>>>>>>>>>> Enter MyProxy pass phrase:
> > >>>>>>>>>>>>>> A credential has been received for user dkelly
> > >>>>>>>>       in
> > >>>>>>>>>>>>>> /tmp/x509up_u1878.
> > >>>>>>>>>>>>>>
> > >>>>>>>>>>>>>> [davidk at communicado ranger]$ swift -sites.file
> > >>>>>>>>       sites.xml
> > >>>>>>>>>>>>>> -tc.file
> > >>>>>>>>>>>>>> tc.data 001-catsn-ranger.swift
> > >>>>>>>>>>>>>> Swift svn swift-r4987 (swift modified locally)
> > >>>>>>>>       cog-r3229
> > >>>>>>>>>>>>>>
> > >>>>>>>>>>>>>> RunID: 20110825-1326-o3e38fe0
> > >>>>>>>>>>>>>> Progress: time: Thu, 25 Aug 2011 13:26:43
> > >>>>>>>>>>>>>> -0500
> > >>>>>>>>>>>>>> Progress: time: Thu, 25 Aug 2011 13:26:44
> > >>>>>>>>>>>>>> -0500
> > >>>>>>>>       Selecting
> > >>>>>>>>>>>>>> site:8
> > >>>>>>>>>>>>>> Initializing site shared directory:2
> > >>>>>>>>>>>>>> Execution failed:
> > >>>>>>>>>>>>>> Authentication failed [Caused by: Failure
> > >>>>>>>>       unspecified at
> > >>>>>>>>>>>>>> GSS-API
> > >>>>>>>>>>>>>> level
> > >>>>>>>>>>>>>> [Caused by: Unknown CA]]
> > >>>>>>>>>>>>>>
> > >>>>>>>>>>>>>> Any ideas?
> > >>>>>>>>>>>>>>
> > >>>>>>>>>>>>>> Thanks,
> > >>>>>>>>>>>>>> David
> > >>>>>>>>>>>>>>
> > >>>>>>>>>>>>>>
> > >>>>>>>>>>>>>>
> > >>>>>>>>>>>>>> _______________________________________________
> > >>>>>>>>>>>>>> Swift-devel mailing list
> > >>>>>>>>>>>>>> Swift-devel at ci.uchicago.edu
> > >>>>>>>>>>>>>>
> > >>>>>>>>
> > >>>>>         https://lists.ci.uchicago.edu/cgi-bin/mailman/listinfo/swift-devel
> > >>>>>>>>>>>>>>
> > >>>>>>>>>>>>>>
> > >>>>>>>>>>>>>>
> > >>>>>>>>>>>>>>
> > >>>>>>>>>>>>>> --
> > >>>>>>>>>>>>>> Ketan
> > >>>>>>>>>>>>> _______________________________________________
> > >>>>>>>>>>>>> Swift-devel mailing list
> > >>>>>>>>>>>>> Swift-devel at ci.uchicago.edu
> > >>>>>>>>>>>>>
> > >>>>>>>>
> > >>>>>         https://lists.ci.uchicago.edu/cgi-bin/mailman/listinfo/swift-devel
> > >>>>>>>>
> > >>>>>>>>
> > >>>>>>>>       _______________________________________________
> > >>>>>>>>       Swift-devel mailing list
> > >>>>>>>>       Swift-devel at ci.uchicago.edu
> > >>>>>>>>
> > >>>>>         https://lists.ci.uchicago.edu/cgi-bin/mailman/listinfo/swift-devel
> > >>>>>>>>
> > >>>>>>>>
> > >>>>>>>>
> > >>>>>>>>
> > >>>>>>>> --
> > >>>>>>>> Sarah Kenny
> > >>>>>>>> Programmer ~ Brain Circuits Laboratory ~ Rm 2224 Bio
> > >>>>>>>> Sci
> > >>>>>        III
> > >>>>>>>> University of California Irvine, Dept. of Neurology
> > >>>>>>>> ~
> > >>>>>        773-818-8300
> > >>>>>>>>
> > >>>>>>>> _______________________________________________
> > >>>>>>>> Swift-devel mailing list
> > >>>>>>>> Swift-devel at ci.uchicago.edu
> > >>>>>>>>
> > >>>>>        https://lists.ci.uchicago.edu/cgi-bin/mailman/listinfo/swift-devel
> > >>>>>>>
> > >>>>>>
> > >>>>>>
> > >>>>>
> > >>>>>
> > >>>>>
> > >>>>>
> > >>>>>
> > >>>>> --
> > >>>>> Sarah Kenny
> > >>>>> Programmer ~ Brain Circuits Laboratory ~ Rm 2224 Bio Sci III
> > >>>>> University of California Irvine, Dept. of Neurology ~
> > >>>>> 773-818-8300
> > >>>>>
> > >>>>
> > >>>>
> > >>>> _______________________________________________
> > >>>> Swift-devel mailing list
> > >>>> Swift-devel at ci.uchicago.edu
> > >>>> https://lists.ci.uchicago.edu/cgi-bin/mailman/listinfo/swift-devel





More information about the Swift-devel mailing list