[Swift-devel] Notes from 0.93 meeting
David Kelly
davidk at ci.uchicago.edu
Fri Aug 26 18:52:16 CDT 2011
I tried setting GLOBUS_HOSTNAME on communicado. The gram log file is no longer created, but I still don't see any jobs being submitted?
There is a new set of logs at www.ci.uchicago.edu/~davidk/ranger-gt2-logs2.tar.gz
David
----- Original Message -----
> From: "Mihael Hategan" <hategan at mcs.anl.gov>
> To: "David Kelly" <davidk at ci.uchicago.edu>
> Cc: "swift-devel Devel" <swift-devel at ci.uchicago.edu>, "Jonathan Monette" <jonmon at mcs.anl.gov>
> Sent: Friday, August 26, 2011 1:42:13 PM
> Subject: Re: [Swift-devel] Notes from 0.93 meeting
> "The job manager failed to open stderr" tends to happen when you have
> GLOBUS_HOSTNAME set incorrectly.
>
> On Fri, 2011-08-26 at 13:38 -0500, David Kelly wrote:
> > When I am trying to run the script now, Swift does not seem to be
> > submitting the jobs correctly. Nothing it showing up in qstat.
> >
> > I noticed that a gram log gets created in my home directory that
> > says:
> > ts=2011-08-26T17:30:03.910618Z id=27215 event=gram.job.end
> > level=ERROR gramid=/16145868447994515851/17606392074284884670/
> > job_status=4 status=-73 reason="the job manager failed to open
> > stdout"
> >
> > I'm guessing this is the cause of the problem. Bugs #153 and #215
> > were related to similar problems with stdout and gt2/sge.
> >
> > The full logs are at
> > http://www.ci.uchicago.edu/~davidk/ranger-gt2-logs.tar.gz
> >
> > Thanks,
> > David
> >
> >
> > ----- Original Message -----
> > > From: "Mihael Hategan" <hategan at mcs.anl.gov>
> > > To: "Jonathan Monette" <jonmon at mcs.anl.gov>
> > > Cc: "swift-devel Devel" <swift-devel at ci.uchicago.edu>
> > > Sent: Thursday, August 25, 2011 5:31:34 PM
> > > Subject: Re: [Swift-devel] Notes from 0.93 meeting
> > > On Thu, 2011-08-25 at 17:18 -0500, Jonathan Monette wrote:
> > > > I can send mail to ci support and cc mike to it and ask what
> > > > they
> > > > can
> > > > do.
> > > >
> > > > Mihael, is there anyway for Swift to give a little more feedback
> > > > besides unknown CA or is that a jglobus problem?
> > >
> > > It's a jglobus problem.
> > >
> > > That in itself may not be a big issue, but jglobus is now being
> > > heavily
> > > re-organized by the globus team, so I'm not sure what the best
> > > long-term
> > > strategy is here.
> > > >
> > > > ----- Reply message -----
> > > > From: "Sarah Kenny" <skenny at uchicago.edu>
> > > > Date: Thu, Aug 25, 2011 5:11 pm
> > > > Subject: [Swift-devel] Notes from 0.93 meeting
> > > > To: "Jonathan Monette" <jonmon at mcs.anl.gov>
> > > > Cc: "Mihael Hategan" <hategan at mcs.anl.gov>, "swift-devel Devel"
> > > > <swift-devel at ci.uchicago.edu>
> > > >
> > > >
> > > >
> > > > if i had a nickel for every time i dealt with this i'd be rich!
> > > > :)
> > > > actually, now that i'm looking at our uci machines i actually
> > > > have
> > > > them updating hourly...so, maybe you want to ask the admins to
> > > > do
> > > > that
> > > > to avoid a full day of confusion whenever they expire :P
> > > >
> > > > *usually* i can't gsissh either if the certs have expired but,
> > > > yeah,
> > > > they must be using different CA's now for that on ranger as
> > > > mihael
> > > > suggests...
> > > >
> > > > On Thu, Aug 25, 2011 at 2:46 PM, Jonathan Monette
> > > > <jonmon at mcs.anl.gov>
> > > > wrote:
> > > > True. I did not think that each mechanism would use
> > > > different
> > > > CAs. We might want to ask ci support to update the grid
> > > > certs
> > > > more frequently then to avoid this situation.
> > > >
> > > >
> > > > On Aug 25, 2011, at 4:42 PM, Mihael Hategan wrote:
> > > >
> > > > > On Thu, 2011-08-25 at 16:40 -0500, Jonathan Monette
> > > > > wrote:
> > > > >> That is weird. If you were able to gsissh to ranger I
> > > > would assume
> > > > >> that you are able to globus-url-copy to ranger.
> > > > >
> > > > > Not if the two use different CAs. Or if a password was
> > > > > typed
> > > > at the ssh
> > > > > login.
> > > > >
> > > > >> Anyways, what Sarah said should work. I would assume
> > > > >> that
> > > > ci would
> > > > >> update more frequently to avoid this problem.
> > > > >> On Aug 25, 2011, at 4:38 PM, Sarah Kenny wrote:
> > > > >>
> > > > >>> communicado's certs
> > > > >>> (/etc/grid-security/certificates)
> > > > >>> are
> > > > >>> out-of-date...if you copy
> > > > ranger's /etc/grid-security/certificates
> > > > >>> directory to communicado and point yr X509_CERT_DIR
> > > > >>> to
> > > > >>> it
> > > > you can
> > > > >>> get a job thru (a simple globus-job-run with my
> > > > >>> vaild
> > > > >>> cert
> > > > fails
> > > > >>> from communicado at the moment if i don't do this).
> > > > >>>
> > > > >>> i set our machines at uci to update daily...i think
> > > > >>> it's
> > > > less
> > > > >>> frequently at ci...
> > > > >>>
> > > > >>> On Thu, Aug 25, 2011 at 2:17 PM, Mihael Hategan
> > > > >>> <hategan at mcs.anl.gov> wrote:
> > > > >>> Can you try a globus-url-copy to
> > > > >>> gridftp.ranger?
> > > > >>>
> > > > >>> gridftp.ranger seems to have the NCSA myproxy
> > > > >>> CA.
> > > > You say
> > > > >>> you have the
> > > > >>> proper certificates dir in your
> > > > >>> X509_CERT_DIR,
> > > > >>> and
> > > > that
> > > > >>> directory
> > > > >>> contains the TACC root cert. So it should
> > > > >>> work.
> > > > >>> And
> > > > so
> > > > >>> should swift.
> > > > >>>
> > > > >>> Though I think that jglobus should be more
> > > > >>> clear
> > > > about
> > > > >>> "Unknown ca"
> > > > >>> errors. At least the name of the unknown CA
> > > > >>> should
> > > > be part
> > > > >>> of the error
> > > > >>> message.
> > > > >>>
> > > > >>>
> > > > >>> On Thu, 2011-08-25 at 15:55 -0500, David
> > > > >>> Kelly
> > > > wrote:
> > > > >>>> $ grid-proxy-info -all
> > > > >>>> subject : /C=US/O=National Center for
> > > > >>>> Supercomputing
> > > > >>> Applications/CN=David Kelly
> > > > >>>> issuer : /C=US/O=National Center for Supercomputing
> > > > >>> Applications/OU=Certificate
> > > > >>> Authorities/CN=MyProxy
> > > > >>>> identity : /C=US/O=National Center for
> > > > >>>> Supercomputing
> > > > >>> Applications/CN=David Kelly
> > > > >>>> type : end entity credential
> > > > >>>> strength : 1024 bits
> > > > >>>> path : /tmp/x509up_u1878
> > > > >>>> timeleft : 9:56:53
> > > > >>>>
> > > > >>>>
> > > > >>>> ----- Original Message -----
> > > > >>>>> From: "Mihael Hategan" <hategan at mcs.anl.gov>
> > > > >>>>> To: "David Kelly" <davidk at ci.uchicago.edu>
> > > > >>>>> Cc: "Ketan Maheshwari"
> > > > >>>>> <ketancmaheshwari at gmail.com>,
> > > > >>> "swift-devel Devel"
> > > > >>> <swift-devel at ci.uchicago.edu>
> > > > >>>>> Sent: Thursday, August 25, 2011 3:42:57 PM
> > > > >>>>> Subject: Re: [Swift-devel] Notes from 0.93 meeting
> > > > >>>>> Odd. Can you paste the output of 'grid-proxy-info
> > > > >>>>> -all'?
> > > > >>>>>
> > > > >>>>> On Thu, 2011-08-25 at 15:18 -0500, David Kelly
> > > > >>>>> wrote:
> > > > >>>>>> Sure, here is the full log:
> > > > >>>>>>
> > > > >>>>>>
> > > > >>>
> > > > http://www.ci.uchicago.edu/~davidk/001-catsn-ranger-20110825-1515-5tydro91.log
> > > > >>>>>>
> > > > >>>>>> ----- Original Message -----
> > > > >>>>>>> From: "Mihael Hategan" <hategan at mcs.anl.gov>
> > > > >>>>>>> To: "David Kelly" <davidk at ci.uchicago.edu>
> > > > >>>>>>> Cc: "Ketan Maheshwari"
> > > > >>>>>>> <ketancmaheshwari at gmail.com>,
> > > > >>> "swift-devel
> > > > >>>>>>> Devel" <swift-devel at ci.uchicago.edu>
> > > > >>>>>>> Sent: Thursday, August 25, 2011 2:43:31 PM
> > > > >>>>>>> Subject: Re: [Swift-devel] Notes from 0.93
> > > > >>>>>>> meeting
> > > > >>>>>>> It's possible that the CA dir on Ranger is not
> > > > >>> properly set up.
> > > > >>>>>>> Can
> > > > >>>>>>> you
> > > > >>>>>>> post the full log?
> > > > >>>>>>>
> > > > >>>>>>> On Thu, 2011-08-25 at 13:56 -0500, David Kelly
> > > > >>> wrote:
> > > > >>>>>>>> Those environment variables were not set up. I
> > > > >>> have them defined
> > > > >>>>>>>> now, but I'm still getting the same error.
> > > > >>>>>>>>
> > > > >>>>>>>> [davidk at communicado ranger]$ env |grep 509
> > > > >>>>>>>> X509_CERT_DIR=/opt/osg-1.2.16/globus/TRUSTED_CA
> > > > >>>>>>>> X509_CADIR=/opt/osg-1.2.16/globus/TRUSTED_CA
> > > > >>>>>>>>
> > > > >>>>>>>> [davidk at communicado ranger]$ swift -sites.file
> > > > >>> sites.xml
> > > > >>>>>>>> -tc.file
> > > > >>>>>>>> tc.data 001-catsn-ranger.swift
> > > > >>>>>>>> Swift svn swift-r4987 (swift modified locally)
> > > > >>> cog-r3229
> > > > >>>>>>>>
> > > > >>>>>>>> RunID: 20110825-1352-f1v940b4
> > > > >>>>>>>> Progress: time: Thu, 25 Aug 2011 13:52:59 -0500
> > > > >>>>>>>> Progress: time: Thu, 25 Aug 2011 13:53:00 -0500
> > > > >>> Selecting site:7
> > > > >>>>>>>> Initializing site shared directory:3
> > > > >>>>>>>> Execution failed:
> > > > >>>>>>>> Authentication failed [Caused by: Failure
> > > > >>> unspecified at
> > > > >>>>>>>> GSS-API
> > > > >>>>>>>> level [Caused by: Unknown CA]]
> > > > >>>>>>>>
> > > > >>>>>>>>
> > > > >>>>>>>> ----- Original Message -----
> > > > >>>>>>>>> From: "Ketan Maheshwari"
> > > > >>> <ketancmaheshwari at gmail.com>
> > > > >>>>>>>>> To: "David Kelly" <davidk at ci.uchicago.edu>
> > > > >>>>>>>>> Cc: "Jonathan Monette" <jonmon at mcs.anl.gov>,
> > > > >>> "swift-devel
> > > > >>>>>>>>> Devel"
> > > > >>>>>>>>> <swift-devel at ci.uchicago.edu>
> > > > >>>>>>>>> Sent: Thursday, August 25, 2011 1:32:50 PM
> > > > >>>>>>>>> Subject: Re: [Swift-devel] Notes from 0.93
> > > > >>> meeting
> > > > >>>>>>>>> Hi,
> > > > >>>>>>>>>
> > > > >>>>>>>>>
> > > > >>>>>>>>> Are your CADIR and CACERT env vars set up?
> > > > >>>>>>>>>
> > > > >>>>>>>>>
> > > > >>>>>>>>>
> > > > >>>>>>>>> [communicado:swiftgrid]$ echo $X509_CADIR
> > > > >>>>>>>>> /opt/osg-1.2.16/globus/TRUSTED_CA
> > > > >>>>>>>>>
> > > > >>>>>>>>>
> > > > >>>>>>>>>
> > > > >>>>>>>>> [communicado:swiftgrid]$ echo $X509_CERT_DIR
> > > > >>>>>>>>> /opt/osg-1.2.16/globus/TRUSTED_CA
> > > > >>>>>>>>>
> > > > >>>>>>>>>
> > > > >>>>>>>>>
> > > > >>>>>>>>> On Thu, Aug 25, 2011 at 1:29 PM, David Kelly <
> > > > >>>>>>>>> davidk at ci.uchicago.edu
> > > > >>>>>>>>>> wrote:
> > > > >>>>>>>>>
> > > > >>>>>>>>>
> > > > >>>>>>>>> Thanks Jon,
> > > > >>>>>>>>>
> > > > >>>>>>>>> Here is what happens when I try this from
> > > > >>> communicado:
> > > > >>>>>>>>>
> > > > >>>>>>>>> [davidk at communicado ~]$ myproxy-logon -l
> > > > >>>>>>>>> dkelly
> > > > >>> -s
> > > > >>>>>>>>> myproxy.teragrid.org
> > > > >>>>>>>>> Enter MyProxy pass phrase:
> > > > >>>>>>>>> A credential has been received for user dkelly
> > > > >>> in
> > > > >>>>>>>>> /tmp/x509up_u1878.
> > > > >>>>>>>>>
> > > > >>>>>>>>> [davidk at communicado ranger]$ swift -sites.file
> > > > >>> sites.xml
> > > > >>>>>>>>> -tc.file
> > > > >>>>>>>>> tc.data 001-catsn-ranger.swift
> > > > >>>>>>>>> Swift svn swift-r4987 (swift modified locally)
> > > > >>> cog-r3229
> > > > >>>>>>>>>
> > > > >>>>>>>>> RunID: 20110825-1326-o3e38fe0
> > > > >>>>>>>>> Progress: time: Thu, 25 Aug 2011 13:26:43
> > > > >>>>>>>>> -0500
> > > > >>>>>>>>> Progress: time: Thu, 25 Aug 2011 13:26:44
> > > > >>>>>>>>> -0500
> > > > >>> Selecting
> > > > >>>>>>>>> site:8
> > > > >>>>>>>>> Initializing site shared directory:2
> > > > >>>>>>>>> Execution failed:
> > > > >>>>>>>>> Authentication failed [Caused by: Failure
> > > > >>> unspecified at
> > > > >>>>>>>>> GSS-API
> > > > >>>>>>>>> level
> > > > >>>>>>>>> [Caused by: Unknown CA]]
> > > > >>>>>>>>>
> > > > >>>>>>>>> Any ideas?
> > > > >>>>>>>>>
> > > > >>>>>>>>> Thanks,
> > > > >>>>>>>>> David
> > > > >>>>>>>>>
> > > > >>>>>>>>>
> > > > >>>>>>>>>
> > > > >>>>>>>>> _______________________________________________
> > > > >>>>>>>>> Swift-devel mailing list
> > > > >>>>>>>>> Swift-devel at ci.uchicago.edu
> > > > >>>>>>>>>
> > > > >>>
> > > > https://lists.ci.uchicago.edu/cgi-bin/mailman/listinfo/swift-devel
> > > > >>>>>>>>>
> > > > >>>>>>>>>
> > > > >>>>>>>>>
> > > > >>>>>>>>>
> > > > >>>>>>>>> --
> > > > >>>>>>>>> Ketan
> > > > >>>>>>>> _______________________________________________
> > > > >>>>>>>> Swift-devel mailing list
> > > > >>>>>>>> Swift-devel at ci.uchicago.edu
> > > > >>>>>>>>
> > > > >>>
> > > > https://lists.ci.uchicago.edu/cgi-bin/mailman/listinfo/swift-devel
> > > > >>>
> > > > >>>
> > > > >>> _______________________________________________
> > > > >>> Swift-devel mailing list
> > > > >>> Swift-devel at ci.uchicago.edu
> > > > >>>
> > > > https://lists.ci.uchicago.edu/cgi-bin/mailman/listinfo/swift-devel
> > > > >>>
> > > > >>>
> > > > >>>
> > > > >>>
> > > > >>> --
> > > > >>> Sarah Kenny
> > > > >>> Programmer ~ Brain Circuits Laboratory ~ Rm 2224 Bio
> > > > >>> Sci
> > > > III
> > > > >>> University of California Irvine, Dept. of Neurology
> > > > >>> ~
> > > > 773-818-8300
> > > > >>>
> > > > >>> _______________________________________________
> > > > >>> Swift-devel mailing list
> > > > >>> Swift-devel at ci.uchicago.edu
> > > > >>>
> > > > https://lists.ci.uchicago.edu/cgi-bin/mailman/listinfo/swift-devel
> > > > >>
> > > > >
> > > > >
> > > >
> > > >
> > > >
> > > >
> > > >
> > > > --
> > > > Sarah Kenny
> > > > Programmer ~ Brain Circuits Laboratory ~ Rm 2224 Bio Sci III
> > > > University of California Irvine, Dept. of Neurology ~
> > > > 773-818-8300
> > > >
> > >
> > >
> > > _______________________________________________
> > > Swift-devel mailing list
> > > Swift-devel at ci.uchicago.edu
> > > https://lists.ci.uchicago.edu/cgi-bin/mailman/listinfo/swift-devel
More information about the Swift-devel
mailing list