[Swift-devel] Notes from 0.93 meeting

David Kelly davidk at ci.uchicago.edu
Fri Aug 26 13:38:56 CDT 2011


When I am trying to run the script now, Swift does not seem to be submitting the jobs correctly. Nothing it showing up in qstat.

I noticed that a gram log gets created in my home directory that says:
ts=2011-08-26T17:30:03.910618Z id=27215 event=gram.job.end level=ERROR gramid=/16145868447994515851/17606392074284884670/ job_status=4 status=-73 reason="the job manager failed to open stdout"

I'm guessing this is the cause of the problem. Bugs #153 and #215 were related to similar problems with stdout and gt2/sge.

The full logs are at http://www.ci.uchicago.edu/~davidk/ranger-gt2-logs.tar.gz

Thanks,
David


----- Original Message -----
> From: "Mihael Hategan" <hategan at mcs.anl.gov>
> To: "Jonathan Monette" <jonmon at mcs.anl.gov>
> Cc: "swift-devel Devel" <swift-devel at ci.uchicago.edu>
> Sent: Thursday, August 25, 2011 5:31:34 PM
> Subject: Re: [Swift-devel] Notes from 0.93 meeting
> On Thu, 2011-08-25 at 17:18 -0500, Jonathan Monette wrote:
> > I can send mail to ci support and cc mike to it and ask what they
> > can
> > do.
> >
> > Mihael, is there anyway for Swift to give a little more feedback
> > besides unknown CA or is that a jglobus problem?
> 
> It's a jglobus problem.
> 
> That in itself may not be a big issue, but jglobus is now being
> heavily
> re-organized by the globus team, so I'm not sure what the best
> long-term
> strategy is here.
> >
> > ----- Reply message -----
> > From: "Sarah Kenny" <skenny at uchicago.edu>
> > Date: Thu, Aug 25, 2011 5:11 pm
> > Subject: [Swift-devel] Notes from 0.93 meeting
> > To: "Jonathan Monette" <jonmon at mcs.anl.gov>
> > Cc: "Mihael Hategan" <hategan at mcs.anl.gov>, "swift-devel Devel"
> > <swift-devel at ci.uchicago.edu>
> >
> >
> >
> > if i had a nickel for every time i dealt with this i'd be rich! :)
> > actually, now that i'm looking at our uci machines i actually have
> > them updating hourly...so, maybe you want to ask the admins to do
> > that
> > to avoid a full day of confusion whenever they expire :P
> >
> > *usually* i can't gsissh either if the certs have expired but, yeah,
> > they must be using different CA's now for that on ranger as mihael
> > suggests...
> >
> > On Thu, Aug 25, 2011 at 2:46 PM, Jonathan Monette
> > <jonmon at mcs.anl.gov>
> > wrote:
> >         True. I did not think that each mechanism would use
> >         different
> >         CAs. We might want to ask ci support to update the grid
> >         certs
> >         more frequently then to avoid this situation.
> >
> >
> >         On Aug 25, 2011, at 4:42 PM, Mihael Hategan wrote:
> >
> >         > On Thu, 2011-08-25 at 16:40 -0500, Jonathan Monette wrote:
> >         >> That is weird. If you were able to gsissh to ranger I
> >         would assume
> >         >> that you are able to globus-url-copy to ranger.
> >         >
> >         > Not if the two use different CAs. Or if a password was
> >         > typed
> >         at the ssh
> >         > login.
> >         >
> >         >>  Anyways, what Sarah said should work. I would assume
> >         >>  that
> >         ci would
> >         >> update more frequently to avoid this problem.
> >         >> On Aug 25, 2011, at 4:38 PM, Sarah Kenny wrote:
> >         >>
> >         >>> communicado's certs (/etc/grid-security/certificates)
> >         >>> are
> >         >>> out-of-date...if you copy
> >         ranger's /etc/grid-security/certificates
> >         >>> directory to communicado and point yr X509_CERT_DIR to
> >         >>> it
> >         you can
> >         >>> get a job thru (a simple globus-job-run with my vaild
> >         >>> cert
> >         fails
> >         >>> from communicado at the moment if i don't do this).
> >         >>>
> >         >>> i set our machines at uci to update daily...i think it's
> >         less
> >         >>> frequently at ci...
> >         >>>
> >         >>> On Thu, Aug 25, 2011 at 2:17 PM, Mihael Hategan
> >         >>> <hategan at mcs.anl.gov> wrote:
> >         >>>        Can you try a globus-url-copy to gridftp.ranger?
> >         >>>
> >         >>>        gridftp.ranger seems to have the NCSA myproxy CA.
> >         You say
> >         >>>        you have the
> >         >>>        proper certificates dir in your X509_CERT_DIR,
> >         >>>        and
> >         that
> >         >>>        directory
> >         >>>        contains the TACC root cert. So it should work.
> >         >>>        And
> >         so
> >         >>>        should swift.
> >         >>>
> >         >>>        Though I think that jglobus should be more clear
> >         about
> >         >>>        "Unknown ca"
> >         >>>        errors. At least the name of the unknown CA
> >         >>>        should
> >         be part
> >         >>>        of the error
> >         >>>        message.
> >         >>>
> >         >>>
> >         >>>        On Thu, 2011-08-25 at 15:55 -0500, David Kelly
> >         wrote:
> >         >>>> $ grid-proxy-info -all
> >         >>>> subject : /C=US/O=National Center for Supercomputing
> >         >>>        Applications/CN=David Kelly
> >         >>>> issuer : /C=US/O=National Center for Supercomputing
> >         >>>        Applications/OU=Certificate
> >         >>>        Authorities/CN=MyProxy
> >         >>>> identity : /C=US/O=National Center for Supercomputing
> >         >>>        Applications/CN=David Kelly
> >         >>>> type : end entity credential
> >         >>>> strength : 1024 bits
> >         >>>> path : /tmp/x509up_u1878
> >         >>>> timeleft : 9:56:53
> >         >>>>
> >         >>>>
> >         >>>> ----- Original Message -----
> >         >>>>> From: "Mihael Hategan" <hategan at mcs.anl.gov>
> >         >>>>> To: "David Kelly" <davidk at ci.uchicago.edu>
> >         >>>>> Cc: "Ketan Maheshwari" <ketancmaheshwari at gmail.com>,
> >         >>>        "swift-devel Devel" <swift-devel at ci.uchicago.edu>
> >         >>>>> Sent: Thursday, August 25, 2011 3:42:57 PM
> >         >>>>> Subject: Re: [Swift-devel] Notes from 0.93 meeting
> >         >>>>> Odd. Can you paste the output of 'grid-proxy-info
> >         >>>>> -all'?
> >         >>>>>
> >         >>>>> On Thu, 2011-08-25 at 15:18 -0500, David Kelly wrote:
> >         >>>>>> Sure, here is the full log:
> >         >>>>>>
> >         >>>>>>
> >         >>>
> >          http://www.ci.uchicago.edu/~davidk/001-catsn-ranger-20110825-1515-5tydro91.log
> >         >>>>>>
> >         >>>>>> ----- Original Message -----
> >         >>>>>>> From: "Mihael Hategan" <hategan at mcs.anl.gov>
> >         >>>>>>> To: "David Kelly" <davidk at ci.uchicago.edu>
> >         >>>>>>> Cc: "Ketan Maheshwari" <ketancmaheshwari at gmail.com>,
> >         >>>        "swift-devel
> >         >>>>>>> Devel" <swift-devel at ci.uchicago.edu>
> >         >>>>>>> Sent: Thursday, August 25, 2011 2:43:31 PM
> >         >>>>>>> Subject: Re: [Swift-devel] Notes from 0.93 meeting
> >         >>>>>>> It's possible that the CA dir on Ranger is not
> >         >>>        properly set up.
> >         >>>>>>> Can
> >         >>>>>>> you
> >         >>>>>>> post the full log?
> >         >>>>>>>
> >         >>>>>>> On Thu, 2011-08-25 at 13:56 -0500, David Kelly
> >         >>>        wrote:
> >         >>>>>>>> Those environment variables were not set up. I
> >         >>>        have them defined
> >         >>>>>>>> now, but I'm still getting the same error.
> >         >>>>>>>>
> >         >>>>>>>> [davidk at communicado ranger]$ env |grep 509
> >         >>>>>>>> X509_CERT_DIR=/opt/osg-1.2.16/globus/TRUSTED_CA
> >         >>>>>>>> X509_CADIR=/opt/osg-1.2.16/globus/TRUSTED_CA
> >         >>>>>>>>
> >         >>>>>>>> [davidk at communicado ranger]$ swift -sites.file
> >         >>>        sites.xml
> >         >>>>>>>> -tc.file
> >         >>>>>>>> tc.data 001-catsn-ranger.swift
> >         >>>>>>>> Swift svn swift-r4987 (swift modified locally)
> >         >>>        cog-r3229
> >         >>>>>>>>
> >         >>>>>>>> RunID: 20110825-1352-f1v940b4
> >         >>>>>>>> Progress: time: Thu, 25 Aug 2011 13:52:59 -0500
> >         >>>>>>>> Progress: time: Thu, 25 Aug 2011 13:53:00 -0500
> >         >>>        Selecting site:7
> >         >>>>>>>> Initializing site shared directory:3
> >         >>>>>>>> Execution failed:
> >         >>>>>>>>      Authentication failed [Caused by: Failure
> >         >>>        unspecified at
> >         >>>>>>>>      GSS-API
> >         >>>>>>>>      level [Caused by: Unknown CA]]
> >         >>>>>>>>
> >         >>>>>>>>
> >         >>>>>>>> ----- Original Message -----
> >         >>>>>>>>> From: "Ketan Maheshwari"
> >         >>>        <ketancmaheshwari at gmail.com>
> >         >>>>>>>>> To: "David Kelly" <davidk at ci.uchicago.edu>
> >         >>>>>>>>> Cc: "Jonathan Monette" <jonmon at mcs.anl.gov>,
> >         >>>        "swift-devel
> >         >>>>>>>>> Devel"
> >         >>>>>>>>> <swift-devel at ci.uchicago.edu>
> >         >>>>>>>>> Sent: Thursday, August 25, 2011 1:32:50 PM
> >         >>>>>>>>> Subject: Re: [Swift-devel] Notes from 0.93
> >         >>>        meeting
> >         >>>>>>>>> Hi,
> >         >>>>>>>>>
> >         >>>>>>>>>
> >         >>>>>>>>> Are your CADIR and CACERT env vars set up?
> >         >>>>>>>>>
> >         >>>>>>>>>
> >         >>>>>>>>>
> >         >>>>>>>>> [communicado:swiftgrid]$ echo $X509_CADIR
> >         >>>>>>>>> /opt/osg-1.2.16/globus/TRUSTED_CA
> >         >>>>>>>>>
> >         >>>>>>>>>
> >         >>>>>>>>>
> >         >>>>>>>>> [communicado:swiftgrid]$ echo $X509_CERT_DIR
> >         >>>>>>>>> /opt/osg-1.2.16/globus/TRUSTED_CA
> >         >>>>>>>>>
> >         >>>>>>>>>
> >         >>>>>>>>>
> >         >>>>>>>>> On Thu, Aug 25, 2011 at 1:29 PM, David Kelly <
> >         >>>>>>>>> davidk at ci.uchicago.edu
> >         >>>>>>>>>> wrote:
> >         >>>>>>>>>
> >         >>>>>>>>>
> >         >>>>>>>>> Thanks Jon,
> >         >>>>>>>>>
> >         >>>>>>>>> Here is what happens when I try this from
> >         >>>        communicado:
> >         >>>>>>>>>
> >         >>>>>>>>> [davidk at communicado ~]$ myproxy-logon -l dkelly
> >         >>>        -s
> >         >>>>>>>>> myproxy.teragrid.org
> >         >>>>>>>>> Enter MyProxy pass phrase:
> >         >>>>>>>>> A credential has been received for user dkelly
> >         >>>        in
> >         >>>>>>>>> /tmp/x509up_u1878.
> >         >>>>>>>>>
> >         >>>>>>>>> [davidk at communicado ranger]$ swift -sites.file
> >         >>>        sites.xml
> >         >>>>>>>>> -tc.file
> >         >>>>>>>>> tc.data 001-catsn-ranger.swift
> >         >>>>>>>>> Swift svn swift-r4987 (swift modified locally)
> >         >>>        cog-r3229
> >         >>>>>>>>>
> >         >>>>>>>>> RunID: 20110825-1326-o3e38fe0
> >         >>>>>>>>> Progress: time: Thu, 25 Aug 2011 13:26:43 -0500
> >         >>>>>>>>> Progress: time: Thu, 25 Aug 2011 13:26:44 -0500
> >         >>>        Selecting
> >         >>>>>>>>> site:8
> >         >>>>>>>>> Initializing site shared directory:2
> >         >>>>>>>>> Execution failed:
> >         >>>>>>>>> Authentication failed [Caused by: Failure
> >         >>>        unspecified at
> >         >>>>>>>>> GSS-API
> >         >>>>>>>>> level
> >         >>>>>>>>> [Caused by: Unknown CA]]
> >         >>>>>>>>>
> >         >>>>>>>>> Any ideas?
> >         >>>>>>>>>
> >         >>>>>>>>> Thanks,
> >         >>>>>>>>> David
> >         >>>>>>>>>
> >         >>>>>>>>>
> >         >>>>>>>>>
> >         >>>>>>>>> _______________________________________________
> >         >>>>>>>>> Swift-devel mailing list
> >         >>>>>>>>> Swift-devel at ci.uchicago.edu
> >         >>>>>>>>>
> >         >>>
> >          https://lists.ci.uchicago.edu/cgi-bin/mailman/listinfo/swift-devel
> >         >>>>>>>>>
> >         >>>>>>>>>
> >         >>>>>>>>>
> >         >>>>>>>>>
> >         >>>>>>>>> --
> >         >>>>>>>>> Ketan
> >         >>>>>>>> _______________________________________________
> >         >>>>>>>> Swift-devel mailing list
> >         >>>>>>>> Swift-devel at ci.uchicago.edu
> >         >>>>>>>>
> >         >>>
> >          https://lists.ci.uchicago.edu/cgi-bin/mailman/listinfo/swift-devel
> >         >>>
> >         >>>
> >         >>>        _______________________________________________
> >         >>>        Swift-devel mailing list
> >         >>>        Swift-devel at ci.uchicago.edu
> >         >>>
> >          https://lists.ci.uchicago.edu/cgi-bin/mailman/listinfo/swift-devel
> >         >>>
> >         >>>
> >         >>>
> >         >>>
> >         >>> --
> >         >>> Sarah Kenny
> >         >>> Programmer ~ Brain Circuits Laboratory ~ Rm 2224 Bio Sci
> >         III
> >         >>> University of California Irvine, Dept. of Neurology ~
> >         773-818-8300
> >         >>>
> >         >>> _______________________________________________
> >         >>> Swift-devel mailing list
> >         >>> Swift-devel at ci.uchicago.edu
> >         >>>
> >         https://lists.ci.uchicago.edu/cgi-bin/mailman/listinfo/swift-devel
> >         >>
> >         >
> >         >
> >
> >
> >
> >
> >
> > --
> > Sarah Kenny
> > Programmer ~ Brain Circuits Laboratory ~ Rm 2224 Bio Sci III
> > University of California Irvine, Dept. of Neurology ~ 773-818-8300
> >
> 
> 
> _______________________________________________
> Swift-devel mailing list
> Swift-devel at ci.uchicago.edu
> https://lists.ci.uchicago.edu/cgi-bin/mailman/listinfo/swift-devel



More information about the Swift-devel mailing list