[Swift-devel] Notes from 0.93 meeting

David Kelly davidk at ci.uchicago.edu
Fri Aug 26 21:19:02 CDT 2011


Getting closer now.

When I changed it from gt2:gt2:sge to gt2:sge, there was more detail in the coaster log. I saw an error that a valid pe was not specified. I defined a pe in my sites.xml and tried again. An SGE submit file is getting created now, but the submit file sets the pe to 'threaded' rather than my value, 16way.

http://www.ci.uchicago.edu/~davidk/ranger-gt2-sge3.tar.gz

David

----- Original Message -----
> From: "Mihael Hategan" <hategan at mcs.anl.gov>
> To: "David Kelly" <davidk at ci.uchicago.edu>
> Cc: "Jonathan Monette" <jonmon at mcs.anl.gov>, "swift-devel Devel" <swift-devel at ci.uchicago.edu>
> Sent: Friday, August 26, 2011 8:23:14 PM
> Subject: Re: [Swift-devel] Notes from 0.93 meeting
> Can you try GT2:SGE instead of GT2:GT2:SGE?
> 
> On Fri, 2011-08-26 at 19:13 -0500, David Kelly wrote:
> > I set it to communicado.ci.uchicago.edu. I'll try again with IP
> > address.
> >
> > ----- Original Message -----
> > > From: "Jonathan Monette" <jonmon at mcs.anl.gov>
> > > To: "David Kelly" <davidk at ci.uchicago.edu>
> > > Cc: "Mihael Hategan" <hategan at mcs.anl.gov>, "swift-devel Devel"
> > > <swift-devel at ci.uchicago.edu>
> > > Sent: Friday, August 26, 2011 6:54:29 PM
> > > Subject: Re: [Swift-devel] Notes from 0.93 meeting
> > > Did you set GLOBUS_HOSTNAME to communicado.ci.uchicago.edu or
> > > probably
> > > better the ip-address of communicado?
> > > On Aug 26, 2011, at 6:52 PM, David Kelly wrote:
> > >
> > > > I tried setting GLOBUS_HOSTNAME on communicado. The gram log
> > > > file is
> > > > no longer created, but I still don't see any jobs being
> > > > submitted?
> > > >
> > > > There is a new set of logs at
> > > > www.ci.uchicago.edu/~davidk/ranger-gt2-logs2.tar.gz
> > > >
> > > > David
> > > >
> > > > ----- Original Message -----
> > > >> From: "Mihael Hategan" <hategan at mcs.anl.gov>
> > > >> To: "David Kelly" <davidk at ci.uchicago.edu>
> > > >> Cc: "swift-devel Devel" <swift-devel at ci.uchicago.edu>,
> > > >> "Jonathan
> > > >> Monette" <jonmon at mcs.anl.gov>
> > > >> Sent: Friday, August 26, 2011 1:42:13 PM
> > > >> Subject: Re: [Swift-devel] Notes from 0.93 meeting
> > > >> "The job manager failed to open stderr" tends to happen when
> > > >> you
> > > >> have
> > > >> GLOBUS_HOSTNAME set incorrectly.
> > > >>
> > > >> On Fri, 2011-08-26 at 13:38 -0500, David Kelly wrote:
> > > >>> When I am trying to run the script now, Swift does not seem to
> > > >>> be
> > > >>> submitting the jobs correctly. Nothing it showing up in qstat.
> > > >>>
> > > >>> I noticed that a gram log gets created in my home directory
> > > >>> that
> > > >>> says:
> > > >>> ts=2011-08-26T17:30:03.910618Z id=27215 event=gram.job.end
> > > >>> level=ERROR gramid=/16145868447994515851/17606392074284884670/
> > > >>> job_status=4 status=-73 reason="the job manager failed to open
> > > >>> stdout"
> > > >>>
> > > >>> I'm guessing this is the cause of the problem. Bugs #153 and
> > > >>> #215
> > > >>> were related to similar problems with stdout and gt2/sge.
> > > >>>
> > > >>> The full logs are at
> > > >>> http://www.ci.uchicago.edu/~davidk/ranger-gt2-logs.tar.gz
> > > >>>
> > > >>> Thanks,
> > > >>> David
> > > >>>
> > > >>>
> > > >>> ----- Original Message -----
> > > >>>> From: "Mihael Hategan" <hategan at mcs.anl.gov>
> > > >>>> To: "Jonathan Monette" <jonmon at mcs.anl.gov>
> > > >>>> Cc: "swift-devel Devel" <swift-devel at ci.uchicago.edu>
> > > >>>> Sent: Thursday, August 25, 2011 5:31:34 PM
> > > >>>> Subject: Re: [Swift-devel] Notes from 0.93 meeting
> > > >>>> On Thu, 2011-08-25 at 17:18 -0500, Jonathan Monette wrote:
> > > >>>>> I can send mail to ci support and cc mike to it and ask what
> > > >>>>> they
> > > >>>>> can
> > > >>>>> do.
> > > >>>>>
> > > >>>>> Mihael, is there anyway for Swift to give a little more
> > > >>>>> feedback
> > > >>>>> besides unknown CA or is that a jglobus problem?
> > > >>>>
> > > >>>> It's a jglobus problem.
> > > >>>>
> > > >>>> That in itself may not be a big issue, but jglobus is now
> > > >>>> being
> > > >>>> heavily
> > > >>>> re-organized by the globus team, so I'm not sure what the
> > > >>>> best
> > > >>>> long-term
> > > >>>> strategy is here.
> > > >>>>>
> > > >>>>> ----- Reply message -----
> > > >>>>> From: "Sarah Kenny" <skenny at uchicago.edu>
> > > >>>>> Date: Thu, Aug 25, 2011 5:11 pm
> > > >>>>> Subject: [Swift-devel] Notes from 0.93 meeting
> > > >>>>> To: "Jonathan Monette" <jonmon at mcs.anl.gov>
> > > >>>>> Cc: "Mihael Hategan" <hategan at mcs.anl.gov>, "swift-devel
> > > >>>>> Devel"
> > > >>>>> <swift-devel at ci.uchicago.edu>
> > > >>>>>
> > > >>>>>
> > > >>>>>
> > > >>>>> if i had a nickel for every time i dealt with this i'd be
> > > >>>>> rich!
> > > >>>>> :)
> > > >>>>> actually, now that i'm looking at our uci machines i
> > > >>>>> actually
> > > >>>>> have
> > > >>>>> them updating hourly...so, maybe you want to ask the admins
> > > >>>>> to
> > > >>>>> do
> > > >>>>> that
> > > >>>>> to avoid a full day of confusion whenever they expire :P
> > > >>>>>
> > > >>>>> *usually* i can't gsissh either if the certs have expired
> > > >>>>> but,
> > > >>>>> yeah,
> > > >>>>> they must be using different CA's now for that on ranger as
> > > >>>>> mihael
> > > >>>>> suggests...
> > > >>>>>
> > > >>>>> On Thu, Aug 25, 2011 at 2:46 PM, Jonathan Monette
> > > >>>>> <jonmon at mcs.anl.gov>
> > > >>>>> wrote:
> > > >>>>>        True. I did not think that each mechanism would use
> > > >>>>>        different
> > > >>>>>        CAs. We might want to ask ci support to update the
> > > >>>>>        grid
> > > >>>>>        certs
> > > >>>>>        more frequently then to avoid this situation.
> > > >>>>>
> > > >>>>>
> > > >>>>>        On Aug 25, 2011, at 4:42 PM, Mihael Hategan wrote:
> > > >>>>>
> > > >>>>>> On Thu, 2011-08-25 at 16:40 -0500, Jonathan Monette
> > > >>>>>> wrote:
> > > >>>>>>> That is weird. If you were able to gsissh to ranger I
> > > >>>>>        would assume
> > > >>>>>>> that you are able to globus-url-copy to ranger.
> > > >>>>>>
> > > >>>>>> Not if the two use different CAs. Or if a password was
> > > >>>>>> typed
> > > >>>>>        at the ssh
> > > >>>>>> login.
> > > >>>>>>
> > > >>>>>>> Anyways, what Sarah said should work. I would assume
> > > >>>>>>> that
> > > >>>>>        ci would
> > > >>>>>>> update more frequently to avoid this problem.
> > > >>>>>>> On Aug 25, 2011, at 4:38 PM, Sarah Kenny wrote:
> > > >>>>>>>
> > > >>>>>>>> communicado's certs
> > > >>>>>>>> (/etc/grid-security/certificates)
> > > >>>>>>>> are
> > > >>>>>>>> out-of-date...if you copy
> > > >>>>>        ranger's /etc/grid-security/certificates
> > > >>>>>>>> directory to communicado and point yr X509_CERT_DIR
> > > >>>>>>>> to
> > > >>>>>>>> it
> > > >>>>>        you can
> > > >>>>>>>> get a job thru (a simple globus-job-run with my
> > > >>>>>>>> vaild
> > > >>>>>>>> cert
> > > >>>>>        fails
> > > >>>>>>>> from communicado at the moment if i don't do this).
> > > >>>>>>>>
> > > >>>>>>>> i set our machines at uci to update daily...i think
> > > >>>>>>>> it's
> > > >>>>>        less
> > > >>>>>>>> frequently at ci...
> > > >>>>>>>>
> > > >>>>>>>> On Thu, Aug 25, 2011 at 2:17 PM, Mihael Hategan
> > > >>>>>>>> <hategan at mcs.anl.gov> wrote:
> > > >>>>>>>>       Can you try a globus-url-copy to
> > > >>>>>>>>       gridftp.ranger?
> > > >>>>>>>>
> > > >>>>>>>>       gridftp.ranger seems to have the NCSA myproxy
> > > >>>>>>>>       CA.
> > > >>>>>        You say
> > > >>>>>>>>       you have the
> > > >>>>>>>>       proper certificates dir in your
> > > >>>>>>>>       X509_CERT_DIR,
> > > >>>>>>>>       and
> > > >>>>>        that
> > > >>>>>>>>       directory
> > > >>>>>>>>       contains the TACC root cert. So it should
> > > >>>>>>>>       work.
> > > >>>>>>>>       And
> > > >>>>>        so
> > > >>>>>>>>       should swift.
> > > >>>>>>>>
> > > >>>>>>>>       Though I think that jglobus should be more
> > > >>>>>>>>       clear
> > > >>>>>        about
> > > >>>>>>>>       "Unknown ca"
> > > >>>>>>>>       errors. At least the name of the unknown CA
> > > >>>>>>>>       should
> > > >>>>>        be part
> > > >>>>>>>>       of the error
> > > >>>>>>>>       message.
> > > >>>>>>>>
> > > >>>>>>>>
> > > >>>>>>>>       On Thu, 2011-08-25 at 15:55 -0500, David
> > > >>>>>>>>       Kelly
> > > >>>>>        wrote:
> > > >>>>>>>>> $ grid-proxy-info -all
> > > >>>>>>>>> subject : /C=US/O=National Center for
> > > >>>>>>>>> Supercomputing
> > > >>>>>>>>       Applications/CN=David Kelly
> > > >>>>>>>>> issuer : /C=US/O=National Center for Supercomputing
> > > >>>>>>>>       Applications/OU=Certificate
> > > >>>>>>>>       Authorities/CN=MyProxy
> > > >>>>>>>>> identity : /C=US/O=National Center for
> > > >>>>>>>>> Supercomputing
> > > >>>>>>>>       Applications/CN=David Kelly
> > > >>>>>>>>> type : end entity credential
> > > >>>>>>>>> strength : 1024 bits
> > > >>>>>>>>> path : /tmp/x509up_u1878
> > > >>>>>>>>> timeleft : 9:56:53
> > > >>>>>>>>>
> > > >>>>>>>>>
> > > >>>>>>>>> ----- Original Message -----
> > > >>>>>>>>>> From: "Mihael Hategan" <hategan at mcs.anl.gov>
> > > >>>>>>>>>> To: "David Kelly" <davidk at ci.uchicago.edu>
> > > >>>>>>>>>> Cc: "Ketan Maheshwari"
> > > >>>>>>>>>> <ketancmaheshwari at gmail.com>,
> > > >>>>>>>>       "swift-devel Devel"
> > > >>>>>>>>       <swift-devel at ci.uchicago.edu>
> > > >>>>>>>>>> Sent: Thursday, August 25, 2011 3:42:57 PM
> > > >>>>>>>>>> Subject: Re: [Swift-devel] Notes from 0.93 meeting
> > > >>>>>>>>>> Odd. Can you paste the output of 'grid-proxy-info
> > > >>>>>>>>>> -all'?
> > > >>>>>>>>>>
> > > >>>>>>>>>> On Thu, 2011-08-25 at 15:18 -0500, David Kelly
> > > >>>>>>>>>> wrote:
> > > >>>>>>>>>>> Sure, here is the full log:
> > > >>>>>>>>>>>
> > > >>>>>>>>>>>
> > > >>>>>>>>
> > > >>>>>         http://www.ci.uchicago.edu/~davidk/001-catsn-ranger-20110825-1515-5tydro91.log
> > > >>>>>>>>>>>
> > > >>>>>>>>>>> ----- Original Message -----
> > > >>>>>>>>>>>> From: "Mihael Hategan" <hategan at mcs.anl.gov>
> > > >>>>>>>>>>>> To: "David Kelly" <davidk at ci.uchicago.edu>
> > > >>>>>>>>>>>> Cc: "Ketan Maheshwari"
> > > >>>>>>>>>>>> <ketancmaheshwari at gmail.com>,
> > > >>>>>>>>       "swift-devel
> > > >>>>>>>>>>>> Devel" <swift-devel at ci.uchicago.edu>
> > > >>>>>>>>>>>> Sent: Thursday, August 25, 2011 2:43:31 PM
> > > >>>>>>>>>>>> Subject: Re: [Swift-devel] Notes from 0.93
> > > >>>>>>>>>>>> meeting
> > > >>>>>>>>>>>> It's possible that the CA dir on Ranger is not
> > > >>>>>>>>       properly set up.
> > > >>>>>>>>>>>> Can
> > > >>>>>>>>>>>> you
> > > >>>>>>>>>>>> post the full log?
> > > >>>>>>>>>>>>
> > > >>>>>>>>>>>> On Thu, 2011-08-25 at 13:56 -0500, David Kelly
> > > >>>>>>>>       wrote:
> > > >>>>>>>>>>>>> Those environment variables were not set up. I
> > > >>>>>>>>       have them defined
> > > >>>>>>>>>>>>> now, but I'm still getting the same error.
> > > >>>>>>>>>>>>>
> > > >>>>>>>>>>>>> [davidk at communicado ranger]$ env |grep 509
> > > >>>>>>>>>>>>> X509_CERT_DIR=/opt/osg-1.2.16/globus/TRUSTED_CA
> > > >>>>>>>>>>>>> X509_CADIR=/opt/osg-1.2.16/globus/TRUSTED_CA
> > > >>>>>>>>>>>>>
> > > >>>>>>>>>>>>> [davidk at communicado ranger]$ swift -sites.file
> > > >>>>>>>>       sites.xml
> > > >>>>>>>>>>>>> -tc.file
> > > >>>>>>>>>>>>> tc.data 001-catsn-ranger.swift
> > > >>>>>>>>>>>>> Swift svn swift-r4987 (swift modified locally)
> > > >>>>>>>>       cog-r3229
> > > >>>>>>>>>>>>>
> > > >>>>>>>>>>>>> RunID: 20110825-1352-f1v940b4
> > > >>>>>>>>>>>>> Progress: time: Thu, 25 Aug 2011 13:52:59 -0500
> > > >>>>>>>>>>>>> Progress: time: Thu, 25 Aug 2011 13:53:00 -0500
> > > >>>>>>>>       Selecting site:7
> > > >>>>>>>>>>>>> Initializing site shared directory:3
> > > >>>>>>>>>>>>> Execution failed:
> > > >>>>>>>>>>>>>     Authentication failed [Caused by: Failure
> > > >>>>>>>>       unspecified at
> > > >>>>>>>>>>>>>     GSS-API
> > > >>>>>>>>>>>>>     level [Caused by: Unknown CA]]
> > > >>>>>>>>>>>>>
> > > >>>>>>>>>>>>>
> > > >>>>>>>>>>>>> ----- Original Message -----
> > > >>>>>>>>>>>>>> From: "Ketan Maheshwari"
> > > >>>>>>>>       <ketancmaheshwari at gmail.com>
> > > >>>>>>>>>>>>>> To: "David Kelly" <davidk at ci.uchicago.edu>
> > > >>>>>>>>>>>>>> Cc: "Jonathan Monette" <jonmon at mcs.anl.gov>,
> > > >>>>>>>>       "swift-devel
> > > >>>>>>>>>>>>>> Devel"
> > > >>>>>>>>>>>>>> <swift-devel at ci.uchicago.edu>
> > > >>>>>>>>>>>>>> Sent: Thursday, August 25, 2011 1:32:50 PM
> > > >>>>>>>>>>>>>> Subject: Re: [Swift-devel] Notes from 0.93
> > > >>>>>>>>       meeting
> > > >>>>>>>>>>>>>> Hi,
> > > >>>>>>>>>>>>>>
> > > >>>>>>>>>>>>>>
> > > >>>>>>>>>>>>>> Are your CADIR and CACERT env vars set up?
> > > >>>>>>>>>>>>>>
> > > >>>>>>>>>>>>>>
> > > >>>>>>>>>>>>>>
> > > >>>>>>>>>>>>>> [communicado:swiftgrid]$ echo $X509_CADIR
> > > >>>>>>>>>>>>>> /opt/osg-1.2.16/globus/TRUSTED_CA
> > > >>>>>>>>>>>>>>
> > > >>>>>>>>>>>>>>
> > > >>>>>>>>>>>>>>
> > > >>>>>>>>>>>>>> [communicado:swiftgrid]$ echo $X509_CERT_DIR
> > > >>>>>>>>>>>>>> /opt/osg-1.2.16/globus/TRUSTED_CA
> > > >>>>>>>>>>>>>>
> > > >>>>>>>>>>>>>>
> > > >>>>>>>>>>>>>>
> > > >>>>>>>>>>>>>> On Thu, Aug 25, 2011 at 1:29 PM, David Kelly <
> > > >>>>>>>>>>>>>> davidk at ci.uchicago.edu
> > > >>>>>>>>>>>>>>> wrote:
> > > >>>>>>>>>>>>>>
> > > >>>>>>>>>>>>>>
> > > >>>>>>>>>>>>>> Thanks Jon,
> > > >>>>>>>>>>>>>>
> > > >>>>>>>>>>>>>> Here is what happens when I try this from
> > > >>>>>>>>       communicado:
> > > >>>>>>>>>>>>>>
> > > >>>>>>>>>>>>>> [davidk at communicado ~]$ myproxy-logon -l
> > > >>>>>>>>>>>>>> dkelly
> > > >>>>>>>>       -s
> > > >>>>>>>>>>>>>> myproxy.teragrid.org
> > > >>>>>>>>>>>>>> Enter MyProxy pass phrase:
> > > >>>>>>>>>>>>>> A credential has been received for user dkelly
> > > >>>>>>>>       in
> > > >>>>>>>>>>>>>> /tmp/x509up_u1878.
> > > >>>>>>>>>>>>>>
> > > >>>>>>>>>>>>>> [davidk at communicado ranger]$ swift -sites.file
> > > >>>>>>>>       sites.xml
> > > >>>>>>>>>>>>>> -tc.file
> > > >>>>>>>>>>>>>> tc.data 001-catsn-ranger.swift
> > > >>>>>>>>>>>>>> Swift svn swift-r4987 (swift modified locally)
> > > >>>>>>>>       cog-r3229
> > > >>>>>>>>>>>>>>
> > > >>>>>>>>>>>>>> RunID: 20110825-1326-o3e38fe0
> > > >>>>>>>>>>>>>> Progress: time: Thu, 25 Aug 2011 13:26:43
> > > >>>>>>>>>>>>>> -0500
> > > >>>>>>>>>>>>>> Progress: time: Thu, 25 Aug 2011 13:26:44
> > > >>>>>>>>>>>>>> -0500
> > > >>>>>>>>       Selecting
> > > >>>>>>>>>>>>>> site:8
> > > >>>>>>>>>>>>>> Initializing site shared directory:2
> > > >>>>>>>>>>>>>> Execution failed:
> > > >>>>>>>>>>>>>> Authentication failed [Caused by: Failure
> > > >>>>>>>>       unspecified at
> > > >>>>>>>>>>>>>> GSS-API
> > > >>>>>>>>>>>>>> level
> > > >>>>>>>>>>>>>> [Caused by: Unknown CA]]
> > > >>>>>>>>>>>>>>
> > > >>>>>>>>>>>>>> Any ideas?
> > > >>>>>>>>>>>>>>
> > > >>>>>>>>>>>>>> Thanks,
> > > >>>>>>>>>>>>>> David
> > > >>>>>>>>>>>>>>
> > > >>>>>>>>>>>>>>
> > > >>>>>>>>>>>>>>
> > > >>>>>>>>>>>>>> _______________________________________________
> > > >>>>>>>>>>>>>> Swift-devel mailing list
> > > >>>>>>>>>>>>>> Swift-devel at ci.uchicago.edu
> > > >>>>>>>>>>>>>>
> > > >>>>>>>>
> > > >>>>>         https://lists.ci.uchicago.edu/cgi-bin/mailman/listinfo/swift-devel
> > > >>>>>>>>>>>>>>
> > > >>>>>>>>>>>>>>
> > > >>>>>>>>>>>>>>
> > > >>>>>>>>>>>>>>
> > > >>>>>>>>>>>>>> --
> > > >>>>>>>>>>>>>> Ketan
> > > >>>>>>>>>>>>> _______________________________________________
> > > >>>>>>>>>>>>> Swift-devel mailing list
> > > >>>>>>>>>>>>> Swift-devel at ci.uchicago.edu
> > > >>>>>>>>>>>>>
> > > >>>>>>>>
> > > >>>>>         https://lists.ci.uchicago.edu/cgi-bin/mailman/listinfo/swift-devel
> > > >>>>>>>>
> > > >>>>>>>>
> > > >>>>>>>>       _______________________________________________
> > > >>>>>>>>       Swift-devel mailing list
> > > >>>>>>>>       Swift-devel at ci.uchicago.edu
> > > >>>>>>>>
> > > >>>>>         https://lists.ci.uchicago.edu/cgi-bin/mailman/listinfo/swift-devel
> > > >>>>>>>>
> > > >>>>>>>>
> > > >>>>>>>>
> > > >>>>>>>>
> > > >>>>>>>> --
> > > >>>>>>>> Sarah Kenny
> > > >>>>>>>> Programmer ~ Brain Circuits Laboratory ~ Rm 2224 Bio
> > > >>>>>>>> Sci
> > > >>>>>        III
> > > >>>>>>>> University of California Irvine, Dept. of Neurology
> > > >>>>>>>> ~
> > > >>>>>        773-818-8300
> > > >>>>>>>>
> > > >>>>>>>> _______________________________________________
> > > >>>>>>>> Swift-devel mailing list
> > > >>>>>>>> Swift-devel at ci.uchicago.edu
> > > >>>>>>>>
> > > >>>>>        https://lists.ci.uchicago.edu/cgi-bin/mailman/listinfo/swift-devel
> > > >>>>>>>
> > > >>>>>>
> > > >>>>>>
> > > >>>>>
> > > >>>>>
> > > >>>>>
> > > >>>>>
> > > >>>>>
> > > >>>>> --
> > > >>>>> Sarah Kenny
> > > >>>>> Programmer ~ Brain Circuits Laboratory ~ Rm 2224 Bio Sci III
> > > >>>>> University of California Irvine, Dept. of Neurology ~
> > > >>>>> 773-818-8300
> > > >>>>>
> > > >>>>
> > > >>>>
> > > >>>> _______________________________________________
> > > >>>> Swift-devel mailing list
> > > >>>> Swift-devel at ci.uchicago.edu
> > > >>>> https://lists.ci.uchicago.edu/cgi-bin/mailman/listinfo/swift-devel



More information about the Swift-devel mailing list