[Swift-devel] Notes from 0.93 meeting

Michael Wilde wilde at mcs.anl.gov
Fri Aug 26 20:09:59 CDT 2011


David, Jon - see also this thread for possible pointers: 


http://lists.ci.uchicago.edu/pipermail/swift-devel/2011-July/008541.html 

- Mike 

----- Original Message -----


From: "Jonathan Monette" <jonmon at mcs.anl.gov> 
To: "David Kelly" <davidk at ci.uchicago.edu> 
Cc: "swift-devel Devel" <swift-devel at ci.uchicago.edu> 
Sent: Friday, August 26, 2011 8:02:30 PM 
Subject: Re: [Swift-devel] Notes from 0.93 meeting 

I don't think it matters but it was a thought. When I was working with swift and globus online I had to set it to an Il address and not the dns name. But that might have been for a different reason. 

----- Reply message ----- 
From: "David Kelly" <davidk at ci.uchicago.edu> 
Date: Fri, Aug 26, 2011 7:13 pm 
Subject: [Swift-devel] Notes from 0.93 meeting 
To: "Jonathan Monette" <jonmon at mcs.anl.gov> 
Cc: "Mihael Hategan" <hategan at mcs.anl.gov>, "swift-devel Devel" <swift-devel at ci.uchicago.edu> 


I set it to communicado.ci.uchicago.edu. I'll try again with IP address. 

----- Original Message ----- 
> From: "Jonathan Monette" <jonmon at mcs.anl.gov> 
> To: "David Kelly" <davidk at ci.uchicago.edu> 
> Cc: "Mihael Hategan" <hategan at mcs.anl.gov>, "swift-devel Devel" <swift-devel at ci.uchicago.edu> 
> Sent: Friday, August 26, 2011 6:54:29 PM 
> Subject: Re: [Swift-devel] Notes from 0.93 meeting 
> Did you set GLOBUS_HOSTNAME to communicado.ci.uchicago.edu or probably 
> better the ip-address of communicado? 
> On Aug 26, 2011, at 6:52 PM, David Kelly wrote: 
> 
> > I tried setting GLOBUS_HOSTNAME on communicado. The gram log file is 
> > no longer created, but I still don't see any jobs being submitted? 
> > 
> > There is a new set of logs at 
> > www.ci.uchicago.edu/~davidk/ranger-gt2-logs2.tar.gz 
> > 
> > David 
> > 
> > ----- Original Message ----- 
> >> From: "Mihael Hategan" <hategan at mcs.anl.gov> 
> >> To: "David Kelly" <davidk at ci.uchicago.edu> 
> >> Cc: "swift-devel Devel" <swift-devel at ci.uchicago.edu>, "Jonathan 
> >> Monette" <jonmon at mcs.anl.gov> 
> >> Sent: Friday, August 26, 2011 1:42:13 PM 
> >> Subject: Re: [Swift-devel] Notes from 0.93 meeting 
> >> "The job manager failed to open stderr" tends to happen when you 
> >> have 
> >> GLOBUS_HOSTNAME set incorrectly. 
> >> 
> >> On Fri, 2011-08-26 at 13:38 -0500, David Kelly wrote: 
> >>> When I am trying to run the script now, Swift does not seem to be 
> >>> submitting the jobs correctly. Nothing it showing up in qstat. 
> >>> 
> >>> I noticed that a gram log gets created in my home directory that 
> >>> says: 
> >>> ts=2011-08-26T17:30:03.910618Z id=27215 event=gram.job.end 
> >>> level=ERROR gramid=/16145868447994515851/17606392074284884670/ 
> >>> job_status=4 status=-73 reason="the job manager failed to open 
> >>> stdout" 
> >>> 
> >>> I'm guessing this is the cause of the problem. Bugs #153 and #215 
> >>> were related to similar problems with stdout and gt2/sge. 
> >>> 
> >>> The full logs are at 
> >>> http://www.ci.uchicago.edu/~davidk/ranger-gt2-logs.tar.gz 
> >>> 
> >>> Thanks, 
> >>> David 
> >>> 
> >>> 
> >>> ----- Original Message ----- 
> >>>> From: "Mihael Hategan" <hategan at mcs.anl.gov> 
> >>>> To: "Jonathan Monette" <jonmon at mcs.anl.gov> 
> >>>> Cc: "swift-devel Devel" <swift-devel at ci.uchicago.edu> 
> >>>> Sent: Thursday, August 25, 2011 5:31:34 PM 
> >>>> Subject: Re: [Swift-devel] Notes from 0.93 meeting 
> >>>> On Thu, 2011-08-25 at 17:18 -0500, Jonathan Monette wrote: 
> >>>>> I can send mail to ci support and cc mike to it and ask what 
> >>>>> they 
> >>>>> can 
> >>>>> do. 
> >>>>> 
> >>>>> Mihael, is there anyway for Swift to give a little more feedback 
> >>>>> besides unknown CA or is that a jglobus problem? 
> >>>> 
> >>>> It's a jglobus problem. 
> >>>> 
> >>>> That in itself may not be a big issue, but jglobus is now being 
> >>>> heavily 
> >>>> re-organized by the globus team, so I'm not sure what the best 
> >>>> long-term 
> >>>> strategy is here. 
> >>>>> 
> >>>>> ----- Reply message ----- 
> >>>>> From: "Sarah Kenny" <skenny at uchicago.edu> 
> >>>>> Date: Thu, Aug 25, 2011 5:11 pm 
> >>>>> Subject: [Swift-devel] Notes from 0.93 meeting 
> >>>>> To: "Jonathan Monette" <jonmon at mcs.anl.gov> 
> >>>>> Cc: "Mihael Hategan" <hategan at mcs.anl.gov>, "swift-devel Devel" 
> >>>>> <swift-devel at ci.uchicago.edu> 
> >>>>> 
> >>>>> 
> >>>>> 
> >>>>> if i had a nickel for every time i dealt with this i'd be rich! 
> >>>>> :) 
> >>>>> actually, now that i'm looking at our uci machines i actually 
> >>>>> have 
> >>>>> them updating hourly...so, maybe you want to ask the admins to 
> >>>>> do 
> >>>>> that 
> >>>>> to avoid a full day of confusion whenever they expire :P 
> >>>>> 
> >>>>> *usually* i can't gsissh either if the certs have expired but, 
> >>>>> yeah, 
> >>>>> they must be using different CA's now for that on ranger as 
> >>>>> mihael 
> >>>>> suggests... 
> >>>>> 
> >>>>> On Thu, Aug 25, 2011 at 2:46 PM, Jonathan Monette 
> >>>>> <jonmon at mcs.anl.gov> 
> >>>>> wrote: 
> >>>>> True. I did not think that each mechanism would use 
> >>>>> different 
> >>>>> CAs. We might want to ask ci support to update the grid 
> >>>>> certs 
> >>>>> more frequently then to avoid this situation. 
> >>>>> 
> >>>>> 
> >>>>> On Aug 25, 2011, at 4:42 PM, Mihael Hategan wrote: 
> >>>>> 
> >>>>>> On Thu, 2011-08-25 at 16:40 -0500, Jonathan Monette 
> >>>>>> wrote: 
> >>>>>>> That is weird. If you were able to gsissh to ranger I 
> >>>>> would assume 
> >>>>>>> that you are able to globus-url-copy to ranger. 
> >>>>>> 
> >>>>>> Not if the two use different CAs. Or if a password was 
> >>>>>> typed 
> >>>>> at the ssh 
> >>>>>> login. 
> >>>>>> 
> >>>>>>> Anyways, what Sarah said should work. I would assume 
> >>>>>>> that 
> >>>>> ci would 
> >>>>>>> update more frequently to avoid this problem. 
> >>>>>>> On Aug 25, 2011, at 4:38 PM, Sarah Kenny wrote: 
> >>>>>>> 
> >>>>>>>> communicado's certs 
> >>>>>>>> (/etc/grid-security/certificates) 
> >>>>>>>> are 
> >>>>>>>> out-of-date...if you copy 
> >>>>> ranger's /etc/grid-security/certificates 
> >>>>>>>> directory to communicado and point yr X509_CERT_DIR 
> >>>>>>>> to 
> >>>>>>>> it 
> >>>>> you can 
> >>>>>>>> get a job thru (a simple globus-job-run with my 
> >>>>>>>> vaild 
> >>>>>>>> cert 
> >>>>> fails 
> >>>>>>>> from communicado at the moment if i don't do this). 
> >>>>>>>> 
> >>>>>>>> i set our machines at uci to update daily...i think 
> >>>>>>>> it's 
> >>>>> less 
> >>>>>>>> frequently at ci... 
> >>>>>>>> 
> >>>>>>>> On Thu, Aug 25, 2011 at 2:17 PM, Mihael Hategan 
> >>>>>>>> <hategan at mcs.anl.gov> wrote: 
> >>>>>>>> Can you try a globus-url-copy to 
> >>>>>>>> gridftp.ranger? 
> >>>>>>>> 
> >>>>>>>> gridftp.ranger seems to have the NCSA myproxy 
> >>>>>>>> CA. 
> >>>>> You say 
> >>>>>>>> you have the 
> >>>>>>>> proper certificates dir in your 
> >>>>>>>> X509_CERT_DIR, 
> >>>>>>>> and 
> >>>>> that 
> >>>>>>>> directory 
> >>>>>>>> contains the TACC root cert. So it should 
> >>>>>>>> work. 
> >>>>>>>> And 
> >>>>> so 
> >>>>>>>> should swift. 
> >>>>>>>> 
> >>>>>>>> Though I think that jglobus should be more 
> >>>>>>>> clear 
> >>>>> about 
> >>>>>>>> "Unknown ca" 
> >>>>>>>> errors. At least the name of the unknown CA 
> >>>>>>>> should 
> >>>>> be part 
> >>>>>>>> of the error 
> >>>>>>>> message. 
> >>>>>>>> 
> >>>>>>>> 
> >>>>>>>> On Thu, 2011-08-25 at 15:55 -0500, David 
> >>>>>>>> Kelly 
> >>>>> wrote: 
> >>>>>>>>> $ grid-proxy-info -all 
> >>>>>>>>> subject : /C=US/O=National Center for 
> >>>>>>>>> Supercomputing 
> >>>>>>>> Applications/CN=David Kelly 
> >>>>>>>>> issuer : /C=US/O=National Center for Supercomputing 
> >>>>>>>> Applications/OU=Certificate 
> >>>>>>>> Authorities/CN=MyProxy 
> >>>>>>>>> identity : /C=US/O=National Center for 
> >>>>>>>>> Supercomputing 
> >>>>>>>> Applications/CN=David Kelly 
> >>>>>>>>> type : end entity credential 
> >>>>>>>>> strength : 1024 bits 
> >>>>>>>>> path : /tmp/x509up_u1878 
> >>>>>>>>> timeleft : 9:56:53 
> >>>>>>>>> 
> >>>>>>>>> 
> >>>>>>>>> ----- Original Message ----- 
> >>>>>>>>>> From: "Mihael Hategan" <hategan at mcs.anl.gov> 
> >>>>>>>>>> To: "David Kelly" <davidk at ci.uchicago.edu> 
> >>>>>>>>>> Cc: "Ketan Maheshwari" 
> >>>>>>>>>> <ketancmaheshwari at gmail.com>, 
> >>>>>>>> "swift-devel Devel" 
> >>>>>>>> <swift-devel at ci.uchicago.edu> 
> >>>>>>>>>> Sent: Thursday, August 25, 2011 3:42:57 PM 
> >>>>>>>>>> Subject: Re: [Swift-devel] Notes from 0.93 meeting 
> >>>>>>>>>> Odd. Can you paste the output of 'grid-proxy-info 
> >>>>>>>>>> -all'? 
> >>>>>>>>>> 
> >>>>>>>>>> On Thu, 2011-08-25 at 15:18 -0500, David Kelly 
> >>>>>>>>>> wrote: 
> >>>>>>>>>>> Sure, here is the full log: 
> >>>>>>>>>>> 
> >>>>>>>>>>> 
> >>>>>>>> 
> >>>>> http://www.ci.uchicago.edu/~davidk/001-catsn-ranger-20110825-1515-5tydro91.log 
> >>>>>>>>>>> 
> >>>>>>>>>>> ----- Original Message ----- 
> >>>>>>>>>>>> From: "Mihael Hategan" <hategan at mcs.anl.gov> 
> >>>>>>>>>>>> To: "David Kelly" <davidk at ci.uchicago.edu> 
> >>>>>>>>>>>> Cc: "Ketan Maheshwari" 
> >>>>>>>>>>>> <ketancmaheshwari at gmail.com>, 
> >>>>>>>> "swift-devel 
> >>>>>>>>>>>> Devel" <swift-devel at ci.uchicago.edu> 
> >>>>>>>>>>>> Sent: Thursday, August 25, 2011 2:43:31 PM 
> >>>>>>>>>>>> Subject: Re: [Swift-devel] Notes from 0.93 
> >>>>>>>>>>>> meeting 
> >>>>>>>>>>>> It's possible that the CA dir on Ranger is not 
> >>>>>>>> properly set up. 
> >>>>>>>>>>>> Can 
> >>>>>>>>>>>> you 
> >>>>>>>>>>>> post the full log? 
> >>>>>>>>>>>> 
> >>>>>>>>>>>> On Thu, 2011-08-25 at 13:56 -0500, David Kelly 
> >>>>>>>> wrote: 
> >>>>>>>>>>>>> Those environment variables were not set up. I 
> >>>>>>>> have them defined 
> >>>>>>>>>>>>> now, but I'm still getting the same error. 
> >>>>>>>>>>>>> 
> >>>>>>>>>>>>> [davidk at communicado ranger]$ env |grep 509 
> >>>>>>>>>>>>> X509_CERT_DIR=/opt/osg-1.2.16/globus/TRUSTED_CA 
> >>>>>>>>>>>>> X509_CADIR=/opt/osg-1.2.16/globus/TRUSTED_CA 
> >>>>>>>>>>>>> 
> >>>>>>>>>>>>> [davidk at communicado ranger]$ swift -sites.file 
> >>>>>>>> sites.xml 
> >>>>>>>>>>>>> -tc.file 
> >>>>>>>>>>>>> tc.data 001-catsn-ranger.swift 
> >>>>>>>>>>>>> Swift svn swift-r4987 (swift modified locally) 
> >>>>>>>> cog-r3229 
> >>>>>>>>>>>>> 
> >>>>>>>>>>>>> RunID: 20110825-1352-f1v940b4 
> >>>>>>>>>>>>> Progress: time: Thu, 25 Aug 2011 13:52:59 -0500 
> >>>>>>>>>>>>> Progress: time: Thu, 25 Aug 2011 13:53:00 -0500 
> >>>>>>>> Selecting site:7 
> >>>>>>>>>>>>> Initializing site shared directory:3 
> >>>>>>>>>>>>> Execution failed: 
> >>>>>>>>>>>>> Authentication failed [Caused by: Failure 
> >>>>>>>> unspecified at 
> >>>>>>>>>>>>> GSS-API 
> >>>>>>>>>>>>> level [Caused by: Unknown CA]] 
> >>>>>>>>>>>>> 
> >>>>>>>>>>>>> 
> >>>>>>>>>>>>> ----- Original Message ----- 
> >>>>>>>>>>>>>> From: "Ketan Maheshwari" 
> >>>>>>>> <ketancmaheshwari at gmail.com> 
> >>>>>>>>>>>>>> To: "David Kelly" <davidk at ci.uchicago.edu> 
> >>>>>>>>>>>>>> Cc: "Jonathan Monette" <jonmon at mcs.anl.gov>, 
> >>>>>>>> "swift-devel 
> >>>>>>>>>>>>>> Devel" 
> >>>>>>>>>>>>>> <swift-devel at ci.uchicago.edu> 
> >>>>>>>>>>>>>> Sent: Thursday, August 25, 2011 1:32:50 PM 
> >>>>>>>>>>>>>> Subject: Re: [Swift-devel] Notes from 0.93 
> >>>>>>>> meeting 
> >>>>>>>>>>>>>> Hi, 
> >>>>>>>>>>>>>> 
> >>>>>>>>>>>>>> 
> >>>>>>>>>>>>>> Are your CADIR and CACERT env vars set up? 
> >>>>>>>>>>>>>> 
> >>>>>>>>>>>>>> 
> >>>>>>>>>>>>>> 
> >>>>>>>>>>>>>> [communicado:swiftgrid]$ echo $X509_CADIR 
> >>>>>>>>>>>>>> /opt/osg-1.2.16/globus/TRUSTED_CA 
> >>>>>>>>>>>>>> 
> >>>>>>>>>>>>>> 
> >>>>>>>>>>>>>> 
> >>>>>>>>>>>>>> [communicado:swiftgrid]$ echo $X509_CERT_DIR 
> >>>>>>>>>>>>>> /opt/osg-1.2.16/globus/TRUSTED_CA 
> >>>>>>>>>>>>>> 
> >>>>>>>>>>>>>> 
> >>>>>>>>>>>>>> 
> >>>>>>>>>>>>>> On Thu, Aug 25, 2011 at 1:29 PM, David Kelly < 
> >>>>>>>>>>>>>> davidk at ci.uchicago.edu 
> >>>>>>>>>>>>>>> wrote: 
> >>>>>>>>>>>>>> 
> >>>>>>>>>>>>>> 
> >>>>>>>>>>>>>> Thanks Jon, 
> >>>>>>>>>>>>>> 
> >>>>>>>>>>>>>> Here is what happens when I try this from 
> >>>>>>>> communicado: 
> >>>>>>>>>>>>>> 
> >>>>>>>>>>>>>> [davidk at communicado ~]$ myproxy-logon -l 
> >>>>>>>>>>>>>> dkelly 
> >>>>>>>> -s 
> >>>>>>>>>>>>>> myproxy.teragrid.org 
> >>>>>>>>>>>>>> Enter MyProxy pass phrase: 
> >>>>>>>>>>>>>> A credential has been received for user dkelly 
> >>>>>>>> in 
> >>>>>>>>>>>>>> /tmp/x509up_u1878. 
> >>>>>>>>>>>>>> 
> >>>>>>>>>>>>>> [davidk at communicado ranger]$ swift -sites.file 
> >>>>>>>> sites.xml 
> >>>>>>>>>>>>>> -tc.file 
> >>>>>>>>>>>>>> tc.data 001-catsn-ranger.swift 
> >>>>>>>>>>>>>> Swift svn swift-r4987 (swift modified locally) 
> >>>>>>>> cog-r3229 
> >>>>>>>>>>>>>> 
> >>>>>>>>>>>>>> RunID: 20110825-1326-o3e38fe0 
> >>>>>>>>>>>>>> Progress: time: Thu, 25 Aug 2011 13:26:43 
> >>>>>>>>>>>>>> -0500 
> >>>>>>>>>>>>>> Progress: time: Thu, 25 Aug 2011 13:26:44 
> >>>>>>>>>>>>>> -0500 
> >>>>>>>> Selecting 
> >>>>>>>>>>>>>> site:8 
> >>>>>>>>>>>>>> Initializing site shared directory:2 
> >>>>>>>>>>>>>> Execution failed: 
> >>>>>>>>>>>>>> Authentication failed [Caused by: Failure 
> >>>>>>>> unspecified at 
> >>>>>>>>>>>>>> GSS-API 
> >>>>>>>>>>>>>> level 
> >>>>>>>>>>>>>> [Caused by: Unknown CA]] 
> >>>>>>>>>>>>>> 
> >>>>>>>>>>>>>> Any ideas? 
> >>>>>>>>>>>>>> 
> >>>>>>>>>>>>>> Thanks, 
> >>>>>>>>>>>>>> David 
> >>>>>>>>>>>>>> 
> >>>>>>>>>>>>>> 
> >>>>>>>>>>>>>> 
> >>>>>>>>>>>>>> _______________________________________________ 
> >>>>>>>>>>>>>> Swift-devel mailing list 
> >>>>>>>>>>>>>> Swift-devel at ci.uchicago.edu 
> >>>>>>>>>>>>>> 
> >>>>>>>> 
> >>>>> https://lists.ci.uchicago.edu/cgi-bin/mailman/listinfo/swift-devel 
> >>>>>>>>>>>>>> 
> >>>>>>>>>>>>>> 
> >>>>>>>>>>>>>> 
> >>>>>>>>>>>>>> 
> >>>>>>>>>>>>>> -- 
> >>>>>>>>>>>>>> Ketan 
> >>>>>>>>>>>>> _______________________________________________ 
> >>>>>>>>>>>>> Swift-devel mailing list 
> >>>>>>>>>>>>> Swift-devel at ci.uchicago.edu 
> >>>>>>>>>>>>> 
> >>>>>>>> 
> >>>>> https://lists.ci.uchicago.edu/cgi-bin/mailman/listinfo/swift-devel 
> >>>>>>>> 
> >>>>>>>> 
> >>>>>>>> _______________________________________________ 
> >>>>>>>> Swift-devel mailing list 
> >>>>>>>> Swift-devel at ci.uchicago.edu 
> >>>>>>>> 
> >>>>> https://lists.ci.uchicago.edu/cgi-bin/mailman/listinfo/swift-devel 
> >>>>>>>> 
> >>>>>>>> 
> >>>>>>>> 
> >>>>>>>> 
> >>>>>>>> -- 
> >>>>>>>> Sarah Kenny 
> >>>>>>>> Programmer ~ Brain Circuits Laboratory ~ Rm 2224 Bio 
> >>>>>>>> Sci 
> >>>>> III 
> >>>>>>>> University of California Irvine, Dept. of Neurology 
> >>>>>>>> ~ 
> >>>>> 773-818-8300 
> >>>>>>>> 
> >>>>>>>> _______________________________________________ 
> >>>>>>>> Swift-devel mailing list 
> >>>>>>>> Swift-devel at ci.uchicago.edu 
> >>>>>>>> 
> >>>>> https://lists.ci.uchicago.edu/cgi-bin/mailman/listinfo/swift-devel 
> >>>>>>> 
> >>>>>> 
> >>>>>> 
> >>>>> 
> >>>>> 
> >>>>> 
> >>>>> 
> >>>>> 
> >>>>> -- 
> >>>>> Sarah Kenny 
> >>>>> Programmer ~ Brain Circuits Laboratory ~ Rm 2224 Bio Sci III 
> >>>>> University of California Irvine, Dept. of Neurology ~ 
> >>>>> 773-818-8300 
> >>>>> 
> >>>> 
> >>>> 
> >>>> _______________________________________________ 
> >>>> Swift-devel mailing list 
> >>>> Swift-devel at ci.uchicago.edu 
> >>>> https://lists.ci.uchicago.edu/cgi-bin/mailman/listinfo/swift-devel 



_______________________________________________ 
Swift-devel mailing list 
Swift-devel at ci.uchicago.edu 
https://lists.ci.uchicago.edu/cgi-bin/mailman/listinfo/swift-devel 



-- 
Michael Wilde 
Computation Institute, University of Chicago 
Mathematics and Computer Science Division 
Argonne National Laboratory 

-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.mcs.anl.gov/pipermail/swift-devel/attachments/20110826/149f8316/attachment.html>


More information about the Swift-devel mailing list