[Swift-devel] Cant get auto-coasters to run from midway to beagle

Michael Wilde wilde at mcs.anl.gov
Sun Mar 10 12:01:53 CDT 2013


Here's run034: seems to be a bit better, but still dies.  This is with throttle of 48 jobs on 48 cores (2 nodes), fom swift.rcc to beagle.  17MB files. Still seems to curiously die about 4 mins into the run, which suggests some kind of timeout is still lurking???

- Mike

Swift 0.94RC4 swift-r6284 cog-r3607 (cog modified locally)

RunID: 20130310-1639-kyb8hca9
Progress:  time: Sun, 10 Mar 2013 16:39:45 +0000
Progress:  time: Sun, 10 Mar 2013 16:39:56 +0000  Selecting site:269  Submitting:47  Submitted:1
Progress:  time: Sun, 10 Mar 2013 16:40:01 +0000  Selecting site:269  Stage in:1  Submitted:47
Progress:  time: Sun, 10 Mar 2013 16:40:15 +0000  Selecting site:269  Stage in:48
Progress:  time: Sun, 10 Mar 2013 16:40:45 +0000  Selecting site:269  Stage in:48
Progress:  time: Sun, 10 Mar 2013 16:41:15 +0000  Selecting site:269  Stage in:48
Progress:  time: Sun, 10 Mar 2013 16:41:45 +0000  Selecting site:269  Stage in:48
Progress:  time: Sun, 10 Mar 2013 16:42:11 +0000  Selecting site:269  Stage in:47  Active:1
Progress:  time: Sun, 10 Mar 2013 16:42:12 +0000  Selecting site:269  Stage in:41  Active:7
Progress:  time: Sun, 10 Mar 2013 16:42:13 +0000  Selecting site:269  Stage in:23  Active:25
Progress:  time: Sun, 10 Mar 2013 16:42:15 +0000  Selecting site:269  Active:48
Progress:  time: Sun, 10 Mar 2013 16:42:17 +0000  Selecting site:269  Active:47  Stage out:1
Progress:  time: Sun, 10 Mar 2013 16:42:18 +0000  Selecting site:268  Stage in:1  Active:46  Stage out:1  Finished successfully:1
Progress:  time: Sun, 10 Mar 2013 16:42:19 +0000  Selecting site:265  Stage in:3  Submitted:1  Active:42  Stage out:2  Finished successfully:4
Progress:  time: Sun, 10 Mar 2013 16:42:20 +0000  Selecting site:258  Stage in:6  Submitting:5  Active:23  Stage out:13  Finished successfully:12
Progress:  time: Sun, 10 Mar 2013 16:42:21 +0000  Selecting site:244  Stage in:24  Submitting:1  Active:20  Stage out:3  Finished successfully:25
Progress:  time: Sun, 10 Mar 2013 16:42:23 +0000  Selecting site:241  Stage in:25  Submitting:3  Stage out:19  Finished successfully:29
Progress:  time: Sun, 10 Mar 2013 16:42:24 +0000  Selecting site:221  Stage in:28  Submitting:19  Submitted:1  Finished successfully:48
Progress:  time: Sun, 10 Mar 2013 16:42:45 +0000  Selecting site:221  Stage in:48  Finished successfully:48
Progress:  time: Sun, 10 Mar 2013 16:42:54 +0000  Selecting site:221  Stage in:47  Active:1  Finished successfully:48
Progress:  time: Sun, 10 Mar 2013 16:43:00 +0000  Selecting site:221  Stage in:47  Stage out:1  Finished successfully:48
Progress:  time: Sun, 10 Mar 2013 16:43:02 +0000  Selecting site:221  Stage in:47  Finished successfully:49
Progress:  time: Sun, 10 Mar 2013 16:43:05 +0000  Selecting site:220  Stage in:47  Submitted:1  Finished successfully:49
Progress:  time: Sun, 10 Mar 2013 16:43:15 +0000  Selecting site:220  Stage in:48  Finished successfully:49
Progress:  time: Sun, 10 Mar 2013 16:43:45 +0000  Selecting site:220  Stage in:48  Finished successfully:49
Channels: {null at https://192.5.86.107:50000=MetaChannel[https://192.5.86.107:50000] -> GSSCChannel-https://192.5.86.107:50000(2)[https://192.5.86.107:50000], /C=US/O=JavaCoG/OU=AutoCA/CN=User at https://192.5.86.107:50000=MetaChannel[service-60519] -> BufferingChannel, null at id://u7b315f9a-13d552c3f68--7fff-u-362f30fc-13d552c3f50--7fffC=MetaChannel[https://192.5.86.107:50000] -> GSSCChannel-https://192.5.86.107:50000(2)[https://192.5.86.107:50000], null at id://u-362f30fc-13d552c3f50--8000-u7b315f9a-13d552c3f68--8000S=MetaChannel[service-60519] -> BufferingChannel}
Context: service-60859
Meta context: service-60519
Progress:  time: Sun, 10 Mar 2013 16:43:59 +0000  Selecting site:220  Stage in:47  Active:1  Finished successfully:49
Channels: {null at https://192.5.86.107:50000=MetaChannel[https://192.5.86.107:50000] -> GSSCChannel-https://192.5.86.107:50000(2)[https://192.5.86.107:50000], /C=US/O=JavaCoG/OU=AutoCA/CN=User at https://192.5.86.107:50000=MetaChannel[service-60519] -> BufferingChannel, null at id://u7b315f9a-13d552c3f68--7fff-u-362f30fc-13d552c3f50--7fffC=MetaChannel[https://192.5.86.107:50000] -> GSSCChannel-https://192.5.86.107:50000(2)[https://192.5.86.107:50000], null at id://u-362f30fc-13d552c3f50--8000-u7b315f9a-13d552c3f68--8000S=MetaChannel[service-60519] -> BufferingChannel}
Context: service-60663
Meta context: service-60519
Progress:  time: Sun, 10 Mar 2013 16:44:05 +0000  Selecting site:220  Stage in:47  Stage out:1  Finished successfully:49
Progress:  time: Sun, 10 Mar 2013 16:44:07 +0000  Selecting site:220  Stage in:47  Finished successfully:50
Channels: {null at https://192.5.86.107:50000=MetaChannel[https://192.5.86.107:50000] -> GSSCChannel-https://192.5.86.107:50000(2)[https://192.5.86.107:50000], /C=US/O=JavaCoG/OU=AutoCA/CN=User at https://192.5.86.107:50000=MetaChannel[service-60519] -> BufferingChannel, null at id://u7b315f9a-13d552c3f68--7fff-u-362f30fc-13d552c3f50--7fffC=MetaChannel[https://192.5.86.107:50000] -> GSSCChannel-https://192.5.86.107:50000(2)[https://192.5.86.107:50000], null at id://u-362f30fc-13d552c3f50--8000-u7b315f9a-13d552c3f68--8000S=MetaChannel[service-60519] -> BufferingChannel}
Context: service-60081
Meta context: service-60519
Progress:  time: Sun, 10 Mar 2013 16:44:09 +0000  Selecting site:219  Stage in:45  Submitting:1  Active:2  Finished successfully:50
Execution failed:
	Exception in getlanduse:
    Arguments: [home/wilde/osgdemo/modis/svn/data/modis/2002/h02v11.rgb]
    Host: beagle
    Directory: modis02-20130310-1639-kyb8hca9/jobs/9/getlanduse-90fyse6l

Caused by:
	Shutting down worker
	getLandUse, modis02.swift, line 20
error null

real	4m27.007s
user	2m44.221s
sys	0m3.448s
+ mv /home/wilde/.swift/runs/current/run034.1362933583 /home/wilde/.swift/runs/completed
midway001$ 


----- Original Message -----
> From: "Mihael Hategan" <hategan at mcs.anl.gov>
> To: "Michael Wilde" <wilde at mcs.anl.gov>
> Cc: "Swift Devel" <swift-devel at ci.uchicago.edu>
> Sent: Sunday, March 10, 2013 1:36:26 AM
> Subject: Re: [Swift-devel] Cant get auto-coasters to run	from	midway	to	beagle
> 
> Please try now. I made some changes:
> 
> 1. start the service with "-l" so that things in your .profile (such
> as
> module load sun-java) would be picked up. However, this also means
> that
> you should unset X509_* stuff or the sshcl proxy forwarding will not
> work properly.
> 
> 2. I fixed a bug that caused an extra connection to the coaster
> service.
> Normally the service connects back to the client and both use that
> connection. However, due to some changes in the way credentials were
> set
> for jobs, and the fact that connections were looked up based on both
> hostname and credential, the coaster client would ignore the existing
> connection and create another one. The initial one with then time out
> at
> some point causing the service to crash.
> 
> Mihael
> 
> On Sat, 2013-03-09 at 17:49 -0600, Michael Wilde wrote:
> > An update on this provider staging related issue: reducing filesize
> > from 17MB to 600KB runs well.
> > 
> > So seems like some kind of flow control or buffer management
> > problem, possibly?
> > 
> > May need to take that problem offline - would be a perfect test
> > case for Yadu to develop a new stress test for.
> > 
> > - Mike
> > 
> > 
> > ----- Forwarded Message -----
> > From: "Michael Wilde" <wilde at mcs.anl.gov>
> > To: "David Kelly" <davidk at ci.uchicago.edu>
> > Sent: Saturday, March 9, 2013 5:21:49 PM
> > Subject: Re: runs for OSG talk
> > 
> > OK, much better: with 600K files (5x5 reduction or 25X smaller) it
> > works well, and fast (form midway to beagle!)
> > 
> > Swift 0.94RC4 swift-r6284 cog-r3607 (cog modified locally)
> > 
> > RunID: 20130309-2319-5zq0jrfg
> > Progress:  time: Sat, 09 Mar 2013 23:19:45 +0000
> > Progress:  time: Sat, 09 Mar 2013 23:19:56 +0000  Selecting
> > site:269  Submitting:47  Submitted:1
> > Progress:  time: Sat, 09 Mar 2013 23:20:05 +0000  Selecting
> > site:269  Stage in:1  Submitted:47
> > Progress:  time: Sat, 09 Mar 2013 23:20:09 +0000  Selecting
> > site:269  Stage in:47  Active:1
> > Progress:  time: Sat, 09 Mar 2013 23:20:10 +0000  Selecting
> > site:269  Stage in:46  Active:1  Stage out:1
> > Progress:  time: Sat, 09 Mar 2013 23:20:11 +0000  Selecting
> > site:250  Stage in:19  Active:28  Stage out:1  Finished
> > successfully:19
> > Progress:  time: Sat, 09 Mar 2013 23:20:12 +0000  Selecting
> > site:229  Stage in:18  Submitting:21  Active:1  Stage out:7
> >  Finished successfully:41
> > Progress:  time: Sat, 09 Mar 2013 23:20:13 +0000  Selecting
> > site:220  Stage in:41  Submitting:1  Active:5  Stage out:1
> >  Finished successfully:49
> > Progress:  time: Sat, 09 Mar 2013 23:20:14 +0000  Selecting
> > site:220  Stage in:38  Active:1  Stage out:9  Finished
> > successfully:49
> > Progress:  time: Sat, 09 Mar 2013 23:20:15 +0000  Selecting
> > site:212  Stage in:30  Submitting:8  Stage out:9  Finished
> > successfully:58
> > Progress:  time: Sat, 09 Mar 2013 23:20:16 +0000  Selecting
> > site:203  Stage in:38  Submitting:8  Submitted:1  Finished
> > successfully:67
> > Progress:  time: Sat, 09 Mar 2013 23:20:18 +0000  Selecting
> > site:202  Stage in:19  Stage out:28  Finished successfully:68
> > Progress:  time: Sat, 09 Mar 2013 23:20:19 +0000  Selecting
> > site:172  Stage in:33  Submitting:2  Submitted:6  Active:5  Stage
> > out:2  Finished successfully:97
> > Progress:  time: Sat, 09 Mar 2013 23:20:20 +0000  Selecting
> > site:170  Stage in:31  Submitting:2  Stage out:14  Finished
> > successfully:100
> > Progress:  time: Sat, 09 Mar 2013 23:20:21 +0000  Selecting
> > site:162  Stage in:30  Submitting:10  Stage out:6  Finished
> > successfully:109
> > Progress:  time: Sat, 09 Mar 2013 23:20:22 +0000  Selecting
> > site:154  Stage in:39  Submitting:5  Submitted:3  Active:1
> >  Finished successfully:115
> > Progress:  time: Sat, 09 Mar 2013 23:20:23 +0000  Selecting
> > site:154  Stage in:21  Active:10  Stage out:16  Finished
> > successfully:116
> > Progress:  time: Sat, 09 Mar 2013 23:20:24 +0000  Selecting
> > site:126  Stage in:20  Submitting:25  Submitted:1  Stage out:2
> >  Finished successfully:143
> > Progress:  time: Sat, 09 Mar 2013 23:20:25 +0000  Selecting
> > site:124  Stage in:31  Active:2  Stage out:15  Finished
> > successfully:145
> > Progress:  time: Sat, 09 Mar 2013 23:20:26 +0000  Selecting
> > site:110  Stage in:30  Submitting:14  Stage out:3  Finished
> > successfully:160
> > Progress:  time: Sat, 09 Mar 2013 23:20:27 +0000  Selecting
> > site:106  Stage in:43  Submitting:1  Submitted:1  Active:1  Stage
> > out:2  Finished successfully:163
> > Progress:  time: Sat, 09 Mar 2013 23:20:28 +0000  Selecting
> > site:104  Stage in:20  Submitting:2  Active:7  Stage out:19
> >  Finished successfully:165
> > Progress:  time: Sat, 09 Mar 2013 23:20:29 +0000  Selecting site:78
> >  Stage in:29  Submitting:16  Submitted:1  Stage out:2  Finished
> > successfully:191
> > Progress:  time: Sat, 09 Mar 2013 23:20:31 +0000  Selecting site:76
> >  Stage in:30  Stage out:17  Finished successfully:194
> > Progress:  time: Sat, 09 Mar 2013 23:20:32 +0000  Selecting site:58
> >  Stage in:29  Submitting:18  Active:1  Finished successfully:211
> > Progress:  time: Sat, 09 Mar 2013 23:20:33 +0000  Selecting site:58
> >  Stage in:33  Active:3  Stage out:12  Finished successfully:211
> > Progress:  time: Sat, 09 Mar 2013 23:20:34 +0000  Selecting site:46
> >  Stage in:18  Submitting:11  Submitted:1  Active:2  Stage out:14
> >  Finished successfully:225
> > Progress:  time: Sat, 09 Mar 2013 23:20:35 +0000  Selecting site:30
> >  Stage in:29  Active:14  Stage out:3  Finished successfully:241
> > Progress:  time: Sat, 09 Mar 2013 23:20:36 +0000  Selecting site:28
> >  Stage in:28  Submitting:2  Stage out:17  Finished
> > successfully:242
> > Progress:  time: Sat, 09 Mar 2013 23:20:37 +0000  Selecting site:10
> >  Stage in:30  Submitting:17  Submitted:1  Finished
> > successfully:259
> > Progress:  time: Sat, 09 Mar 2013 23:20:38 +0000  Selecting site:10
> >  Stage in:35  Stage out:13  Finished successfully:259
> > Progress:  time: Sat, 09 Mar 2013 23:20:39 +0000  Stage in:21
> >  Submitting:6  Submitted:3  Stage out:15  Finished
> > successfully:272
> > Progress:  time: Sat, 09 Mar 2013 23:20:40 +0000  Stage in:10
> >  Active:5  Stage out:14  Finished successfully:288
> > Final status: Sat, 09 Mar 2013 23:20:41 +0000  Finished
> > successfully:317
> > 
> > real	0m58.953s
> > user	0m32.573s
> > sys	0m1.263s
> > + mv /home/wilde/.swift/runs/current/run029.1362871183
> > /home/wilde/.swift/runs/completed
> > midway001$
> > 
> > 
> > 
> > ----- Original Message -----
> > > From: "David Kelly" <davidk at ci.uchicago.edu>
> > > To: "Michael Wilde" <wilde at mcs.anl.gov>
> > > Sent: Saturday, March 9, 2013 5:12:59 PM
> > > Subject: Re: runs for OSG talk
> > > 
> > > 
> > > Yep - I had a version where the input files were in a very
> > > similar
> > > format (PGM, 1 byte per pixel). I'll add that back, but without
> > > the
> > > small PGM header in the files.
> > > 
> > > ----- Original Message -----
> > > 
> > > 
> > > From: "Michael Wilde" <wilde at mcs.anl.gov>
> > > To: "David Kelly" <davidk at ci.uchicago.edu>
> > > Sent: Saturday, March 9, 2013 5:04:43 PM
> > > Subject: Re: runs for OSG talk
> > > 
> > > I think we need to cut down the size of these files for a demo
> > > (although they are great for a stress test).
> > > 
> > > First, the RGB format by itself uses 3 bytes per pixel when it
> > > only
> > > needs one (for land use)
> > > 
> > > Second, we should cut down by a factor of 9 (3x3) or 16 (4x4).
> > > 
> > > I tried that using simple convert statements, but it always seems
> > > to
> > > yield a file exactly double what it should be.
> > > 
> > > More on this later; was hoping to get things working "as is"
> > > first.
> > > 
> > > I assume you could get the perl code to work on
> > > one-byte-per-pixel
> > > instead of the default 3 for the convert rgb format?
> > > 
> > > - Mike
> > > 
> > > ----- Original Message -----
> > > > From: "David Kelly" <davidk at ci.uchicago.edu>
> > > > To: "Michael Wilde" <wilde at mcs.anl.gov>
> > > > Sent: Saturday, March 9, 2013 4:36:30 PM
> > > > Subject: Re: runs for OSG talk
> > > > 
> > > > 
> > > > That would probably be a good idea for a new script, to show
> > > > how to
> > > > stage apps like that. For now I updated the scripts on lustre..
> > > > hopefully that helps.
> > > > 
> > > > ----- Original Message -----
> > > > 
> > > > 
> > > > From: "Michael Wilde" <wilde at mcs.anl.gov>
> > > > To: "David Kelly" <davidk at ci.uchicago.edu>
> > > > Sent: Saturday, March 9, 2013 4:29:14 PM
> > > > Subject: Re: runs for OSG talk
> > > > 
> > > > OK, I see that its trying to run getlanduse.sh from your
> > > > /lustre
> > > > dir
> > > > on beagle, which is different than the one Ive got checked out.
> > > > It
> > > > seems to get an error in a stderr redirect??? Let me se what I
> > > > need
> > > > to do to get the beagle side in sync.
> > > > 
> > > > Seems like since these are perl scripts, we should make the
> > > > app()
> > > > /bin/sh and send the script as data, perhaps?
> > > > 
> > > > - Mike
> > > > 
> > > > ----- Original Message -----
> > > > > From: "Michael Wilde" <wilde at mcs.anl.gov>
> > > > > To: "David Kelly" <davidk at ci.uchicago.edu>
> > > > > Sent: Saturday, March 9, 2013 4:19:31 PM
> > > > > Subject: Re: runs for OSG talk
> > > > > 
> > > > > OK, making progress. Now I dialed down the throttle and node
> > > > > counts
> > > > > to 48 jobs.
> > > > > 
> > > > > Now I get further, for ./demo and site=4 script=2:
> > > > > 
> > > > > RunID: 20130309-2214-1oi3rvea
> > > > > Progress: time: Sat, 09 Mar 2013 22:14:06 +0000
> > > > > Progress: time: Sat, 09 Mar 2013 22:14:17 +0000 Selecting
> > > > > site:269
> > > > > Submitting:47 Submitted:1
> > > > > Progress: time: Sat, 09 Mar 2013 22:14:22 +0000 Selecting
> > > > > site:269
> > > > > Stage in:1 Submitted:47
> > > > > Progress: time: Sat, 09 Mar 2013 22:14:28 +0000 Selecting
> > > > > site:269
> > > > > Stage in:25 Submitted:23
> > > > > Progress: time: Sat, 09 Mar 2013 22:14:36 +0000 Selecting
> > > > > site:269
> > > > > Stage in:48
> > > > > Progress: time: Sat, 09 Mar 2013 22:15:06 +0000 Selecting
> > > > > site:269
> > > > > Stage in:48
> > > > > Progress: time: Sat, 09 Mar 2013 22:15:36 +0000 Selecting
> > > > > site:269
> > > > > Stage in:48
> > > > > Progress: time: Sat, 09 Mar 2013 22:16:06 +0000 Selecting
> > > > > site:269
> > > > > Stage in:48
> > > > > Progress: time: Sat, 09 Mar 2013 22:16:26 +0000 Selecting
> > > > > site:269
> > > > > Stage in:47 Active:1
> > > > > Progress: time: Sat, 09 Mar 2013 22:16:27 +0000 Selecting
> > > > > site:269
> > > > > Stage in:36 Active:12
> > > > > Progress: time: Sat, 09 Mar 2013 22:16:29 +0000 Selecting
> > > > > site:269
> > > > > Stage in:24 Active:24
> > > > > Progress: time: Sat, 09 Mar 2013 22:16:34 +0000 Selecting
> > > > > site:269
> > > > > Stage in:24 Active:23 Stage out:1
> > > > > Progress: time: Sat, 09 Mar 2013 22:16:35 +0000 Selecting
> > > > > site:269
> > > > > Stage in:14 Active:33 Stage out:1
> > > > > Execution failed:
> > > > > Exception in getlanduse:
> > > > > Arguments:
> > > > > [home/wilde/osgdemo/modis/svn/data/modis/2002/h08v04.rgb]
> > > > > Host: beagle
> > > > > Directory:
> > > > > modis02-20130309-2214-1oi3rvea/jobs/k/getlanduse-ko5qjd6l
> > > > > 
> > > > > Caused by:
> > > > > Application /lustre/beagle/davidk/modis/bin/getlanduse.sh
> > > > > failed
> > > > > with an exit code of 1
> > > > > getLandUse, modis02.swift, line 20
> > > > > 
> > > > > real 2m31.463s
> > > > > user 1m33.238s
> > > > > sys 0m2.160s
> > > > > + mv /home/wilde/.swift/runs/current/run024.1362867244
> > > > > /home/wilde/.swift/runs/completed
> > > > > midway001$
> > > > > 
> > > > > 
> > > > > ----- Original Message -----
> > > > > > From: "David Kelly" <davidk at ci.uchicago.edu>
> > > > > > To: "Michael Wilde" <wilde at mcs.anl.gov>
> > > > > > Sent: Saturday, March 9, 2013 3:55:30 PM
> > > > > > Subject: Re: runs for OSG talk
> > > > > > 
> > > > > > 
> > > > > > ok, I'll take a look at that. The run dir I used was
> > > > > > /scratch/midway/davidkelly999/modis/run011
> > > > > > 
> > > > > > 
> > > > > > ----- Original Message -----
> > > > > > 
> > > > > > 
> > > > > > From: "Michael Wilde" <wilde at mcs.anl.gov>
> > > > > > To: "David Kelly" <davidk at ci.uchicago.edu>
> > > > > > Sent: Saturday, March 9, 2013 3:52:28 PM
> > > > > > Subject: Re: runs for OSG talk
> > > > > > 
> > > > > > I just tried this, but didnt work - same prob.
> > > > > > 
> > > > > > But if its working for you now, we must be close.
> > > > > > 
> > > > > > Not yet sure what the diff is...
> > > > > > 
> > > > > > My run dir is /home/wilde/osgdemo/modis/svn/run021
> > > > > > 
> > > > > > - Mike
> > > > > > 
> > > > > > ----- Original Message -----
> > > > > > > From: "David Kelly" <davidk at ci.uchicago.edu>
> > > > > > > To: "Michael Wilde" <wilde at mcs.anl.gov>
> > > > > > > Sent: Saturday, March 9, 2013 3:46:13 PM
> > > > > > > Subject: Re: runs for OSG talk
> > > > > > > 
> > > > > > > 
> > > > > > > Had to make sure I was using the IP address on eth4
> > > > > > > (128.135.112.71
> > > > > > > for midway-login1), not a local address or an infiniband
> > > > > > > address.
> > > > > > > 
> > > > > > > ----- Original Message -----
> > > > > > > 
> > > > > > > 
> > > > > > > From: "David Kelly" <davidk at ci.uchicago.edu>
> > > > > > > To: "Michael Wilde" <wilde at mcs.anl.gov>
> > > > > > > Sent: Saturday, March 9, 2013 3:43:51 PM
> > > > > > > Subject: Re: runs for OSG talk
> > > > > > > 
> > > > > > > 
> > > > > > > I just got it working. I had to adjust for the
> > > > > > > differences in
> > > > > > > my
> > > > > > > username on Beagle/Midway, then I had to set
> > > > > > > GLOBUS_HOSTNAME
> > > > > > > on
> > > > > > > Midway to the IP address, rather than the full hostname
> > > > > > > 
> > > > > > > ----- Original Message -----
> > > > > > > 
> > > > > > > 
> > > > > > > From: "Michael Wilde" <wilde at mcs.anl.gov>
> > > > > > > To: "David Kelly" <davidk at ci.uchicago.edu>
> > > > > > > Sent: Saturday, March 9, 2013 3:40:03 PM
> > > > > > > Subject: Re: runs for OSG talk
> > > > > > > 
> > > > > > > 
> > > > > > > 
> > > > > > > ----- Original Message -----
> > > > > > > > From: "David Kelly" <davidk at ci.uchicago.edu>
> > > > > > > > To: "Michael Wilde" <wilde at mcs.anl.gov>
> > > > > > > > Sent: Saturday, March 9, 2013 3:34:58 PM
> > > > > > > > Subject: Re: runs for OSG talk
> > > > > > > > 
> > > > > > > > 
> > > > > > > > Is your username the same on beagle and midway?
> > > > > > > 
> > > > > > > Yes. And I verified that I can ssh to login4 on beagle
> > > > > > > from
> > > > > > > my
> > > > > > > midway
> > > > > > > session (as indeed the scp's of the proxy files seem to
> > > > > > > be
> > > > > > > working)
> > > > > > > 
> > > > > > > - Mike
> > > > > > > 
> > > > > > > > 
> > > > > > > > ----- Original Message -----
> > > > > > > > 
> > > > > > > > 
> > > > > > > > From: "Michael Wilde" <wilde at mcs.anl.gov>
> > > > > > > > To: "David Kelly" <davidk at ci.uchicago.edu>
> > > > > > > > Sent: Saturday, March 9, 2013 3:34:28 PM
> > > > > > > > Subject: Re: runs for OSG talk
> > > > > > > > 
> > > > > > > > OK.
> > > > > > > > 
> > > > > > > > Ignore what I said about "problem finding java" - thats
> > > > > > > > code
> > > > > > > > in
> > > > > > > > the
> > > > > > > > very long escaped shell command that gets sent to the
> > > > > > > > remote
> > > > > > > > side.
> > > > > > > > I
> > > > > > > > dont *think* thats the problem.
> > > > > > > > 
> > > > > > > > I also verified that beagle can connect to ports 50001
> > > > > > > > etc
> > > > > > > > on
> > > > > > > > swift.rcc, and that seems OK.
> > > > > > > > 
> > > > > > > > I exported GLOBUS_HOSTNAME=midway001.rcc.uchicago.edu
> > > > > > > > on
> > > > > > > > the
> > > > > > > > midway
> > > > > > > > side. And the beagle side seems to be connecting there.
> > > > > > > > 
> > > > > > > > Im a bit confused about the timestamps I see for the
> > > > > > > > proxy
> > > > > > > > expiration
> > > > > > > > time, but am not yet suspicious of that (although it
> > > > > > > > seems
> > > > > > > > less
> > > > > > > > than
> > > > > > > > 5 hours past GMT... not sure.)
> > > > > > > > 
> > > > > > > > - Mike
> > > > > > > > 
> > > > > > > > ----- Original Message -----
> > > > > > > > > From: "David Kelly" <davidk at ci.uchicago.edu>
> > > > > > > > > To: "Michael Wilde" <wilde at mcs.anl.gov>
> > > > > > > > > Sent: Saturday, March 9, 2013 3:26:32 PM
> > > > > > > > > Subject: Re: runs for OSG talk
> > > > > > > > > 
> > > > > > > > > 
> > > > > > > > > I'm seeing the same error now.. looking into it
> > > > > > > > > 
> > > > > > > > > ----- Original Message -----
> > > > > > > > > 
> > > > > > > > > 
> > > > > > > > > From: "Michael Wilde" <wilde at mcs.anl.gov>
> > > > > > > > > To: "David Kelly" <davidk at ci.uchicago.edu>
> > > > > > > > > Sent: Saturday, March 9, 2013 3:21:30 PM
> > > > > > > > > Subject: Re: runs for OSG talk
> > > > > > > > > 
> > > > > > > > > Looking deeper I see that the logs show problems with
> > > > > > > > > finding
> > > > > > > > > Java,
> > > > > > > > > I
> > > > > > > > > assume on beagle, ans also service ending (presumably
> > > > > > > > > coaster
> > > > > > > > > service on midway host).
> > > > > > > > > 
> > > > > > > > > I'll dig into these two.
> > > > > > > > > 
> > > > > > > > > I see that it scp's the proxies to beagle which I
> > > > > > > > > think
> > > > > > > > > answers
> > > > > > > > > my
> > > > > > > > > question about security.
> > > > > > > > > 
> > > > > > > > > - Mike
> > > > > > > > > 
> > > > > > > > > ----- Original Message -----
> > > > > > > > > > From: "Michael Wilde" <wilde at mcs.anl.gov>
> > > > > > > > > > To: "David Kelly" <davidk at ci.uchicago.edu>
> > > > > > > > > > Sent: Saturday, March 9, 2013 3:15:01 PM
> > > > > > > > > > Subject: Re: runs for OSG talk
> > > > > > > > > > 
> > > > > > > > > > OK. Any thoughts about beagle?
> > > > > > > > > > 
> > > > > > > > > > Ive been experimenting but still cant get it to
> > > > > > > > > > work,
> > > > > > > > > > same
> > > > > > > > > > error
> > > > > > > > > > (cant connect to bootstrap port)
> > > > > > > > > > 
> > > > > > > > > > WHen you tried ssh-cl to beagle with automatic
> > > > > > > > > > coasters,
> > > > > > > > > > what
> > > > > > > > > > configuration (sites env etc) did you use?
> > > > > > > > > > 
> > > > > > > > > > I verified that beagle can connect back to the
> > > > > > > > > > midway
> > > > > > > > > > hosts
> > > > > > > > > > and
> > > > > > > > > > ports.
> > > > > > > > > > 
> > > > > > > > > > Do we need to specify security or create a proxy
> > > > > > > > > > etc?
> > > > > > > > > > 
> > > > > > > > > > Thanks,
> > > > > > > > > > 
> > > > > > > > > > - Mike
> > > > > > > > > > 
> > > > > > > > > > 
> > > > > > > > > > ----- Original Message -----
> > > > > > > > > > > From: "David Kelly" <davidk at ci.uchicago.edu>
> > > > > > > > > > > To: "Michael Wilde" <wilde at mcs.anl.gov>
> > > > > > > > > > > Sent: Saturday, March 9, 2013 3:08:58 PM
> > > > > > > > > > > Subject: Re: runs for OSG talk
> > > > > > > > > > > 
> > > > > > > > > > > 
> > > > > > > > > > > 
> > > > > > > > > > > One way you can override/customize the default
> > > > > > > > > > > templates
> > > > > > > > > > > is
> > > > > > > > > > > to
> > > > > > > > > > > create
> > > > > > > > > > > them in $HOME/.swift/sites (I'm not sure if
> > > > > > > > > > > that's
> > > > > > > > > > > what
> > > > > > > > > > > you
> > > > > > > > > > > mean
> > > > > > > > > > > by
> > > > > > > > > > > a local sites dir or not). But you are right
> > > > > > > > > > > about
> > > > > > > > > > > Midway
> > > > > > > > > > > -
> > > > > > > > > > > I
> > > > > > > > > > > have
> > > > > > > > > > > noticed that when using modis it will sometimes
> > > > > > > > > > > get
> > > > > > > > > > > stuck
> > > > > > > > > > > when
> > > > > > > > > > > it
> > > > > > > > > > > goes to a queue that is busy. Ideally swift
> > > > > > > > > > > replication
> > > > > > > > > > > would
> > > > > > > > > > > be
> > > > > > > > > > > able to help better handle that, but I haven't
> > > > > > > > > > > had
> > > > > > > > > > > much
> > > > > > > > > > > luck
> > > > > > > > > > > with
> > > > > > > > > > > that yet. Another way around this may be to add
> > > > > > > > > > > this
> > > > > > > > > > > to
> > > > > > > > > > > the
> > > > > > > > > > > template:
> > > > > > > > > > > 
> > > > > > > > > > > 
> > > > > > > > > > > <profile namespace="globus"
> > > > > > > > > > > key="slurm.exclusive">false</profile>
> > > > > > > > > > > 
> > > > > > > > > > > 
> > > > > > > > > > > The swift.log issue was never fixed. It went to
> > > > > > > > > > > swift-devel
> > > > > > > > > > > for
> > > > > > > > > > > discussion but was never fixed. I think it is
> > > > > > > > > > > relatively
> > > > > > > > > > > simple
> > > > > > > > > > > though.. probably worth fixing before release.
> > > > > > > > > > > 
> > > > > > > > > > > 
> > > > > > > > > > > 
> > > > > > > > > > > 
> > > > > > > > > > > ----- Original Message -----
> > > > > > > > > > > 
> > > > > > > > > > > 
> > > > > > > > > > > From: "Michael Wilde" <wilde at mcs.anl.gov>
> > > > > > > > > > > To: "David Kelly" <davidk at ci.uchicago.edu>
> > > > > > > > > > > Sent: Saturday, March 9, 2013 1:38:47 PM
> > > > > > > > > > > Subject: Re: runs for OSG talk
> > > > > > > > > > > 
> > > > > > > > > > > OK, sounds good re the trip plan. Feel free to
> > > > > > > > > > > stay
> > > > > > > > > > > Tue
> > > > > > > > > > > night
> > > > > > > > > > > to
> > > > > > > > > > > avoid a 4hr drive after a long day.
> > > > > > > > > > > 
> > > > > > > > > > > Im trying the modis demo.
> > > > > > > > > > > 
> > > > > > > > > > > I tried to create a local sites/ dir so I can
> > > > > > > > > > > modify
> > > > > > > > > > > the
> > > > > > > > > > > sites
> > > > > > > > > > > templates; thats not working for me either yet.
> > > > > > > > > > > 
> > > > > > > > > > > For midway, need to force to westmere or sandyb
> > > > > > > > > > > (but
> > > > > > > > > > > not
> > > > > > > > > > > both)
> > > > > > > > > > > and
> > > > > > > > > > > ensure 1-node jobs, because either queue can get
> > > > > > > > > > > filled
> > > > > > > > > > > and
> > > > > > > > > > > not
> > > > > > > > > > > yield an idle node for a long time. maybe need to
> > > > > > > > > > > fiddle
> > > > > > > > > > > jobsPerNode
> > > > > > > > > > > to get at least 1 core when the system is busy
> > > > > > > > > > > and
> > > > > > > > > > > *pretend*
> > > > > > > > > > > that
> > > > > > > > > > > its a node.
> > > > > > > > > > > 
> > > > > > > > > > > So to get response I tried beagle-ssh; That isnt
> > > > > > > > > > > working
> > > > > > > > > > > because
> > > > > > > > > > > the
> > > > > > > > > > > template sites file is wrong in swift 0.94 rc4.
> > > > > > > > > > > 
> > > > > > > > > > > I also see that swift.log is still getting
> > > > > > > > > > > produced -
> > > > > > > > > > > I
> > > > > > > > > > > thought
> > > > > > > > > > > we
> > > > > > > > > > > eliminated that. Did it come back due to a
> > > > > > > > > > > problem
> > > > > > > > > > > with
> > > > > > > > > > > that
> > > > > > > > > > > fix?
> > > > > > > > > > > 
> > > > > > > > > > > I'll keep hacking; suggestions welcome.
> > > > > > > > > > > 
> > > > > > > > > > > - Mike
> > > > > > > > > > > 
> > > > > > > > > > > 
> > > > > > > > > > > ----- Original Message -----
> > > > > > > > > > > > From: "David Kelly" <davidk at ci.uchicago.edu>
> > > > > > > > > > > > To: "Michael Wilde" <wilde at mcs.anl.gov>
> > > > > > > > > > > > Sent: Saturday, March 9, 2013 12:20:00 PM
> > > > > > > > > > > > Subject: Re: runs for OSG talk
> > > > > > > > > > > > 
> > > > > > > > > > > > 
> > > > > > > > > > > > Hi Mike,
> > > > > > > > > > > > 
> > > > > > > > > > > > 
> > > > > > > > > > > > Looking more closely at the agenda, I think the
> > > > > > > > > > > > most
> > > > > > > > > > > > interesting/useful talks will be on Tuesday.
> > > > > > > > > > > > Monday
> > > > > > > > > > > > I'll
> > > > > > > > > > > > come
> > > > > > > > > > > > to
> > > > > > > > > > > > Argonne to work on any loose ends and put the
> > > > > > > > > > > > finishing
> > > > > > > > > > > > touches
> > > > > > > > > > > > on
> > > > > > > > > > > > any slides/runs/scripts, then drive to
> > > > > > > > > > > > Indianapolis
> > > > > > > > > > > > on
> > > > > > > > > > > > Monday
> > > > > > > > > > > > afternoon/evening. I have a hotel booked for
> > > > > > > > > > > > Monday
> > > > > > > > > > > > night.
> > > > > > > > > > > > 
> > > > > > > > > > > > 
> > > > > > > > > > > > I'll do some runs using the routes we talked
> > > > > > > > > > > > about.
> > > > > > > > > > > > I'm
> > > > > > > > > > > > pretty
> > > > > > > > > > > > sure
> > > > > > > > > > > > I
> > > > > > > > > > > > have working configurations for everything we
> > > > > > > > > > > > talked
> > > > > > > > > > > > about,
> > > > > > > > > > > > so
> > > > > > > > > > > > I
> > > > > > > > > > > > think it's really just a matter of plugging in
> > > > > > > > > > > > the
> > > > > > > > > > > > apps.
> > > > > > > > > > > > 
> > > > > > > > > > > > 
> > > > > > > > > > > > David
> > > > > > > > > > > > 
> > > > > > > > > > > > ----- Original Message -----
> > > > > > > > > > > > 
> > > > > > > > > > > > 
> > > > > > > > > > > > From: "Michael Wilde" <wilde at mcs.anl.gov>
> > > > > > > > > > > > To: "David Kelly" <davidk at ci.uchicago.edu>
> > > > > > > > > > > > Sent: Saturday, March 9, 2013 11:03:15 AM
> > > > > > > > > > > > Subject: runs for OSG talk
> > > > > > > > > > > > 
> > > > > > > > > > > > Hi David,
> > > > > > > > > > > > 
> > > > > > > > > > > > I just wanted to let you know that Im looking
> > > > > > > > > > > > into
> > > > > > > > > > > > the
> > > > > > > > > > > > run
> > > > > > > > > > > > options
> > > > > > > > > > > > now. Im hoping to try a few... WIll see how
> > > > > > > > > > > > much
> > > > > > > > > > > > help
> > > > > > > > > > > > I
> > > > > > > > > > > > need.
> > > > > > > > > > > > Have
> > > > > > > > > > > > you decided on a driving time and made hotel
> > > > > > > > > > > > arrangements?
> > > > > > > > > > > > 
> > > > > > > > > > > > I would feel free to stay for whatever portion
> > > > > > > > > > > > of
> > > > > > > > > > > > the
> > > > > > > > > > > > OSG
> > > > > > > > > > > > meeting
> > > > > > > > > > > > you
> > > > > > > > > > > > feel is of value. The only thing I ask is that
> > > > > > > > > > > > for
> > > > > > > > > > > > Wed
> > > > > > > > > > > > and
> > > > > > > > > > > > Thu
> > > > > > > > > > > > you
> > > > > > > > > > > > stay available online for user-support or other
> > > > > > > > > > > > assistance
> > > > > > > > > > > > needs
> > > > > > > > > > > > that come up here. And that you engage with
> > > > > > > > > > > > people
> > > > > > > > > > > > that
> > > > > > > > > > > > can
> > > > > > > > > > > > help
> > > > > > > > > > > > us
> > > > > > > > > > > > develop the Swift user community and reliable
> > > > > > > > > > > > OSG
> > > > > > > > > > > > usage.
> > > > > > > > > > > > Rob,
> > > > > > > > > > > > Marco,
> > > > > > > > > > > > Lincoln, and Suchandra would be good to hang
> > > > > > > > > > > > out
> > > > > > > > > > > > with
> > > > > > > > > > > > and
> > > > > > > > > > > > they
> > > > > > > > > > > > can
> > > > > > > > > > > > introduce you to good contacts.
> > > > > > > > > > > > 
> > > > > > > > > > > > Of course we will cover your expenses via a
> > > > > > > > > > > > UChicago
> > > > > > > > > > > > travel
> > > > > > > > > > > > expense
> > > > > > > > > > > > report.
> > > > > > > > > > > > 
> > > > > > > > > > > > We'll be starting a project with a tiny bit of
> > > > > > > > > > > > additional
> > > > > > > > > > > > ExTENCI
> > > > > > > > > > > > funds to make Swift do smarter data management
> > > > > > > > > > > > on
> > > > > > > > > > > > OSG
> > > > > > > > > > > > sites
> > > > > > > > > > > > (and
> > > > > > > > > > > > in
> > > > > > > > > > > > general) so anything you learn about OSG
> > > > > > > > > > > > storage
> > > > > > > > > > > > elements/services/tools will be valuable for
> > > > > > > > > > > > that
> > > > > > > > > > > > (srmcp,
> > > > > > > > > > > > lcgcp,
> > > > > > > > > > > > etc).
> > > > > > > > > > > > 
> > > > > > > > > > > > Between now and your talk, lets just focus on
> > > > > > > > > > > > the
> > > > > > > > > > > > talk,
> > > > > > > > > > > > OK?
> > > > > > > > > > > > Im
> > > > > > > > > > > > hoping
> > > > > > > > > > > > we have slides frozen by Monday.
> > > > > > > > > > > > 
> > > > > > > > > > > > While I fiddle, if you could do catsn or other
> > > > > > > > > > > > hello-world-like
> > > > > > > > > > > > tests
> > > > > > > > > > > > to cover the "routes" we discussed, that would
> > > > > > > > > > > > pave
> > > > > > > > > > > > the
> > > > > > > > > > > > way
> > > > > > > > > > > > for
> > > > > > > > > > > > plugging in the real app examples.
> > > > > > > > > > > > 
> > > > > > > > > > > > Sound good? Let me know of any concerns (other
> > > > > > > > > > > > than
> > > > > > > > > > > > the
> > > > > > > > > > > > fact
> > > > > > > > > > > > that
> > > > > > > > > > > > this is a tad rushed ;)
> > > > > > > > > > > > 
> > > > > > > > > > > > Thanks and regards,
> > > > > > > > > > > > 
> > > > > > > > > > > > - Mike
> > > > > > > > > > > > 
> > > > > > > > > > > > 
> > > > > > > > > > > > --
> > > > > > > > > > > > Michael Wilde
> > > > > > > > > > > > Computation Institute, University of Chicago
> > > > > > > > > > > > Mathematics and Computer Science Division
> > > > > > > > > > > > Argonne National Laboratory
> > > > > > > > > > > > 
> > > > > > > > > > > > 
> > > > > > > > > > > > 
> > > > > > > > > > > 
> > > > > > > > > > > 
> > > > > > > > > > 
> > > > > > > > > 
> > > > > > > > > 
> > > > > > > > 
> > > > > > > > 
> > > > > > > 
> > > > > > > 
> > > > > > > 
> > > > > > 
> > > > > > 
> > > > > 
> > > > 
> > > > 
> > > 
> > > 
> > _______________________________________________
> > Swift-devel mailing list
> > Swift-devel at ci.uchicago.edu
> > https://lists.ci.uchicago.edu/cgi-bin/mailman/listinfo/swift-devel
> 
> 
> 



More information about the Swift-devel mailing list