[Swift-devel] Cant get auto-coasters to run from midway to beagle

Mihael Hategan hategan at mcs.anl.gov
Sun Mar 10 01:36:26 CST 2013


Please try now. I made some changes:

1. start the service with "-l" so that things in your .profile (such as
module load sun-java) would be picked up. However, this also means that
you should unset X509_* stuff or the sshcl proxy forwarding will not
work properly.

2. I fixed a bug that caused an extra connection to the coaster service.
Normally the service connects back to the client and both use that
connection. However, due to some changes in the way credentials were set
for jobs, and the fact that connections were looked up based on both
hostname and credential, the coaster client would ignore the existing
connection and create another one. The initial one with then time out at
some point causing the service to crash.

Mihael

On Sat, 2013-03-09 at 17:49 -0600, Michael Wilde wrote:
> An update on this provider staging related issue: reducing filesize from 17MB to 600KB runs well.
> 
> So seems like some kind of flow control or buffer management problem, possibly?
> 
> May need to take that problem offline - would be a perfect test case for Yadu to develop a new stress test for.
> 
> - Mike
> 
> 
> ----- Forwarded Message -----
> From: "Michael Wilde" <wilde at mcs.anl.gov>
> To: "David Kelly" <davidk at ci.uchicago.edu>
> Sent: Saturday, March 9, 2013 5:21:49 PM
> Subject: Re: runs for OSG talk
> 
> OK, much better: with 600K files (5x5 reduction or 25X smaller) it works well, and fast (form midway to beagle!)
> 
> Swift 0.94RC4 swift-r6284 cog-r3607 (cog modified locally)
> 
> RunID: 20130309-2319-5zq0jrfg
> Progress:  time: Sat, 09 Mar 2013 23:19:45 +0000
> Progress:  time: Sat, 09 Mar 2013 23:19:56 +0000  Selecting site:269  Submitting:47  Submitted:1
> Progress:  time: Sat, 09 Mar 2013 23:20:05 +0000  Selecting site:269  Stage in:1  Submitted:47
> Progress:  time: Sat, 09 Mar 2013 23:20:09 +0000  Selecting site:269  Stage in:47  Active:1
> Progress:  time: Sat, 09 Mar 2013 23:20:10 +0000  Selecting site:269  Stage in:46  Active:1  Stage out:1
> Progress:  time: Sat, 09 Mar 2013 23:20:11 +0000  Selecting site:250  Stage in:19  Active:28  Stage out:1  Finished successfully:19
> Progress:  time: Sat, 09 Mar 2013 23:20:12 +0000  Selecting site:229  Stage in:18  Submitting:21  Active:1  Stage out:7  Finished successfully:41
> Progress:  time: Sat, 09 Mar 2013 23:20:13 +0000  Selecting site:220  Stage in:41  Submitting:1  Active:5  Stage out:1  Finished successfully:49
> Progress:  time: Sat, 09 Mar 2013 23:20:14 +0000  Selecting site:220  Stage in:38  Active:1  Stage out:9  Finished successfully:49
> Progress:  time: Sat, 09 Mar 2013 23:20:15 +0000  Selecting site:212  Stage in:30  Submitting:8  Stage out:9  Finished successfully:58
> Progress:  time: Sat, 09 Mar 2013 23:20:16 +0000  Selecting site:203  Stage in:38  Submitting:8  Submitted:1  Finished successfully:67
> Progress:  time: Sat, 09 Mar 2013 23:20:18 +0000  Selecting site:202  Stage in:19  Stage out:28  Finished successfully:68
> Progress:  time: Sat, 09 Mar 2013 23:20:19 +0000  Selecting site:172  Stage in:33  Submitting:2  Submitted:6  Active:5  Stage out:2  Finished successfully:97
> Progress:  time: Sat, 09 Mar 2013 23:20:20 +0000  Selecting site:170  Stage in:31  Submitting:2  Stage out:14  Finished successfully:100
> Progress:  time: Sat, 09 Mar 2013 23:20:21 +0000  Selecting site:162  Stage in:30  Submitting:10  Stage out:6  Finished successfully:109
> Progress:  time: Sat, 09 Mar 2013 23:20:22 +0000  Selecting site:154  Stage in:39  Submitting:5  Submitted:3  Active:1  Finished successfully:115
> Progress:  time: Sat, 09 Mar 2013 23:20:23 +0000  Selecting site:154  Stage in:21  Active:10  Stage out:16  Finished successfully:116
> Progress:  time: Sat, 09 Mar 2013 23:20:24 +0000  Selecting site:126  Stage in:20  Submitting:25  Submitted:1  Stage out:2  Finished successfully:143
> Progress:  time: Sat, 09 Mar 2013 23:20:25 +0000  Selecting site:124  Stage in:31  Active:2  Stage out:15  Finished successfully:145
> Progress:  time: Sat, 09 Mar 2013 23:20:26 +0000  Selecting site:110  Stage in:30  Submitting:14  Stage out:3  Finished successfully:160
> Progress:  time: Sat, 09 Mar 2013 23:20:27 +0000  Selecting site:106  Stage in:43  Submitting:1  Submitted:1  Active:1  Stage out:2  Finished successfully:163
> Progress:  time: Sat, 09 Mar 2013 23:20:28 +0000  Selecting site:104  Stage in:20  Submitting:2  Active:7  Stage out:19  Finished successfully:165
> Progress:  time: Sat, 09 Mar 2013 23:20:29 +0000  Selecting site:78  Stage in:29  Submitting:16  Submitted:1  Stage out:2  Finished successfully:191
> Progress:  time: Sat, 09 Mar 2013 23:20:31 +0000  Selecting site:76  Stage in:30  Stage out:17  Finished successfully:194
> Progress:  time: Sat, 09 Mar 2013 23:20:32 +0000  Selecting site:58  Stage in:29  Submitting:18  Active:1  Finished successfully:211
> Progress:  time: Sat, 09 Mar 2013 23:20:33 +0000  Selecting site:58  Stage in:33  Active:3  Stage out:12  Finished successfully:211
> Progress:  time: Sat, 09 Mar 2013 23:20:34 +0000  Selecting site:46  Stage in:18  Submitting:11  Submitted:1  Active:2  Stage out:14  Finished successfully:225
> Progress:  time: Sat, 09 Mar 2013 23:20:35 +0000  Selecting site:30  Stage in:29  Active:14  Stage out:3  Finished successfully:241
> Progress:  time: Sat, 09 Mar 2013 23:20:36 +0000  Selecting site:28  Stage in:28  Submitting:2  Stage out:17  Finished successfully:242
> Progress:  time: Sat, 09 Mar 2013 23:20:37 +0000  Selecting site:10  Stage in:30  Submitting:17  Submitted:1  Finished successfully:259
> Progress:  time: Sat, 09 Mar 2013 23:20:38 +0000  Selecting site:10  Stage in:35  Stage out:13  Finished successfully:259
> Progress:  time: Sat, 09 Mar 2013 23:20:39 +0000  Stage in:21  Submitting:6  Submitted:3  Stage out:15  Finished successfully:272
> Progress:  time: Sat, 09 Mar 2013 23:20:40 +0000  Stage in:10  Active:5  Stage out:14  Finished successfully:288
> Final status: Sat, 09 Mar 2013 23:20:41 +0000  Finished successfully:317
> 
> real	0m58.953s
> user	0m32.573s
> sys	0m1.263s
> + mv /home/wilde/.swift/runs/current/run029.1362871183 /home/wilde/.swift/runs/completed
> midway001$ 
> 
> 
> 
> ----- Original Message -----
> > From: "David Kelly" <davidk at ci.uchicago.edu>
> > To: "Michael Wilde" <wilde at mcs.anl.gov>
> > Sent: Saturday, March 9, 2013 5:12:59 PM
> > Subject: Re: runs for OSG talk
> > 
> > 
> > Yep - I had a version where the input files were in a very similar
> > format (PGM, 1 byte per pixel). I'll add that back, but without the
> > small PGM header in the files.
> > 
> > ----- Original Message -----
> > 
> > 
> > From: "Michael Wilde" <wilde at mcs.anl.gov>
> > To: "David Kelly" <davidk at ci.uchicago.edu>
> > Sent: Saturday, March 9, 2013 5:04:43 PM
> > Subject: Re: runs for OSG talk
> > 
> > I think we need to cut down the size of these files for a demo
> > (although they are great for a stress test).
> > 
> > First, the RGB format by itself uses 3 bytes per pixel when it only
> > needs one (for land use)
> > 
> > Second, we should cut down by a factor of 9 (3x3) or 16 (4x4).
> > 
> > I tried that using simple convert statements, but it always seems to
> > yield a file exactly double what it should be.
> > 
> > More on this later; was hoping to get things working "as is" first.
> > 
> > I assume you could get the perl code to work on one-byte-per-pixel
> > instead of the default 3 for the convert rgb format?
> > 
> > - Mike
> > 
> > ----- Original Message -----
> > > From: "David Kelly" <davidk at ci.uchicago.edu>
> > > To: "Michael Wilde" <wilde at mcs.anl.gov>
> > > Sent: Saturday, March 9, 2013 4:36:30 PM
> > > Subject: Re: runs for OSG talk
> > > 
> > > 
> > > That would probably be a good idea for a new script, to show how to
> > > stage apps like that. For now I updated the scripts on lustre..
> > > hopefully that helps.
> > > 
> > > ----- Original Message -----
> > > 
> > > 
> > > From: "Michael Wilde" <wilde at mcs.anl.gov>
> > > To: "David Kelly" <davidk at ci.uchicago.edu>
> > > Sent: Saturday, March 9, 2013 4:29:14 PM
> > > Subject: Re: runs for OSG talk
> > > 
> > > OK, I see that its trying to run getlanduse.sh from your /lustre
> > > dir
> > > on beagle, which is different than the one Ive got checked out. It
> > > seems to get an error in a stderr redirect??? Let me se what I need
> > > to do to get the beagle side in sync.
> > > 
> > > Seems like since these are perl scripts, we should make the app()
> > > /bin/sh and send the script as data, perhaps?
> > > 
> > > - Mike
> > > 
> > > ----- Original Message -----
> > > > From: "Michael Wilde" <wilde at mcs.anl.gov>
> > > > To: "David Kelly" <davidk at ci.uchicago.edu>
> > > > Sent: Saturday, March 9, 2013 4:19:31 PM
> > > > Subject: Re: runs for OSG talk
> > > > 
> > > > OK, making progress. Now I dialed down the throttle and node
> > > > counts
> > > > to 48 jobs.
> > > > 
> > > > Now I get further, for ./demo and site=4 script=2:
> > > > 
> > > > RunID: 20130309-2214-1oi3rvea
> > > > Progress: time: Sat, 09 Mar 2013 22:14:06 +0000
> > > > Progress: time: Sat, 09 Mar 2013 22:14:17 +0000 Selecting
> > > > site:269
> > > > Submitting:47 Submitted:1
> > > > Progress: time: Sat, 09 Mar 2013 22:14:22 +0000 Selecting
> > > > site:269
> > > > Stage in:1 Submitted:47
> > > > Progress: time: Sat, 09 Mar 2013 22:14:28 +0000 Selecting
> > > > site:269
> > > > Stage in:25 Submitted:23
> > > > Progress: time: Sat, 09 Mar 2013 22:14:36 +0000 Selecting
> > > > site:269
> > > > Stage in:48
> > > > Progress: time: Sat, 09 Mar 2013 22:15:06 +0000 Selecting
> > > > site:269
> > > > Stage in:48
> > > > Progress: time: Sat, 09 Mar 2013 22:15:36 +0000 Selecting
> > > > site:269
> > > > Stage in:48
> > > > Progress: time: Sat, 09 Mar 2013 22:16:06 +0000 Selecting
> > > > site:269
> > > > Stage in:48
> > > > Progress: time: Sat, 09 Mar 2013 22:16:26 +0000 Selecting
> > > > site:269
> > > > Stage in:47 Active:1
> > > > Progress: time: Sat, 09 Mar 2013 22:16:27 +0000 Selecting
> > > > site:269
> > > > Stage in:36 Active:12
> > > > Progress: time: Sat, 09 Mar 2013 22:16:29 +0000 Selecting
> > > > site:269
> > > > Stage in:24 Active:24
> > > > Progress: time: Sat, 09 Mar 2013 22:16:34 +0000 Selecting
> > > > site:269
> > > > Stage in:24 Active:23 Stage out:1
> > > > Progress: time: Sat, 09 Mar 2013 22:16:35 +0000 Selecting
> > > > site:269
> > > > Stage in:14 Active:33 Stage out:1
> > > > Execution failed:
> > > > Exception in getlanduse:
> > > > Arguments:
> > > > [home/wilde/osgdemo/modis/svn/data/modis/2002/h08v04.rgb]
> > > > Host: beagle
> > > > Directory:
> > > > modis02-20130309-2214-1oi3rvea/jobs/k/getlanduse-ko5qjd6l
> > > > 
> > > > Caused by:
> > > > Application /lustre/beagle/davidk/modis/bin/getlanduse.sh failed
> > > > with an exit code of 1
> > > > getLandUse, modis02.swift, line 20
> > > > 
> > > > real 2m31.463s
> > > > user 1m33.238s
> > > > sys 0m2.160s
> > > > + mv /home/wilde/.swift/runs/current/run024.1362867244
> > > > /home/wilde/.swift/runs/completed
> > > > midway001$
> > > > 
> > > > 
> > > > ----- Original Message -----
> > > > > From: "David Kelly" <davidk at ci.uchicago.edu>
> > > > > To: "Michael Wilde" <wilde at mcs.anl.gov>
> > > > > Sent: Saturday, March 9, 2013 3:55:30 PM
> > > > > Subject: Re: runs for OSG talk
> > > > > 
> > > > > 
> > > > > ok, I'll take a look at that. The run dir I used was
> > > > > /scratch/midway/davidkelly999/modis/run011
> > > > > 
> > > > > 
> > > > > ----- Original Message -----
> > > > > 
> > > > > 
> > > > > From: "Michael Wilde" <wilde at mcs.anl.gov>
> > > > > To: "David Kelly" <davidk at ci.uchicago.edu>
> > > > > Sent: Saturday, March 9, 2013 3:52:28 PM
> > > > > Subject: Re: runs for OSG talk
> > > > > 
> > > > > I just tried this, but didnt work - same prob.
> > > > > 
> > > > > But if its working for you now, we must be close.
> > > > > 
> > > > > Not yet sure what the diff is...
> > > > > 
> > > > > My run dir is /home/wilde/osgdemo/modis/svn/run021
> > > > > 
> > > > > - Mike
> > > > > 
> > > > > ----- Original Message -----
> > > > > > From: "David Kelly" <davidk at ci.uchicago.edu>
> > > > > > To: "Michael Wilde" <wilde at mcs.anl.gov>
> > > > > > Sent: Saturday, March 9, 2013 3:46:13 PM
> > > > > > Subject: Re: runs for OSG talk
> > > > > > 
> > > > > > 
> > > > > > Had to make sure I was using the IP address on eth4
> > > > > > (128.135.112.71
> > > > > > for midway-login1), not a local address or an infiniband
> > > > > > address.
> > > > > > 
> > > > > > ----- Original Message -----
> > > > > > 
> > > > > > 
> > > > > > From: "David Kelly" <davidk at ci.uchicago.edu>
> > > > > > To: "Michael Wilde" <wilde at mcs.anl.gov>
> > > > > > Sent: Saturday, March 9, 2013 3:43:51 PM
> > > > > > Subject: Re: runs for OSG talk
> > > > > > 
> > > > > > 
> > > > > > I just got it working. I had to adjust for the differences in
> > > > > > my
> > > > > > username on Beagle/Midway, then I had to set GLOBUS_HOSTNAME
> > > > > > on
> > > > > > Midway to the IP address, rather than the full hostname
> > > > > > 
> > > > > > ----- Original Message -----
> > > > > > 
> > > > > > 
> > > > > > From: "Michael Wilde" <wilde at mcs.anl.gov>
> > > > > > To: "David Kelly" <davidk at ci.uchicago.edu>
> > > > > > Sent: Saturday, March 9, 2013 3:40:03 PM
> > > > > > Subject: Re: runs for OSG talk
> > > > > > 
> > > > > > 
> > > > > > 
> > > > > > ----- Original Message -----
> > > > > > > From: "David Kelly" <davidk at ci.uchicago.edu>
> > > > > > > To: "Michael Wilde" <wilde at mcs.anl.gov>
> > > > > > > Sent: Saturday, March 9, 2013 3:34:58 PM
> > > > > > > Subject: Re: runs for OSG talk
> > > > > > > 
> > > > > > > 
> > > > > > > Is your username the same on beagle and midway?
> > > > > > 
> > > > > > Yes. And I verified that I can ssh to login4 on beagle from
> > > > > > my
> > > > > > midway
> > > > > > session (as indeed the scp's of the proxy files seem to be
> > > > > > working)
> > > > > > 
> > > > > > - Mike
> > > > > > 
> > > > > > > 
> > > > > > > ----- Original Message -----
> > > > > > > 
> > > > > > > 
> > > > > > > From: "Michael Wilde" <wilde at mcs.anl.gov>
> > > > > > > To: "David Kelly" <davidk at ci.uchicago.edu>
> > > > > > > Sent: Saturday, March 9, 2013 3:34:28 PM
> > > > > > > Subject: Re: runs for OSG talk
> > > > > > > 
> > > > > > > OK.
> > > > > > > 
> > > > > > > Ignore what I said about "problem finding java" - thats
> > > > > > > code
> > > > > > > in
> > > > > > > the
> > > > > > > very long escaped shell command that gets sent to the
> > > > > > > remote
> > > > > > > side.
> > > > > > > I
> > > > > > > dont *think* thats the problem.
> > > > > > > 
> > > > > > > I also verified that beagle can connect to ports 50001 etc
> > > > > > > on
> > > > > > > swift.rcc, and that seems OK.
> > > > > > > 
> > > > > > > I exported GLOBUS_HOSTNAME=midway001.rcc.uchicago.edu on
> > > > > > > the
> > > > > > > midway
> > > > > > > side. And the beagle side seems to be connecting there.
> > > > > > > 
> > > > > > > Im a bit confused about the timestamps I see for the proxy
> > > > > > > expiration
> > > > > > > time, but am not yet suspicious of that (although it seems
> > > > > > > less
> > > > > > > than
> > > > > > > 5 hours past GMT... not sure.)
> > > > > > > 
> > > > > > > - Mike
> > > > > > > 
> > > > > > > ----- Original Message -----
> > > > > > > > From: "David Kelly" <davidk at ci.uchicago.edu>
> > > > > > > > To: "Michael Wilde" <wilde at mcs.anl.gov>
> > > > > > > > Sent: Saturday, March 9, 2013 3:26:32 PM
> > > > > > > > Subject: Re: runs for OSG talk
> > > > > > > > 
> > > > > > > > 
> > > > > > > > I'm seeing the same error now.. looking into it
> > > > > > > > 
> > > > > > > > ----- Original Message -----
> > > > > > > > 
> > > > > > > > 
> > > > > > > > From: "Michael Wilde" <wilde at mcs.anl.gov>
> > > > > > > > To: "David Kelly" <davidk at ci.uchicago.edu>
> > > > > > > > Sent: Saturday, March 9, 2013 3:21:30 PM
> > > > > > > > Subject: Re: runs for OSG talk
> > > > > > > > 
> > > > > > > > Looking deeper I see that the logs show problems with
> > > > > > > > finding
> > > > > > > > Java,
> > > > > > > > I
> > > > > > > > assume on beagle, ans also service ending (presumably
> > > > > > > > coaster
> > > > > > > > service on midway host).
> > > > > > > > 
> > > > > > > > I'll dig into these two.
> > > > > > > > 
> > > > > > > > I see that it scp's the proxies to beagle which I think
> > > > > > > > answers
> > > > > > > > my
> > > > > > > > question about security.
> > > > > > > > 
> > > > > > > > - Mike
> > > > > > > > 
> > > > > > > > ----- Original Message -----
> > > > > > > > > From: "Michael Wilde" <wilde at mcs.anl.gov>
> > > > > > > > > To: "David Kelly" <davidk at ci.uchicago.edu>
> > > > > > > > > Sent: Saturday, March 9, 2013 3:15:01 PM
> > > > > > > > > Subject: Re: runs for OSG talk
> > > > > > > > > 
> > > > > > > > > OK. Any thoughts about beagle?
> > > > > > > > > 
> > > > > > > > > Ive been experimenting but still cant get it to work,
> > > > > > > > > same
> > > > > > > > > error
> > > > > > > > > (cant connect to bootstrap port)
> > > > > > > > > 
> > > > > > > > > WHen you tried ssh-cl to beagle with automatic
> > > > > > > > > coasters,
> > > > > > > > > what
> > > > > > > > > configuration (sites env etc) did you use?
> > > > > > > > > 
> > > > > > > > > I verified that beagle can connect back to the midway
> > > > > > > > > hosts
> > > > > > > > > and
> > > > > > > > > ports.
> > > > > > > > > 
> > > > > > > > > Do we need to specify security or create a proxy etc?
> > > > > > > > > 
> > > > > > > > > Thanks,
> > > > > > > > > 
> > > > > > > > > - Mike
> > > > > > > > > 
> > > > > > > > > 
> > > > > > > > > ----- Original Message -----
> > > > > > > > > > From: "David Kelly" <davidk at ci.uchicago.edu>
> > > > > > > > > > To: "Michael Wilde" <wilde at mcs.anl.gov>
> > > > > > > > > > Sent: Saturday, March 9, 2013 3:08:58 PM
> > > > > > > > > > Subject: Re: runs for OSG talk
> > > > > > > > > > 
> > > > > > > > > > 
> > > > > > > > > > 
> > > > > > > > > > One way you can override/customize the default
> > > > > > > > > > templates
> > > > > > > > > > is
> > > > > > > > > > to
> > > > > > > > > > create
> > > > > > > > > > them in $HOME/.swift/sites (I'm not sure if that's
> > > > > > > > > > what
> > > > > > > > > > you
> > > > > > > > > > mean
> > > > > > > > > > by
> > > > > > > > > > a local sites dir or not). But you are right about
> > > > > > > > > > Midway
> > > > > > > > > > -
> > > > > > > > > > I
> > > > > > > > > > have
> > > > > > > > > > noticed that when using modis it will sometimes get
> > > > > > > > > > stuck
> > > > > > > > > > when
> > > > > > > > > > it
> > > > > > > > > > goes to a queue that is busy. Ideally swift
> > > > > > > > > > replication
> > > > > > > > > > would
> > > > > > > > > > be
> > > > > > > > > > able to help better handle that, but I haven't had
> > > > > > > > > > much
> > > > > > > > > > luck
> > > > > > > > > > with
> > > > > > > > > > that yet. Another way around this may be to add this
> > > > > > > > > > to
> > > > > > > > > > the
> > > > > > > > > > template:
> > > > > > > > > > 
> > > > > > > > > > 
> > > > > > > > > > <profile namespace="globus"
> > > > > > > > > > key="slurm.exclusive">false</profile>
> > > > > > > > > > 
> > > > > > > > > > 
> > > > > > > > > > The swift.log issue was never fixed. It went to
> > > > > > > > > > swift-devel
> > > > > > > > > > for
> > > > > > > > > > discussion but was never fixed. I think it is
> > > > > > > > > > relatively
> > > > > > > > > > simple
> > > > > > > > > > though.. probably worth fixing before release.
> > > > > > > > > > 
> > > > > > > > > > 
> > > > > > > > > > 
> > > > > > > > > > 
> > > > > > > > > > ----- Original Message -----
> > > > > > > > > > 
> > > > > > > > > > 
> > > > > > > > > > From: "Michael Wilde" <wilde at mcs.anl.gov>
> > > > > > > > > > To: "David Kelly" <davidk at ci.uchicago.edu>
> > > > > > > > > > Sent: Saturday, March 9, 2013 1:38:47 PM
> > > > > > > > > > Subject: Re: runs for OSG talk
> > > > > > > > > > 
> > > > > > > > > > OK, sounds good re the trip plan. Feel free to stay
> > > > > > > > > > Tue
> > > > > > > > > > night
> > > > > > > > > > to
> > > > > > > > > > avoid a 4hr drive after a long day.
> > > > > > > > > > 
> > > > > > > > > > Im trying the modis demo.
> > > > > > > > > > 
> > > > > > > > > > I tried to create a local sites/ dir so I can modify
> > > > > > > > > > the
> > > > > > > > > > sites
> > > > > > > > > > templates; thats not working for me either yet.
> > > > > > > > > > 
> > > > > > > > > > For midway, need to force to westmere or sandyb (but
> > > > > > > > > > not
> > > > > > > > > > both)
> > > > > > > > > > and
> > > > > > > > > > ensure 1-node jobs, because either queue can get
> > > > > > > > > > filled
> > > > > > > > > > and
> > > > > > > > > > not
> > > > > > > > > > yield an idle node for a long time. maybe need to
> > > > > > > > > > fiddle
> > > > > > > > > > jobsPerNode
> > > > > > > > > > to get at least 1 core when the system is busy and
> > > > > > > > > > *pretend*
> > > > > > > > > > that
> > > > > > > > > > its a node.
> > > > > > > > > > 
> > > > > > > > > > So to get response I tried beagle-ssh; That isnt
> > > > > > > > > > working
> > > > > > > > > > because
> > > > > > > > > > the
> > > > > > > > > > template sites file is wrong in swift 0.94 rc4.
> > > > > > > > > > 
> > > > > > > > > > I also see that swift.log is still getting produced -
> > > > > > > > > > I
> > > > > > > > > > thought
> > > > > > > > > > we
> > > > > > > > > > eliminated that. Did it come back due to a problem
> > > > > > > > > > with
> > > > > > > > > > that
> > > > > > > > > > fix?
> > > > > > > > > > 
> > > > > > > > > > I'll keep hacking; suggestions welcome.
> > > > > > > > > > 
> > > > > > > > > > - Mike
> > > > > > > > > > 
> > > > > > > > > > 
> > > > > > > > > > ----- Original Message -----
> > > > > > > > > > > From: "David Kelly" <davidk at ci.uchicago.edu>
> > > > > > > > > > > To: "Michael Wilde" <wilde at mcs.anl.gov>
> > > > > > > > > > > Sent: Saturday, March 9, 2013 12:20:00 PM
> > > > > > > > > > > Subject: Re: runs for OSG talk
> > > > > > > > > > > 
> > > > > > > > > > > 
> > > > > > > > > > > Hi Mike,
> > > > > > > > > > > 
> > > > > > > > > > > 
> > > > > > > > > > > Looking more closely at the agenda, I think the
> > > > > > > > > > > most
> > > > > > > > > > > interesting/useful talks will be on Tuesday. Monday
> > > > > > > > > > > I'll
> > > > > > > > > > > come
> > > > > > > > > > > to
> > > > > > > > > > > Argonne to work on any loose ends and put the
> > > > > > > > > > > finishing
> > > > > > > > > > > touches
> > > > > > > > > > > on
> > > > > > > > > > > any slides/runs/scripts, then drive to Indianapolis
> > > > > > > > > > > on
> > > > > > > > > > > Monday
> > > > > > > > > > > afternoon/evening. I have a hotel booked for Monday
> > > > > > > > > > > night.
> > > > > > > > > > > 
> > > > > > > > > > > 
> > > > > > > > > > > I'll do some runs using the routes we talked about.
> > > > > > > > > > > I'm
> > > > > > > > > > > pretty
> > > > > > > > > > > sure
> > > > > > > > > > > I
> > > > > > > > > > > have working configurations for everything we
> > > > > > > > > > > talked
> > > > > > > > > > > about,
> > > > > > > > > > > so
> > > > > > > > > > > I
> > > > > > > > > > > think it's really just a matter of plugging in the
> > > > > > > > > > > apps.
> > > > > > > > > > > 
> > > > > > > > > > > 
> > > > > > > > > > > David
> > > > > > > > > > > 
> > > > > > > > > > > ----- Original Message -----
> > > > > > > > > > > 
> > > > > > > > > > > 
> > > > > > > > > > > From: "Michael Wilde" <wilde at mcs.anl.gov>
> > > > > > > > > > > To: "David Kelly" <davidk at ci.uchicago.edu>
> > > > > > > > > > > Sent: Saturday, March 9, 2013 11:03:15 AM
> > > > > > > > > > > Subject: runs for OSG talk
> > > > > > > > > > > 
> > > > > > > > > > > Hi David,
> > > > > > > > > > > 
> > > > > > > > > > > I just wanted to let you know that Im looking into
> > > > > > > > > > > the
> > > > > > > > > > > run
> > > > > > > > > > > options
> > > > > > > > > > > now. Im hoping to try a few... WIll see how much
> > > > > > > > > > > help
> > > > > > > > > > > I
> > > > > > > > > > > need.
> > > > > > > > > > > Have
> > > > > > > > > > > you decided on a driving time and made hotel
> > > > > > > > > > > arrangements?
> > > > > > > > > > > 
> > > > > > > > > > > I would feel free to stay for whatever portion of
> > > > > > > > > > > the
> > > > > > > > > > > OSG
> > > > > > > > > > > meeting
> > > > > > > > > > > you
> > > > > > > > > > > feel is of value. The only thing I ask is that for
> > > > > > > > > > > Wed
> > > > > > > > > > > and
> > > > > > > > > > > Thu
> > > > > > > > > > > you
> > > > > > > > > > > stay available online for user-support or other
> > > > > > > > > > > assistance
> > > > > > > > > > > needs
> > > > > > > > > > > that come up here. And that you engage with people
> > > > > > > > > > > that
> > > > > > > > > > > can
> > > > > > > > > > > help
> > > > > > > > > > > us
> > > > > > > > > > > develop the Swift user community and reliable OSG
> > > > > > > > > > > usage.
> > > > > > > > > > > Rob,
> > > > > > > > > > > Marco,
> > > > > > > > > > > Lincoln, and Suchandra would be good to hang out
> > > > > > > > > > > with
> > > > > > > > > > > and
> > > > > > > > > > > they
> > > > > > > > > > > can
> > > > > > > > > > > introduce you to good contacts.
> > > > > > > > > > > 
> > > > > > > > > > > Of course we will cover your expenses via a
> > > > > > > > > > > UChicago
> > > > > > > > > > > travel
> > > > > > > > > > > expense
> > > > > > > > > > > report.
> > > > > > > > > > > 
> > > > > > > > > > > We'll be starting a project with a tiny bit of
> > > > > > > > > > > additional
> > > > > > > > > > > ExTENCI
> > > > > > > > > > > funds to make Swift do smarter data management on
> > > > > > > > > > > OSG
> > > > > > > > > > > sites
> > > > > > > > > > > (and
> > > > > > > > > > > in
> > > > > > > > > > > general) so anything you learn about OSG storage
> > > > > > > > > > > elements/services/tools will be valuable for that
> > > > > > > > > > > (srmcp,
> > > > > > > > > > > lcgcp,
> > > > > > > > > > > etc).
> > > > > > > > > > > 
> > > > > > > > > > > Between now and your talk, lets just focus on the
> > > > > > > > > > > talk,
> > > > > > > > > > > OK?
> > > > > > > > > > > Im
> > > > > > > > > > > hoping
> > > > > > > > > > > we have slides frozen by Monday.
> > > > > > > > > > > 
> > > > > > > > > > > While I fiddle, if you could do catsn or other
> > > > > > > > > > > hello-world-like
> > > > > > > > > > > tests
> > > > > > > > > > > to cover the "routes" we discussed, that would pave
> > > > > > > > > > > the
> > > > > > > > > > > way
> > > > > > > > > > > for
> > > > > > > > > > > plugging in the real app examples.
> > > > > > > > > > > 
> > > > > > > > > > > Sound good? Let me know of any concerns (other than
> > > > > > > > > > > the
> > > > > > > > > > > fact
> > > > > > > > > > > that
> > > > > > > > > > > this is a tad rushed ;)
> > > > > > > > > > > 
> > > > > > > > > > > Thanks and regards,
> > > > > > > > > > > 
> > > > > > > > > > > - Mike
> > > > > > > > > > > 
> > > > > > > > > > > 
> > > > > > > > > > > --
> > > > > > > > > > > Michael Wilde
> > > > > > > > > > > Computation Institute, University of Chicago
> > > > > > > > > > > Mathematics and Computer Science Division
> > > > > > > > > > > Argonne National Laboratory
> > > > > > > > > > > 
> > > > > > > > > > > 
> > > > > > > > > > > 
> > > > > > > > > > 
> > > > > > > > > > 
> > > > > > > > > 
> > > > > > > > 
> > > > > > > > 
> > > > > > > 
> > > > > > > 
> > > > > > 
> > > > > > 
> > > > > > 
> > > > > 
> > > > > 
> > > > 
> > > 
> > > 
> > 
> > 
> _______________________________________________
> Swift-devel mailing list
> Swift-devel at ci.uchicago.edu
> https://lists.ci.uchicago.edu/cgi-bin/mailman/listinfo/swift-devel





More information about the Swift-devel mailing list