[Swift-devel] Configuring Swift to access MosaStore

Michael Wilde wilde at mcs.anl.gov
Mon Mar 5 08:17:42 CST 2012


was: Re: [Swift-devel] coasters-hosts.pl script

Jon, can you create a demo script that shows how to configure a Swift run to use MosaStore. The following approach may work:

- Assume MosaStore will be mounted as /mosa to all workers

- Simulate this with a localhost run, using /tmp/mosa, then do same with *1* worker, N jobs per node (eg 4 on BG/P, 8 on PADS, 2 on Beagle).

- Set CDM direct mode for all paths starting with [/tmp]/mosa. You might need to work through some of the issues with CDM direct where accesses need to match both /tmp/mosa and file:///tmp/mosa (I *think*)

- Map some temporary output-to-input files to /tmp/mosa; create a multi-level "catsncats"-like workflow to exercise it; the recent ParameterSweep example, perhaps extended to do N levels of fan-in/fan-out and pass-N might be a good test.

- see if you can get _concurrent to get placed on /tmp/mosa

I think some of these tests would be a great test case for Swift/Turbine as well.

You can do this is stages; the simple test of mapping CDM-direct files to /tmp/mosa should give Emalayan an initial test case to run once Mosa is ready on the BG/P.

- Mike


----- Original Message -----
> From: "Matei Ripeanu" <matei.ripeanu at gmail.com>
> To: mosastore at googlegroups.com, "Jonathan Monette" <jonmon at mcs.anl.gov>, "Justin M Wozniak" <wozniak at mcs.anl.gov>
> Cc: swift-devel at ci.uchicago.edu, emalayan at ece.ubc.ca
> Sent: Friday, March 2, 2012 6:29:17 PM
> Subject: Re: [Swift-devel] coasters-hosts.pl script
> Indeed this is good news! Thank you.
> 
> 
> 
> Our next task, I think, will be to figure out how to configure Swift
> so that the headnode (where Swift runs) will not require any access to
> intermediate storage (MosaStore). Only the worker nodes will have
> access to intermediate storage. This is to go around the one way
> headnode-worker node connectivity issue.
> 
> 
> 
> Any guidance on how to get this configuration would be much
> appreciated.
> 
> 
> 
> Thank you again,
> 
> 
> 
> -Matei
> 
> 
> 
> 
> 
> From: mosastore at googlegroups.com [mailto:mosastore at googlegroups.com]
> On Behalf Of Emalayan Vairavanathan
> Sent: March-02-12 2:32 PM
> To: Jonathan Monette; Justin M Wozniak
> Cc: swift-devel at ci.uchicago.edu Devel; emalayan at ece.ubc.cais ;
> MosaStore
> Subject: Re: [Swift-devel] coasters-hosts.pl script
> 
> 
> 
> 
> 
> Thank you Jon and Justin.
> 
> 
> 
> 
> 
> This is a great news. I will get back to you if I have questions.
> 
> 
> 
> 
> 
> Regards
> 
> 
> Emalayan
> 
> 
> 
> 
> 
> 
> 
> 
> 
> From: Jonathan Monette < jonmon at mcs.anl.gov >
> To: Justin M Wozniak < wozniak at mcs.anl.gov >
> Cc: " swift-devel at ci.uchicago.edu Devel " <
> swift-devel at ci.uchicago.edu >; emalayan at ece.ubc.ca
> Sent: Friday, 2 March 2012 2:21 PM
> Subject: Re: [Swift-devel] coasters-hosts.pl script
> 
> 
> Emalayan,
> We believe we have fixed the issue. You can copy the new
> coasters-hosts.pl script from
> ~jonmon/surveyor/worker-init-test/coasters-hosts.pl
> 
> This script reads the worker logs located in the logs directory. The
> steps to run are as follows:
> start-coaster-service
> <wait for workers to start>
> ./coasters-hosts.pl logs/worker-*.log > worker-hosts.txt
> 
> You MUST clean out the worker logs after you before you start a new
> coaster service to make sure the script searches the right worker log
> files. This may not be ideal at the moment but this will help get you
> started. If you have any other questions feel free to ask. We will
> need to update the mosaswift site with the new information, we will do
> this soon.
> 
> On Mar 2, 2012, at 11:26 AM, Jonathan Monette wrote:
> 
> > Can we match this line: 2012/03/02 17:16:04.712 INFO - Running on
> > node 172.18.1.83 from the worker log,
> > instead of this line: 2012-03-02 17:21:25,214+0000 DEBUG Cpu worker
> > started: block=2012.0302.171344.704 host=172.18.1.83 id=0 from the
> > cps log?
> >
> > They both provide the same ip addresses. And the worker log always
> > has that ip address before the cps log does.
> >
> > On Mar 2, 2012, at 11:15 AM, Jonathan Monette wrote:
> >
> >> That fix still did not work. I had moved it to the same spot. It is
> >> still waiting for the worker-init.pl script to finish before the ip
> >> addresses are printed to the cps log. Those ip addresses are what
> >> is needed by the coaster-hosts.pl script to finish. If I create an
> >> empty file for the coaster-host.pl script to read, then the work
> >> continues and the ip addresses show up in the cps log.
> >>
> >> Why is log4j waiting to add those lines to the cps log after the
> >> worker-init.pl script is finished?
> >>
> >> On Mar 2, 2012, at 11:05 AM, Jonathan Monette wrote:
> >>
> >>> Thanks, in my copy I thought I had moved the reconnect to before
> >>> the init-cmd and it still wasn't working. I will test with your
> >>> change. I just verified that it was indeed waiting for the
> >>> worker-init.pl script to finish. I created an empty file for the
> >>> script to read and it finished connecting and the ip addresses I
> >>> needed were added to the cps log. I will also be testing your fix.
> >>>
> >>> On Mar 2, 2012, at 11:01 AM, Justin M Wozniak wrote:
> >>>
> >>>>
> >>>> Yes- I must have tested this with a different log file. I just
> >>>> checked in and installed in ~wozniak/Public a fix for this that
> >>>> launches WORKER_INIT_CMD after the reconnect(). I am a little
> >>>> worried about time outs but it works so far. I will continue
> >>>> testing...
> >>>> Justin
> >>>>
> >>>> On Thu, 1 Mar 2012, Jonathan Monette wrote:
> >>>>
> >>>>> Justin,
> >>>>> So I have been trying to help Emalayan get the host list file
> >>>>> for the worker-init.pl script. It seems the cps log file is not
> >>>>> providing the ip addresses for the coasters-hosts.pl script. I
> >>>>> thought this was maybe because we did not have the correct log4j
> >>>>> setting set but we have the Coaster service Cpu set to DEBUG. So
> >>>>> for some reason the workers are not connecting to the service.
> >>>>> When I comment out the export WORKER_ENVIRONEMTN="…" line in the
> >>>>> coaster-service.conf file I see the workers connect and the cps
> >>>>> log file shows there ip addresses. However when setting this
> >>>>> line it seems they are not connecting.
> >>>>>
> >>>>> Emalayan thought there might be some sort of circular dependency
> >>>>> going with the host-list file and the worker. The worker
> >>>>> requires the host-list file so that it can run the
> >>>>> worker-init.pl script and then connect but the host-list file
> >>>>> cannot be generated because the workers cannot connect. I
> >>>>> noticed in your swift-test directory the cps files did have the
> >>>>> ip addresses set and coasters-hosts.pl found the ip addresses
> >>>>> and reported them. Did you try that test with setting the
> >>>>> WORKER_ENVIRONMENT variable in the coaster-service.conf file?
> >>>>> Any idea what may be happening? The job is running when looking
> >>>>> under cqstat.
> >>>>>
> >>>>> A side note: At the mosaswift site, your example talks about
> >>>>> running the coasters-hosts.pl on the cps log but the example you
> >>>>> provide runs it on logs/coasters.log. This may need to be
> >>>>> changed. Also, should provide the log4j setting that is required
> >>>>> to generate the Cpu line with the worker ip address just to
> >>>>> clarify that this line should be set for this script to work.
> >>>>>
> >>>>> For reference, this line:
> >>>>> log4j.logger.org.globus.cog.abstraction.coaster.service.job.manager.Cpu=DEBUG
> >>>>
> >>>> --
> >>>> Justin M Wozniak
> >>>
> >>> _______________________________________________
> >>> Swift-devel mailing list
> >>> Swift-devel at ci.uchicago.edu
> >>> https://lists.ci.uchicago.edu/cgi-bin/mailman/listinfo/swift-devel
> >>
> >> _______________________________________________
> >> Swift-devel mailing list
> >> Swift-devel at ci.uchicago.edu
> >> https://lists.ci.uchicago.edu/cgi-bin/mailman/listinfo/swift-devel
> >
> > _______________________________________________
> > Swift-devel mailing list
> > Swift-devel at ci.uchicago.edu
> > https://lists.ci.uchicago.edu/cgi-bin/mailman/listinfo/swift-devel
> 
> 
> 
> 
> --
> You received this message because you are subscribed to the Google
> Groups "MosaStore" group.
> To post to this group, send email to mosastore at googlegroups.com .
> To unsubscribe from this group, send email to
> mosastore+unsubscribe at googlegroups.com .
> For more options, visit this group at
> http://groups.google.com/group/mosastore?hl=en .
> _______________________________________________
> Swift-devel mailing list
> Swift-devel at ci.uchicago.edu
> https://lists.ci.uchicago.edu/cgi-bin/mailman/listinfo/swift-devel

-- 
Michael Wilde
Computation Institute, University of Chicago
Mathematics and Computer Science Division
Argonne National Laboratory




More information about the Swift-devel mailing list