[Swift-devel] Configuring Swift to access MosaStore

Jonathan Monette jonmon at mcs.anl.gov
Mon Mar 5 17:07:08 CST 2012


If you could provide the set up you were using that would be great. I can fill in anything missing an do my tests to verify.

On Mar 5, 2012, at 13:34, Emalayan Vairavanathan <svemalayan at yahoo.com> wrote:

> Thank you Jon. 
> 
> Yesterday I successfully run Mosa  (on our cluster) with cdm-direct mode with the help of swift-user manual and the scripts available in  /cog/modules/swift/tests/cdm/absolute.
> 
> It would be useful if you can develop a simple test case. I can double check with my test case.
> 
> Thank you
> Emalayan
> 
> From: Jonathan Monette <jonmon at mcs.anl.gov>
> To: Michael Wilde <wilde at mcs.anl.gov> 
> Cc: "emalayan at ece.ubc.ca" <emalayan at ece.ubc.ca>; "matei at ece.ubc.ca" <matei at ece.ubc.ca>; "swift-devel at ci.uchicago.edu" <swift-devel at ci.uchicago.edu>; "mosastore at googlegroups.com" <mosastore at googlegroups.com>; Jonathan Monette <jon.monette at gmail.com> 
> Sent: Monday, 5 March 2012 7:14 AM
> Subject: Re: [Swift-devel] Configuring Swift to access MosaStore
> 
> Yea. I will get demo scripts together for the mosa tests. 
> 
> On Mar 5, 2012, at 8:17, Michael Wilde <wilde at mcs.anl.gov> wrote:
> 
> > was: Re: [Swift-devel] coasters-hosts.pl script
> > 
> > Jon, can you create a demo script that shows how to configure a Swift run to use MosaStore. The following approach may work:
> > 
> > - Assume MosaStore will be mounted as /mosa to all workers
> > 
> > - Simulate this with a localhost run, using /tmp/mosa, then do same with *1* worker, N jobs per node (eg 4 on BG/P, 8 on PADS, 2 on Beagle).
> > 
> > - Set CDM direct mode for all paths starting with [/tmp]/mosa. You might need to work through some of the issues with CDM direct where accesses need to match both /tmp/mosa and file:///tmp/mosa (I *think*)
> > 
> > - Map some temporary output-to-input files to /tmp/mosa; create a multi-level "catsncats"-like workflow to exercise it; the recent ParameterSweep example, perhaps extended to do N levels of fan-in/fan-out and pass-N might be a good test.
> > 
> > - see if you can get _concurrent to get placed on /tmp/mosa
> > 
> > I think some of these tests would be a great test case for Swift/Turbine as well.
> > 
> > You can do this is stages; the simple test of mapping CDM-direct files to /tmp/mosa should give Emalayan an initial test case to run once Mosa is ready on the BG/P.
> > 
> > - Mike
> > 
> > 
> > ----- Original Message -----
> >> From: "Matei Ripeanu" <matei.ripeanu at gmail.com>
> >> To: mosastore at googlegroups.com, "Jonathan Monette" <jonmon at mcs.anl.gov>, "Justin M Wozniak" <wozniak at mcs.anl.gov>
> >> Cc: swift-devel at ci.uchicago.edu, emalayan at ece.ubc.ca
> >> Sent: Friday, March 2, 2012 6:29:17 PM
> >> Subject: Re: [Swift-devel] coasters-hosts.pl script
> >> Indeed this is good news! Thank you.
> >> 
> >> 
> >> 
> >> Our next task, I think, will be to figure out how to configure Swift
> >> so that the headnode (where Swift runs) will not require any access to
> >> intermediate storage (MosaStore). Only the worker nodes will have
> >> access to intermediate storage. This is to go around the one way
> >> headnode-worker node connectivity issue.
> >> 
> >> 
> >> 
> >> Any guidance on how to get this configuration would be much
> >> appreciated.
> >> 
> >> 
> >> 
> >> Thank you again,
> >> 
> >> 
> >> 
> >> -Matei
> >> 
> >> 
> >> 
> >> 
> >> 
> >> From: mosastore at googlegroups.com [mailto:mosastore at googlegroups.com]
> >> On Behalf Of Emalayan Vairavanathan
> >> Sent: March-02-12 2:32 PM
> >> To: Jonathan Monette; Justin M Wozniak
> >> Cc: swift-devel at ci.uchicago.edu Devel; emalayan at ece.ubc.cais ;
> >> MosaStore
> >> Subject: Re: [Swift-devel] coasters-hosts.pl script
> >> 
> >> 
> >> 
> >> 
> >> 
> >> Thank you Jon and Justin.
> >> 
> >> 
> >> 
> >> 
> >> 
> >> This is a great news. I will get back to you if I have questions.
> >> 
> >> 
> >> 
> >> 
> >> 
> >> Regards
> >> 
> >> 
> >> Emalayan
> >> 
> >> 
> >> 
> >> 
> >> 
> >> 
> >> 
> >> 
> >> 
> >> From: Jonathan Monette < jonmon at mcs.anl.gov >
> >> To: Justin M Wozniak < wozniak at mcs.anl.gov >
> >> Cc: " swift-devel at ci.uchicago.edu Devel " <
> >> swift-devel at ci.uchicago.edu >; emalayan at ece.ubc.ca
> >> Sent: Friday, 2 March 2012 2:21 PM
> >> Subject: Re: [Swift-devel] coasters-hosts.pl script
> >> 
> >> 
> >> Emalayan,
> >> We believe we have fixed the issue. You can copy the new
> >> coasters-hosts.pl script from
> >> ~jonmon/surveyor/worker-init-test/coasters-hosts.pl
> >> 
> >> This script reads the worker logs located in the logs directory. The
> >> steps to run are as follows:
> >> start-coaster-service
> >> <wait for workers to start>
> >> ./coasters-hosts.pl logs/worker-*.log > worker-hosts.txt
> >> 
> >> You MUST clean out the worker logs after you before you start a new
> >> coaster service to make sure the script searches the right worker log
> >> files. This may not be ideal at the moment but this will help get you
> >> started. If you have any other questions feel free to ask. We will
> >> need to update the mosaswift site with the new information, we will do
> >> this soon.
> >> 
> >> On Mar 2, 2012, at 11:26 AM, Jonathan Monette wrote:
> >> 
> >>> Can we match this line: 2012/03/02 17:16:04.712 INFO - Running on
> >>> node 172.18.1.83 from the worker log,
> >>> instead of this line: 2012-03-02 17:21:25,214+0000 DEBUG Cpu worker
> >>> started: block=2012.0302.171344.704 host=172.18.1.83 id=0 from the
> >>> cps log?
> >>> 
> >>> They both provide the same ip addresses. And the worker log always
> >>> has that ip address before the cps log does.
> >>> 
> >>> On Mar 2, 2012, at 11:15 AM, Jonathan Monette wrote:
> >>> 
> >>>> That fix still did not work. I had moved it to the same spot. It is
> >>>> still waiting for the worker-init.pl script to finish before the ip
> >>>> addresses are printed to the cps log. Those ip addresses are what
> >>>> is needed by the coaster-hosts.pl script to finish. If I create an
> >>>> empty file for the coaster-host.pl script to read, then the work
> >>>> continues and the ip addresses show up in the cps log.
> >>>> 
> >>>> Why is log4j waiting to add those lines to the cps log after the
> >>>> worker-init.pl script is finished?
> >>>> 
> >>>> On Mar 2, 2012, at 11:05 AM, Jonathan Monette wrote:
> >>>> 
> >>>>> Thanks, in my copy I thought I had moved the reconnect to before
> >>>>> the init-cmd and it still wasn't working. I will test with your
> >>>>> change. I just verified that it was indeed waiting for the
> >>>>> worker-init.pl script to finish. I created an empty file for the
> >>>>> script to read and it finished connecting and the ip addresses I
> >>>>> needed were added to the cps log. I will also be testing your fix.
> >>>>> 
> >>>>> On Mar 2, 2012, at 11:01 AM, Justin M Wozniak wrote:
> >>>>> 
> >>>>>> 
> >>>>>> Yes- I must have tested this with a different log file. I just
> >>>>>> checked in and installed in ~wozniak/Public a fix for this that
> >>>>>> launches WORKER_INIT_CMD after the reconnect(). I am a little
> >>>>>> worried about time outs but it works so far. I will continue
> >>>>>> testing...
> >>>>>> Justin
> >>>>>> 
> >>>>>> On Thu, 1 Mar 2012, Jonathan Monette wrote:
> >>>>>> 
> >>>>>>> Justin,
> >>>>>>> So I have been trying to help Emalayan get the host list file
> >>>>>>> for the worker-init.pl script. It seems the cps log file is not
> >>>>>>> providing the ip addresses for the coasters-hosts.pl script. I
> >>>>>>> thought this was maybe because we did not have the correct log4j
> >>>>>>> setting set but we have the Coaster service Cpu set to DEBUG. So
> >>>>>>> for some reason the workers are not connecting to the service.
> >>>>>>> When I comment out the export WORKER_ENVIRONEMTN="…" line in the
> >>>>>>> coaster-service.conf file I see the workers connect and the cps
> >>>>>>> log file shows there ip addresses. However when setting this
> >>>>>>> line it seems they are not connecting.
> >>>>>>> 
> >>>>>>> Emalayan thought there might be some sort of circular dependency
> >>>>>>> going with the host-list file and the worker. The worker
> >>>>>>> requires the host-list file so that it can run the
> >>>>>>> worker-init.pl script and then connect but the host-list file
> >>>>>>> cannot be generated because the workers cannot connect. I
> >>>>>>> noticed in your swift-test directory the cps files did have the
> >>>>>>> ip addresses set and coasters-hosts.pl found the ip addresses
> >>>>>>> and reported them. Did you try that test with setting the
> >>>>>>> WORKER_ENVIRONMENT variable in the coaster-service.conf file?
> >>>>>>> Any idea what may be happening? The job is running when looking
> >>>>>>> under cqstat.
> >>>>>>> 
> >>>>>>> A side note: At the mosaswift site, your example talks about
> >>>>>>> running the coasters-hosts.pl on the cps log but the example you
> >>>>>>> provide runs it on logs/coasters.log. This may need to be
> >>>>>>> changed. Also, should provide the log4j setting that is required
> >>>>>>> to generate the Cpu line with the worker ip address just to
> >>>>>>> clarify that this line should be set for this script to work.
> >>>>>>> 
> >>>>>>> For reference, this line:
> >>>>>>> log4j.logger.org.globus.cog.abstraction.coaster.service.job.manager.Cpu=DEBUG
> >>>>>> 
> >>>>>> --
> >>>>>> Justin M Wozniak
> >>>>> 
> >>>>> _______________________________________________
> >>>>> Swift-devel mailing list
> >>>>> Swift-devel at ci.uchicago.edu
> >>>>> https://lists.ci.uchicago.edu/cgi-bin/mailman/listinfo/swift-devel
> >>>> 
> >>>> _______________________________________________
> >>>> Swift-devel mailing list
> >>>> Swift-devel at ci.uchicago.edu
> >>>> https://lists.ci.uchicago.edu/cgi-bin/mailman/listinfo/swift-devel
> >>> 
> >>> _______________________________________________
> >>> Swift-devel mailing list
> >>> Swift-devel at ci.uchicago.edu
> >>> https://lists.ci.uchicago.edu/cgi-bin/mailman/listinfo/swift-devel
> >> 
> >> 
> >> 
> >> 
> >> --
> >> You received this message because you are subscribed to the Google
> >> Groups "MosaStore" group.
> >> To post to this group, send email to mosastore at googlegroups.com .
> >> To unsubscribe from this group, send email to
> >> mosastore+unsubscribe at googlegroups.com .
> >> For more options, visit this group at
> >> http://groups.google.com/group/mosastore?hl=en .
> >> _______________________________________________
> >> Swift-devel mailing list
> >> Swift-devel at ci.uchicago.edu
> >> https://lists.ci.uchicago.edu/cgi-bin/mailman/listinfo/swift-devel
> > 
> > -- 
> > Michael Wilde
> > Computation Institute, University of Chicago
> > Mathematics and Computer Science Division
> > Argonne National Laboratory
> > 
> _______________________________________________
> Swift-devel mailing list
> Swift-devel at ci.uchicago.edu
> https://lists.ci.uchicago.edu/cgi-bin/mailman/listinfo/swift-devel
> 
> 
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.mcs.anl.gov/pipermail/swift-devel/attachments/20120305/58863f96/attachment.html>


More information about the Swift-devel mailing list