[Swift-devel] Configuring Swift to access MosaStore

Jonathan Monette jonmon at mcs.anl.gov
Mon Mar 5 09:14:03 CST 2012


Yea. I will get demo scripts together for the mosa tests. 

On Mar 5, 2012, at 8:17, Michael Wilde <wilde at mcs.anl.gov> wrote:

> was: Re: [Swift-devel] coasters-hosts.pl script
> 
> Jon, can you create a demo script that shows how to configure a Swift run to use MosaStore. The following approach may work:
> 
> - Assume MosaStore will be mounted as /mosa to all workers
> 
> - Simulate this with a localhost run, using /tmp/mosa, then do same with *1* worker, N jobs per node (eg 4 on BG/P, 8 on PADS, 2 on Beagle).
> 
> - Set CDM direct mode for all paths starting with [/tmp]/mosa. You might need to work through some of the issues with CDM direct where accesses need to match both /tmp/mosa and file:///tmp/mosa (I *think*)
> 
> - Map some temporary output-to-input files to /tmp/mosa; create a multi-level "catsncats"-like workflow to exercise it; the recent ParameterSweep example, perhaps extended to do N levels of fan-in/fan-out and pass-N might be a good test.
> 
> - see if you can get _concurrent to get placed on /tmp/mosa
> 
> I think some of these tests would be a great test case for Swift/Turbine as well.
> 
> You can do this is stages; the simple test of mapping CDM-direct files to /tmp/mosa should give Emalayan an initial test case to run once Mosa is ready on the BG/P.
> 
> - Mike
> 
> 
> ----- Original Message -----
>> From: "Matei Ripeanu" <matei.ripeanu at gmail.com>
>> To: mosastore at googlegroups.com, "Jonathan Monette" <jonmon at mcs.anl.gov>, "Justin M Wozniak" <wozniak at mcs.anl.gov>
>> Cc: swift-devel at ci.uchicago.edu, emalayan at ece.ubc.ca
>> Sent: Friday, March 2, 2012 6:29:17 PM
>> Subject: Re: [Swift-devel] coasters-hosts.pl script
>> Indeed this is good news! Thank you.
>> 
>> 
>> 
>> Our next task, I think, will be to figure out how to configure Swift
>> so that the headnode (where Swift runs) will not require any access to
>> intermediate storage (MosaStore). Only the worker nodes will have
>> access to intermediate storage. This is to go around the one way
>> headnode-worker node connectivity issue.
>> 
>> 
>> 
>> Any guidance on how to get this configuration would be much
>> appreciated.
>> 
>> 
>> 
>> Thank you again,
>> 
>> 
>> 
>> -Matei
>> 
>> 
>> 
>> 
>> 
>> From: mosastore at googlegroups.com [mailto:mosastore at googlegroups.com]
>> On Behalf Of Emalayan Vairavanathan
>> Sent: March-02-12 2:32 PM
>> To: Jonathan Monette; Justin M Wozniak
>> Cc: swift-devel at ci.uchicago.edu Devel; emalayan at ece.ubc.cais ;
>> MosaStore
>> Subject: Re: [Swift-devel] coasters-hosts.pl script
>> 
>> 
>> 
>> 
>> 
>> Thank you Jon and Justin.
>> 
>> 
>> 
>> 
>> 
>> This is a great news. I will get back to you if I have questions.
>> 
>> 
>> 
>> 
>> 
>> Regards
>> 
>> 
>> Emalayan
>> 
>> 
>> 
>> 
>> 
>> 
>> 
>> 
>> 
>> From: Jonathan Monette < jonmon at mcs.anl.gov >
>> To: Justin M Wozniak < wozniak at mcs.anl.gov >
>> Cc: " swift-devel at ci.uchicago.edu Devel " <
>> swift-devel at ci.uchicago.edu >; emalayan at ece.ubc.ca
>> Sent: Friday, 2 March 2012 2:21 PM
>> Subject: Re: [Swift-devel] coasters-hosts.pl script
>> 
>> 
>> Emalayan,
>> We believe we have fixed the issue. You can copy the new
>> coasters-hosts.pl script from
>> ~jonmon/surveyor/worker-init-test/coasters-hosts.pl
>> 
>> This script reads the worker logs located in the logs directory. The
>> steps to run are as follows:
>> start-coaster-service
>> <wait for workers to start>
>> ./coasters-hosts.pl logs/worker-*.log > worker-hosts.txt
>> 
>> You MUST clean out the worker logs after you before you start a new
>> coaster service to make sure the script searches the right worker log
>> files. This may not be ideal at the moment but this will help get you
>> started. If you have any other questions feel free to ask. We will
>> need to update the mosaswift site with the new information, we will do
>> this soon.
>> 
>> On Mar 2, 2012, at 11:26 AM, Jonathan Monette wrote:
>> 
>>> Can we match this line: 2012/03/02 17:16:04.712 INFO - Running on
>>> node 172.18.1.83 from the worker log,
>>> instead of this line: 2012-03-02 17:21:25,214+0000 DEBUG Cpu worker
>>> started: block=2012.0302.171344.704 host=172.18.1.83 id=0 from the
>>> cps log?
>>> 
>>> They both provide the same ip addresses. And the worker log always
>>> has that ip address before the cps log does.
>>> 
>>> On Mar 2, 2012, at 11:15 AM, Jonathan Monette wrote:
>>> 
>>>> That fix still did not work. I had moved it to the same spot. It is
>>>> still waiting for the worker-init.pl script to finish before the ip
>>>> addresses are printed to the cps log. Those ip addresses are what
>>>> is needed by the coaster-hosts.pl script to finish. If I create an
>>>> empty file for the coaster-host.pl script to read, then the work
>>>> continues and the ip addresses show up in the cps log.
>>>> 
>>>> Why is log4j waiting to add those lines to the cps log after the
>>>> worker-init.pl script is finished?
>>>> 
>>>> On Mar 2, 2012, at 11:05 AM, Jonathan Monette wrote:
>>>> 
>>>>> Thanks, in my copy I thought I had moved the reconnect to before
>>>>> the init-cmd and it still wasn't working. I will test with your
>>>>> change. I just verified that it was indeed waiting for the
>>>>> worker-init.pl script to finish. I created an empty file for the
>>>>> script to read and it finished connecting and the ip addresses I
>>>>> needed were added to the cps log. I will also be testing your fix.
>>>>> 
>>>>> On Mar 2, 2012, at 11:01 AM, Justin M Wozniak wrote:
>>>>> 
>>>>>> 
>>>>>> Yes- I must have tested this with a different log file. I just
>>>>>> checked in and installed in ~wozniak/Public a fix for this that
>>>>>> launches WORKER_INIT_CMD after the reconnect(). I am a little
>>>>>> worried about time outs but it works so far. I will continue
>>>>>> testing...
>>>>>> Justin
>>>>>> 
>>>>>> On Thu, 1 Mar 2012, Jonathan Monette wrote:
>>>>>> 
>>>>>>> Justin,
>>>>>>> So I have been trying to help Emalayan get the host list file
>>>>>>> for the worker-init.pl script. It seems the cps log file is not
>>>>>>> providing the ip addresses for the coasters-hosts.pl script. I
>>>>>>> thought this was maybe because we did not have the correct log4j
>>>>>>> setting set but we have the Coaster service Cpu set to DEBUG. So
>>>>>>> for some reason the workers are not connecting to the service.
>>>>>>> When I comment out the export WORKER_ENVIRONEMTN="…" line in the
>>>>>>> coaster-service.conf file I see the workers connect and the cps
>>>>>>> log file shows there ip addresses. However when setting this
>>>>>>> line it seems they are not connecting.
>>>>>>> 
>>>>>>> Emalayan thought there might be some sort of circular dependency
>>>>>>> going with the host-list file and the worker. The worker
>>>>>>> requires the host-list file so that it can run the
>>>>>>> worker-init.pl script and then connect but the host-list file
>>>>>>> cannot be generated because the workers cannot connect. I
>>>>>>> noticed in your swift-test directory the cps files did have the
>>>>>>> ip addresses set and coasters-hosts.pl found the ip addresses
>>>>>>> and reported them. Did you try that test with setting the
>>>>>>> WORKER_ENVIRONMENT variable in the coaster-service.conf file?
>>>>>>> Any idea what may be happening? The job is running when looking
>>>>>>> under cqstat.
>>>>>>> 
>>>>>>> A side note: At the mosaswift site, your example talks about
>>>>>>> running the coasters-hosts.pl on the cps log but the example you
>>>>>>> provide runs it on logs/coasters.log. This may need to be
>>>>>>> changed. Also, should provide the log4j setting that is required
>>>>>>> to generate the Cpu line with the worker ip address just to
>>>>>>> clarify that this line should be set for this script to work.
>>>>>>> 
>>>>>>> For reference, this line:
>>>>>>> log4j.logger.org.globus.cog.abstraction.coaster.service.job.manager.Cpu=DEBUG
>>>>>> 
>>>>>> --
>>>>>> Justin M Wozniak
>>>>> 
>>>>> _______________________________________________
>>>>> Swift-devel mailing list
>>>>> Swift-devel at ci.uchicago.edu
>>>>> https://lists.ci.uchicago.edu/cgi-bin/mailman/listinfo/swift-devel
>>>> 
>>>> _______________________________________________
>>>> Swift-devel mailing list
>>>> Swift-devel at ci.uchicago.edu
>>>> https://lists.ci.uchicago.edu/cgi-bin/mailman/listinfo/swift-devel
>>> 
>>> _______________________________________________
>>> Swift-devel mailing list
>>> Swift-devel at ci.uchicago.edu
>>> https://lists.ci.uchicago.edu/cgi-bin/mailman/listinfo/swift-devel
>> 
>> 
>> 
>> 
>> --
>> You received this message because you are subscribed to the Google
>> Groups "MosaStore" group.
>> To post to this group, send email to mosastore at googlegroups.com .
>> To unsubscribe from this group, send email to
>> mosastore+unsubscribe at googlegroups.com .
>> For more options, visit this group at
>> http://groups.google.com/group/mosastore?hl=en .
>> _______________________________________________
>> Swift-devel mailing list
>> Swift-devel at ci.uchicago.edu
>> https://lists.ci.uchicago.edu/cgi-bin/mailman/listinfo/swift-devel
> 
> -- 
> Michael Wilde
> Computation Institute, University of Chicago
> Mathematics and Computer Science Division
> Argonne National Laboratory
> 



More information about the Swift-devel mailing list