[Swift-devel] coasters-hosts.pl script

Matei Ripeanu matei.ripeanu at gmail.com
Fri Mar 2 18:29:17 CST 2012


Indeed this is good news!  Thank you.

 

Our next task, I think, will be to figure out how to configure Swift so that
the headnode (where Swift runs) will not require any access to intermediate
storage (MosaStore). Only the worker nodes will have access to intermediate
storage.  This is to go around the one way headnode-worker node connectivity
issue. 

 

Any guidance on how to get this configuration would be much appreciated.

 

Thank you again, 

 

-Matei

 

From: mosastore at googlegroups.com [mailto:mosastore at googlegroups.com] On
Behalf Of Emalayan Vairavanathan
Sent: March-02-12 2:32 PM
To: Jonathan Monette; Justin M Wozniak
Cc: swift-devel at ci.uchicago.edu Devel; emalayan at ece.ubc.cais ; MosaStore
Subject: Re: [Swift-devel] coasters-hosts.pl script

 

Thank you Jon and Justin. 

 

This is a great news. I will get back to you if I have questions.

 

Regards

Emalayan

 

  _____  

From: Jonathan Monette <jonmon at mcs.anl.gov>
To: Justin M Wozniak <wozniak at mcs.anl.gov> 
Cc: "swift-devel at ci.uchicago.edu Devel
<mailto:swift-devel at ci.uchicago.edu%20Devel> "
<swift-devel at ci.uchicago.edu>; emalayan at ece.ubc.ca 
Sent: Friday, 2 March 2012 2:21 PM
Subject: Re: [Swift-devel] coasters-hosts.pl script


Emalayan,
  We believe we have fixed the issue.  You can copy the new
coasters-hosts.pl script from
~jonmon/surveyor/worker-init-test/coasters-hosts.pl

This script reads the worker logs located in the logs directory.  The steps
to run are as follows:
start-coaster-service
<wait for workers to start>
./coasters-hosts.pl logs/worker-*.log > worker-hosts.txt

You MUST clean out the worker logs after you before you start a new coaster
service to make sure the script searches the right worker log files.    This
may not be ideal at the moment but this will help get you started.  If you
have any other questions feel free to ask.  We will need to update the
mosaswift site with the new information, we will do this soon.

On Mar 2, 2012, at 11:26 AM, Jonathan Monette wrote:

> Can we match this line: 2012/03/02 17:16:04.712 INFO  - Running on node
172.18.1.83 from the worker log,
> instead of this line: 2012-03-02 17:21:25,214+0000 DEBUG Cpu worker
started: block=2012.0302.171344.704 host=172.18.1.83 id=0 from the cps log?
> 
> They both provide the same ip addresses.  And the worker log always has
that ip address before the cps log does.
> 
> On Mar 2, 2012, at 11:15 AM, Jonathan Monette wrote:
> 
>> That fix still did not work.  I had moved it to the same spot.  It is
still waiting for the worker-init.pl script to finish before the ip
addresses are printed to the cps log.  Those ip addresses are what is needed
by the coaster-hosts.pl script to finish.  If I create an empty file for the
coaster-host.pl script to read, then the work continues and the ip addresses
show up in the cps log.  
>> 
>> Why is log4j waiting to add those lines to the cps log after the
worker-init.pl script is finished?
>> 
>> On Mar 2, 2012, at 11:05 AM, Jonathan Monette wrote:
>> 
>>> Thanks, in my copy I thought I had moved the reconnect to before the
init-cmd and it still wasn't working.  I will test with your change.  I just
verified that it was indeed waiting for the worker-init.pl script to finish.
I created an empty file for the script to read and it finished connecting
and the ip addresses I needed were added to the cps log.  I will also be
testing your fix.
>>> 
>>> On Mar 2, 2012, at 11:01 AM, Justin M Wozniak wrote:
>>> 
>>>> 
>>>> Yes- I must have tested this with a different log file.  I just checked
in and installed in ~wozniak/Public a fix for this that launches
WORKER_INIT_CMD after the reconnect().  I am a little worried about time
outs but it works so far.  I will continue testing...
>>>>     Justin
>>>> 
>>>> On Thu, 1 Mar 2012, Jonathan Monette wrote:
>>>> 
>>>>> Justin,
>>>>> So I have been trying to help Emalayan get the host list file for the
worker-init.pl script.  It seems the cps log file is not providing the ip
addresses for the coasters-hosts.pl script.  I thought this was maybe
because we did not have the correct log4j setting set but we have the
Coaster service Cpu set to DEBUG.  So for some reason the workers are not
connecting to the service.  When I comment out the export
WORKER_ENVIRONEMTN="." line in the coaster-service.conf file I see the
workers connect and the cps log file shows there ip addresses.  However when
setting this line it seems they are not connecting.
>>>>> 
>>>>> Emalayan thought there might be some sort of circular dependency going
with the host-list file and the worker.  The worker requires the host-list
file so that it can run the worker-init.pl script and then connect but the
host-list file cannot be generated because the workers cannot connect.  I
noticed in your swift-test directory the cps files did have the ip addresses
set and coasters-hosts.pl found the ip addresses and reported them.  Did you
try that test with setting the WORKER_ENVIRONMENT variable in the
coaster-service.conf file?  Any idea what may be happening?  The job is
running when looking under cqstat.
>>>>> 
>>>>> A side note: At the mosaswift site, your example talks about running
the coasters-hosts.pl on the cps log but the example you provide runs it on
logs/coasters.log.  This may need to be changed.  Also, should provide the
log4j setting that is required to generate the Cpu line with the worker ip
address just to clarify that this line should be set for this script to
work.
>>>>> 
>>>>> For reference, this line:
log4j.logger.org.globus.cog.abstraction.coaster.service.job.manager.Cpu=DEBU
G
>>>> 
>>>> -- 
>>>> Justin M Wozniak
>>> 
>>> _______________________________________________
>>> Swift-devel mailing list
>>> Swift-devel at ci.uchicago.edu
>>> https://lists.ci.uchicago.edu/cgi-bin/mailman/listinfo/swift-devel
>> 
>> _______________________________________________
>> Swift-devel mailing list
>> Swift-devel at ci.uchicago.edu
>> https://lists.ci.uchicago.edu/cgi-bin/mailman/listinfo/swift-devel
> 
> _______________________________________________
> Swift-devel mailing list
> Swift-devel at ci.uchicago.edu
> https://lists.ci.uchicago.edu/cgi-bin/mailman/listinfo/swift-devel




-- 
You received this message because you are subscribed to the Google Groups
"MosaStore" group.
To post to this group, send email to mosastore at googlegroups.com.
To unsubscribe from this group, send email to
mosastore+unsubscribe at googlegroups.com.
For more options, visit this group at
http://groups.google.com/group/mosastore?hl=en.

-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.mcs.anl.gov/pipermail/swift-devel/attachments/20120302/7d0a6987/attachment.html>


More information about the Swift-devel mailing list