[Swift-devel] coasters-hosts.pl script

Jonathan Monette jonmon at mcs.anl.gov
Fri Mar 2 11:47:36 CST 2012


I think this approach was chosen to get something working for Emalayan quickly so he could start developing. I do not think this was Goin to be the final approach. A more  stable approach was to replace it later. 

On Mar 2, 2012, at 11:27 AM, Michael Wilde <wilde at mcs.anl.gov> wrote:

> This all seems a bit brittle. I think what we did in Falkon was to use the Zoid init script that runs on the IOP to add the worker IPs:
> 
> http://wiki.mcs.anl.gov/zeptoos/index.php/ZOID#User_script
> 
> This script can find the subnet of the workers, and the worker IPs on that subnet are fixed.
> 
> You still have the issue of waiting for all the IPs to report back. Each could make a file in a directory.  But you'd be less at the mercy of worker.pl scripts and log4j to get the IP info you need, perhaps?
> 
> - Mike
> 
> ----- Original Message -----
>> From: "Jonathan Monette" <jonmon at mcs.anl.gov>
>> To: "Justin M Wozniak" <wozniak at mcs.anl.gov>
>> Cc: "swift-devel at ci.uchicago.edu Devel" <swift-devel at ci.uchicago.edu>, emalayan at ece.ubc.ca
>> Sent: Friday, March 2, 2012 11:15:03 AM
>> Subject: Re: [Swift-devel] coasters-hosts.pl script
>> That fix still did not work. I had moved it to the same spot. It is
>> still waiting for the worker-init.pl script to finish before the ip
>> addresses are printed to the cps log. Those ip addresses are what is
>> needed by the coaster-hosts.pl script to finish. If I create an empty
>> file for the coaster-host.pl script to read, then the work continues
>> and the ip addresses show up in the cps log.
>> 
>> Why is log4j waiting to add those lines to the cps log after the
>> worker-init.pl script is finished?
>> 
>> On Mar 2, 2012, at 11:05 AM, Jonathan Monette wrote:
>> 
>>> Thanks, in my copy I thought I had moved the reconnect to before the
>>> init-cmd and it still wasn't working. I will test with your change.
>>> I just verified that it was indeed waiting for the worker-init.pl
>>> script to finish. I created an empty file for the script to read and
>>> it finished connecting and the ip addresses I needed were added to
>>> the cps log. I will also be testing your fix.
>>> 
>>> On Mar 2, 2012, at 11:01 AM, Justin M Wozniak wrote:
>>> 
>>>> 
>>>> Yes- I must have tested this with a different log file. I just
>>>> checked in and installed in ~wozniak/Public a fix for this that
>>>> launches WORKER_INIT_CMD after the reconnect(). I am a little
>>>> worried about time outs but it works so far. I will continue
>>>> testing...
>>>>    Justin
>>>> 
>>>> On Thu, 1 Mar 2012, Jonathan Monette wrote:
>>>> 
>>>>> Justin,
>>>>> So I have been trying to help Emalayan get the host list file for
>>>>> the worker-init.pl script. It seems the cps log file is not
>>>>> providing the ip addresses for the coasters-hosts.pl script. I
>>>>> thought this was maybe because we did not have the correct log4j
>>>>> setting set but we have the Coaster service Cpu set to DEBUG. So
>>>>> for some reason the workers are not connecting to the service.
>>>>> When I comment out the export WORKER_ENVIRONEMTN="…" line in the
>>>>> coaster-service.conf file I see the workers connect and the cps
>>>>> log file shows there ip addresses. However when setting this line
>>>>> it seems they are not connecting.
>>>>> 
>>>>> Emalayan thought there might be some sort of circular dependency
>>>>> going with the host-list file and the worker. The worker requires
>>>>> the host-list file so that it can run the worker-init.pl script
>>>>> and then connect but the host-list file cannot be generated
>>>>> because the workers cannot connect. I noticed in your swift-test
>>>>> directory the cps files did have the ip addresses set and
>>>>> coasters-hosts.pl found the ip addresses and reported them. Did
>>>>> you try that test with setting the WORKER_ENVIRONMENT variable in
>>>>> the coaster-service.conf file? Any idea what may be happening? The
>>>>> job is running when looking under cqstat.
>>>>> 
>>>>> A side note: At the mosaswift site, your example talks about
>>>>> running the coasters-hosts.pl on the cps log but the example you
>>>>> provide runs it on logs/coasters.log. This may need to be changed.
>>>>> Also, should provide the log4j setting that is required to
>>>>> generate the Cpu line with the worker ip address just to clarify
>>>>> that this line should be set for this script to work.
>>>>> 
>>>>> For reference, this line:
>>>>> log4j.logger.org.globus.cog.abstraction.coaster.service.job.manager.Cpu=DEBUG
>>>> 
>>>> --
>>>> Justin M Wozniak
>>> 
>>> _______________________________________________
>>> Swift-devel mailing list
>>> Swift-devel at ci.uchicago.edu
>>> https://lists.ci.uchicago.edu/cgi-bin/mailman/listinfo/swift-devel
>> 
>> _______________________________________________
>> Swift-devel mailing list
>> Swift-devel at ci.uchicago.edu
>> https://lists.ci.uchicago.edu/cgi-bin/mailman/listinfo/swift-devel
> 
> -- 
> Michael Wilde
> Computation Institute, University of Chicago
> Mathematics and Computer Science Division
> Argonne National Laboratory
> 



More information about the Swift-devel mailing list