[Swift-devel] hprof profiling of coaster services

Michael Wilde wilde at mcs.anl.gov
Fri Jun 26 09:29:04 CDT 2009


I will try, and cc the list. Its not clear that John knows more than he 
reported - most likely he saw processes owned by Zhao and Allan at the 
top of "top", and concluded that they were runing application tests on 
the login hosts. But worth a try.

I think in the meantime its worthwhile for Allan to get reliable data on 
coaster performance. That seems useful for our own needs and for 
eventual publication. If we determine that the overhead should indeed be 
small, I suspect we can coordinate with the sysadmins and start running 
again, watching closely to make sure we do no harm.

- Mike


On 6/26/09 9:23 AM, Mihael Hategan wrote:
> I think we should ask the sysadmins for clarification on what the
> problem was (their email leave room for interpretation) and talk to them
> and see what we can do to solve it.
> 
> On Fri, 2009-06-26 at 07:55 -0500, Michael Wilde wrote:
>> Yes, sorry, I meant to paste it at the bottom of the previous message 
>> but forgot. The sysadmin message is at the bottom of the thread below.
>>
>> -------- Original Message --------
>> Subject: Re: Ranger @ TACC - Jobs Running On Head Node creating heavy load
>> Date: Wed, 17 Jun 2009 15:26:37 -0500
>> From: Allan Espinosa <aespinosa at cs.uchicago.edu>
>> To: Zhao Zhang <zhaozhang at uchicago.edu>
>> CC: wilde at mcs.anl.gov
>> References: <1245269425.13629.23.camel at lockman-d630.tacc.utexas.edu>	 
>> <4A394FBD.5040301 at uchicago.edu>	 
>> <50b07b4b0906171322k26976392s4a99144749c437e7 at mail.gmail.com>
>>
>> Zhao,
>>
>> your coaster services and gram call back daemons are eating 2cores.
>> You should kill these too as you abort your swift run.
>>
>> -Allan
>>
>> 2009/6/17 Allan Espinosa <aespinosa at cs.uchicago.edu>:
>>  > I am guessing that these are the coaster services running on the GRAM
>>  > head node (gateway.ranger points to login3.ranger).
>>  >
>>  >
>>  > I made the run last night.  I am currently running stuff on teraport.
>>  >
>>  > 2009/6/17 Zhao Zhang <zhaozhang at uchicago.edu>:
>>  >> Hi, Mike
>>  >>
>>  >> That is me and Allan. I am running the remaining part of the AMPL 
>> work flow,
>>  >> 800 job left. What shall I do now?
>>  >>
>>  >> zhao
>>  >>
>>  >> John Lockman wrote:
>>  >>>
>>  >>> Dr. Wilde,
>>  >>>
>>  >>>
>>  >>> Two users on your project, [zzhang & tg802895] are running jobs on the
>>  >>> Ranger head node [login3] which is slowing the system down dramatically
>>  >>> for other users.
>>  >>> Can these jobs be run on the compute nodes and not the head node?
>>  >>>
>>  >>>
>>  >>> Thanks,
>>  >>>
>>
>>
>> On 6/26/09 7:52 AM, Ben Clifford wrote:
>>>> In other words, the Ranger sysadmin did not say "your coaster process is 
>>>> consuming CPU", he just said your jobs are causing the login3 host to be 
>>>> slow for other users.
>>> Do you have what the Ranger sysadmin actually said, in its entirety?
>>>
>> _______________________________________________
>> Swift-devel mailing list
>> Swift-devel at ci.uchicago.edu
>> http://mail.ci.uchicago.edu/mailman/listinfo/swift-devel
> 



More information about the Swift-devel mailing list