[Swift-devel] block allocation

Thu May 28 12:21:38 CDT 2009

Hi, Mihael

Mihael Hategan wrote:
> You should check whether blocks are properly de-allocated when all the
> work is done.
>   
I made a clear run of 100-scip jobs. Here is the description of procedure:
1. Workflow started on submit host.
    Queue Status
    login3% showq | grep zzhang
    745500    data       zzhang        Waiting 16     00:50:00  Thu May 
28 11:59:50
    745501    data       zzhang        Waiting 16     00:50:00  Thu May 
28 11:59:52
    745502    data       zzhang        Waiting 16     00:40:00  Thu May 
28 11:59:56

2. Workflow running on Ranger
    Queue Status
    login3% showq | grep zzhang
    745500    data       zzhang        Running 16     00:49:53  Thu May 
28 12:00:57
    745501    data       zzhang        Running 16     00:49:53  Thu May 
28 12:00:57
    745502    data       zzhang        Running 16     00:39:53  Thu May 
28 12:00:57
    745504    data       zzhang        Running 16     00:39:53  Thu May 
28 12:00:57

    login3% showq | grep zzhang
    745500    data       zzhang        Running 16     00:49:36  Thu May 
28 12:00:57
    745501    data       zzhang        Running 16     00:49:36  Thu May 
28 12:00:57
    745502    data       zzhang        Running 16     00:39:36  Thu May 
28 12:00:57
    745504    data       zzhang        Running 16     00:39:36  Thu May 
28 12:00:57

    login3% showq | grep zzhang
    745500    data       zzhang        Running 16     00:49:30  Thu May 
28 12:00:57
    745501    data       zzhang        Running 16     00:49:30  Thu May 
28 12:00:57
    745502    data       zzhang        Running 16     00:39:30  Thu May 
28 12:00:57
    745504    data       zzhang        Running 16     00:39:30  Thu May 
28 12:00:57

3. Workflow finished
    Queue Status
    login3% showq | grep zzhang
    745500    data       zzhang        Running 16     00:48:06  Thu May 
28 12:00:57
    745501    data       zzhang        Running 16     00:48:06  Thu May 
28 12:00:57
    745502    data       zzhang        Running 16     00:38:06  Thu May 
28 12:00:57
    745504    data       zzhang        Running 16     00:38:06  Thu May 
28 12:00:57
    745511    data       zzhang        Waiting 16     00:10:00  Thu May 
28 12:02:39

As you could tell, there is one more job with WallTime 10 minutes coming 
out of the queue, I checked the last 20 lines of coaster.log, it doesn't 
say much about this:
2009-05-28 12:02:38,246-0500 INFO  AbstractKarajanChannel 
GSSCChannel-https://128.135.125.17:50004(1) REQ: Handler(SHUTDOWNSERVICE)
2009-05-28 12:02:38,249-0500 INFO  CoasterService Shutdown sequence 
completed
2009-05-28 12:02:39,212-0500 INFO  Cpu 0528-591140-000000:1 pullLater
2009-05-28 12:02:39,213-0500 INFO  Cpu 0528-591140-000001:0 pull
2009-05-28 12:02:40,223-0500 INFO  Cpu 0528-591140-000001:0 pullLater
2009-05-28 12:02:40,223-0500 INFO  Cpu 0528-591140-000000:0 pull
2009-05-28 12:02:41,232-0500 INFO  Cpu 0528-591140-000000:0 pullLater
2009-05-28 12:02:41,239-0500 INFO  Cpu 0528-591140-000003:1 pull
2009-05-28 12:02:42,252-0500 INFO  Cpu 0528-591140-000003:1 pullLater
2009-05-28 12:02:42,253-0500 INFO  Cpu 0528-591140-000003:0 pull
2009-05-28 12:02:43,262-0500 INFO  Cpu 0528-591140-000003:0 pullLater
2009-05-28 12:02:43,263-0500 INFO  Cpu 0528-591140-000002:0 pull
2009-05-28 12:02:44,273-0500 INFO  Cpu 0528-591140-000002:0 pullLater
2009-05-28 12:02:44,273-0500 INFO  Cpu 0528-591140-000001:1 pull
2009-05-28 12:02:45,283-0500 INFO  Cpu 0528-591140-000001:1 pullLater
2009-05-28 12:02:45,283-0500 INFO  Cpu 0528-591140-000002:1 pull
2009-05-28 12:02:46,292-0500 INFO  Cpu 0528-591140-000002:1 pullLater
2009-05-28 12:02:46,293-0500 INFO  Cpu 0528-591140-000000:1 pull
2009-05-28 12:02:46,992-0500 INFO  CoasterService Idle time: 0
2009-05-28 12:02:47,713 117968031 Exit code: 0

Within ~5 minutes the workflow finished, the first 4 allocations' status 
changed to "unsched"
login3% showq | grep zzhang
745524    data       zzhang        Unsched 16     00:50:00  Thu May 28 
12:11:31
745525    data       zzhang        Unsched 16     00:50:00  Thu May 28 
12:11:33
745526    data       zzhang        Unsched 16     00:50:00  Thu May 28 
12:11:38
745527    data       zzhang        Unsched 16     00:50:00  Thu May 28 
12:11:42

Within ~2 minutes, those allocations are released.

zhao
zhao
> On Wed, 2009-05-27 at 18:10 -0500, Zhao Zhang wrote:
>   
>> Sorry, Mihael, I didn't get your last question.
>>
>> zhao
>>
>> Mihael Hategan wrote:
>>     
>>> On Wed, 2009-05-27 at 17:33 -0500, Zhao Zhang wrote:
>>>   
>>>       
>>>> Mihael Hategan wrote:
>>>>     
>>>>         
>>>>> On Wed, 2009-05-27 at 16:49 -0500, Zhao Zhang wrote:
>>>>>   
>>>>>       
>>>>>           
>>>>>> Mihael Hategan wrote:
>>>>>>     
>>>>>>         
>>>>>>             
>>>>>>> On Wed, 2009-05-27 at 16:31 -0500, Zhao Zhang wrote:
>>>>>>>   
>>>>>>>       
>>>>>>>           
>>>>>>>               
>>>>>>>> Hi, Mihael
>>>>>>>>
>>>>>>>> I did a clean run of 100 scip jobs on ranger. (scip is a new application 
>>>>>>>> from MCS).
>>>>>>>>     
>>>>>>>>         
>>>>>>>>             
>>>>>>>>                 
>>>>>>> And?
>>>>>>>   
>>>>>>>       
>>>>>>>           
>>>>>>>               
>>>>>> I need your help to make sure coaster is working as expected.
>>>>>>     
>>>>>>         
>>>>>>             
>>>>> Did the workflow finish successfully? I'll assume "yes", since you
>>>>> didn't mention any errors, but it would be useful to state such things
>>>>> from the start.
>>>>>
>>>>>   
>>>>>       
>>>>>           
>>>> Yes, it was successful.
>>>>     
>>>>         
>>> That's not a bad sign.
>>>
>>>   
>>>       
>>>>> As far as it working as expected, I have more confidence in the workings
>>>>> of the algorithm itself than in the ancillary stuff. In other words, if
>>>>> you don't see errors/restarted jobs or stack traces in the logs, it's
>>>>> likely that things are fine.
>>>>>
>>>>>   
>>>>>       
>>>>>           
>>>>>>>   
>>>>>>>       
>>>>>>>           
>>>>>>>               
>>>>>>>> The log is at /home/zzhang/scip/ranger-logs/coasters.log
>>>>>>>>     
>>>>>>>>         
>>>>>>>>             
>>>>>>>>                 
>>>>>>> [hategan at communicado ~]$ less /home/zzhang/scip/ranger-logs/coasters.log
>>>>>>> /home/zzhang/scip/ranger-logs/coasters.log: Permission denied
>>>>>>>   
>>>>>>>       
>>>>>>>           
>>>>>>>               
>>>>>> try now.
>>>>>>     
>>>>>>         
>>>>>>             
>>>>> [hategan at communicado ~]$ date
>>>>> Wed May 27 17:26:51 CDT 2009
>>>>> [hategan at communicado ~]$ less /home/zzhang/scip/ranger-logs/coasters.log
>>>>> /home/zzhang/scip/ranger-logs/coasters.log: Permission denied
>>>>>
>>>>>   
>>>>>       
>>>>>           
>>>> sorry, try again.
>>>>     
>>>>         
>>> Looks ok. There are some things that should probably be adjusted there,
>>> but it looks reasonable. Where the queued/running jobs removed from the
>>> queue when the workflow finished?
>>>
>>>
>>>   
>>>       
>
>
>