[Swift-devel] block allocation
Zhao Zhang
zhaozhang at uchicago.edu
Thu May 28 12:21:38 CDT 2009
Hi, Mihael
Mihael Hategan wrote:
> You should check whether blocks are properly de-allocated when all the
> work is done.
>
I made a clear run of 100-scip jobs. Here is the description of procedure:
1. Workflow started on submit host.
Queue Status
login3% showq | grep zzhang
745500 data zzhang Waiting 16 00:50:00 Thu May
28 11:59:50
745501 data zzhang Waiting 16 00:50:00 Thu May
28 11:59:52
745502 data zzhang Waiting 16 00:40:00 Thu May
28 11:59:56
2. Workflow running on Ranger
Queue Status
login3% showq | grep zzhang
745500 data zzhang Running 16 00:49:53 Thu May
28 12:00:57
745501 data zzhang Running 16 00:49:53 Thu May
28 12:00:57
745502 data zzhang Running 16 00:39:53 Thu May
28 12:00:57
745504 data zzhang Running 16 00:39:53 Thu May
28 12:00:57
login3% showq | grep zzhang
745500 data zzhang Running 16 00:49:36 Thu May
28 12:00:57
745501 data zzhang Running 16 00:49:36 Thu May
28 12:00:57
745502 data zzhang Running 16 00:39:36 Thu May
28 12:00:57
745504 data zzhang Running 16 00:39:36 Thu May
28 12:00:57
login3% showq | grep zzhang
745500 data zzhang Running 16 00:49:30 Thu May
28 12:00:57
745501 data zzhang Running 16 00:49:30 Thu May
28 12:00:57
745502 data zzhang Running 16 00:39:30 Thu May
28 12:00:57
745504 data zzhang Running 16 00:39:30 Thu May
28 12:00:57
3. Workflow finished
Queue Status
login3% showq | grep zzhang
745500 data zzhang Running 16 00:48:06 Thu May
28 12:00:57
745501 data zzhang Running 16 00:48:06 Thu May
28 12:00:57
745502 data zzhang Running 16 00:38:06 Thu May
28 12:00:57
745504 data zzhang Running 16 00:38:06 Thu May
28 12:00:57
745511 data zzhang Waiting 16 00:10:00 Thu May
28 12:02:39
As you could tell, there is one more job with WallTime 10 minutes coming
out of the queue, I checked the last 20 lines of coaster.log, it doesn't
say much about this:
2009-05-28 12:02:38,246-0500 INFO AbstractKarajanChannel
GSSCChannel-https://128.135.125.17:50004(1) REQ: Handler(SHUTDOWNSERVICE)
2009-05-28 12:02:38,249-0500 INFO CoasterService Shutdown sequence
completed
2009-05-28 12:02:39,212-0500 INFO Cpu 0528-591140-000000:1 pullLater
2009-05-28 12:02:39,213-0500 INFO Cpu 0528-591140-000001:0 pull
2009-05-28 12:02:40,223-0500 INFO Cpu 0528-591140-000001:0 pullLater
2009-05-28 12:02:40,223-0500 INFO Cpu 0528-591140-000000:0 pull
2009-05-28 12:02:41,232-0500 INFO Cpu 0528-591140-000000:0 pullLater
2009-05-28 12:02:41,239-0500 INFO Cpu 0528-591140-000003:1 pull
2009-05-28 12:02:42,252-0500 INFO Cpu 0528-591140-000003:1 pullLater
2009-05-28 12:02:42,253-0500 INFO Cpu 0528-591140-000003:0 pull
2009-05-28 12:02:43,262-0500 INFO Cpu 0528-591140-000003:0 pullLater
2009-05-28 12:02:43,263-0500 INFO Cpu 0528-591140-000002:0 pull
2009-05-28 12:02:44,273-0500 INFO Cpu 0528-591140-000002:0 pullLater
2009-05-28 12:02:44,273-0500 INFO Cpu 0528-591140-000001:1 pull
2009-05-28 12:02:45,283-0500 INFO Cpu 0528-591140-000001:1 pullLater
2009-05-28 12:02:45,283-0500 INFO Cpu 0528-591140-000002:1 pull
2009-05-28 12:02:46,292-0500 INFO Cpu 0528-591140-000002:1 pullLater
2009-05-28 12:02:46,293-0500 INFO Cpu 0528-591140-000000:1 pull
2009-05-28 12:02:46,992-0500 INFO CoasterService Idle time: 0
2009-05-28 12:02:47,713 117968031 Exit code: 0
Within ~5 minutes the workflow finished, the first 4 allocations' status
changed to "unsched"
login3% showq | grep zzhang
745524 data zzhang Unsched 16 00:50:00 Thu May 28
12:11:31
745525 data zzhang Unsched 16 00:50:00 Thu May 28
12:11:33
745526 data zzhang Unsched 16 00:50:00 Thu May 28
12:11:38
745527 data zzhang Unsched 16 00:50:00 Thu May 28
12:11:42
Within ~2 minutes, those allocations are released.
zhao
zhao
> On Wed, 2009-05-27 at 18:10 -0500, Zhao Zhang wrote:
>
>> Sorry, Mihael, I didn't get your last question.
>>
>> zhao
>>
>> Mihael Hategan wrote:
>>
>>> On Wed, 2009-05-27 at 17:33 -0500, Zhao Zhang wrote:
>>>
>>>
>>>> Mihael Hategan wrote:
>>>>
>>>>
>>>>> On Wed, 2009-05-27 at 16:49 -0500, Zhao Zhang wrote:
>>>>>
>>>>>
>>>>>
>>>>>> Mihael Hategan wrote:
>>>>>>
>>>>>>
>>>>>>
>>>>>>> On Wed, 2009-05-27 at 16:31 -0500, Zhao Zhang wrote:
>>>>>>>
>>>>>>>
>>>>>>>
>>>>>>>
>>>>>>>> Hi, Mihael
>>>>>>>>
>>>>>>>> I did a clean run of 100 scip jobs on ranger. (scip is a new application
>>>>>>>> from MCS).
>>>>>>>>
>>>>>>>>
>>>>>>>>
>>>>>>>>
>>>>>>> And?
>>>>>>>
>>>>>>>
>>>>>>>
>>>>>>>
>>>>>> I need your help to make sure coaster is working as expected.
>>>>>>
>>>>>>
>>>>>>
>>>>> Did the workflow finish successfully? I'll assume "yes", since you
>>>>> didn't mention any errors, but it would be useful to state such things
>>>>> from the start.
>>>>>
>>>>>
>>>>>
>>>>>
>>>> Yes, it was successful.
>>>>
>>>>
>>> That's not a bad sign.
>>>
>>>
>>>
>>>>> As far as it working as expected, I have more confidence in the workings
>>>>> of the algorithm itself than in the ancillary stuff. In other words, if
>>>>> you don't see errors/restarted jobs or stack traces in the logs, it's
>>>>> likely that things are fine.
>>>>>
>>>>>
>>>>>
>>>>>
>>>>>>>
>>>>>>>
>>>>>>>
>>>>>>>
>>>>>>>> The log is at /home/zzhang/scip/ranger-logs/coasters.log
>>>>>>>>
>>>>>>>>
>>>>>>>>
>>>>>>>>
>>>>>>> [hategan at communicado ~]$ less /home/zzhang/scip/ranger-logs/coasters.log
>>>>>>> /home/zzhang/scip/ranger-logs/coasters.log: Permission denied
>>>>>>>
>>>>>>>
>>>>>>>
>>>>>>>
>>>>>> try now.
>>>>>>
>>>>>>
>>>>>>
>>>>> [hategan at communicado ~]$ date
>>>>> Wed May 27 17:26:51 CDT 2009
>>>>> [hategan at communicado ~]$ less /home/zzhang/scip/ranger-logs/coasters.log
>>>>> /home/zzhang/scip/ranger-logs/coasters.log: Permission denied
>>>>>
>>>>>
>>>>>
>>>>>
>>>> sorry, try again.
>>>>
>>>>
>>> Looks ok. There are some things that should probably be adjusted there,
>>> but it looks reasonable. Where the queued/running jobs removed from the
>>> queue when the workflow finished?
>>>
>>>
>>>
>>>
>
>
>
More information about the Swift-devel
mailing list