[codes-ross-users] Simulation ends with events unprocessed

Mubarak, Misbah mmubarak at anl.gov
Tue Feb 28 16:57:31 CST 2017


Hi Jian,

Thanks for reporting this issue. So I have a few questions: are you using the latest CODES master branch? If so, which dragonfly model are you using— there is an aries based dragonfly model (dragonfly-custom.C) in addition to the conventional Dally-style dragonfly that we have been using?

I see that you are getting left over messages in terminals and routers, are you running the simulation long enough so that all messages have enough time to reach their destinations? Alternately, have you done any updates in the dragonfly routing or topology setup? Have you tested the workload with other network models and does it work fine?

Looking at the issue, I don’t think this is a ROSS related problem, this looks like either a model or workload related problem.

Thanks,
Misbah
From: <codes-ross-users-bounces at lists.mcs.anl.gov<mailto:codes-ross-users-bounces at lists.mcs.anl.gov>> on behalf of Jian Peng <jpeng10 at hawk.iit.edu<mailto:jpeng10 at hawk.iit.edu>>
Date: Tuesday, February 28, 2017 at 5:36 PM
To: "codes-ross-users at lists.mcs.anl.gov<mailto:codes-ross-users at lists.mcs.anl.gov>" <codes-ross-users at lists.mcs.anl.gov<mailto:codes-ross-users at lists.mcs.anl.gov>>
Subject: Re: [codes-ross-users] Simulation ends with events unprocessed

Hi,

    After some debugging, I found the reason of this issue was "tw_event_send()". I added some debugging code shown as "tw_event_send.png". Then I got debugging output as "debug1.png". Please notice the highlighted value of pointer "kp". Then I found this same "kp" value in previous context, shown in "debug2.png". Since the "last_time" is greater than "recv_ts", "tw_eventq_push()" rather than "tw_pq_enqueue()" is called. As I'm doing sequential simulation, "tw_scheduler_sequential()" in tw-shed.c should be used. "tw_pq_dequeue()" is used to read the event queue, shown in "tw_scheduler_sequential.png".

    So here's the question. The events that called "tw_eventq_push()" seem not able to be retrieved by "tw_pq_dequeue()". So I'm wondering that either mapping multiple LPs to one KP or only using "tw_pq_dequeue()" in sequential scheduler is a bug? I haven't looked too much into both parts, just got this question in mind.

    Thanks!



On Mon, Feb 27, 2017 at 1:20 PM, Jian Peng <jpeng10 at hawk.iit.edu<mailto:jpeng10 at hawk.iit.edu>> wrote:
Hi,

    I'm running some experiments on Dragonfly with CODES 5.0.2.  Sometimes they end up with unsent messages remaining in terminals and routers. I think the reason is the events that carry these messages are not handled. I looked through my debugging information and confirmed it.

    So, why does ROSS end when there are still events unprocessed? I'm running a sequential simulation. Thanks.

[Inline image 1]



-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.mcs.anl.gov/pipermail/codes-ross-users/attachments/20170228/fc61f213/attachment-0001.html>
-------------- next part --------------
A non-text attachment was scrubbed...
Name: image.png
Type: image/png
Size: 62165 bytes
Desc: image.png
URL: <http://lists.mcs.anl.gov/pipermail/codes-ross-users/attachments/20170228/fc61f213/attachment-0001.png>


More information about the codes-ross-users mailing list