[codes-ross-users] Simulation ends with events unprocessed

Jian Peng jpeng10 at hawk.iit.edu
Mon Mar 6 15:14:48 CST 2017


Hi,

   The reason turned out to be using a large offset value in
tw_event_new(). After it is fixed, everything works well. Thanks for your
replies :)

On Tue, Feb 28, 2017 at 3:57 PM, Mubarak, Misbah <mmubarak at anl.gov> wrote:

> Hi Jian,
>
> Thanks for reporting this issue. So I have a few questions: are you using
> the latest CODES master branch? If so, which dragonfly model are you using—
> there is an aries based dragonfly model (dragonfly-custom.C) in addition to
> the conventional Dally-style dragonfly that we have been using?
>
> I see that you are getting left over messages in terminals and routers,
> are you running the simulation long enough so that all messages have enough
> time to reach their destinations? Alternately, have you done any updates in
> the dragonfly routing or topology setup? Have you tested the workload with
> other network models and does it work fine?
>
> Looking at the issue, I don’t think this is a ROSS related problem, this
> looks like either a model or workload related problem.
>
> Thanks,
> Misbah
> From: <codes-ross-users-bounces at lists.mcs.anl.gov> on behalf of Jian Peng
> <jpeng10 at hawk.iit.edu>
> Date: Tuesday, February 28, 2017 at 5:36 PM
> To: "codes-ross-users at lists.mcs.anl.gov" <codes-ross-users at lists.mcs.
> anl.gov>
> Subject: Re: [codes-ross-users] Simulation ends with events unprocessed
>
> Hi,
>
>     After some debugging, I found the reason of this issue was
> "tw_event_send()". I added some debugging code shown as
> "tw_event_send.png". Then I got debugging output as "debug1.png". Please
> notice the highlighted value of pointer "kp". Then I found this same "kp"
> value in previous context, shown in "debug2.png". Since the "last_time" is
> greater than "recv_ts", "tw_eventq_push()" rather than "tw_pq_enqueue()" is
> called. As I'm doing sequential simulation, "tw_scheduler_sequential()" in
> tw-shed.c should be used. "tw_pq_dequeue()" is used to read the event
> queue, shown in "tw_scheduler_sequential.png".
>
>     So here's the question. The events that called "tw_eventq_push()" seem
> not able to be retrieved by "tw_pq_dequeue()". So I'm wondering that either
> mapping multiple LPs to one KP or only using "tw_pq_dequeue()" in
> sequential scheduler is a bug? I haven't looked too much into both parts,
> just got this question in mind.
>
>     Thanks!
>
>
>
> On Mon, Feb 27, 2017 at 1:20 PM, Jian Peng <jpeng10 at hawk.iit.edu> wrote:
>
>> Hi,
>>
>>     I'm running some experiments on Dragonfly with CODES 5.0.2.
>> Sometimes they end up with unsent messages remaining in terminals and
>> routers. I think the reason is the events that carry these messages are not
>> handled. I looked through my debugging information and confirmed it.
>>
>>     So, why does ROSS end when there are still events unprocessed? I'm
>> running a sequential simulation. Thanks.
>>
>> [image: Inline image 1]
>>
>>
>>
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.mcs.anl.gov/pipermail/codes-ross-users/attachments/20170306/9debb261/attachment-0001.html>
-------------- next part --------------
A non-text attachment was scrubbed...
Name: image.png
Type: image/png
Size: 62165 bytes
Desc: not available
URL: <http://lists.mcs.anl.gov/pipermail/codes-ross-users/attachments/20170306/9debb261/attachment-0001.png>


More information about the codes-ross-users mailing list