[codes-ross-users] CODES LP Design Question
Phil Carns
carns at mcs.anl.gov
Thu May 14 13:51:13 CDT 2015
What I described in this thread turned out to be a bad idea.
The problem with handing off a function pointer is that the target LP
(thing2 in the example below) might not be on the same process as the
caller (thing1), so in the general case it isn't safe to let it hang
onto a function pointer to issue the completion event.
The lsm/local-storage-model API has this same issue (a higher level
models want to hand control off to a disk model and then just get
proceed with with its own work once the disk access is complete). LSM
solves this safely from a technical point of view by providing functions
for the caller to allocate an appropriate event, and letting the caller
specify it's own event struct that will be handed back on completion.
This is safe regardless of how your LPs are organized, but it still bugs
me a little, though, because a) it looks burly in the code and b) the
target LP doesn't have a way to pass results back in the completion event.
Anyway, I'll putter around with this and share if I come up with
something cleaner, but I wanted to at least respond to this thread as a
warning in the mean time :)
thanks,
-Phil
On 04/08/2015 10:44 AM, Carns, Philip H. wrote:
> I've never tried to modularize steps like this in a single LP, but it is
> a little similar to some cross-LP scenarios I've run into before. Here
> comes a long-winded example :)
>
> Imagine you have two LPs (that run on the same simulated node), called
> thing1 and thing2. You want the thing1 procedure to invoke thing2, and
> then for thing1 to continue once thing2 is done.
>
> If I were to do this now I think I would set it up like this. Assume
> that thing1 and thing2 each have their own .c and .h files.
>
> thing2.h has this prototype: void start_thing2(lp, gid, completion_fn_ptr);
>
> This function will construct and send an event to the thing2 LP
> (identified by the gid argument) to kick off whatever it is going to
> do. This function will be executed within a thing1 LP event handler,
> but the definition of the function is in thing2.c. The purpose of this
> arrangement is to let thing1 send an event to thing2 without having
> access to thing2's event structure definition. As long as you keep the
> API stable then thing2.c can do whatever it needs to do in terms of
> refactoring its event structure without perturbing thing1.
>
> The lp argument is the lp pointer for the *calling* LP, and is probably
> needed to allocate a new event.
>
> The completion_fn_ptr is a function pointer to a function defined by the
> caller (thing1 in this case) that sends an event back to the original
> LP. This is the mechanism for thing2 to tell thing1 it is done so that
> thing1 can continue operation. Since it is an opaque function pointer,
> thing2 remains oblivious to the event structure definition of thing1 (or
> even what kind of LP thing1 is). You can make the completion_fn_ptr
> signature anything that you like, so if thing2 has some result to report
> back, then it can be passed as an argument to that function and stuffed
> into the event that heads back to thing1.
>
> Of course you can also add whatever arbitrary arguments you want to
> start_thing2() to pass input parameters it needs to start whatever
> thing2 is doing. Those would get stuffed into the event destined for
> thing2.
>
> Unfortunately I haven't actually done this so I don't have an example.
> My thinking on how to organize this has evolved over time and I haven't
> gotten around to trying this particular permutation :) I've done
> something very similar except for the function pointer; I have a few
> examples have an explicitly defined function that jumps back from thing2
> to thing1. That works fine, but it is bad news if you want more than
> one type of LP to be able to call thing2.
>
> A related theme here is that I'm a fan (possibly in the minority on
> this) of splitting up different complex procedures that happen on the
> same simulated node into separate LPs. As long as you have them share
> node resources (NIC, storage, etc.) then the net result is the same in
> terms of the simulation outcome. The positive is that you don't have
> event handler explosion (big switch statements to handle different
> cases). The negative is that you have to jump between these LPs by
> sending events with small timestamps. Once you start doing that, then
> parallel conservative execution is practically out of the question; you
> have to do serial execution or parallel optimistic execution to handle
> those extra small events in a reasonable way.
>
> This might be overkill for your scenario, but I thought I would throw it
> out there. I wanted to collect my thoughts on this anyway. In general
> I don't think there is any particular way around this without juggling
> callbacks and such; really the best thing you can hope for right now is
> to minimize entanglement between different LP types in your code.
>
> -Phil
>
> On 04/08/2015 06:47 AM, Ross, Robert B. wrote:
>> I think the way to think about this is that the event payload carries the state associated with the operation and gets passed around between LPs as needed to simulate the steps. If you do this right, there shouldn't be local LP state associated with the operation -- everything is in the event payload that is needed.
>>
>> There might be state at LPs associated with the LP itself, such as what objects exist or what names are in the namespace. Or not, if you are just assuming things exist for instance.
>>
>> John, Phil, and/or Misbah may need to correct me...of course.
>>
>> -- Rob
>>
>>> On Apr 7, 2015, at 4:09 PM, Jenkins, Jonathan P. <jenkins at mcs.anl.gov> wrote:
>>>
>>> Hi Joe,
>>>
>>> So if I understand right, the context is you want to simulate some POSIX
>>> operations of interest, such as mkdir, and that the client is capable of
>>> performing the stat (the lookup) using purely local information (hence the
>>> self-event)?
>>>
>>> In general, it's useful to think of events in these systems as RPC calls,
>>> where the event structure you are setting up contains the parameters, and
>>> there are corresponding return parameters that the calling LP expects to
>>> receive. In this frame of thought, both RPC calls and returns issue a
>>> discrete event. In your example, you could have your "mkdir" RPC event
>>> handler perform an additional "lookup/stat" RPC to check the path's
>>> existence, and upon receiving the "return" event, either make the
>>> directory or fail.
>>>
>>> Unfortunately, event-driven programming more-or-less necessitates
>>> callback-heavy code, which can be quite awkward in some contexts. We've
>>> talked in the past about ways to at least standardize the way we do this
>>> in the context of CODES, but nothing has materialized as of yet.
>>>
>>> Hope that helps. Something tells me that the assumption about clients
>>> being capable of performing metadata operations without outside
>>> interaction with e.g. a storage server is not quite right, though I could
>>> be misunderstanding.
>>>
>>> Thanks,
>>> John
>>>
>>>> On 4/7/15, 3:43 PM, "Joe Scott" <tscott2 at g.clemson.edu> wrote:
>>>>
>>>> Hello All,
>>>>
>>>> I am having a hard time wrapping myself around the programming paradigm
>>>> here, and I wonder if you might offer some guidance on a better way to
>>>> use CODES.
>>>>
>>>> So I am trying to process these higher level tasks (POSIX tasks like
>>>> mkdir) by launching the subevents as separate processes.
>>>>
>>>> The specific case that is tying me in knots is a user issuing a mkdir.
>>>> It launches the mkdir event handler, which needs to perform a lookup on
>>>> the path of the mkdir.
>>>>
>>>> So I need to send an event from this client LP to itself to perform the
>>>> lookup. But I also need the lookup, upon completion, to relaunch the
>>>> mkdir task.
>>>>
>>>> Speaking it over with some of my lab mates, they seem to think I am
>>>> either overthinking it or trying to use the wrong tool for the job.
>>>>
>>>> Is this a usecase you guys are familiar with? Can you shed some light on
>>>> this situation?
>>>>
>>>> I feel like there should be a way to do this without getting into
>>>> callback/completion function hell.
>>>>
>>>> Thanks,
>>>> Joe Scott
>>>> Clemson University
>>>> _______________________________________________
>>>> codes-ross-users mailing list
>>>> codes-ross-users at lists.mcs.anl.gov
>>>> https://lists.mcs.anl.gov/mailman/listinfo/codes-ross-users
>>> _______________________________________________
>>> codes-ross-users mailing list
>>> codes-ross-users at lists.mcs.anl.gov
>>> https://lists.mcs.anl.gov/mailman/listinfo/codes-ross-users
>> _______________________________________________
>> codes-ross-users mailing list
>> codes-ross-users at lists.mcs.anl.gov
>> https://lists.mcs.anl.gov/mailman/listinfo/codes-ross-users
> _______________________________________________
> codes-ross-users mailing list
> codes-ross-users at lists.mcs.anl.gov
> https://lists.mcs.anl.gov/mailman/listinfo/codes-ross-users
More information about the codes-ross-users
mailing list