<div class="gmail_quote">On Mon, Sep 19, 2011 at 17:31, Barry Smith <span dir="ltr"><<a href="mailto:bsmith@mcs.anl.gov">bsmith@mcs.anl.gov</a>></span> wrote:<br><blockquote class="gmail_quote" style="margin:0 0 0 .8ex;border-left:1px #ccc solid;padding-left:1ex;">

Of course we would not have a string lookup but it would be the object->function(...) type of virtual function we use; I don't see how any compiler would ever be able to inline this; it seems to me that this might trash the instruction cache or perhaps not, if it doesn't trash the instruction cache then maybe it would not be anymore expensive then the if () test? But it does have to push the arguments on the stack before the function call and clean them off afterwards (though presumable cleaning them off is basically free) or will it be smart enough to use registers for the arguments (but then would it back up the previous values in the registers which takes time, or would it ..... or would it ..... . Hmm so maybe just using the two classes/virtual function approach is fine and I shouldn't do the uglier if () stuff??? My gut feeling (which is likely wrong) is that replacing an array access with a function call is always a bad idea :-(.</blockquote>

</div><br><div>It has to put the arguments in registers to make a function call. A static function call (address known at compile time) has essentially no overhead (like 1 cycle) after the arguments are positioned. Sometimes the compiler can avoid positioning the registers specially for the function call (e.g. it just uses those registers for earlier computation). If that bit of code is under register pressure, it will have to save and restore some registers. If you inline the function, there will probably be some registers used anyway, so it can be ambiguous, especially depending on the size of the function (e.g. if one side of the "if" is large).</div>

<div><br></div><div>A static function tail-calling a function pointer with the same arguments compiles to mov, mov, jmp. I.e.</div><div><br></div><div>int foo(obj,a,b,c) {</div><div>  return obj->ops->foo(obj,a,b,c);</div>

<div>}</div><div><br></div><div>The mov is cheap as long as the object is in cache (usually you're going to do something with it anyway), but the indirect function call takes around 6 cycles due to a partial pipeline stall. You see a similar cost for a "bad" branch of an if statement. (The pipeline usually predicts that backward conditional jumps will be taken, but forward jumps will not. The compiler has to make a decision when laying out the code, so there will always be a bad branch. Virtual calls are symmetric and if you take the same branch many times, all modern CPUs will predict it correctly regardless of which side is taken, so the cost of the indirect call will be mostly hidden.)</div>

<div><br></div><div>Different CPU generations will have different performance if you have a bunch of heterogeneous objects and it doesn't always get better. Penryn would get very good performance with a bunch of objects where type was index mod 3, but Sandy Bridge is about 5 times worse. This huge variation mostly goes away if the objects are mostly homogeneous. In the PetscTable case, it is just one object  and the indirect call is always the same.</div>

<div><br></div><div>Note that if you make the indirect call, there is no advantage to putting the implementations in a header (the compiler wasn't going to inline them anyway).</div>