<div class="gmail_quote">On Sat, Sep 22, 2012 at 2:54 PM, Matthew Knepley <span dir="ltr"><<a href="mailto:knepley@gmail.com" target="_blank">knepley@gmail.com</a>></span> wrote:<br><blockquote class="gmail_quote" style="margin:0 0 0 .8ex;border-left:1px #ccc solid;padding-left:1ex">
<div class="gmail_quote"><div>Changed to <span style="line-height:19px;text-align:justify;color:rgb(51,51,51);font-size:14px">mid = ((unsigned int)low + (unsigned int)high)) >> 1;, </span>which is in the comment above.</div>
</div></blockquote><div><br></div><div>Well, you wrote</div><div><br></div><div> PetscInt imid = ((unsigned PetscInt) imin + (unsigned PetscInt) imax) >> 1;</div><div><br></div><div>which doesn't compile because the unsigned keyword does not apply to typedefs.</div>
<div> </div><blockquote class="gmail_quote" style="margin:0 0 0 .8ex;border-left:1px #ccc solid;padding-left:1ex"><div class="gmail_quote"><div class="im">
<div> </div><blockquote class="gmail_quote" style="margin:0 0 0 .8ex;border-left:1px #ccc solid;padding-left:1ex"><div>Your implementation also does unnecessary arithmetic, but due to possible overflow, the compiler may not be able to remove it. Why not use the tight structure I suggested?</div>
</blockquote></div></div><br><a href="http://en.wikipedia.org/wiki/Binary_search" target="_blank">http://en.wikipedia.org/wiki/Binary_search</a></blockquote></div><br><div>The paragon of tight code... I prefer to avoid mixing C and Fortran conventions,</div>
<div><br></div><div><div><div><font face="courier new, monospace">PetscErrorCode PetscFindInt(PetscInt key,PetscInt n,const PetscInt ii[],PetscInt *loc)</font></div></div></div><div><font face="courier new, monospace">{</font></div>
<div><font face="courier new, monospace"> PetscInt lo = 0,hi = n;</font></div><div><font face="courier new, monospace"> /* guards */</font></div><div><div><font face="courier new, monospace"> while (hi - lo > 1) {</font></div>
<div><font face="courier new, monospace"> PetscInt mid = lo + (hi - lo) / 2;</font></div><div><font face="courier new, monospace"> if (key < ii[mid]) hi = mid;</font></div><div><font face="courier new, monospace"> else lo = mid;</font></div>
<div><font face="courier new, monospace"> }</font></div><div><font face="courier new, monospace"> *loc = key == ii[lo] ? lo : -1;</font></div></div><div><br></div><div><br></div><div><font face="courier new, monospace">Under gcc -O3 code is</font></div>
<div><font face="courier new, monospace"><div>0000000000000000 <PetscFindInt_Jed> xor eax,eax</div><div>0000000000000002 <PetscFindInt_Jed+0x2> jmp 000000000000001d <PetscFindInt_Jed+0x1d></div><div>
0000000000000004 <PetscFindInt_Jed+0x4> nop DWORD PTR [rax+0x0]</div><div>0000000000000008 <PetscFindInt_Jed+0x8> sar r8d,1</div><div>000000000000000b <PetscFindInt_Jed+0xb> add r8d,eax</div><div>
000000000000000e <PetscFindInt_Jed+0xe> movsxd r9,r8d</div><div>0000000000000011 <PetscFindInt_Jed+0x11> cmp DWORD PTR [rdx+r9*4],edi</div><div>0000000000000015 <PetscFindInt_Jed+0x15> cmovle eax,r8d</div>
<div>0000000000000019 <PetscFindInt_Jed+0x19> cmovg esi,r8d</div><div>000000000000001d <PetscFindInt_Jed+0x1d> mov r8d,esi</div><div>0000000000000020 <PetscFindInt_Jed+0x20> sub r8d,eax</div><div>
0000000000000023 <PetscFindInt_Jed+0x23> cmp r8d,0x1</div><div>0000000000000027 <PetscFindInt_Jed+0x27> jg 0000000000000008 <PetscFindInt_Jed+0x8></div><div>0000000000000029 <PetscFindInt_Jed+0x29> movsxd rsi,eax</div>
<div>000000000000002c <PetscFindInt_Jed+0x2c> cmp DWORD PTR [rdx+rsi*4],edi</div></font></div><div><font face="courier new, monospace"><div>000000000000002f <PetscFindInt_Jed+0x2f> mov edx,0xffffffff</div>
<div>0000000000000034 <PetscFindInt_Jed+0x34> cmovne eax,edx</div><div>0000000000000037 <PetscFindInt_Jed+0x37> mov DWORD PTR [rcx],eax</div><div>0000000000000039 <PetscFindInt_Jed+0x39> xor eax,eax</div>
<div>000000000000003b <PetscFindInt_Jed+0x3b> ret </div><div>000000000000003c <PetscFindInt_Jed+0x3c> nop DWORD PTR [rax+0x0]</div><div><br></div><div>When I strip guards out of your version, I get the following which is twice as big and has two conditional jumps per inner loop (at least the conditional forward jump is predicted as not taken). We can benchmark, but I'll be shocked if my version is not faster.</div>
<div><br></div><div>0000000000000040 <PetscFindInt_Matt> sub esi,0x1</div><div>0000000000000043 <PetscFindInt_Matt+0x3> xor eax,eax</div><div>0000000000000045 <PetscFindInt_Matt+0x5> jmp 0000000000000054 <PetscFindInt_Matt+0x14></div>
<div>0000000000000047 <PetscFindInt_Matt+0x7> nop WORD PTR [rax+rax*1+0x0]</div><div>0000000000000050 <PetscFindInt_Matt+0x10> lea eax,[r10+0x1]</div><div>0000000000000054 <PetscFindInt_Matt+0x14> cmp esi,eax</div>
<div>0000000000000056 <PetscFindInt_Matt+0x16> jle 000000000000008d <PetscFindInt_Matt+0x4d></div><div>0000000000000058 <PetscFindInt_Matt+0x18> lea r8d,[rax+rsi*1]</div><div>000000000000005c <PetscFindInt_Matt+0x1c> shr r8d,1</div>
<div>000000000000005f <PetscFindInt_Matt+0x1f> mov r9d,r8d</div><div>0000000000000062 <PetscFindInt_Matt+0x22> mov r10d,r8d</div><div>0000000000000065 <PetscFindInt_Matt+0x25> cmp edi,DWORD PTR [rdx+r9*4]</div>
<div>0000000000000069 <PetscFindInt_Matt+0x29> jg 0000000000000050 <PetscFindInt_Matt+0x10></div><div>000000000000006b <PetscFindInt_Matt+0x2b> cmp eax,r8d</div><div>000000000000006e <PetscFindInt_Matt+0x2e> jge 000000000000008a <PetscFindInt_Matt+0x4a></div>
<div>0000000000000070 <PetscFindInt_Matt+0x30> lea r9d,[r8+rax*1]</div><div>0000000000000074 <PetscFindInt_Matt+0x34> shr r9d,1</div><div>0000000000000077 <PetscFindInt_Matt+0x37> mov esi,r9d</div>
<div>000000000000007a <PetscFindInt_Matt+0x3a> mov r10d,r9d</div><div>000000000000007d <PetscFindInt_Matt+0x3d> cmp DWORD PTR [rdx+rsi*4],edi</div><div>0000000000000080 <PetscFindInt_Matt+0x40> jl 00000000000000a0 <PetscFindInt_Matt+0x60></div>
<div>0000000000000082 <PetscFindInt_Matt+0x42> mov r8d,r9d</div><div>0000000000000085 <PetscFindInt_Matt+0x45> cmp eax,r8d</div><div>0000000000000088 <PetscFindInt_Matt+0x48> jl 0000000000000070 <PetscFindInt_Matt+0x30></div>
<div>000000000000008a <PetscFindInt_Matt+0x4a> mov esi,r8d</div><div>000000000000008d <PetscFindInt_Matt+0x4d> cmp eax,esi</div><div>000000000000008f <PetscFindInt_Matt+0x4f> je 00000000000000a8 <PetscFindInt_Matt+0x68></div>
<div>0000000000000091 <PetscFindInt_Matt+0x51> mov eax,0xffffffff</div><div>0000000000000096 <PetscFindInt_Matt+0x56> mov DWORD PTR [rcx],eax</div><div>0000000000000098 <PetscFindInt_Matt+0x58> xor eax,eax</div>
<div>000000000000009a <PetscFindInt_Matt+0x5a> ret </div><div>000000000000009b <PetscFindInt_Matt+0x5b> nop DWORD PTR [rax+rax*1+0x0]</div><div>00000000000000a0 <PetscFindInt_Matt+0x60> mov esi,r8d</div>
<div>00000000000000a3 <PetscFindInt_Matt+0x63> jmp 0000000000000050 <PetscFindInt_Matt+0x10></div><div>00000000000000a5 <PetscFindInt_Matt+0x65> nop DWORD PTR [rax]</div><div>00000000000000a8 <PetscFindInt_Matt+0x68> movsxd rsi,eax</div>
<div>00000000000000ab <PetscFindInt_Matt+0x6b> cmp DWORD PTR [rdx+rsi*4],edi</div><div>00000000000000ae <PetscFindInt_Matt+0x6e> mov edx,0xffffffff</div><div>00000000000000b3 <PetscFindInt_Matt+0x73> cmovne eax,edx</div>
<div>00000000000000b6 <PetscFindInt_Matt+0x76> mov DWORD PTR [rcx],eax</div><div>00000000000000b8 <PetscFindInt_Matt+0x78> xor eax,eax</div><div>00000000000000ba <PetscFindInt_Matt+0x7a> ret</div></font></div>