Micro-optimizations in .NET (x86/x64)
Part 2
This is the second post in the "Micro-optimizations in .NET" series. If you missed the first one, check out Conversion from Boolean to Integer.
Example 2: Integer Comparison (Branchless Design)
As discussed previously, conditional branches are costly for modern processors when the branch predictor fails. Branch misprediction occurs frequently unless the data is homogeneous (e.g., all values are identical). The core idea of optimizing such algorithms is to "straighten" the execution flow by removing conditional jumps.
Let's examine a standard integer comparison function. It returns -1 if a < b, 1 if a > b, and 0 if a = b.
public static int CompareIntegers(int a, int b)
{
if (a < b) return -1;
if (a > b) return 1;
return 0;
}
In the worst-case scenario (when operands are equal), the CPU must evaluate two conditional jumps. The assembly listing confirms this:
; Examples.CompareIntegers(Int32, Int32)
L0000: cmp ecx, edx ; Compare a and b
L0002: jge short L000a ; Jump if a >= b
L0004: mov eax, 0xffffffff ; Result = -1
L0009: ret
L000a: cmp ecx, edx ; Compare again
L000c: jle short L0014 ; Jump if a <= b
L000e: mov eax, 1 ; Result = 1
L0013: ret
L0014: xor eax, eax ; Result = 0
L0016: ret
The Optimized Branchless Approach
We can eliminate branches by recognizing that the result is essentially the difference between two boolean states. Instead of if statements, we use the setg (set if greater) and setl (set if less) instructions, which don't trigger a jump.
public static int CompareIntegersOptimized(int a, int b)
{
bool isGreater = a > b;
bool isLess = a < b;
// Using the trick from Part 1
return ConvertBooleanToIntOptimized(isGreater) -
ConvertBooleanToIntOptimized(isLess);
}
The generated machine code is now linear:
; Examples.CompareIntegersOptimized(Int32, Int32)
L0000: xor eax, eax ; Clear EAX
L0002: cmp ecx, edx ; Compare a and b
L0004: setg al ; AL = (a > b) ? 1 : 0
L0007: cmp ecx, edx ; Compare again
L0009: setl dl ; DL = (a < b) ? 1 : 0
L000c: movzx edx, dl ; Zero-extend DL to EDX
L000f: sub eax, edx ; EAX = isGreater - isLess
L0011: ret
Performance Benchmarks
The impact of branch prediction is clearly visible when comparing random data vs. constant data.
1. Random Data (Unpredictable):
The branchless version is ~3x faster because it avoids the massive penalty of mispredictions.
| Method | Mean | Error | StdDev |
| CompareIntegers | 3.855 ms | 0.0654 ms | 0.0580 ms |
| CompareIntegersOptimized | 1.203 ms | 0.0342 ms | 0.0970 ms |
2. Constant Data (Predictable Zeroes):
Interestingly, the branchless version is ~20% slower here. When the CPU can perfectly predict the branch, the standard if version wins because it executes only 6 instructions versus the 8 instructions in our optimized version.
| Method | Mean | Error | StdDev |
| CompareIntegers | 885.9 μs | 15.62 μs | 13.84 μs |
| CompareIntegersOptimized | 1,070.7 μs | 14.58 μs | 12.93 μs |
Generic Implementation
Thanks to Static Abstract Members in Interfaces, we can make this logic generic for any type that supports comparison operators:
public static int CompareGeneric<T>(T left, T right)
where T : IComparisonOperators<T, T, bool>
{
return ConvertBooleanToIntOptimized(left > right) -
ConvertBooleanToIntOptimized(left < right);
}
Conclusion
Branchless code is a powerful tool for processing random or "noisy" data. However, as shown, it's not a silver bullet. Always profile your specific data distribution before committing to such micro-optimizations.




