How to optimize for the Pentium
family of microprocessors

Copyright © 1996, 2000 by Agner Fog. Last modified 2000-03-31.

Contents

  1. Introduction
  2. Literature
  3. Calling assembly functions from high level language
  4. Debugging and verifying
  5. Memory model
  6. Alignment
  7. Cache
  8. First time versus repeated execution
  9. Address generation interlock (PPlain and PMMX)
  10. Pairing integer instructions (PPlain and PMMX)
    1. Perfect pairing
    2. Imperfect pairing
  11. Splitting complex instructions into simpler ones (PPlain and PMMX)
  12. Prefixes (PPlain and PMMX)
  13. Overview of PPro, PII and PIII pipeline
  14. Instruction decoding (PPro, PII and PIII)
  15. Instruction fetch (PPro, PII and PIII)
  16. Register renaming (PPro, PII and PIII)
    1. Eliminating dependencies
    2. Register read stalls
  17. Out of order execution (PPro, PII and PIII)
  18. Retirement (PPro, PII and PIII)
  19. Partial stalls (PPro, PII and PIII)
    1. Partial register stalls
    2. Partial flags stalls
    3. Flags stalls after shifts and rotates
    4. Partial memory stalls
  20. Dependency chains (PPro, PII and PIII)
  21. Searching for bottlenecks (PPro, PII and PIII)
  22. Jumps and branches (all processors)
    1. Branch prediction in PPlain
    2. Branch prediction in PMMX, PPro, PII and PIII
    3. Avoiding jumps (all processors)
    4. Avoiding conditional jumps by using flags (all processors)
    5. Replacing conditional jumps by conditional moves (PPro, PII and PIII)
  23. Reducing code size (all processors)
  24. Scheduling floating point code (PPlain and PMMX)
  25. Loop optimization (all processors)
    1. Loops in PPlain and PMMX
    2. Loops in PPro, PII and PIII
  26. Problematic Instructions
    1. XCHG (all processors)
    2. Rotates through carry (all processors)
    3. String instructions (all processors)
    4. Bit test (all processors)
    5. Integer multiplication (all processors)
    6. WAIT instruction (all processors)
    7. FCOM + FSTSW AX (all processors)
    8. FPREM (all processors)
    9. FRNDINT (all processors)
    10. FSCALE and exponential function (all processors)
    11. FPTAN (all processors)
    12. FSQRT (PIII)
    13. MOV [MEM], ACCUM (PPlain and PMMX)
    14. TEST instruction (PPlain and PMMX)
    15. Bit scan (PPlain and PMMX)
    16. FLDCW (PPro, PII and PIII)
  27. Special topics
    1. LEA instruction (all processors)
    2. Division (all processors)
    3. Freeing floating point registers (all processors)
    4. Transitions between floating point and MMX instructions PMMX, PII and PIII)
    5. Converting from floating point to integer (All processors)
    6. Using integer instructions to do floating point operations (All processors)
    7. Using floating point instructions to do integer operations (PPlain and PMMX)
    8. Moving blocks of data (All processors)
    9. Self-modifying code (All processors)
    10. Detecting processor type (All processors)
  28. List of instruction timings for PPlain and PMMX
    1. Integer instructions
    2. Floating point instructions
    3. MMX instructions (PMMX)
  29. List of instruction timings and micro-op breakdown for PPro, PII and PIII
    1. Integer instructions
    2. Floating point instructions
    3. MMX instructions (PII and PIII)
    4. XMM instructions (PIII)
  30. Testing speed
  31. Comparison of the different microprocessors
Prev Up Next
FatPhil's x86 resource
Fatphil's home page