Machine-Specific Peephole Optimizers#

In addition to instruction patterns the md file may contain definitions of machine-specific peephole optimizations.

The combiner does not notice certain peephole optimizations when the data flow in the program does not suggest that it should try them. For example, sometimes two consecutive insns related in purpose can be combined even though the second one does not appear to use a register computed in the first one. A machine-specific peephole optimizer can detect such opportunities.

There are two forms of peephole definitions that may be used. The original define_peephole is run at assembly output time to match insns and substitute assembly text. Use of define_peephole is deprecated.

A newer define_peephole2 matches insns and substitutes new insns. The peephole2 pass is run after register allocation but before scheduling, which may result in much better code for targets that do scheduling.

RTL to Text Peephole Optimizers#

A definition looks like this:

(define_peephole
  [insn-pattern-1
   insn-pattern-2
   ...]
  "condition"
  "template"
  "optional-insn-attributes")

The last string operand may be omitted if you are not using any machine-specific information in this machine description. If present, it must obey the same rules as in a define_insn.

In this skeleton, insn-pattern-1 and so on are patterns to match consecutive insns. The optimization applies to a sequence of insns when insn-pattern-1 matches the first one, insn-pattern-2 matches the next, and so on.

Each of the insns matched by a peephole must also match a define_insn. Peepholes are checked only at the last stage just before code generation, and only optionally. Therefore, any insn which would match a peephole but no define_insn will cause a crash in code generation in an unoptimized compilation, or at various optimization stages.

The operands of the insns are matched with match_operands, match_operator, and match_dup, as usual. What is not usual is that the operand numbers apply to all the insn patterns in the definition. So, you can check for identical operands in two insns by using match_operand in one insn and match_dup in the other.

The operand constraints used in match_operand patterns do not have any direct effect on the applicability of the peephole, but they will be validated afterward, so make sure your constraints are general enough to apply whenever the peephole matches. If the peephole matches but the constraints are not satisfied, the compiler will crash.

It is safe to omit constraints in all the operands of the peephole; or you can write constraints which serve as a double-check on the criteria previously tested.

Once a sequence of insns matches the patterns, the condition is checked. This is a C expression which makes the final decision whether to perform the optimization (we do so if the expression is nonzero). If condition is omitted (in other words, the string is empty) then the optimization is applied to every sequence of insns that matches the patterns.

The defined peephole optimizations are applied after register allocation is complete. Therefore, the peephole definition can check which operands have ended up in which kinds of registers, just by looking at the operands.

The way to refer to the operands in condition is to write operands[i] for operand number i (as matched by (match_operand i ...)). Use the variable insn to refer to the last of the insns being matched; use prev_active_insn to find the preceding insns.

When optimizing computations with intermediate results, you can use condition to match only when the intermediate results are not used elsewhere. Use the C expression dead_or_set_p (insn, op), where insn is the insn in which you expect the value to be used for the last time (from the value of insn, together with use of prev_nonnote_insn), and op is the intermediate value (from operands[i]).

Applying the optimization means replacing the sequence of insns with one new insn. The template controls ultimate output of assembler code for this combined insn. It works exactly like the template of a define_insn. Operand numbers in this template are the same ones used in matching the original sequence of insns.

The result of a defined peephole optimizer does not need to match any of the insn patterns in the machine description; it does not even have an opportunity to match them. The peephole optimizer definition itself serves as the insn pattern to control how the insn is output.

Defined peephole optimizers are run as assembler code is being output, so the insns they produce are never combined or rearranged in any way.

Here is an example, taken from the 68000 machine description:

(define_peephole
  [(set (reg:SI 15) (plus:SI (reg:SI 15) (const_int 4)))
   (set (match_operand:DF 0 "register_operand" "=f")
        (match_operand:DF 1 "register_operand" "ad"))]
  "FP_REG_P (operands[0]) && ! FP_REG_P (operands[1])"
{
  rtx xoperands[2];
  xoperands[1] = gen_rtx_REG (SImode, REGNO (operands[1]) + 1);
#ifdef MOTOROLA
  output_asm_insn ("move.l %1,(sp)", xoperands);
  output_asm_insn ("move.l %1,-(sp)", operands);
  return "fmove.d (sp)+,%0";
#else
  output_asm_insn ("movel %1,sp@", xoperands);
  output_asm_insn ("movel %1,sp@-", operands);
  return "fmoved sp@+,%0";
#endif
})

The effect of this optimization is to change

jbsr _foobar
addql #4,sp
movel d1,sp@-
movel d0,sp@-
fmoved sp@+,fp0

into

jbsr _foobar
movel d1,sp@
movel d0,sp@-
fmoved sp@+,fp0

If a peephole matches a sequence including one or more jump insns, you must take account of the flags such as CC_REVERSED which specify that the condition codes are represented in an unusual manner. The compiler automatically alters any ordinary conditional jumps which occur in such situations, but the compiler cannot alter jumps which have been replaced by peephole optimizations. So it is up to you to alter the assembler code that the peephole produces. Supply C code to write the assembler output, and in this C code check the condition code status flags and change the assembler code as appropriate. insn-pattern-1 and so on look almost like the second operand of define_insn. There is one important difference: the second operand of define_insn consists of one or more RTX’s enclosed in square brackets. Usually, there is only one: then the same action can be written as an element of a define_peephole. But when there are multiple actions in a define_insn, they are implicitly enclosed in a parallel. Then you must explicitly write the parallel, and the square brackets within it, in the define_peephole. Thus, if an insn pattern looks like this,

(define_insn "divmodsi4"
  [(set (match_operand:SI 0 "general_operand" "=d")
        (div:SI (match_operand:SI 1 "general_operand" "0")
                (match_operand:SI 2 "general_operand" "dmsK")))
   (set (match_operand:SI 3 "general_operand" "=d")
        (mod:SI (match_dup 1) (match_dup 2)))]
  "TARGET_68020"
  "divsl%.l %2,%3:%0")

then the way to mention this insn in a peephole is as follows:

(define_peephole
  [...
   (parallel
    [(set (match_operand:SI 0 "general_operand" "=d")
          (div:SI (match_operand:SI 1 "general_operand" "0")
                  (match_operand:SI 2 "general_operand" "dmsK")))
     (set (match_operand:SI 3 "general_operand" "=d")
          (mod:SI (match_dup 1) (match_dup 2)))])
   ...]
  ...)

RTL to RTL Peephole Optimizers#

The define_peephole2 definition tells the compiler how to substitute one sequence of instructions for another sequence, what additional scratch registers may be needed and what their lifetimes must be.

(define_peephole2
  [insn-pattern-1
   insn-pattern-2
   ...]
  "condition"
  [new-insn-pattern-1
   new-insn-pattern-2
   ...]
  "preparation-statements")

The definition is almost identical to define_split (see Defining How to Split Instructions) except that the pattern to match is not a single instruction, but a sequence of instructions.

It is possible to request additional scratch registers for use in the output template. If appropriate registers are not free, the pattern will simply not match.

Scratch registers are requested with a match_scratch pattern at the top level of the input pattern. The allocated register (initially) will be dead at the point requested within the original sequence. If the scratch is used at more than a single point, a match_dup pattern at the top level of the input pattern marks the last position in the input sequence at which the register must be available.

Here is an example from the IA-32 machine description:

(define_peephole2
  [(match_scratch:SI 2 "r")
   (parallel [(set (match_operand:SI 0 "register_operand" "")
                   (match_operator:SI 3 "arith_or_logical_operator"
                     [(match_dup 0)
                      (match_operand:SI 1 "memory_operand" "")]))
              (clobber (reg:CC 17))])]
  "! optimize_size && ! TARGET_READ_MODIFY"
  [(set (match_dup 2) (match_dup 1))
   (parallel [(set (match_dup 0)
                   (match_op_dup 3 [(match_dup 0) (match_dup 2)]))
              (clobber (reg:CC 17))])]
  "")

This pattern tries to split a load from its use in the hopes that we’ll be able to schedule around the memory load latency. It allocates a single SImode register of class GENERAL_REGS ("r") that needs to be live only at the point just before the arithmetic.

A real example requiring extended scratch lifetimes is harder to come by, so here’s a silly made-up example:

(define_peephole2
  [(match_scratch:SI 4 "r")
   (set (match_operand:SI 0 "" "") (match_operand:SI 1 "" ""))
   (set (match_operand:SI 2 "" "") (match_dup 1))
   (match_dup 4)
   (set (match_operand:SI 3 "" "") (match_dup 1))]
  "/* determine 1 does not overlap 0 and 2 */"
  [(set (match_dup 4) (match_dup 1))
   (set (match_dup 0) (match_dup 4))
   (set (match_dup 2) (match_dup 4))
   (set (match_dup 3) (match_dup 4))]
  "")

There are two special macros defined for use in the preparation statements: DONE and FAIL. Use them with a following semicolon, as a statement.

DONE#

Use the DONE macro to end RTL generation for the peephole. The only RTL insns generated as replacement for the matched input insn will be those already emitted by explicit calls to emit_insn within the preparation statements; the replacement pattern is not used.

FAIL#

Make the define_peephole2 fail on this occasion. When a define_peephole2 fails, it means that the replacement was not truly available for the particular inputs it was given. In that case, GCC may still apply a later define_peephole2 that also matches the given insn pattern. (Note that this is different from define_split, where FAIL prevents the input insn from being split at all.)

If the preparation falls through (invokes neither DONE nor FAIL), then the define_peephole2 uses the replacement template.

If we had not added the (match_dup 4) in the middle of the input sequence, it might have been the case that the register we chose at the beginning of the sequence is killed by the first or second set.