Using summary information in IPA passes#

Programs are represented internally as a callgraph (a multi-graph where nodes are functions and edges are call sites) and a varpool (a list of static and external variables in the program).

The inter-procedural optimization is organized as a sequence of individual passes, which operate on the callgraph and the varpool. To make the implementation of WHOPR possible, every inter-procedural optimization pass is split into several stages that are executed at different times during WHOPR compilation:

  • LGEN time

    • Generate summary (generate_summary in struct ipa_opt_pass_d). This stage analyzes every function body and variable initializer is examined and stores relevant information into a pass-specific data structure.

    • Write summary (write_summary in struct ipa_opt_pass_d). This stage writes all the pass-specific information generated by generate_summary. Summaries go into their own LTO_section_* sections that have to be declared in lto-streamer.h: enum lto_section_type. A new section is created by calling create_output_block and data can be written using the lto_output_* routines.

  • WPA time

    • Read summary (read_summary in struct ipa_opt_pass_d). This stage reads all the pass-specific information in exactly the same order that it was written by write_summary.

    • Execute (execute in struct opt_pass). This performs inter-procedural propagation. This must be done without actual access to the individual function bodies or variable initializers. Typically, this results in a transitive closure operation over the summary information of all the nodes in the callgraph.

    • Write optimization summary (write_optimization_summary in struct ipa_opt_pass_d). This writes the result of the inter-procedural propagation into the object file. This can use the same data structures and helper routines used in write_summary.

  • LTRANS time

    • Read optimization summary (read_optimization_summary in struct ipa_opt_pass_d). The counterpart to write_optimization_summary. This reads the interprocedural optimization decisions in exactly the same format emitted by write_optimization_summary.

    • Transform (function_transform and variable_transform in struct ipa_opt_pass_d). The actual function bodies and variable initializers are updated based on the information passed down from the Execute stage.

The implementation of the inter-procedural passes are shared between LTO, WHOPR and classic non-LTO compilation.

  • During the traditional file-by-file mode every pass executes its own Generate summary, Execute, and Transform stages within the single execution context of the compiler.

  • In LTO compilation mode, every pass uses Generate summary and Write summary stages at compilation time, while the Read summary, Execute, and Transform stages are executed at link time.

  • In WHOPR mode all stages are used.

To simplify development, the GCC pass manager differentiates between normal inter-procedural passes (see Regular IPA passes), small inter-procedural passes (see Small IPA passes) and late inter-procedural passes (see Late IPA passes). A small or late IPA pass (SIMPLE_IPA_PASS) does everything at once and thus cannot be executed during WPA in WHOPR mode. It defines only the Execute stage and during this stage it accesses and modifies the function bodies. Such passes are useful for optimization at LGEN or LTRANS time and are used, for example, to implement early optimization before writing object files. The simple inter-procedural passes can also be used for easier prototyping and development of a new inter-procedural pass.

Virtual clones#

One of the main challenges of introducing the WHOPR compilation mode was addressing the interactions between optimization passes. In LTO compilation mode, the passes are executed in a sequence, each of which consists of analysis (or Generate summary), propagation (or Execute) and Transform stages. Once the work of one pass is finished, the next pass sees the updated program representation and can execute. This makes the individual passes dependent on each other.

In WHOPR mode all passes first execute their Generate summary stage. Then summary writing marks the end of the LGEN stage. At WPA time, the summaries are read back into memory and all passes run the Execute stage. Optimization summaries are streamed and sent to LTRANS, where all the passes execute the Transform stage.

Most optimization passes split naturally into analysis, propagation and transformation stages. But some do not. The main problem arises when one pass performs changes and the following pass gets confused by seeing different callgraphs between the Transform stage and the Generate summary or Execute stage. This means that the passes are required to communicate their decisions with each other.

To facilitate this communication, the GCC callgraph infrastructure implements virtual clones, a method of representing the changes performed by the optimization passes in the callgraph without needing to update function bodies.

A virtual clone in the callgraph is a function that has no associated body, just a description of how to create its body based on a different function (which itself may be a virtual clone).

The description of function modifications includes adjustments to the function’s signature (which allows, for example, removing or adding function arguments), substitutions to perform on the function body, and, for inlined functions, a pointer to the function that it will be inlined into.

It is also possible to redirect any edge of the callgraph from a function to its virtual clone. This implies updating of the call site to adjust for the new function signature.

Most of the transformations performed by inter-procedural optimizations can be represented via virtual clones. For instance, a constant propagation pass can produce a virtual clone of the function which replaces one of its arguments by a constant. The inliner can represent its decisions by producing a clone of a function whose body will be later integrated into a given function.

Using virtual clones, the program can be easily updated during the Execute stage, solving most of pass interactions problems that would otherwise occur during Transform.

Virtual clones are later materialized in the LTRANS stage and turned into real functions. Passes executed after the virtual clone were introduced also perform their Transform stage on new functions, so for a pass there is no significant difference between operating on a real function or a virtual clone introduced before its Execute stage.

Optimization passes then work on virtual clones introduced before their Execute stage as if they were real functions. The only difference is that clones are not visible during the Generate Summary stage.

To keep function summaries updated, the callgraph interface allows an optimizer to register a callback that is called every time a new clone is introduced as well as when the actual function or variable is generated or when a function or variable is removed. These hooks are registered in the Generate summary stage and allow the pass to keep its information intact until the Execute stage. The same hooks can also be registered during the Execute stage to keep the optimization summaries updated for the Transform stage.

IPA references#

GCC represents IPA references in the callgraph. For a function or variable A, the IPA reference is a list of all locations where the address of A is taken and, when A is a variable, a list of all direct stores and reads to/from A. References represent an oriented multi-graph on the union of nodes of the callgraph and the varpool. See ipa-reference.cc: ipa_reference_write_optimization_summary and ipa-reference.cc: ipa_reference_read_optimization_summary for details.

Jump functions#

Suppose that an optimization pass sees a function A and it knows the values of (some of) its arguments. The jump function describes the value of a parameter of a given function call in function A based on this knowledge.

Jump functions are used by several optimizations, such as the inter-procedural constant propagation pass and the devirtualization pass. The inliner also uses jump functions to perform inlining of callbacks.