Overview#
The central data structure used by the internal representation is the
tree
. These nodes, while all of the C type tree
, are of
many varieties. A tree
is a pointer type, but the object to
which it points may be of a variety of types. From this point forward,
we will refer to trees in ordinary type, rather than in this
font
, except when talking about the actual C type tree
.
You can tell what kind of node a particular tree is by using the
TREE_CODE
macro. Many, many macros take trees as input and
return trees as output. However, most macros require a certain kind of
tree node as input. In other words, there is a type-system for trees,
but it is not reflected in the C type-system.
For safety, it is useful to configure GCC with --enable-checking
.
Although this results in a significant performance penalty (since all
tree types are checked at run-time), and is therefore inappropriate in a
release version, it is extremely helpful during the development process.
Many macros behave as predicates. Many, although not all, of these
predicates end in _P
. Do not rely on the result type of these
macros being of any particular type. You may, however, rely on the fact
that the type can be compared to 0
, so that statements like
if (TEST_P (t) && !TEST_P (y))
x = 1;
and
int i = (TEST_P (t) != 0);
are legal. Macros that return int
values now may be changed to
return tree
values, or other pointers in the future. Even those
that continue to return int
may return multiple nonzero codes
where previously they returned only zero and one. Therefore, you should
not write code like
if (TEST_P (t) == 1)
as this code is not guaranteed to work correctly in the future.
You should not take the address of values returned by the macros or functions described here. In particular, no guarantee is given that the values are lvalues.
In general, the names of macros are all in uppercase, while the names of functions are entirely in lowercase. There are rare exceptions to this rule. You should assume that any macro or function whose name is made up entirely of uppercase letters may evaluate its arguments more than once. You may assume that a macro or function whose name is made up entirely of lowercase letters will evaluate its arguments only once.
The error_mark_node
is a special tree. Its tree code is
ERROR_MARK
, but since there is only ever one node with that code,
the usual practice is to compare the tree against
error_mark_node
. (This test is just a test for pointer
equality.) If an error has occurred during front-end processing the
flag errorcount
will be set. If the front end has encountered
code it cannot handle, it will issue a message to the user and set
sorrycount
. When these flags are set, any macro or function
which normally returns a tree of a particular kind may instead return
the error_mark_node
. Thus, if you intend to do any processing of
erroneous code, you must be prepared to deal with the
error_mark_node
.
Occasionally, a particular tree slot (like an operand to an expression, or a particular field in a declaration) will be referred to as ‘reserved for the back end’. These slots are used to store RTL when the tree is converted to RTL for use by the GCC back end. However, if that process is not taking place (e.g., if the front end is being hooked up to an intelligent editor), then those slots may be used by the back end presently in use.
If you encounter situations that do not match this documentation, such as tree nodes of types not mentioned here, or macros documented to return entities of a particular kind that instead return entities of some different kind, you have found a bug, either in the front end or in the documentation. Please report these bugs as you would any other bug.
Trees#
All GENERIC trees have two fields in common. First, TREE_CHAIN
is a pointer that can be used as a singly-linked list to other trees.
The other is TREE_TYPE
. Many trees store the type of an
expression or declaration in this field.
These are some other functions for handling trees:
tree_size
Return the number of bytes a tree takes.
build0
,build1
,build2
,build3
,build4
,build5
,build6
These functions build a tree and supply values to put in each parameter. The basic signature is
code, type, [operands]
.code
is theTREE_CODE
, andtype
is a tree representing theTREE_TYPE
. These are followed by the operands, each of which is also a tree.
Identifiers#
An IDENTIFIER_NODE
represents a slightly more general concept
than the standard C or C++ concept of identifier. In particular, an
IDENTIFIER_NODE
may contain a $
, or other extraordinary
characters.
There are never two distinct IDENTIFIER_NODE
s representing the
same identifier. Therefore, you may use pointer equality to compare
IDENTIFIER_NODE
s, rather than using a routine like
strcmp
. Use get_identifier
to obtain the unique
IDENTIFIER_NODE
for a supplied string.
You can use the following macros to access identifiers:
- IDENTIFIER_POINTER#
The string represented by the identifier, represented as a
char*
. This string is alwaysNUL
-terminated, and contains no embeddedNUL
characters.
- IDENTIFIER_LENGTH#
The length of the string returned by
IDENTIFIER_POINTER
, not including the trailingNUL
. This value ofIDENTIFIER_LENGTH (x)
is always the same asstrlen (IDENTIFIER_POINTER (x))
.
- IDENTIFIER_OPNAME_P#
This predicate holds if the identifier represents the name of an overloaded operator. In this case, you should not depend on the contents of either the
IDENTIFIER_POINTER
or theIDENTIFIER_LENGTH
.
- IDENTIFIER_TYPENAME_P#
This predicate holds if the identifier represents the name of a user-defined conversion operator. In this case, the
TREE_TYPE
of theIDENTIFIER_NODE
holds the type to which the conversion operator converts.
Containers#
Two common container data structures can be represented directly with
tree nodes. A TREE_LIST
is a singly linked list containing two
trees per node. These are the TREE_PURPOSE
and TREE_VALUE
of each node. (Often, the TREE_PURPOSE
contains some kind of
tag, or additional information, while the TREE_VALUE
contains the
majority of the payload. In other cases, the TREE_PURPOSE
is
simply NULL_TREE
, while in still others both the
TREE_PURPOSE
and TREE_VALUE
are of equal stature.) Given
one TREE_LIST
node, the next node is found by following the
TREE_CHAIN
. If the TREE_CHAIN
is NULL_TREE
, then
you have reached the end of the list.
A TREE_VEC
is a simple vector. The TREE_VEC_LENGTH
is an
integer (not a tree) giving the number of nodes in the vector. The
nodes themselves are accessed using the TREE_VEC_ELT
macro, which
takes two arguments. The first is the TREE_VEC
in question; the
second is an integer indicating which element in the vector is desired.
The elements are indexed from zero.