Unbounded by reference#

This section documents a GNU Modula-2 compiler switch which implements a language optimisation surrounding the implementation of unbounded arrays. In GNU Modula-2 the unbounded array is implemented by utilising an internal structure struct {dataType *address, unsigned int high}. So given the Modula-2 procedure declaration:

PROCEDURE foo (VAR a: ARRAY OF dataType) ;
BEGIN
   IF a[2]= (* etc *)
END foo ;

it is translated into GCC tree s, which can be represented in their C form thus:

void foo (struct {dataType *address, unsigned int high} a)
{
   if (a.address[2] == /* etc */
}

Whereas if the procedure foo was declared as:

PROCEDURE foo (a: ARRAY OF dataType) ;
BEGIN
   IF a[2]= (* etc *)
END foo ;

then it is implemented by being translated into the following GCC tree s, which can be represented in their C form thus:

void foo (struct {dataType *address, unsigned int high} a)
{
   dataType *copyContents = (dataType *)alloca (a.high+1);
   memcpy(copyContents, a.address, a.high+1);
   a.address = copyContents;

   if (a.address[2] == /* etc */
}

This implementation works, but it makes a copy of each non VAR unbounded array when a procedure is entered. If the unbounded array is not changed during procedure foo then this implementation will be very inefficient. In effect Modula-2 lacks the REF keyword of Ada. Consequently the programmer maybe tempted to sacrifice semantic clarity for greater efficiency by declaring the parameter using the VAR keyword in place of REF.

The -funbounded-by-reference switch instructs the compiler to check and see if the programmer is modifying the content of any unbounded array. If it is modified then a copy will be made upon entry into the procedure. Conversely if the content is only read and never modified then this non VAR unbounded array is a candidate for being passed by reference. It is only a candidate as it is still possible that passing this parameter by reference could alter the meaning of the source code. For example consider the following case:

PROCEDURE StrConCat (VAR a: ARRAY OF CHAR; b, c: ARRAY OF CHAR) ;
BEGIN
   (* code which performs string a := b + c *)
END StrConCat ;

PROCEDURE foo ;
VAR
   a: ARRAY [0..3] OF CHAR ;
BEGIN
   a := 'q' ;
   StrConCat(a, a, a)
END foo ;

In the code above we see that the same parameter, a, is being passed three times to StrConCat. Clearly even though parameters b and c are never modified it would be incorrect to implement them as pass by reference. Therefore the compiler checks to see if any non VAR parameter is type compatible with any VAR parameter and if so it generates runtime procedure entry checks to determine whether the contents of parameters b or c matches the contents of a. If a match is detected then a copy is made and the address in the unbounded struct ure is modified.

The compiler will check the address range of each candidate against the address range of any VAR parameter, providing they are type compatible. For example consider:

PROCEDURE foo (a: ARRAY OF BYTE; VAR f: REAL) ;
BEGIN
   f := 3.14 ;
   IF a[0]=BYTE(0)
   THEN
      (* etc *)
   END
END foo ;

PROCEDURE bar ;
BEGIN
   r := 2.0 ;
   foo(r, r)
END bar ;

Here we see that although parameter, a, is a candidate for the passing by reference, it would be incorrect to use this transformation. Thus the compiler detects that parameters, a and f are type compatible and will produce runtime checking code to test whether the address range of their respective contents intersect.

This section describes the linking related options. There are three linking strategies available which are dynamic scaffold, static scaffold and user defined. The dynamic scaffold is enabled by default and each module will register itself to the runtime M2RTS via a constructor. The static scaffold mechanism will invoke each modules _init and _finish function in turn via a sequence of calls from within main. Lastly the user defined strategy can be implemented by turning off the dynamic and static options via -fno-scaffold-dynamic and -fno-scaffold-static.

In the simple test below:

$ gm2 hello.mod

the driver will add the options -fscaffold-dynamic and -fgen-module-list=- which generate a list of application modules and also creates the main function with calls to M2RTS. It can be useful to add the option -fsources which displays the source files as they are parsed and summarizes whether the source file is required for compilation or linking.

If you wish to split the above command line into a compile and link then you could use these steps:

$ gm2 -c -fscaffold-main hello.mod
$ gm2 hello.o

The -fscaffold-main informs the compiler to generate the main function and scaffold. You can enable the environment variable GCC_M2LINK_RTFLAG to trace the construction and destruction of the application. The values for GCC_M2LINK_RTFLAG are shown in the table below:

value   | meaning
=================
all     | turn on all flags below
module  | trace modules as they register themselves
pre     | generate module list prior to dependency resolution
dep     | trace module dependency resolution
post    | generate module list after dependency resolution
force   | generate a module list after dependency and forced
        | ordering is complete

The values can be combined using a comma separated list.

One of the advantages of the dynamic scaffold is that the driver behaves in a similar way to the other front end drivers. For example consider a small project consisting of 4 definition implementation modules (a.def, a.mod, b.def, b.mod, c.def, c.mod, d.def, d.mod) and a program module program.mod.

To link this project we could:

$ gm2 -g -c a.mod
$ gm2 -g -c b.mod
$ gm2 -g -c c.mod
$ gm2 -g -c d.mod
$ gm2 -g program.mod a.o b.o c.o d.o

The module initialization sequence is defined by the ISO standard to follow the import graph traversal. The initialization order is the order in which the corresponding separate modules finish the processing of their import lists.

However, if required, you can override this using -fruntime-modules=a,b,c,d for example which forces the initialization sequence to a, b, c and d.