The improvements in execution speed that are obtained depend upon the style and
content of the source program, for example, whether one- or multi-dimensioned
arrays are used, whether nested loops appear, and so on.
As optimisation can involve source code re arrangement and a change in the way
that registers and store locations are used, it is possible that numerical
results produced by an optimised program may differ in some way from those
produced by the unoptimised version of the same program. This effect may be
more noticeable with iterative algorithms and is due to the fact that a more
accurate value can be held in a coprocessor floating point register than can be
held in the corresponding store location.
Some programs may actually execute more slowly when optimised due to
non-executed loops that cannot be detected by the compiler, for example:
where N is zero or negative at run time. In this case code that is moved out of
the loop will be executed once, rather than not at all as would happen if this
optimisation had not been made.
When the compiler option /OPTIMISE is used, the compiler performs code
optimisation based on rearranging the order of execution of statements which
constitute a program unit (see below). If /OPTIMISE is not used, the following
optimisations are typical of those performed by default.
= I + 3 + 7
and producing code which is the same as for the
= I + 10
This might not appear to be of much use at first
glance, since you might not think that you would write expressions with
multiple constants in this way. However, consider the expression
PI is a parameter - the
2*PI part would be evaluated at compile time. In addition to this
however, a number of situations arise for the implicit arithmetic which the
compiler plants code for (chiefly array subscript calculation) where this
technique results in considerable reduction in the amount of arithmetic done at
Related to this is the conversion of the type of
constants where appropriate. For example, the statement:
is compiled as:
thus the need for a type conversion at run time is
= A(I,J+K) + 3
The code necessary to calculate the offset
represented by (I,J+K) is only performed once.
= I + J
L = K * I
For the second statement, the compiler recognises
that it has the value of K in a register, so it does not need to load K from
Note, however, that it will probably need to
reference the value of I from memory, since the calculation of I + J will have
resulted in the loss of the value of I from a register.
Even if there were some statements interspersed
between the statements above, this optimisation could still take place, so long
the register in question was not used for another purpose in the interim, and
none of the interim statements were GOTOs, and
none of the executable statements were labelled (a good reason to dispense with
unused labels in your code).
The compiler tries to avoid using registers which
might contain something useful in a subsequent calculation.
A related technique is used for the coprocessor
floating point registers, although due to the limited size of the hardware
register stack, it is not possible to leave a value in a register just in case
it might be useful. Instead, if a recently calculated floating point value
proves to be useful for a subsequent calculation, the instruction which places
the result in the corresponding memory location is converted from "store
and pop" to "store and don't pop". The value is then available
somewhere in the register stack for the subsequent calculation.
Note that this floating point register tracking is
not performed when the /DEBUG compiler option is used or implied.
Full use is made of the instruction set. For example, an integer addition or
subtraction of 1 is performed by the appropriate increment or decrement
instruction. Also, some optimisations can be used to perform certain arithmetic
operations a little quicker. For example, evaluation of
I*5, where I
is of integer type, can be performed with the instruction sequence:
which is faster than the corresponding integer
multiply. Note however, that this optimisation is not done in CHECK mode, since
any overflow would go undetected.
When the /OPTIMISE option is used, optimisations performed include the
Loop invariant motion. This means that any calculations which are constant with
respect to the loop variable may be moved, so that they are performed once and
for all on entry to the loop. This leads to the actual degradation in
performance mentioned earlier, for the case where the loop is not executed at
all. However, in most cases, particularly when the loop is executed a large
number of times, considerable savings can result.
Loop induction weakening. This means that, instead of using multiples of the
loop index, a constant is added to a pseudo variable each time round the loop.
For example, consider the following loop:
I = 1, N
A(1,I) = 0
The offset into the array A will be a constant
multiple of the loop variable I. The constant is related to the size of the
first dimension of the array A. Induction weakening will replace multiplication
by this constant to produce the array offset at each iteration of the loop by a
faster addition of the constant at each iteration.
Elimination of common subexpressions across statements. This is often a
consequence of the optimisations in (1) and (2) : expressions which are taken
out of the loop as either loop invariant, or as candidates for induction
weakening, can themselves be sub-parts of larger expressions.
In some loops, particularly useful quantities can be "locked" into
registers. "Locking" means that, for the duration of a loop, the
value of a program variable, or perhaps a derived quantity such as an offset
into an array, is kept in a register, and is not stored into its associated
store location (if indeed it has one) until exit from the loop.
Obviously, this requires that exit from the loop
cannot be by means of a GOTO from within itself, and that no subroutine or
function is called from within the loop, as these statements could destroy any
value held in the register.
Also, there is some trade-off involved in tying up a
register in this way, so generally locking will only occur for relatively short
Optimisation of the loop in the example given in 2)
above involves induction weakening and locking the array offset in a register.
Some additional optimisations based on the 80486 and Pentium instruction set. In
some cases integer instructions are used instead of floating point
instructions. This often results in different behaviour where the operands are
invalid (for example where they should cause an overflow), but it is assumed
that, if optimisation is being employed, problems such as this have been
Many cases of a "dot-product" construction are spotted and replaced
with faster code, for example:
I = 1, N
SUM = SUM + A(I)*B(I)
Many cases of redundant combinations of instructions are eliminated, for
example, jumps to the next line, loads from a register to itself which
sometimes are generated as a result of register locking (see 4 above).
The above list is not exhaustive, and new optimisations will be added during the
course of compiler development.