FTN95 now includes an in-built profiling tool called TimingAnalysis.
TimingAnalysis accurately measures CPU time spent in each subroutine of a
Fortran program. It allows program developers to discover the most CPU
intensive areas of a program so that these can be improved. Frequently, it will
be found that over 90% of the program CPU time is in less than 10% of the code.
TimingAnalysis produces a report giving statistics on CPU time, number of
calls to and from routines, and the number of page faults per routine. CPU time
statistics are collected on both an exclusive (only the single routine) basis
and an inclusive basis (the routine and everything called below it). In
addition to the reports, a file suitable for importing into a spread sheet is
To use the TimingAnalysis all that is required is to compile with the
/TIMING option. This can be done within Visual Studio .NET by selecting
the Project Properties and navigating to Compiler
Options->Debugging->Plant timing information and set this
option to Yes. If compiling from the command line then the /TIMING option
should be used.
TimingAnalysis creates output files containing summary information for the
program executed. If using TimingAnalysis from Visual Studio .NET then
these files will be placed in the project output directory and will have
extensions of .TMR and .TMO. The .TMR file contains summary analysis of
the executed program and the .TMO file contains a (by default) tab seperated
file ready for importing into a spreadsheet program such as Microsoft Excel.
From the command line only .TMR files are produced by default. To
specify a .TMO file from the command line an environment variable called
TIMINGOPTS should be created and set as follows:
set TIMINGOPTS=/TMO /DLM tab
where /TMO indicates that a .TMO file should be produced and DLM specifes the
delimeter character to be used in the .TMO file. The default is tab and
can be any character other than '-' or '/'.
How Timing Analysis Works
TimingAnalysis uses the RDTSC instruction to access the Pentium cycle
counter. This counter gives the number of cycles that have passed since
the computer started up. This allows direct measurement of the time short
segments of code take to execute. The times labeled as CPU time in
TimingAnalysis reports are actually elapsed time spent in a routine, to a very
high resolution. This time will include time waiting for I/O to the disk
(including paging) or waiting at a prompt to the screen. With this in mind, the
reports can be very useful.
The /TIMING compile option causes the compiler to produce code to plant code
to do a call to the timing routine when a routine is called. TimingAnalysis
stores the CPU timer and begins the work of collecting information. Any
overhead introduced by using TimingAnalysis is removed when the reports are
produced. The work of collecting and saving the timing information can take
significant time and can double the run time of a program which does a lot of
calls. This time spent is never counted in that of the timed program's
subroutines and does not need to be compensated for.
After the subroutine entry information is stored, TimingAnalysis again stores
the CPU timer and returns to the called subroutine. When TimingAnalysis is
called at return time, it performs the subroutine exit measurements similar to
the way it did at routine entry and then returns to the caller of the
subroutine just timed. When the program reaches termination TimingAnlaysis
produces reports and the program exits.
Understanding TimingAnalysis Reports
The timing analysis report is produced twice, once sorted on
exclusive CPU secand again sorted on
inclusive CPU sec.The second part
of the report continues for one page or until the inclusive Cpu% drops
below 4.0%.The report header
indicates the sort key used for the report.There
are two groups of information on the page, the exclusive information which
applies only the routine itself and the inclusive information which applies to
the routine and everything it calls and below. The following decribes the
columns in the .TMR file produced by TimingAnalysis.
Called - This is the number of times this routine is
called by other routines. Calls - This is the number of calls made to other routines directly by
this routine. Page Flts - This is the number of times a page had to be brought in
from disk during execution of this routine. P/F% - The percentage of page faults encountered in this routine. Cpu sec -This is the total CPU
time in seconds spent in this routine. Avg CPU -This is the total CPU
time divided by the number of calls to this routine. This number is scaled to allow good significance to be displayed in a
small space. It is followed by an indicator of its units which can be
s for seconds, m for
milliseconds, or u for microseconds. Cpu% -This is the percent of
the CPU time spent in this routine. CumCpu% -This is a running sum
of the column to its left and provides an easy way to determine how many
routines comprise what percent of the total CPU time.
Called - This is the number of calls to
this routine and all calls made by routines called by this routine. For
the main program, this is the total of all calls made during the program
execution. Calls - This is the total of the
calls made by this routine and the routines called by this routine. It
is equal to inclusive Called minus exclusive Called and was included on the
report mainly for symmetry. Cpu sec - This is the total CPU
time spent in this routine and all routines it calls. This
is in effect viewing this routine as the start of a subsystem of the whole
program. Cpu% - This is the percentage of
the CPU time that this routine and all called below it comprise.
When the /TMO option is given in the TIMINGOPTS environment
variable or when the timing option is used within Visual Studio .NET, a file
suitable for importing to a spread sheet or other utility is produced. This
file has the same name as the timing report file, with the .TMO suffix. If the
/DLM option is given a delimiter character can be given which can be: "comma",
"tab", or any punctuation character. This character will then be used as a
field separator. The .TMO file contains the basic data present in the timing
report. The first line contains names of the quantities. The remainder of the
file contains one line per called routine with that routine's data.
Using TimingAnalysis Reports
There are many ways to use TimingAnalysis reports to improve a
program's performance. First of
all, look at how many times each routine was called. Do
these coincide with your understanding of the program? Could
any be called in a loop they should not be in? When
a routine is called many times but has a very short average execution time,
there may be much time wasted in the call alone. It
may be a good idea to find a way to perform the function in-line or compute a
group of results in advance and simply access it where the call was.
The paging information can be very useful as well. In
routines with high paging, look for multi-dimensional arrays which are accessed
primarily with the right-most subscripts varying fastest.
The CPU time is the most useful statistic for improving program
performance. The TimingAnalysis
report is arranged so that you can easily determine a set of routines
which comprise any particular percent of the whole run time. These
are the routines which should be studied for possible improvement.
It is important to note what the time assigned to a routine
includes. All routines which are
compiled with the /TIMING option have their time assigned to them. Any
routines not compiled with this option will have their time assigned to the
routine which called them. All
FORTRAN library routines such as SIN, ATAN, LOG, etc. have their CPU time
included in the routines which call them. FORTRAN
I/O statements can use a lot of CPU time and this is also assigned to the
routines containing the I/O statements. The
CPU time used to call another timed routine and transfer arguments is assigned
to the calling program.
The CPU time used in the return
from the called program is assigned to the called program.
The maximum call depth within TimingAnalysis is 2048.
(a routine calling itself), a Fortran extension, will cause the inclusive
statistics to be incorrect for the recursively called routines.
Alternate subroutine returns are not supported and will lead to incorrect