PIPS High-Level Software Interface Pipsmake configuration
PIPS High-Level Software Interface
Pipsmake configuration
You can get a printable version of this document on
http://www.cri.ensmp.fr/pips/pipsmake-rc.htdoc/pipsmake-rc.pdf and a HTML version on http://www.cri.ensmp.fr/pips/pipsmake-rc.htdoc.
Chapter 1
Introduction
This paper describes high-level objects and functions that are potentially user-visible in a PIPS1 [?] interactive environment. It defines the internal software interface between a user interface and program analyses and transformations. This is clearly not a user guide but can be used as a reference guide, the best one before source code because PIPS user interfaces are very closely mapped on this document: some of their features are automatically derived from it.
Objects can be viewed and functions activated by one of PIPS existing user interfaces: tpips2 , the tty style interface which is currently recommended, pips3 [?], the old batch interface, improved by many shell scripts4 , wpips and epips, the X-Window System interfaces. The epips interface is an extension of wpips which uses Emacs to display more information in a more convenient way. Unfortunately, right now these window-based interfaces are no longer working and have been replaced by gpips. It is also possible to use PIPS through a Python API, pyps.
From a theoretical point of view, the object types and functions available in PIPS define an heterogeneous algebra with constructors (e.g. parser), extractors (e.g. prettyprinter) and operators (e.g. loop unrolling). Very few combinations of functions make sense, but many functions and object types are available. This abundance is confusing for casual and experiences users as well, and it was deemed necessary to assist them by providing default computation rules and automatic consistency management similar to make. The rule interpretor is called pipsmake6 and described in [?]. Its key concepts are the phase, which correspond to a PIPS function made user-visible, for instance, a parser, the resources, which correspond to objects used or defined by the phases, for instance, a source file or an AST (parsed code), and the virtual rules, which define the set of input resources used by a phase and the set of output resources defined by the phase. Since PIPS is an interprocedural tool, some real inpu resources are not known until execution. Some variables such as CALLERS or CALLEES can be used in virtual rules. They are expanded at execution to obtain an effective rule with the precise resources needed.
For debugging purposes and for advanced users, the precise choice and tuning of an algorithm can be made using properties. Default properties are installed with PIPS but they can be redefined, partly or entirely, by a properties.rc file located in the current directory. Properties can also be redefined from the user interfaces, for example with the command setproperty when the tpips interface is used.
As far as their static structures are concerned, most object types are described in more details in PIPS Internal Representation of Fortran and C code7 . A dynamic view is given here. In which order should functions be applied? Which object do they produce and vice-versa which function does produce such and such objects? How does PIPS cope with bottom-up and top-down interprocedurality?
Resources produced by several rules and their associated rule must be given alias names when they should be explicitly computed or activated by an interactive interfaceFI: I do not understand.. This is otherwise not relevant. The alias names are used to generate automatically header files and/or test files used by PIPS interfaces.
No more than one resource should be produced per line of rule because different files are automatically extracted from this one8 . Another caveat is that all resources whose names are suffixed with _file are considered printable or displayable, and the others are considered binary data, even though they may be ASCII strings.
This LATEX file is used by several procedures to derive some pieces of C code and ASCII files. The useful information is located in the PipsMake areas, a very simple literate programming environment... For instance alias information is used to generate automatically menus for window-based interfaces such as wpips or gpips. Object (a.k.a resource) types and functions are renamed using the alias declaration. The name space of aliases is global. All aliases must have different names. Function declarations are used to build a mapping table between function names and pointer to C functions, phases.h. Object suffixes are used to derive a header file, resources.h, with all resource names. Parts of this file are also extracted to generate on-line information for wpips and automatic completion for tpips.
The behavior of PIPS can be slightly tuned by using properties. Most properties are linked to a particular phase, for instance to prettyprint, but some are linked to PIPS infrastructure and are presented in Chapter 2.
1.1 Informal syntax
To understand and to be able to write new rules for pipsmake, a few things need to be known.
1.1.1 Example
The rule:
< PROGRAM.entities
< MODULE.code
< CALLEES.summary_effects
Properties are also declared in this file. For instance
1.1.2 Pipsmake variables
The following variables are defined to handle interprocedurality:
- PROGRAM:
- the whole application currently analyzed;
- MODULE:
- the current MODULE (a procedure or function);
- ALL:
- all the MODULEs of the current PROGRAM, functions and compilation units;
- ALLFUNC:
- all the MODULEs of the current PROGRAM that are functions;
- CALLEES:
- all the MODULEs called in the given MODULE;
- CALLERS:
- all the MODULEs that call the given MODULE.
These variables are used in the rule definitions and instantiated before pipsmake infers which resources are pre-requisites for a rule.
1.2 Properties
This paper also defines and describes global variables used to modify or fine tune PIPS behavior. Since global variables are useful for some purposes, but always dangerous, PIPS programmers are required to avoid them or to declare them explicitly as properties. Properties have an ASCII name and can have boolean, integer or string values.
Casual users should not use them. Some properties are modified for them by the user interface and/or the high-level functions. Some property combinations may be meaningless. More experienced users can set their values, using their names and a user interface.
Experienced users can also modify properties by inserting a file called properties.rc in their local directory. Of course, they cannot declare new properties, since they would not be recognized by the PIPS system. The local property file is read after the default property file, $PIPS_ROOT/etc/properties.rc. Some user-specified property values may be ignored because they are modified by a PIPS function before it had a chance to have any effect. Unfortunately, there is no explicit indication of usefulness for the properties in this report.
The default property file can be used to generate a custom version of properties.rc. It is derived automatically from this documentation, Documentation/pipsmake-rc.tex.
PIPS behavior can also be altered by Shell environment variables. Their generic names is XXXX_DEBUG_LEVEL, where XXXX is a library or a phase or an interface name (of course, there are exceptions). Theoretically these environment variables are also declared as properties, but this is generally forgotten by programmers. A debug level of 0 is equivalent to no tracing. The amount of tracing increases with the debug level. The maximum useful value is 9.
Another Shell environment variable, NEWGEN_MAX_TABULATED_ELEMENTS, is useful to analyze large programs. Its default value is 12,000 but it is not uncommon to have to set it up to 200,000.
Properties are listed below on a source library basis. Properties used in more than one library or used by PIPS infrastructure are presented first. Section 2.3 contains information about properties related to infrastructure, external and user interface libraries. Properties for analyses are grouped in Chapter 6. Properties for program transformations, parallelization and distribution phases are listed in the next section in Chapters 8 and 7. User output produced by different kinds of prettyprinters are presented in Chapter 9. Chaper 10 is dedicated to properties of the libraries added by CEA to implement Feautrier’s method.
1.3 Outline
Rule and object declaration are grouped in chapters: input files (Chapter 3), syntax analysis and abstract syntax tree (Chapter 4), analyses (Chapter 6), parallelizations (Chapter 7), program transformations (Chapter 8) and prettyprinters of output files (Chapter 9). Chapter 10 describes several analyses defined by Paul Feautrier. Chapter 11 contains a set of menu declarations for the window-based interfaces.
Virtually every PIPS programmer contributed some lines in this report. Inconsistencies are likely. Please report them to the PIPS team9 !
Contents
1.1 Informal syntax
1.1.1 Example
1.1.2 Pipsmake variables
1.2 Properties
1.3 Outline
2 Global Options
2.1 Fortran Loops
2.2 Logging
2.3 PIPS Infrastructure
2.3.1 Newgen
2.3.2 C3 Linear Library
2.3.3 PipsMake
2.3.4 PipsDBM
2.3.5 Top Level Control
2.3.6 Tpips Command Line Interface
2.3.7 Warning Control
2.3.8 Option for C Code Generation
3 Input Files
3.1 User File
3.2 Preprocessing and Splitting
3.2.1 Fortran case of preprocessing and splitting
3.2.1.1 Fortran Syntactic Verification
3.2.1.2 Fortran file preprocessing
3.2.1.3 Fortran Split
3.2.1.4 Fortran Syntactic Preprocessing
3.2.2 C Preprocessing and Splitting
3.2.2.1 C Syntactic Verification
3.2.3 Source File Hierarchy
3.3 Source File
3.4 Regeneration of User Source Files
4 Abstract Syntax Tree
4.1 Entities
4.2 Parsed Code and Callees
4.2.1 Fortran
4.2.1.1 Fortran restrictions
4.2.1.2 Some additional remarks
4.2.1.3 Some unfriendly features
4.2.1.4 Declaration of the standard parser
4.2.2 Declaration of HPFC parser
4.2.3 Declaration of the C parsers
4.3 Controlized Code (Hierarchical Control Flow Graph)
5 Pedagogical phases
5.1 Using XML backend
5.2 Prepending a comment
5.3 Prepending a call
5.4 Add a pragma to a module
6 Analyses
6.1 Call Graph
6.2 Memory Effects
6.2.1 Proper Memory Effects
6.2.2 Filtered Proper Memory Effects
6.2.3 Cumulated Memory Effects
6.2.4 Summary Data Flow Information (SDFI)
6.2.5 IN and OUT Effects
6.2.6 Proper and Cumulated References
6.2.7 Effect Properties
6.3 Reductions
6.3.1 Reduction Propagation
6.3.2 Reduction Detection
6.4 Chains (Use-Def Chains)
6.4.1 Menu for Use-Def Chains
6.4.2 Standard Use-Def Chains (a.k.a. Atomic Chains)
6.4.3 READ/WRITE Region-Based Chains
6.4.4 IN/OUT Region-Based Chains
6.4.5 Chain Properties
6.4.5.1 Add use-use Chains
6.4.5.2 Remove Some Chains
6.5 Dependence Graph (DG)
6.5.1 Menu for Dependence Tests
6.5.2 Fast Dependence Test
6.5.3 Full Dependence Test
6.5.4 Semantics Dependence Test
6.5.5 Dependence Test with Convex Array Regions
6.5.6 Dependence Properties (Ricedg)
6.5.6.1 Dependence Test Selection
6.5.6.2 Statistics
6.5.6.3 Algorithmic Dependences
6.5.6.4 Printout
6.5.6.5 Optimization
6.6 Flinter
6.7 Loop statistics
6.8 Semantics Analysis
6.8.1 Transformers
6.8.1.1 Menu for Transformers
6.8.1.2 Fast Intraprocedural Transformers
6.8.1.3 Full Intraprocedural Transformers
6.8.1.4 Fast Interprocedural Transformers
6.8.1.5 Full Interprocedural Transformers
6.8.1.6 Full Interprocedural Transformers
6.8.2 Summary Transformer
6.8.3 Initial Precondition
6.8.4 Intraprocedural Summary Precondition
6.8.5 Preconditions
6.8.5.1 Menu for Preconditions
6.8.5.2 Intra-Procedural Preconditions
6.8.5.3 Fast Inter-Procedural Preconditions
6.8.5.4 Full Inter-Procedural Preconditions
6.8.6 Interprocedural Summary Precondition
6.8.7 Total Preconditions
6.8.7.0.1 Status:
6.8.7.1 Menu for Total Preconditions
6.8.7.2 Intra-Procedural Total Preconditions
6.8.7.3 Inter-Procedural Total Preconditions
6.8.8 Summary Total Precondition
6.8.9 Summary Total Postcondition
6.8.10 Final Postcondition
6.8.11 Semantic Analysis Properties
6.8.11.1 Value types
6.8.11.2 Array declarations and accesses
6.8.11.3 Flow Sensitivity
6.8.11.4 Context for statement and expression transformers
6.8.11.5 Interprocedural Semantics Analysis
6.8.11.6 Fix Point Operators
6.8.11.7 Normalization level
6.8.11.8 Prettyprint
6.8.11.9 Debugging
6.9 Continuation conditions
6.10 Complexities
6.10.1 Menu for Complexities
6.10.2 Uniform Complexities
6.10.3 Summary Complexity
6.10.4 Floating Point Complexities
6.10.5 Complexity properties
6.10.5.1 Debugging
6.10.5.2 Fine Tuning
6.10.5.3 Target Machine and Compiler Selection
6.10.5.4 Evaluation Strategy
6.11 Convex Array Regions
6.11.1 Menu for Convex Array Regions
6.11.2 MAY READ/WRITE Convex Array Regions
6.11.3 MUST READ/WRITE Convex Array Regions
6.11.4 Summary READ/WRITE Convex Array Regions
6.11.5 IN Convex Array Regions
6.11.6 IN Summary Convex Array Regions
6.11.7 OUT Summary Convex Array Regions
6.11.8 OUT Convex Array Regions
6.11.9 Properties for Convex Array Regions
6.12 Alias Analysis
6.12.1 Dynamic Aliases
6.12.2 Intraprocedural Summary Points to Analysis
6.12.3 Points to Analysis
6.12.4 Pointer Values Analyses
6.12.5 Properties for pointer analyses
6.12.6 Menu for Alias Views
6.13 Complementary Sections
6.13.1 READ/WRITE Complementary Sections
6.13.2 Summary READ/WRITE Complementary Sections
7 Parallelization and Distribution
7.1 Code Parallelization
7.1.1 Parallelization properties
7.1.1.1 Properties controlling Rice parallelization
7.1.2 Menu for Parallelization Algorithm Selection
7.1.3 Allen & Kennedy’s Parallelization Algorithm
7.1.4 Def-Use Based Parallelization Algorithm
7.1.5 Parallelization and Vectorization for Cray Multiprocessors
7.1.6 Coarse Grain Parallelization
7.1.7 Global Loop Nest Parallelization
7.1.8 Coerce Parallel Code into Sequential Code
7.1.9 Detect Computation Intensive Loops
7.1.10 Limit Parallelism in Parallel Loop Nests
7.2 SIMDizer for SIMD Multimedia Instruction Set
7.2.1 SIMD properties
7.2.1.1 Auto-Unroll
7.2.1.2 Memory Organisation
7.2.1.3 Pattern file
7.2.2 Scalopes project
7.2.2.1 Bufferization
7.2.2.2 SCMP generation
7.3 Code Distribution
7.3.1 Shared-Memory Emulation
7.3.2 HPF Compiler
7.3.2.1 HPFC Filter
7.3.2.2 HPFC Initialization
7.3.2.3 HPF Directive removal
7.3.2.4 HPFC actual compilation
7.3.2.5 HPFC completion
7.3.2.6 HPFC install
7.3.2.7 HPFC High Performance Fortran Compiler properties
7.3.3 STEP: MPI code generation from OpenMP programs
7.3.3.1 STEP Directives
7.3.3.2 STEP Analysis
7.3.3.3 STEP code generation
7.3.4 PHRASE: high-level language transformation for partial evaluation in reconfigurable logic
7.3.4.1 Phrase Distributor Initialisation
7.3.4.2 Phrase Distributor
7.3.4.3 Phrase Distributor Control Code
7.3.5 Safescale
7.3.5.1 Distribution init
7.3.5.2 Statement Externalization
7.3.6 CoMap: Code Generation for Accelerators with DMA
7.3.6.1 Phrase Remove Dependences
7.3.6.2 Phrase comEngine Distributor
7.3.6.3 PHRASE ComEngine properties
7.3.7 Parallelization for Terapix architecture
7.3.7.1 Isolate Statement
7.3.7.2 Delay Communications
7.3.7.3 Hardware Constraints Solver
7.3.7.4 kernelize
7.3.7.5 Generating communications
7.3.8 Code distribution on GPU
7.3.9 Task generation for SCALOPES project
8 Program Transformations
8.1 Loop Transformations
8.1.1 Introduction
8.1.2 Loop range Normalization
8.1.3 Loop Distribution
8.1.4 Statement Insertion
8.1.5 Loop Expansion
8.1.6 Loop Fusion
8.1.7 Index Set Splitting
8.1.8 Loop Unrolling
8.1.8.1 Regular Loop Unroll
8.1.8.2 Full Loop Unroll
8.1.9 Loop Fusion
8.1.10 Strip-mining
8.1.11 Loop Interchange
8.1.12 Hyperplane Method
8.1.13 Loop Nest Tiling
8.1.14 Symbolic Tiling
8.1.15 Loop Normalize
8.1.16 Guard Elimination and Loop Transformations
8.1.17 Tiling for sequences of loop nests
8.2 Redundancy Elimination
8.2.1 Loop Invariant Code Motion
8.2.2 Partial Redundancy Elimination
8.3 Control-Flow Optimizations
8.3.1 Dead Code Elimination
8.3.1.1 Dead Code Elimination properties
8.3.2 Dead Code Elimination (a.k.a. Use-Def Elimination)
8.3.3 Control Restructurers
8.3.3.1 Unspaghettify
8.3.3.2 Restructure Control
8.3.3.3 For-loop recovering
8.3.3.4 For-loop to do-loop transformation
8.3.3.5 For-loop to while-loop transformation
8.3.3.6 Do-while to while-loop transformation
8.3.3.7 Spaghettify
8.3.3.8 Full Spaghettify
8.3.4 Control Structure Normalisation (STF)
8.3.5 Trivial Test Elimination
8.3.6 Finite State Machine Generation
8.3.6.1 FSM Generation
8.3.6.2 Full FSM Generation
8.3.6.3 FSM Split State
8.3.6.4 FSM Merge States
8.3.6.5 FSM properties
8.3.7 Control Counters
8.4 Expression Transformations
8.4.1 Atomizers
8.4.1.1 General Atomizer
8.4.1.2 Limited Atomizer
8.4.1.3 Atomizer properties
8.4.2 Partial Evaluation
8.4.3 Reduction Detection
8.4.4 Reduction Replacement
8.4.5 Forward Substitution
8.4.6 Expression Substitution
8.4.7 Rename operator
8.4.8 Array to pointer conversion
8.4.9 Expression Optimizations
8.4.9.1 Expression optimization using algebraic properties
8.4.9.2 Common subexpression elimination
8.5 Hardware Accelerator
8.5.1 FREIA Software
8.5.2 FREIA SPoC
8.5.3 FREIA Terapix
8.5.4 FREIA OpenCL
8.6 Function Level Transformations
8.6.1 Inlining
8.6.2 Unfolding
8.6.3 Outlining
8.6.4 Cloning
8.7 Declaration Transformations
8.7.1 Declarations cleaning
8.7.2 Array Resizing
8.7.2.1 Top Down Array Resizing
8.7.2.2 Bottom Up Array Resizing
8.7.2.3 Array Resizing Statistic
8.7.2.4 Array Resizing properties
8.7.3 Scalarization
8.7.4 Induction Variable Substitution
8.7.5 Strength Reduction
8.7.6 Flatten Code
8.7.7 Split Update Operator
8.7.8 Split Initializations (C code)
8.7.9 Set Return Type
8.7.10 Cast at Call Sites
8.8 Array Bound Checking
8.8.1 Elimination of Redundant Tests: Bottom-Up Approach
8.8.2 Insertion of Unavoidable Tests
8.8.3 Interprocedural Array Bound Checking
8.8.4 Array Bound Checking Instrumentation
8.9 Alias Verification
8.9.1 Alias Propagation
8.9.2 Alias Checking
8.10 Used Before Set
8.11 Miscellaneous transformations
8.11.1 Type Checker
8.11.2 Scalar and Array Privatization
8.11.2.1 Scalar Privatization
8.11.2.2 Array Privatization
8.11.3 Scalar and Array Expansion
8.11.3.1 Scalar Expansion
8.11.3.2 Array Expansion
8.11.4 Freeze variables
8.11.5 Manual Editing
8.11.6 Transformation Test
8.12 Extensions Transformations
8.12.1 OpenMP Pragma
9 Output Files (Prettyprinted Files)
9.1 Parsed Printed Files (User View)
9.1.1 Menu for User Views
9.1.2 Standard User View
9.1.3 User View with Transformers
9.1.4 User View with Preconditions
9.1.5 User View with Total Preconditions
9.1.6 User View with Continuation Conditions
9.1.7 User View with Convex Array Regions
9.1.8 User View with Invariant Convex Array Regions
9.1.9 User View with IN Convex Array Regions
9.1.10 User View with OUT Convex Array Regions
9.1.11 User View with Complexities
9.1.12 User View with Proper Effects
9.1.13 User View with Cumulated Effects
9.1.14 User View with IN Effects
9.1.15 User View with OUT Effects
9.2 Printed File (Sequential Views)
9.2.1 Html output
9.2.2 Menu for Sequential Views
9.2.3 Standard Sequential View
9.2.4 Sequential View with Transformers
9.2.5 Sequential View with Initial Preconditions
9.2.6 Sequential View with Complexities
9.2.7 Sequential View with Preconditions
9.2.8 Sequential View with Total Preconditions
9.2.9 Sequential View with Continuation Conditions
9.2.10 Sequential View with Convex Array Regions
9.2.10.1 Sequential View with Plain Pointer Regions
9.2.10.2 Sequential View with Proper Pointer Regions
9.2.10.3 Sequential View with Invariant Pointer Regions
9.2.10.4 Sequential View with Plain Convex Array Regions
9.2.10.5 Sequential View with Proper Convex Array Regions
9.2.10.6 Sequential View with Invariant Convex Array Regions
9.2.10.7 Sequential View with IN Convex Array Regions
9.2.10.8 Sequential View with OUT Convex Array Regions
9.2.10.9 Sequential View with Privatized Convex Array Regions
9.2.11 Sequential View with Complementary Sections
9.2.12 Sequential View with Proper Effects
9.2.13 Sequential View with Cumulated Effects
9.2.14 Sequential View with IN Effects
9.2.15 Sequential View with OUT Effects
9.2.16 Sequential View with Proper Reductions
9.2.17 Sequential View with Cumulated Reductions
9.2.18 Sequential View with Static Control Information
9.2.19 Sequential View with Points-To Information
9.2.20 Sequential View with Simple Pointer Values
9.2.21 Prettyprint properties
9.2.21.1 Language
9.2.21.2 Layout
9.2.21.3 Target Language Selection
9.2.21.3.1 Parallel output style
9.2.21.3.2 Default sequential output style
9.2.21.4 Display Analysis Results
9.2.21.5 Display Internals for Debugging
9.2.21.5.1 Warning:
9.2.21.6 Declarations
9.2.21.7 FORESYS Interface
9.2.21.8 HPFC Prettyprinter
9.2.21.9 Interface to Emacs
9.3 Printed Files with the Intraprocedural Control Graph
9.3.1 Menu for Graph Views
9.3.2 Standard Graph View
9.3.3 Graph View with Transformers
9.3.4 Graph View with Complexities
9.3.5 Graph View with Preconditions
9.3.6 Graph View with Preconditions
9.3.7 Graph View with Regions
9.3.8 Graph View with IN Regions
9.3.9 Graph View with OUT Regions
9.3.10 Graph View with Proper Effects
9.3.11 Graph View with Cumulated Effects
9.3.12 ICFG properties
9.3.13 Graph properties
9.3.13.1 Interface to Graphics Prettyprinters
9.4 Parallel Printed Files
9.4.1 Menu for Parallel View
9.4.2 Fortran 77 Parallel View
9.4.3 HPF Directives Parallel View
9.4.4 OpenMP Directives Parallel View
9.4.5 Fortran 90 Parallel View
9.4.6 Cray Fortran Parallel View
9.5 Call Graph Files
9.5.1 Menu for Call Graphs
9.5.2 Standard Call Graphs
9.5.3 Call Graphs with Complexities
9.5.4 Call Graphs with Preconditions
9.5.5 Call Graphs with Total Preconditions
9.5.6 Call Graphs with Transformers
9.5.7 Call Graphs with Proper Effects
9.5.8 Call Graphs with Cumulated Effects
9.5.9 Call Graphs with Regions
9.5.10 Call Graphs with IN Regions
9.5.11 Call Graphs with OUT Regions
9.6 DrawGraph Interprocedural Control Flow Graph Files (DVICFG)
9.6.1 Menu for DVICFG’s
9.6.2 Minimal ICFG with graphical filtered Proper Effects
9.7 Interprocedural Control Flow Graph Files (ICFG)
9.7.1 Menu for ICFG’s
9.7.2 Minimal ICFG
9.7.3 Minimal ICFG with Complexities
9.7.4 Minimal ICFG with Preconditions
9.7.5 Minimal ICFG with Preconditions
9.7.6 Minimal ICFG with Transformers
9.7.7 Minimal ICFG with Proper Effects
9.7.8 Minimal ICFG with filtered Proper Effects
9.7.9 Minimal ICFG with Cumulated Effects
9.7.10 Minimal ICFG with Regions
9.7.11 Minimal ICFG with IN Regions
9.7.12 Minimal ICFG with OUT Regions
9.7.13 ICFG with Loops
9.7.14 ICFG with Loops and Complexities
9.7.15 ICFG with Loops and Preconditions
9.7.16 ICFG with Loops and Total Preconditions
9.7.17 ICFG with Loops and Transformers
9.7.18 ICFG with Loops and Proper Effects
9.7.19 ICFG with Loops and Cumulated Effects
9.7.20 ICFG with Loops and Regions
9.7.21 ICFG with Loops and IN Regions
9.7.22 ICFG with Loops and OUT Regions
9.7.23 ICFG with Control
9.7.24 ICFG with Control and Complexities
9.7.25 ICFG with Control and Preconditions
9.7.26 ICFG with Control and Total Preconditions
9.7.27 ICFG with Control and Transformers
9.7.28 ICFG with Control and Proper Effects
9.7.29 ICFG with Control and Cumulated Effects
9.7.30 ICFG with Control and Regions
9.7.31 ICFG with Control and IN Regions
9.7.32 ICFG with Control and OUT Regions
9.8 Dependence Graph File
9.8.1 Menu For Dependence Graph Views
9.8.2 Effective Dependence Graph View
9.8.3 Loop-Carried Dependence Graph View
9.8.4 Whole Dependence Graph View
9.8.5 Filtered Dependence Graph View
9.8.6 Filtered Dependence daVinci Graph View
9.8.7 Filtered Dependence Graph View
9.8.8 Chains Graph View
9.8.9 Chains Graph Graphviz Dot View
9.8.10 Dependence Graph Graphviz Dot View
9.8.11 Properties for Dot output
9.9 Fortran to C prettyprinter
9.9.1 Properties for Fortran to C prettyprinter
9.10 Prettyprinters Smalltalk
9.11 Prettyprinter for the Polyhderal Compiler Collection (PoCC)
9.12 Prettyprinter for CLAIRE
10 Feautrier Methods (a.k.a. Polyhedral Method)
10.1 Static Control Detection
10.2 Scheduling
10.3 Code Generation for Affine Schedule
10.4 Prettyprinters for CM Fortran
11 User Interface Menu Layouts
11.1 View menu
11.2 Transformation menu
12 Conclusion
13 Known Problems
Chapter 2
Global Options
Options are called properties in PIPS. Most of them are related to a specific phase, for instance the dependence graph computation. They are declared next to the corresponding phase declaration. But some are related to one library or even to several libraries and they are declared in this chapter.
Skip this chapter on first reading. Also skip this chapter on second reading because you are unlikely to need these properties until you develop in PIPS.
2.1 Fortran Loops
Are DO loops bodies executed at least once (F-66 style), or not (Fortran 77)?
ONE_TRIP_DO FALSE
is useful for use/def and semantics analysis but is not used for region analyses. This dangerous property should be set to FALSE. It is not consistently checked by PIPS phases, because nobody seems to use this obsolete Fortran feature anymore.
2.2 Logging
With
LOG_TIMINGS FALSE
it is possible to display the amount of real, cpu and system times directly spent in each phase as well as the times spent reading/writing data structures from/to PIPS database. The computation of total time used to complete a pipsmake request is broken down into global times, a set of phase times which is the accumulation of the times spent in each phase, and a set of IO times, also accumulated through phases.
Note that the IO times are included in the phase times.
With
LOG_MEMORY_USAGE FALSE
it is possible to log the amount of memory used by each phase and by each request. This is mainly useful to check if a computation can be performed on a given machine. This memory log can also be used to track memory leaks. Valgrind may be more useful to track memory leaks.
2.3 PIPS Infrastructure
PIPS infrastructure is based on a few external libraries, Newgen and Linear, and on three key PIPS1 libraries:
- pipsdbm which manages resources such as code produced by PIPS and ensures persistance,
- pipsmake which ensures consistency within a workspace with respect to the producer-consumer rules declared in this file,
- and top-level which defines a common API for all PIPS user interfaces, whether human or API.
2.3.1 Newgen
Newgen offers some debugging support to check object consistency (gen_consistent_p and gen_defined_p), and for dynamic type checking. See Newgen documentation[?][?].
2.3.2 C3 Linear Library
This library is external and offers an independent debugging system.
The following properties specify how null (
SYSTEM_NULL "<null␣system>"
), undefined
SYSTEM_UNDEFINED "<undefined␣system>"
) or non feasible systems
SYSTEM_NOT_FEASIBLE "{0==-1}"
are prettyprinted by PIPS.
2.3.3 PipsMake
With
CHECK_RESOURCE_USAGE FALSE
it is possible to log and report differences between the set of resources actually read and written by the procedures called by pipsmake and the set of resources declared as read or written in pipsmake.rc file.
ACTIVATE_DEL_DERIVED_RES TRUE
controls the rule activation process that may delete from the database all the derived resources from the newly activated rule to make sure that non-consistent resources cannot be used by accident.
PIPSMAKE_CHECKPOINTS 0
controls how often resources should be saved and freed. 0 means never, and a positive value means every n applications of a rule. This feature was added to allow long big automatic tpips scripts that can coredump and be restarted latter on close to the state before the core. As another side effect, it allows to free the memory and to keep memory consumption as moderate as possible, as opposed to usual tpips runs which keep all memory allocated. Note that it should not be too often saved, because it may last a long time, especially when entities are considered on big workspaces. The frequency may be adapted in a script, rarely at the beginning to more often latter.
2.3.4 PipsDBM
Shell environment variables PIPSDBM_DEBUG_LEVEL can be set to ? to check object consistency when they are stored in the database, and to ? to check object consistency when they are stored or retrieved (in case an intermediate phase has corrupted some data structure unwillingly).
You can control what is done when a workspace is closed and resources are save. The
PIPSDBM_RESOURCES_TO_DELETE "obsolete"
property can be set to to ”obsolete” or to ”all”.
Note that it is not managed from pipsdbm but from pipsmake which knows what is obsolete or not.
2.3.5 Top Level Control
The top-level library is built on top of the pipsmake and pipsdbm libraries to factorize functions useful to build a PIPS user interface or API.
Property
USER_LOG_P TRUE
controls the logging of the session in the database of the current workspace. This log can be processed by PIPS utility logfile2tpips to generate automatically a tpips script which can be used to replay the current PIPS session, workspace by workspace, regardless of the PIPSuser interface used.
Property
ABORT_ON_USER_ERROR FALSE
specifies how user errors impact execution once the error message is printed on stderr: return and go ahead, usually when PIPS is used interactively, or core dump for debugging purposes and for script executions, especially non-regression tests.
Property
MAXIMUM_USER_ERROR 2
specifies the number of user error allowed before the programs brutally aborts.
Property
ACTIVE_PHASES "PRINT_SOURCE␣PRINT_CODE␣PRINT_PARALLELIZED77_CODE␣PRINT_CALL_GRAPH␣PRINT_ICFG␣TRANSFORMERS_INTER_FULL␣INTERPROCEDURAL_SUMMARY_PRECONDITION␣PRECONDITIONS_INTER_FULL␣ATOMIC_CHAINS␣RICE_SEMANTICS_DEPENDENCE_GRAPH␣MAY_REGIONS"
specifies which pipsmake phases should be used when several phases can be used to produce the same resource. This property is used when a workspace is created. A workspace is the database maintained by PIPS to contain all resources defined for a whole application or for the whole set of files used to create it.
Resources that create ambiguities for pipsmake are at least:
- parsed_printed_file
- printed_file
- callgraph_file
- icfg_file
- parsed_code, because several parsers are available
- transformers
- summary_precondition
- preconditions
- regions
- chains
- dg
This list must be updated according to new rules and new resources declared in this file. Note that no default parser is usually specified in this property, because it is selected automatically according to the source file suffixes when possible.
Until October 2009, the active phases were:
PRINT_CALL_GRAPH PRINT_ICFG TRANSFORMERS_INTRA_FAST
INTRAPROCEDURAL_SUMMARY_PRECONDITION
PRECONDITIONS_INTRA ATOMIC_CHAINS
RICE_FAST_DEPENDENCE_GRAPH MAY_REGIONS"
They still are used for the old non-regression tests.
2.3.6 Tpips Command Line Interface
tpips is one of PIPS user interfaces.
TPIPS_IS_A_SHELL FALSE
controls whether tpips should behave as an extended shell and consider any input command that is not a tpips command a Shell command.
This property is automatically set to TRUE when pyps is running.
PYPS FALSE
2.3.7 Warning Control
User warnings may be turned off. Definitely, this is not the default option! Most warnings must be read to understand surprising results. This property is used by library misc.
NO_USER_WARNING FALSE
By default, PIPS reports errors generated by system call stat which is used in library pipsdbm to check the time a resource has been written and hence its temporal consistency.
WARNING_ON_STAT_ERROR TRUE
2.3.8 Option for C Code Generation
The syntactic constraints of C89 have been eased for declarations in C99, where it is possible to intersperse statement declarations within executable statements. This property is used to request C89 compatible code generation.
C89_CODE_GENERATION FALSE
So the default option is to generate C99 code, which may be changed because it is likely to make the code generated by PIPS unparsable by PIPS.
There is no guarantee that each code generation phase is going to comply with this property. It is up to each developper to decide if this global property is to be used or not in his/her local phase.
Chapter 3
Input Files
3.1 User File
An input program is a set of user Fortran 77 or C source files and a name, called a workspace. The files are looked for in the current directory, then by using the colon-separated PIPS_SRCPATH variable for other directories where they might be found. The first occurrence of the file name in the ordered directories is chosen, which is consistent with PATH and MANPATH behaviour.
The source files are splitted by PIPS at the program initialization phase to produce one PIPS-private source file for each procedure, subroutine or function, and for each block data. A function like fsplit is used and the new files are stored in the workspace, which simply is a UNIX sub-directory of the current directory. These new files have names suffixed by .f.orig.
Since PIPS performs interprocedural analyses, it expects to find a source code file for each procedure or function called. Missing modules can be replaced by stubs, which can be made more or less precise with respect to their effects on formal parameters and global variables. A stub may be empty. Empty stubs can be automatically generated if the code is properly typed (see Section 3.3).
The user source files should not be edited by the user once PIPS has been started because these editions are not going to be taken into account unless a new workspace is created. But their preprocessed copies, the PIPS source files, safely can be edited while running PIPS. The automatic consistency mechanism makes sure that any information displayed to the user is consistent with the current state of the sources files in the workspace. These source files have names terminated by the standard suffix, .f.
New user source files should be automatically and completely re-built when the program is no longer under PIPS control, i.e. when the workspace is closed. An executable application can easily be regenerated after code transformations using the tpips1 interface and requesting the PRINTED_FILE resources for all modules, including compilation units in C:
display PRINTED_FILE[%ALL]
Note that compilation units can be left out with:
display PRINTED_FILE[%ALLFUNC]
In both cases with C source code, the order of modules may be unsuitable for direct recompilation and compilation units should be included anyway, but this is what is done by explicitly requesting the code regeneration as described in § 3.4.
Note that PIPS expects proper ANSI Fortran 77 code. Its parser was not designed to locate syntax errors. It is highly recommended to check source files with a standard Fortran compiler (see Section 3.2) before submitting them to PIPS.
3.2 Preprocessing and Splitting
3.2.1 Fortran case of preprocessing and splitting
The Fortran files specified as input to PIPS by the user are preprocessed in various ways.
3.2.1.1 Fortran Syntactic Verification
If the PIPS_CHECK_FORTRAN shell environment variable is defined to false or no or 0, the syntax of the source files is not checked by compiling it with a C compiler.If the PIPS_CHECK_FORTRAN shell environment variable is defined to true or yes or 1, the syntax of the file is checked by compiling it with a Fortran 77 compiler. If the PIPS_CHECK_FORTRAN shell environment variable is not defined, the check is performed according to CHECK_FORTRAN_SYNTAX_BEFORE_RUNNING_PIPS 3.2.1.1.
The Fortran compiler is defined by the PIPS_FLINT environment variable. If it is undefined, the default compiler is f77 -c -ansi).
In case of failure, a warning is displayed. Note that if the program cannot be compiled properly with a Fortran compiler, it is likely that many problems will be encountered within PIPS.
The next property also triggers this preliminary syntactic verification.
CHECK_FORTRAN_SYNTAX_BEFORE_RUNNING_PIPS TRUE
PIPS requires source code for all leaves in its visible call graph. By default, a user error is raised by Function initializer if a user request cannot be satisfied because some source code is missing. It also is possible to generate some synthetic code (also known as stubs) and to update the current module list but this is not a very satisfying option because all interprocedural analysis results are going to be wrong. The user should retrieve the generated .f files in the workspace, under the Tmp directory, and add some assignments (def ) and uses to mimic the action of the real code to have a sufficient behavior from the point of view of the analysis or transformations you want to apply on the whole program. The user modified synthetic files should then be saved and used to generate a new workspace.
If PREPROCESSOR_MISSING_FILE_HANDLING 3.2.1.1 is set to "query", a script can optionally be set to handle the interactive request using PREPROCESSOR_MISSING_FILE_GENERATOR 3.2.1.1. This script is passed the function name and prints the filename on standard output. When empty, it uses an internal one.
Valid settings: error or generate or query.
PREPROCESSOR_MISSING_FILE_HANDLING "error"
PREPROCESSOR_MISSING_FILE_GENERATOR ""
The generated stub can have various default effect, say to prevent over-optimistic parallelization.
STUB_MEMORY_BARRIER FALSE
STUB_IO_BARRIER FALSE
3.2.1.2 Fortran file preprocessing
If the file suffix is .F then the file is preprocessed. By default PIPS uses gfortran -E for Fortran files. This preprocessor can be changed by setting the PIPS_FPP environment variable.
Moreover the default preprocessing options are -P -D__PIPS__ -D__HPFC__ and they can be extended (not replaced...) with the PIPS_FPP_FLAGS environment variable.
3.2.1.3 Fortran Split
The file is then split into one file per module using a PIPS specialized version of fsplit2 . This preprocessing also handles
- Hollerith constants by converting them to the quoted syntax3 ;
- unnamed modules by adding MAIN000 or PROGRAM MAIN000 or or DATA000 or BLOCK DATA DATA000 according to needs.
The output of this phase is a set of .f_initial files in per-module subdirectories. They constitute the resource INITIAL_FILE.
3.2.1.4 Fortran Syntactic Preprocessing
A second step of preprocessing is performed to produce SOURCE_FILE files with standard Fortran suffix .f from the .f_initial files. The two preprocessing steps are shown in Figure 3.1.
Each module source file is then processed by top-level to handle Fortran include and to comment out IMPLICIT NONE which are not managed by PIPS. Also this phase performs some transformations of complex constants to help the PIPS parser. Files referenced in Fortran include statements are looked for from the directory where the Fortran file is. The Shell variable PIPS_CPP_FLAGS is not used to locate these include files.
3.2.2 C Preprocessing and Splitting
The C preprocessor is applied before the splitting. By default PIPS uses cpp -C for C files. This preprocessor can be changed by setting the PIPS_CPP environment variable.
Moreover the -D__PIPS__ -D__HPFC__ -U__GNUC__ preprocessing options are used and can be extended (not replaced) with the PIPS_CPP_FLAGS environment variable.
This PIPS_CPP_FLAGS variable can also be used to locate the include files. Directories to search are specified with the -Ifile option, as usual for the C preprocessor.
3.2.2.1 C Syntactic Verification
If the PIPS_CHECK_C shell environment variable is defined to false or no or 0, the syntax of the source files is not checked by compiling it with a C compiler. If the PIPS_CHECK_C shell environment variable is defined to true or yes or 1, the syntax of the file is checked by compiling it with a C compiler. If the PIPS_CHECK_C shell environment variable is not defined, the check is performed according to CHECK_C_SYNTAX_BEFORE_RUNNING_PIPS 3.2.2.1.
The environment variable PIPS_CC is used to define the C compiler available. If it is undefined, the compiler chosen is gcc -c ).
In case of failure, a warning is displayed.
If the environement variable PIPS_CPP_FLAGS is defined, it should contain the options -Wall and -Werror for the check to be effective.
The next property also triggers this preliminary syntactic verification.
CHECK_C_SYNTAX_BEFORE_RUNNING_PIPS TRUE
Although its default value is FALSE, it is much safer to set it to true when dealing with new sources files. PIPS is not designed to process non-standard source code. Bugs in source files are not well explained or localized. They can result in weird behaviors and inexpected core dumps. Before complaining about PIPS, it is higly recommended to set this property to TRUE.
Note: the C and Fortran syntactic verifications could be controlled by a unique property.
3.2.3 Source File Hierarchy
The source files may be placed in different directories and have the same name, which makes resource management more difficult. The default option is to assume that no file name conflicts occur. This is the historical option and it leads to much simpler module names.
PREPROCESSOR_FILE_NAME_CONFLICT_HANDLING FALSE
3.3 Source File
A source_file contains the code of exactly one module. Source files are created from user source files at program initialization by fsplit or a similar function if fsplit is not available (see Section 3.2). A source file may be updated by the user4 , but not by PIPS. Program transformations are performed on the internal representation (see 4) and visible in the prettyprinted output (see 9).
Source code splitting and preprocessing, e.g. cpp, are performed by the function create_workspace() from the top-level library, in collaboration with db_create_workspace() from library pipsdbm which creates the workspace directory. The user source files have names suffixed by .f or .F if cpp must be applied. They are split into original user_files with suffix .f.orig. These so-called original user files are in fact copies stored in the workspace. The syntactic PIPS preprocessor is applied to generate what is known as a source_file by PIPS. This process is fully automatized and not visible from PIPS user interfaces. However, the cpp preprocessor actions can be controlled using the Shell environment variable PIPS_CPP_FLAGS.
Function initializer is only called when the source code is not found. If the user code is properly typed, it is possible to force initializer to generate empty stubs by setting properties PREPROCESSOR_MISSING_FILE_HANDLING 3.2.1.1 and, to avoid inconsistency, PARSER_TYPE_CHECK_CALL_SITES 4.2.1.4. But remember that many Fortran codes use subroutines with variable numbers of arguments and with polymorphic types. Fortran varargs mechanism can be achieved by using or not the second argument according to the first one. Polymorphism can be useful to design an IO package or generic array subroutine, e.g. a subroutine setting an array to zero or a subroutine to copy an array into another one.
The current default option is to generate a user error if some source code is missing. This decision was made for two reasons:
- too many warnings about typing are generated as soon as polymorphism is used;
- analysis results and code transformations are potentially wrong because no memory effects are synthesized
Sometimes, a function happen to be defined (and not only declared) inside a header file with the inline keyword. In that case PIPS can consider it as a regular module or just ignore it, as its presence may be system-dependant. Property IGNORE_FUNCTION_IN_HEADER 3.3 control this behavior and must be set before workspace creation.
IGNORE_FUNCTION_IN_HEADER TRUE
Modules can be flagged as “stubs”, aka functions provided to PIPS but which shouldn’t be inlined or modified. Property PREPROCESSOR_INITIALIZER_FLAG_AS_STUB 3.3 controls if the initializer should declare new files as stubs.
< PROGRAM.stubs
PREPROCESSOR_INITIALIZER_FLAG_AS_STUB TRUE
> MODULE.initial_file
Note: the generation of the resource user_file here above is mainly directed in having the resource concept here. More thought is needed to have the concept of user files managed by pipsmake.
MUST appear after initializer:
< MODULE.initial_file
< MODULE.user_file
In C, the initializer can generate directly a c_source_file and its compilation unit.
> COMPILATION_UNIT.c_source_file
3.4 Regeneration of User Source Files
The unsplit 3.4 phase regenerates user files from available printed_file. The various modules that where initially stored in single file are appended together in a file with the same name. Not that just fsplit is reversed, not a preprocessing through cpp. Also the include file preprocessing is not reversed.
Regeneration of user files. The various modules that where initially stored in single file are appended together in a file with the same name.
alias unsplit ’User files Regeneration’
unsplit > PROGRAM.user_file
< ALL.user_file
< ALL.printed_file
Chapter 4
Abstract Syntax Tree
The abstract syntax tree, a.k.a intermediate representation, a.k.a. internal representation, is presented in [?] and in PIPS Internal Representation of Fortran and C code1 .
4.1 Entities
Program entities are stored in PIPS unique symbol table2 , called entities. Fortran entities, like intrinsics and operators, are created by bootstrap at program initialization. The symbol table is updated with user local and global variables when modules are parsed or linked together. This side effect is not disclosed to pipsmake.
The entity data structure is described in PIPS Internal Representation of Fortran and C code3 .
The declaration of new intrinsics is not easy because it was assumed that there number was fixed and limited by the Fortran standard. In fact, Fortran extensions define new ones. To add a new intrinsic, C code in bootstrap/bootstrap.c and in effects-generic/intrinsics.c must be added to declare its name, type and Read/Write memory effects.
Information about entities generated by the parsers is printed out conditionally to property: PARSER_DUMP_SYMBOL_TABLE 4.2.1.4. which is set to false by default. Unless you are debugging the parser, do not set this property to TRUE but display the symbol table file. See Section 4.2.1.4 for Fortran and Section 4.2.3 for C.
4.2 Parsed Code and Callees
Each module source code is parsed to produce an internal representation called parsed_code and a list of called module names, callees.
4.2.1 Fortran
Source code is assumed to be fully Fortran-77 compliant. On the first encountered error, the parser may be able to emit a useful message or the non-analyzed part of the source code is printed out.
PIPS input language is standard Fortran 77 with few extensions and some restrictions. The input character set includes underscore, _, and varying length variable names, i.e. they are not restricted to 6 characters.
4.2.1.1 Fortran restrictions
- ENTRY statements are not recognized and a user error is generated. Very few cases of this obsolete feature were encountered in the codes initially used to benchmark PIPS. ENTRY statements have to be replaced manually by SUBROUTINE or FUNCTION and appropriate commons. If the parser bumps into a call to an ENTRY point, it may wrongly diagnose a missing source code for this entry, or even generate a useless but pipsmake satisfying stub if the corresponding property has been set (see Section 3.3).
- Multiple returns are not in PIPS Fortran.
- ASSIGN and assigned GOTO are not in PIPS Fortran.
- Computed GOTOs are not in PIPS Fortran. They are automatically replaced by a IF...ELSEIF...ENDIF construct in the parser.
- Functional formal parameters are not accepted. This is deeply exploited in pipsmake.
- Integer PARAMETERs must be initialized with integer constant expressions because conversion functions are not implemented.
- DO loop headers should have no label. Add a CONTINUE just before the loop when it happens. This can be performed automatically if the property PARSER_SIMPLIFY_LABELLED_LOOPS 4.2.1.4 is set to TRUE. This restriction is imposed by the parallelization phases, not by the parser.
- Complex constants, e.g. (0.,1.), are not directly recognized by the parser. They must be replaced by a call to intrinsic CMPLX. The PIPS preprocessing replaces them by a call to COMPLX_.
- Function formulae are not recognized by the parser. An undeclared array
and/or an unsupported macro is diagnosed. They may be substituted in an
unsafe way by the preprocessor if the property
PARSER_EXPAND_STATEMENT_FUNCTIONS 4.2.1.4
is set. If the substitution is considered possibly unsafe, a warning is displayed.
These parser restrictions were based on funding constraints. They are mostly alleviated by the preprocessing phase. PerfectClub and SPEC-CFP95 benchmarks are handled without manual editing, but for ENTRY statements which are obsoleted by the current Fortran standard.
4.2.1.2 Some additional remarks
- The PIPS preprocessing stage included in fsplit() is going to name unnamed modules MAIN000 and unnamed blockdata DATA000 to be consistent with the generated file name.
- Hollerith constants are converted to a more readable quoted form, and then output as such by the prettyprinter.
4.2.1.3 Some unfriendly features
- Source code is read in columns 1-72 only. Lines ending in columns 73 and beyond usually generate incomprehensible errors. A warning is generated for lines ending after column 72.
- Comments are carried by the following statement. Comments carried by RETURN, ENDDO, GOTO or CONTINUE statements are not always preserved because the internal representation transforms these statements or because the parallelization phase regenerates some of them. However, they are more likely to be hidden by the prettyprinter. There is a large range of prettyprinter properties to obtain less filtered view of the code.
- Formats and character constants are not properly handled. Multi-line formats and constants are not always reprinted in a Fortran correct form.
- Declarations are exploited on-the-fly. Thus type and dimension information must be available before common declaration. If not, wrong common offsets are computed at first and fixed later in Function EndOfProcedure). Also, formal arguments implicitly are declared using the default implicit rule. If it is necessary to declare them, this new declarations should occur before an IMPLICIT declaration. Users are surprised by the type redefinition errors displayed.
4.2.1.4 Declaration of the standard parser
> MODULE.callees
< PROGRAM.entities
< MODULE.source_file
For parser debugging purposes, it is possible to print a summary of the symbol table, when enabling this property:
PARSER_DUMP_SYMBOL_TABLE FALSE
This should be avoided and the resource symbol_table_file be displayed instead.
The prettyprint of the symbol table for a Fortran module is generated with:
< PROGRAM.entities
< MODULE.parsed_code
Input Format
Some subtle errors occur because the PIPS parser uses a fixed format. Columns 73 to 80 are ignored, but the parser may emit a warning if some characters are encountered in this comment field.
PARSER_WARN_FOR_COLUMNS_73_80 TRUE
ANSI extension
PIPS has been initially developed to parse correct Fortran compliant programs only. Real applications use lots of ANSI extensions… and they are not always correct! To make sure that PIPS output is correct, the input code should be checked against ANSI extensions using property
CHECK_FORTRAN_SYNTAX_BEFORE_PIPS
(see Section 3.2) and the property below should be set to false.
PARSER_ACCEPT_ANSI_EXTENSIONS TRUE
Currently, this property is not used often enough in PIPS parser which let go many mistakes... as expected by real users!
Array range extension
PIPS has been developed to parse correct Fortran-77 compliant programs only. Array ranges are used to improve readability. They can be generated by PIPS prettyprinter. They are not parsed as correct input by default.
PARSER_ACCEPT_ARRAY_RANGE_EXTENSION FALSE
Type Checking
Each argument list at calls to a function or a subroutine is compared to the functional type of the callee. Turn this off if you need to support variable numbers of arguments or if you use overloading and do not want to hear about it. For instance, an IO routine can be used to write an array of integers or an array of reals or an array of complex if the length parameter is appropriate.
Since the functional typing is shaky, let’s turn it off by default!
PARSER_TYPE_CHECK_CALL_SITES FALSE
Loop Header with Label
The PIPS implementation of Allen&Kennedy algorithm cannot cope with labeled DO loops because the loop, and hence its label, may be replicated if the loop is distributed. The parser can generate an extra CONTINUE statement to carry the label and produce a label-free loop. This is not the standard option because PIPS is designed to output code as close as possible to the user source code.
PARSER_SIMPLIFY_LABELLED_LOOPS FALSE
Most PIPS analyses work better if do loop bounds are affine. It is sometimes possible to improve results for non-affine bounds by assigning the bound to an integer variables and by using this variable as bound. But this is implemented for Fortran, but not for C.
PARSER_LINEARIZE_LOOP_BOUNDS FALSE
Entry
The entry construct can be seen as an early attempt at object-oriented programming. The same object can be processed by several function. The object is declared as a standard subroutine or function and entry points are placed in the executable code. The entry points have different sets of formal parameters, they may share some common pieces of code, they share the declared variables, especially the static ones.
The entry mechanism is dangerous because of the flow of control between entries. It is now obsolete and is not analyzed directly by PIPS. Instead each entry may be converted into a first class function or subroutine and static variables are gathered in a specific common. This is the default option. If the substitution is not acceptable, the property may be turned off and entries results in a parser error.
PARSER_SUBSTITUTE_ENTRIES TRUE
Alternate Return
Alternate returns are put among the obsolete Fortran features by the Fortran 90 standard. It is possible (1) to refuse them (option ”NO”), or (2) to ignore them and to replace alternate returns by STOP (option ”STOP”), or (3) to substitute them by a semantically equivalent code based on return code values (option ”RC” or option ”HRC”). Option (2) is useful if the alternate returns are used to propagate error conditions. Option (3) is useful to understand the impact of the alternate returns on the control flow graph and to maintain the code semantics. Option ”RC” uses an additional parameter while option ”HRC” uses a set of PIPS run-time functions to hide the set and get of the return code which make declaration regeneration less useful. By default, the first option is selected and alternate returns are refused.
To produce an executable code, the declarations must be regenerated: see property PRETTYPRINT_ALL_DECLARATIONS 9.2.21.6 in Section 9.2.21.6. This is not necessary with option ”HRC”. Fewer new declarations are needed if variable PARSER_RETURN_CODE_VARIABLE 4.2.1.4 is implicitly integer because its first letter is in the I-N range.
With option (2), the code can still be executed if alternate returns are used only for errors and if no errors occur. It can also be analyzed to understand what the normal behavior is. For instance, OUT regions are more likely to be exact when exceptions and errors are ignored.
Formal and actual label variables are replaced by string variables to preserve the parameter ordre and as much source information as possible. See PARSER_FORMAL_LABEL_SUBSTITUTE_PREFIX 4.2.1.4 which is used to generate new variable names.
PARSER_SUBSTITUTE_ALTERNATE_RETURNS "NO"
PARSER_RETURN_CODE_VARIABLE "I_PIPS_RETURN_CODE_"
PARSER_FORMAL_LABEL_SUBSTITUTE_PREFIX "FORMAL_RETURN_LABEL_"
The internal representation can be hidden and the alternate returns can be prettyprinted at the call sites and modules declaration by turning on the following property:
PRETTYPRINT_REGENERATE_ALTERNATE_RETURNS FALSE
Using a mixed C / Fortran RI is troublesome for comments handling: sometimes the comment guard is stored in the comment, sometime not. Sometimes it is on purpose, sometimes it is not. When following property is set to true, PIPS4 does its best to prettyprint comments correctly.
PRETTYPRINT_CHECK_COMMENTS TRUE
If all modules have been processed by PIPS, it is possible not to regenerate alternate returns and to use a code close to the internal representation. If they are regenerated in the call sites and module declaration, they are nevertheless not used by the code generated by PIPS which is consistent with the internal representation.
Here is a possible implementation of the two PIPS run-time subroutines required by the hidden return code (”HRC”) option:
subroutine SET_I_PIPS_RETURN_CODE_(irc)
common /PIPS_RETURN_CODE_COMMON/irc_shared
irc_shared = irc
end
subroutine GET_I_PIPS_RETURN_CODE_(irc)
common /PIPS_RETURN_CODE_COMMON/irc_shared
irc = irc_shared
end
Note that the subroutine names depend on the PARSER_RETURN_CODE_VARIABLE 4.2.1.4 property. They are generated by prefixing it with SET_ and GET_. There implementation is free. The common name used should not conflict with application common names. The ENTRY mechanism is not used because it would be desugared by PIPS anyway.
Assigned GO TO
By default, assigned GO TO and ASSIGN statements are not accepted. These constructs are obsolete and will not be part of future Fortran standards.
However, it is possible to replace them automatically in a way similar to computed GO TO. Each ASSIGN statement is replaced by a standard integer assignment. The label is converted to its numerical value. When an assigned GO TO with its optional list of labels is encountered, it is transformed into a sequence of logical IF statement with appropriate tests and GO TO’s. According to Fortran 77 Standard, Section 11.3, Page 11-2, the control variable must be set to one of the labels in the optional list. Hence a STOP statement is generated to interrupt the execution in case this happens, but note that compilers such as SUN f77 and g77 do not check this condition at run-time (it is undecidable statically).
PARSER_SUBSTITUTE_ASSIGNED_GOTO FALSE
Assigned GO TO without the optional list of labels are not processed. In other words, PIPS make the optional list mandatory for substitution. It usually is quite easy to add manually the list of potential targets.
Also, ASSIGN statements cannot be used to define a FORMAT label. If the desugaring option is selected, an illegal program is produced by PIPS parser.
Statement Function
This property controls the processing of Fortran statement functions by text substitution in the parser. No other processing is available and the parser stops with an error message when a statement function declaration is encountered.
The default used to be not to perform this unchecked replacement, which might change the semantics of the program because type coercion is not enforced and actual parameters are not assigned to intermediate variables. However most statement functions do not require these extra-steps and it is legal to perform the textual substitution. For user convenience, the default option is textual substitution.
Note that the parser does not have enough information to check the validity of the transformation, but a warning is issued if legality is doubtful. If strange results are obtained when executing codes transformed with PIPS, his property should be set to false.
A better method would be to represent them somehow a local functions in the internal representation, but the implications for pipsmake and other issues are clearly not all foreseen…(Fabien Coelho).
PARSER_EXPAND_STATEMENT_FUNCTIONS TRUE
4.2.2 Declaration of HPFC parser
This parser takes a different Fortran file but applies the same processing as the previous parser. The Fortran file is the result of the preprocessing by the hpfc_filter 7.3.2.1 phase of the original file in order to extract the directives and switch them to a Fortran 77 parsable form. As another side-effect, this parser hides some callees from pipsmake. This callees are temporary functions used to encode HPF directives. Their call sites are removed from the code before requesting full analyses to PIPS. This parser is triggered automatically by the hpfc_close 7.3.2.5 phase when requested. It should never be selected or activated by hand.
> MODULE.callees
< PROGRAM.entities
< MODULE.hpfc_filtered_file
4.2.3 Declaration of the C parsers
A C file is seen in PIPS as a compilation unit, that contains all the objects declarations that are global to this file, and as many as module (function or procedure) definitions defined in this file.
Thus the compilation unit contains the file-global macros, the include statements, the local and global variable definitions, the type definitions, and the function declarations if any found in the C file.
When the PIPS workspace is created by PIPS preprocessor, each C file is preprocessed5 using for instance gcc -E6 and broken into a new which contains only the file-global variable declarations, the function declarations and the type definitions, and one C file for each C function defined in the initial C file.
The new compilation units must be parsed before the new files, containing each one exactly one function definition, can be parsed. The new compilation units are named like the initial file names but with a bang extension.
For example, considering a C file foo.c with 2 function definitions:
typedef float data_t;
data_t matrix[N][N];
extern int errno;
int calc(data_t a[N][N]) {
[...]
}
int main(int argc, char *argv[]) {
[..]
}
After preprocessing, it leads to a file foo.cpp_processed.c that is then split into a new foo!.cpp_processed.c compilation unit containing
typedef float data_t;
data_t matrix[N][N];
extern int errno;
int calc(data_t a[N][N]);}
int main(int argc, char *argv[]);
and 2 module files containing the definitions of the 2 functions, a calc.c
and a main.c
Note that it is possible to have an empty compilation unit and no module file if the original file does not contain sensible C informations (such as an empty file containing only blank characters and so on).
< COMPILATION_UNIT.c_source_file
The resource COMPILATION_UNIT.declarations produced by compilation_unit_parser is a special resource used to force the parsing of the new compilation unit before the parsing of its associated functions. It is in fact a hash table containing the file-global C keywords and typedef names defined in each compilation unit.
In fact phase compilation_unit_parser also produces parsed_code and callees resources for the compilation unit. This is done to work around the fact that rule c_parser was invoked on compilation units by later phases, in particular for the computation of initial preconditions, breaking the declarations of function prototypes. These two resources are not declared here because pipsmake gets confused between the different rules to compute parsed code : there is no simple way to distinguish between compilation units and modules at some times and handling them similarly at other times.
> MODULE.callees
< PROGRAM.entities
< MODULE.c_source_file
< COMPILATION_UNIT.declarations
If you want to parse some C code using tpips, it is possible to select the C parser with
PRETTYPRINT_STATEMENT_NUMBER FALSE
PRETTYPRINT_BLOCK_IF_ONLY TRUE
A prettyprint of the symbol table for a C module can be generated with
< PROGRAM.entities
< MODULE.parsed_code
The EXTENDED_VARIABLE_INFORMATION 4.2.3 property can be used to extend the information available for variables. By default the entity name, the offset and the size are printed. Using this property the type and the user name, which may be different from the internal name, are also displayed.
EXTENDED_VARIABLE_INFORMATION FALSE
The C_PARSER_RETURN_SUBSTITUTION 4.2.3 property can be used to handle properly multiple returns within one function. The current default value is false, which preserves best the source aspect but modifies the control flow because the calls to return are assumed to flow in sequence. If the property is set to true, C return statement are replaced, when necessary, either by a simple goto for void functions, or by an assignment of the returned value to a special variable and a goto. A unique return statement is placed at the syntactic end of the function. For functions with no return statement or with a unique return statement placed at the end of their bodies, this property is useless.
C_PARSER_RETURN_SUBSTITUTION FALSE
The C99 for-loop with a declaration such as for(int i = a;...;...) can be represented in the RI with a naive representation such as:
This is done when the C_PARSER_GENERATE_NAIVE_C99_FOR_LOOP_DECLARATION 4.2.3 property is set to TRUE
C_PARSER_GENERATE_NAIVE_C99_FOR_LOOP_DECLARATION FALSE
Else, we can generate more or less other representation. For example, with some declaration splitting, we can generate a more representative version:
if C_PARSER_GENERATE_COMPACT_C99_FOR_LOOP_DECLARATION ?? property set to FALSE.
C_PARSER_GENERATE_COMPACT_C99_FOR_LOOP_DECLARATION FALSE
Else, we can generate a more compact (but newer representation that can choke some parts of PIPS7 ...) like:
This representation is not yet implemented.
4.3 Controlized Code (Hierarchical Control Flow Graph)
PIPS analyses and transformations take advantage of a hierarchical control flow graph (HCFG), which preserves structured part of code as such, and uses a control flow graph only when no syntactic representation is available (see [?]). The encoding of the relationship between structured and unstructured parts of code is explained elsewhere, mainly in the PIPS Internal Representation of Fortran and C code8 .
The controlizer 4.3 is the historical controlizer phase that removes GOTO statements in the parsed code and generates a similar representation with small CFGs. It was developped for Fortran 77 code.
The Fortran controlizer phase was too hacked and undocumented to be improved and debugged for C99 code so a new version has been developed, documented and is designed to be simpler and easier to understand. But, for comparison, the Fortran controlizer phase can still be used.
< PROGRAM.entities
< MODULE.parsed_code
For debugging and validation purpose, by setting at most one of the PIPS_USE_OLD_CONTROLIZER or PIPS_USE_NEW_CONTROLIZER environment variables, you can force the use of the specific version of the controlizer you want to use. This override the setting by activateRonan?.
Note that the controlizer choice impacts the HCFG when Fortran entries are used. If you do not know what Fortran entries are, it is deprecated stuff anyway... ☺
The new_controlizer 4.3 removes GOTO statements in the parsed code and generates a similar representation with small CFGs. It is designed to work according to C and C99 standards. Sequences of sequence and variable declarations are handled properly. However, the prettyprinter is tuned for code generated by controlizer 4.3, which does not always minimize the number of goto statements regenerated.
The hierarchical control flow graph built by the controlizer 4.3 is pretty crude. The partial control flow graphs, called unstructured statements, are derived from syntactic constructs. The control scope of an unstructured is the smallest enclosing structured construct, whether a loop, a test or a sequence. Thus some statements, which might be seen as part of structured code, end up as nodes of an unstructured.
Note that sequences of statements are identified as such by controlizer 4.3. Each of them appears as a unique node.
Also, useless CONTINUE statements may be added as provisional landing pads and not removed. The exit node should never have successors but this may happen after some PIPS function calls. The exit node, as well as several other nodes, also may be unreachable. After clean up, there should be no unreachable node or the only unreachable node should be the exit node. Function unspaghettify 8.3.3.1 (see Section 8.3.3.1) is applied by default to clean up and to reduce the control flow graphs after controlizer 4.3.
The GOTO statements are transformed in arcs but also in CONTINUE statements to preserve as many user comments as possible.
The top statement of a module returned by the controlizer 4.3 used to contain always an unstructured instruction with only one node. Several phases in PIPS assumed that this always is the case, although other program transformations may well return any kind of top statement, most likely a block. This is no longer true. The top statement of a module may contain any kind of instruction.
Here is declared the C and C99 controlizer:
< PROGRAM.entities
< MODULE.parsed_code
Control restructuring eliminates empty sequences but as empty true or false branch of structured IF. This semantic property of PIPS Internal Representation of Fortran and C code9 is enforced by libraries effects, regions, hpfc, effects-generic.
WARN_ABOUT_EMPTY_SEQUENCES FALSE
By unsetting this property unspaghettify 8.3.3.1 is not applied implicitly in the controlizer phase.
UNSPAGHETTIFY_IN_CONTROLIZER TRUE
The next property is used to convert C for loops into C while loops. The purpose is to speed up the re-use of Fortran analyses and transformation for C code. This property is set to false by default and should ultimately disappear. But for new user convenience, it is set to TRUE by activate_language() when the language is C.
FOR_TO_WHILE_LOOP_IN_CONTROLIZER FALSE
The next property is used to convert C for loops into C do loops when syntactically possible. The conversion is not safe because the effect of the loop body on the loop index is not checked. The purpose is to speed up the re-use of Fortran analyses and transformation for C code. This property is set to false by default and should disappear soon. But for new user convenience, it is set to TRUE by activate_language() when the language is C.
FOR_TO_DO_LOOP_IN_CONTROLIZER FALSE
This can also explicitly applied by calling the phase described in § 8.3.3.4.
FORMAT Restructuring
To able deeper code transformation, FORMATs can be gathered at the very beginning of the code or at the very end according to the following options in the unspaghettify or control restructuring phase.
GATHER_FORMATS_AT_BEGINNING FALSE
GATHER_FORMATS_AT_END FALSE
Clean Up Sequences
To display the statistics about cleaning-up sequences and removing useless CONTINUE or empty statement.
CLEAN_UP_SEQUENCES_DISPLAY_STATISTICS FALSE
There is a trade-off between keeping the comments associated to labels and goto and the cleaning that can be do on the control graph.
By default, do not fuse empty control nodes that have labels or comments:
FUSE_CONTROL_NODES_WITH_COMMENTS_OR_LABEL FALSE
Chapter 5
Pedagogical phases
Although this phases should be spread elsewhere in this manual, we have put some pedagogical phases useful to jump into PIPS first.
5.1 Using XML backend
A phase that displays, in debug mode, statements matching an XPath expression on the internal representation:
simple_xpath_test > MODULE.code
< PROGRAM.entities
< MODULE.code
5.2 Prepending a comment
Prepends a comment to the first statement of a module. Useful to apply post-processing after PIPS.
prepend_comment > MODULE.code
< PROGRAM.entities
< MODULE.code
The comment to add is selected by this property:
PREPEND_COMMENT "/*␣This␣comment␣is␣added␣by␣PREPEND_COMMENT␣phase␣*/"
5.3 Prepending a call
This phase inserts a call to function MY_TRACK just before the first statement of a module. Useful as a pedagogical example to explore the internal representation and Newgen. Not to be used for any pratical purpose as it is bugged. Debugging it is a pedagogical exercise.
prepend_call > MODULE.code
> MODULE.callees
< PROGRAM.entities
< MODULE.code
The called function could be defined by this property:
PREPEND_CALL "MY_TRACK"
but it is not.
5.4 Add a pragma to a module
This phase prepend or appends a pragma to a module.
add_pragma > MODULE.code
< PROGRAM.entities
< MODULE.code
The pragma name can be defined by this property:
PRAGMA_NAME "MY_PRAGMA"
The pragma can be append or prepend thanks to this property:
PRAGMA_PREPEND TRUE
Remove labels that are not usefull
< PROGRAM.entities
< MODULE.code
Loop labels can be kept thanks to this property:
REMOVE_USELESS_LABEL_KEEP_LOOP_LABEL FALSE
Chapter 6
Analyses
Analyses encompass the computations of call graphs, the memory effects, reductions, use-def chains, dependence graphs, interprocedural checks (flinter), semantics information (transformers and preconditions), continuations, complexities, convex array regions, dynamic aliases and complementary regions.
6.1 Call Graph
All lists of callees are needed to build the global lists of callers for each module. The callers and callees lists are used by pipsmake to control top-down and bottom-up analyses. The call graph is assumed to be a DAG, i.e. no recursive cycle exists, but it is not necessarily connected.
The height of a module can be used to schedule bottom-up analyses. It is zero if the module has no callees. Else, it is the maximal height of the callees plus one.
The depth of a module can be used to schedule top-down analyses. It is zero if the module has no callers. Else, it it the maximal depth of the callers plus one.
> ALL.height
> ALL.depth
< ALL.callees
The following pass generates a uDrawGraph1 version of the callgraph. Its quite partial since it should rely on an hypothetical all callees, direct and indirect, resource.
alias graph_of_calls ’For current module’
alias full_graph_of_calls ’For all modules’
graph_of_calls > MODULE.dvcg_file
< ALL.callees
full_graph_of_calls > PROGRAM.dvcg_file
< ALL.callees
6.2 Memory Effects
The data structures used to represent memory effects and their computation are described in [?]. Another description is available on line, in PIPS Internal Representation of Fortran and C code2 Technical Report.
Note that the standard name in the Dragon book is likely to be Gen and Kill sets in the standard data flow analysis framework, but PIPS uses the more general concept of effect developped by P. Jouvelot and D. Gifford [?] and its analyses are mostly based on the abstract syntac tree (AST) rather than the control flow graph (CFG).
6.2.1 Proper Memory Effects
The proper memory effects of a statement basically are a list of variables that may be read or written by the statement. They are used to build use-def chains (see [?] or a later edition) and then the dependence graph.
Proper means that the effects of a compound statement do not include the effects of lower level statements. For instance, the body of a loop, true and false branches of a test statement, control nodes in an unstructured statement ... are ignored to compute the proper effects of a loop, a test or an unstructured.
Two families of effects are computed : pointer_effects are effects in which intermediary access paths may refer to different memory locations at different program points; regular effects are constant path effects, which means that their intermediary access paths all refer to unique memory locations. The same distinction holds for convex array regions (see section 6.11).
proper_effects_with_points_to and proper_effects_with_pointer_values are alternatives to compute constant path proper effects using points-to (see subsection 6.12.3) or pointer values analyses (see subsection 6.12.4). This is still at an experimental stage.
Summary effects (see Section 6.2.4) of a called module are used to compute the proper effects at the corresponding call sites. They are translated from the callee’s scope into the caller’s scope. The translation is based on the actual-to-formal binding. If too many actual arguments are defined, a user warning is issued but the processing goes on because a simple semantics is available: ignore useless actual arguments. If too few actual arguments are provided, a user error is issued because the effects of the call are not defined.
Variables private to loops are handled like regular variable.
See proper_effects 6.2.1
See proper_effects 6.2.1
< PROGRAM.entities
< MODULE.code
< CALLEES.summary_pointer_effects
< PROGRAM.entities
< MODULE.code
< CALLEES.summary_effects
< PROGRAM.entities
< MODULE.code
< MODULE.points_to_list
< CALLEES.summary_effects
< PROGRAM.entities
< MODULE.code
< MODULE.pointer_values
< CALLEES.summary_effects
6.2.2 Filtered Proper Memory Effects
To be continued...by whom?
< PROGRAM.entities
< MODULE.code
< MODULE.proper_effects
< CALLEES.summary_effects
6.2.3 Cumulated Memory Effects
Cumulated effects of statements are lists of read or written variables, just like the proper effects (see Section 6.2.1). Cumulated means that the effects of a compound statement, do loop, test or unstructured, include the effects of the lower level statements such as a loop body or a test branch.
< PROGRAM.entities
< MODULE.code
< MODULE.proper_pointer_effects
< PROGRAM.entities
< MODULE.code
< MODULE.proper_pointer_effects
< MODULE.points_to_list
< PROGRAM.entities
< MODULE.code
< MODULE.proper_pointer_effects
< MODULE.points_to_list
< PROGRAM.entities
< MODULE.code
< MODULE.proper_effects
< PROGRAM.entities
< MODULE.code
< MODULE.proper_effects
< PROGRAM.entities
< MODULE.code
< MODULE.proper_effects
6.2.4 Summary Data Flow Information (SDFI)
Summary data flow information is the simplest interprocedural information needed to take procedure into account in a parallelizer. It was introduced in Parafrase (see [?]) under this name, but should be called summary memory effects in PIPS context.
The summary_effects 6.2.4 of a module are the cumulated memory effects of its top level statement (see Section 6.2.6), but effects on local dynamic variables are ignored (because they cannot be observed by the callers3 ) and subscript expressions of remaining effects are eliminated.
< PROGRAM.entities
< MODULE.code
< MODULE.cumulated_pointer_effects
< PROGRAM.entities
< MODULE.code
< MODULE.cumulated_effects
6.2.5 IN and OUT Effects
IN and OUT memory effects of a statement s are memory locations whose input values are used by statement s or whose output values are used by statement s continuation. Variables allocated in the statement are not part of the IN or OUT effects. Variables defined before they are used ar not part of the IN effects. OUT effects require an interprocedural analysis4
> MODULE.cumulated_in_effects
< PROGRAM.entities
< MODULE.code
< MODULE.cumulated_effects
< CALLEES.in_summary_effects
in_summary_effects > MODULE.in_summary_effects
< PROGRAM.entities
< MODULE.code
< MODULE.in_effects
out_summary_effects > MODULE.out_summary_effects
< PROGRAM.entities
< MODULE.code
< CALLERS.out_effects
out_effects > MODULE.out_effects
< PROGRAM.entities
< MODULE.code
< MODULE.out_summary_effects
< MODULE.cumulated_in_effects
6.2.6 Proper and Cumulated References
The concept of proper references is not yet clearly defined. The original idea is to keep track of the actual objects of newgen domain reference used in the program representation of the current statement, while retaining if they correspond to a read or a write of the corresponding memory locations. Proper references are represented as effects.
For C programs, where memory accesses are not necessarily represented by objects of newgen domain reference, the semantics of this analysis is unclear.
Cumulated references gather proper references over the program code, without taking into account the modification of memory stores by the program execution.
FC: I should implement real summary references?
< PROGRAM.entities
< MODULE.code
< CALLEES.summary_effects
cumulated_references > MODULE.cumulated_references
< PROGRAM.entities
< MODULE.code
< MODULE.proper_references
6.2.7 Effect Properties
Filter this variable in phase filter_proper_effects 6.2.2.
EFFECTS_FILTER_ON_VARIABLE ""
When set to TRUE, EFFECTS_POINTER_MODIFICATION_CHECKING 6.2.7 enables pointer modification checking during the computation of cumulated effects and/or RW covex array regions. Since this is still at experimentation level, it’s default value is FALSE. This property should disappear when pointer modification analyses are more mature.
EFFECTS_POINTER_MODIFICATION_CHECKING FALSE
The default (and correct) behaviour for the computation of effects is to transform dereferencing paths into constant paths. When property CONSTANT_PATH_EFFECTS 6.2.7 is set to FALSE, the latter transformation is skipped. Effects are then equivalent to pointer_effects. This property is available for backward compatibility and experimental purpose. It must be borne in mind that analyses and transformations using the resulting effects may yield uncorrect results. This property also affects the computation of convex array regions.
CONSTANT_PATH_EFFECTS TRUE
Since CONSTANT_PATH_EFFECTS 6.2.7 may be set to FALSE erroneously, some tests are included in conflicts testing to avoid generating wrong code. However, these tests are costly, and can be turned off by setting TRUST_CONSTANT_PATH_EFFECTS_IN_CONFLICTS 6.2.7 to FALSE. This must be used with care and only when there is no aliasing.
TRUST_CONSTANT_PATH_EFFECTS_IN_CONFLICTS FALSE
Property USER_EFFECTS_ON_STD_FILES 6.2.7 is used to control the way the user uses stdout, stdin and stderr. The default case (FALSE) means that the user does not modify these global variables. When set to TRUE, they are considered as user variables, and dereferencing them through calls to stdio functions leads to less precise effects.
USER_EFFECTS_ON_STD_FILES FALSE
Property MEMORY_EFFECTS_ONLY 6.2.7 is used to restrict the action kind of an effect action to store. In other words, variable declarations and type declarations are not considered to alter the execution state when this property is set to TRUE. This is fine for Fortran code because variables cannot be declared among executable statements and because new type cannot be declared. But this leads to wrong result for C code when loop distribution or use-def elimination is performed.
Currently, PIPS does not have the capability to store default values depending on the source code language. The default value is TRUE to avoid disturbing too many phases of PIPS at the same time while environment and type declaration effects are introduced.
MEMORY_EFFECTS_ONLY TRUE
Some programs do measure execution times. All code placed between measurement points must not be moved out, as can happen when loops are distributed or, more generally, instructions are rescheduled. Since loops using time effects are not parallel, a clock variable is always updated when a time-related function is called. This is sufficient to avoid most problems, but not all of them because time effects of all other executed statements are kept implicit, i.e. the real time clock is not updated: and loops can still be distributed. If time measurements are key, this property must be turned on. By default, it is turned off.
TIME_EFFECTS_USED FALSE
6.3 Reductions
The proper reductions are computed from a code.
< PROGRAM.entities
< MODULE.code
< MODULE.proper_references
< CALLEES.summary_effects
< CALLEES.summary_reductions
The cumulated reductions propagate the reductions in the code, upwards.
< PROGRAM.entities
< MODULE.code
< MODULE.proper_references
< MODULE.cumulated_effects
< MODULE.proper_reductions
This pass summarizes the reductions candidates found in a module for export to its callers. The summary effects should be used to restrict attention to variable of interest in the translation?
< PROGRAM.entities
< MODULE.code
< MODULE.cumulated_reductions
< MODULE.summary_effects
Some possible (simple) transformations could be added to the code to mark reductions in loops, for latter use in the parallelization.
The following is NOT implemented. Anyway, should the cumulated_reductions be simply used by the prettyprinter instead?
< PROGRAM.entities
< MODULE.code
< MODULE.cumulated_reductions
6.3.1 Reduction Propagation
tries to transform
into
< PROGRAM.entities
< MODULE.code
< MODULE.proper_reductions
< MODULE.dg
6.3.2 Reduction Detection
tries to transform
which hides a reduction on b into
when possible
< PROGRAM.entities
< MODULE.code
< MODULE.dg
6.4 Chains (Use-Def Chains)
Use-def and def-use chains are a standard data structure in optimizing compilers [?]. These chains are used as a first approximation of the dependence graph. Chains based on convex array regions (see Section 6.11) are more effective for interprocedural parallelization.
If chains based on convex array regions have been selected, the simplest dependence test must be used because regions carry more information than any kind of preconditions. Preconditions and loop bound information already are included in the region predicate.
6.4.1 Menu for Use-Def Chains
alias atomic_chains ’Standard’
alias region_chains ’Regions’
alias in_out_regions_chains ’In-Out Regions’
6.4.2 Standard Use-Def Chains (a.k.a. Atomic Chains)
The algorithm used to compute use-def chains is original because it is based on PIPS hierarchical control flow graph and not on a unique control flow graph.
This algorithm generates inexistent dependencies on loop indices. These dependence arcs appear between DO loop headers and implicit DO loops in IO statements, or between one DO loop header and unrelated DO loop bound expressions using that index variable. It is easy to spot the problem because loop indices are not privatized. A prettyprint option,
PRETTYPRINT_ALL_PRIVATE_VARIABLES 9.2.21.5.1
must be set to true to see if the loop index is privatized or not. The problem disappears when some loop indices are renamed.
The problem is due to the internal representation of DO loops: PIPS has no way to distinguish between initialization effects and increment effects. They have to be merged as proper loop effects. To reduce the problem, proper effects of DO loops do not include the index read effect due to the loop incrementation.
Artificial arcs are added to... (Pierre Jouvelot, help!).
< PROGRAM.entities
< MODULE.code
< MODULE.proper_effects
6.4.3 READ/WRITE Region-Based Chains
Such chains are required for effective interprocedural parallelization. The dependence graph is annotated with proper regions, to avoid inaccuracy due to summarization at simple statement level (see Section 6.11).
Region-based chains are only compatible with the Rice Fast Dependence Graph option (see Section 6.5.1) which has been extended to deal with them5 . Other dependence tests do not use region descriptors (their convex system), because they cannot improve the Rice Fast Dependence test based on regions.
< PROGRAM.entities
< MODULE.code
< MODULE.proper_regions
6.4.4 IN/OUT Region-Based Chains
Beware : this option is for experimental use only; resulting parallel code may not be equivalent to input code (see the explanations below).
When in_out_regions_chains 6.4.4 is selected, IN and OUT regions (see Sections 6.11.5 and 6.11.8) are used at call sites instead of READ and WRITE regions. For all other statements, usual READ and WRITE regions are used.
As a consequence, arrays and scalars which could be declared as local in callees, but are exposed to callers because they are statically allocated or are formal parameters, are ignored, increasing the opportunities to detect parallel loops. But as the program transformation which consists in privatizing variables in modules is not yet implemented in PIPS, the code resulting from the parallelization with in_out_regions_chains 6.4.4 may not be equivalent to the original sequential code. The privatization here is non-standard: for instance, variables declared in commons or static should be stack allocated to avoid conflicts.
As for region-based chains (see Section 6.4.3), the simplest dependence test should be selected for best results.
< PROGRAM.entities
< MODULE.code
< MODULE.proper_regions
< MODULE.in_regions
< MODULE.out_regions
The following loop in Subroutine inout cannot be parallelized legally because Subroutine foo uses a static variable, y. However, PIPS will display this loop as (potentially) parallel if the in_out option is selected for use-def chain computation. Remember that IN/OUT regions require MUST regions to obtain interesting results (see Section 6.11.5).
subroutine inout(a,n)
real a(n)
do i = 1, n
call foo(a(i))
enddo
end
subroutine foo(x)
save y
y = x
x = x + y
end
6.4.5 Chain Properties
6.4.5.1 Add use-use Chains
It is possible to put use-use dependence arcs in the dependence graph. This is useful for estimation of cache memory traffic and of communication for distributed memory machine (e.g. you can parallelize only communication free loops). Beware of use-use dependence on scalar variables. You might expect scalars to be broadcasted and/or replicated on each processor but they are not handled that way by the parallelization process unless you manage to have them declared private with respect to all enclosing loops.
This feature is not supported by PIPS user interfaces. Results may be hard to interpret. It is useful to print the dependence graph.
KEEP_READ_READ_DEPENDENCE FALSE
6.4.5.2 Remove Some Chains
It is possible to mask effects on local variables in loop bodies. This is dangerous with current version of Allen & Kennedy which assumes that all the edges are present, the ones on private variables being partially ignored but for loop distribution. In other words, this property should always be set to false.
CHAINS_MASK_EFFECTS FALSE
It also is possible to keep only true data-flow (Def – Use) dependences in the dependence graph. This was an attempt at mimicking the effect of direct dependence analysis and at avoiding privatization. However, direct dependence analysis is not implemented in the standard tests and spurious def-use dependence arcs are taken into account.
CHAINS_DATAFLOW_DEPENDENCE_ONLY FALSE
These last two properties are not consistent with PIPS current development (1995/96). It is assumed that all dependence arcs are present in the dependence graph. Phases using the latter should be able to filter out irrelevant arcs, e.g. pertaining to privatized variables.
6.5 Dependence Graph (DG)
The dependence graph is used primarily by the parallelization algorithms. A dependence graph is a refinement of use-def chains (Section 6.4). It is location-based and not value-based.
There are several ways to compute a dependence graph. Some of them are fast (Banerjee’s one for instance) but provide poor results, others might be slower (Rï¿œmi Triolet’s one for instance) but produce better results.
Three different dependence tests are available, all based on Fourier-Motzkin elimination improved with a heuristics for the integer domain. The fast version uses subscript expressions only (unless convex array regions were used to compute use-def chains, in which case regions are used instead). The full version uses subscript expressions and loop bounds. The semantics version uses subscript expressions and preconditions (see 6.8).
Note that, for interprocedural parallelization, precise array regions only are used by the fast dependence test if the proper kind of use-def chains has been previously selected (see Section 6.4.3).
There are several kinds of dependence graphs. Most of them share the same overall data structure: a graph with labels on arcs and vertices. usually, the main differences are in the labels that decorate arcs; for instance, Kennedy’s algorithm requires dependence levels (which loop actually creates the dependence) while algorithms originated from CSRD prefer DDVs (relations between loop indices when the dependence occurs). Dependence cones introduced in [?, ?, ?, ?] are even more precise [?].
The computations of dependence level and dependence cone [?] are both implemented in PIPS. DDV’s are not computed. Currently, only dependence levels are exploited by parallelization algorithms.
The dependence graph can be printed with or without filters (see Section 9.8). The standard dependence graph includes all arcs taken into account by the parallelization process (Allen & Kennedy [?]), except those that are due to scalar private variables and that impact the distribution process only. The loop carried dependence graph does not include intra-iteration dependences and is a good basis for iteration scheduling. The whole graph includes all arcs, but input dependence arcs.
It is possible to gather some statistics about dependences by turning on property RICEDG_PROVIDE_STATISTICS 6.5.6.2 (more details in the properties). A Shell script from PIPS utilities, print-dg-statistics, can be used in combination to extract the most relevant information for a whole program.
During the parallelization phases, is is possible to ignore arcs related to states of the libc, such as the heap memory management, because thread-safe libraries do perform the updates within critical sections. But these arcs are part of the use-def chains and of the dependence graph. If they were removed instead of being ignored, use-def elimination would remove all free statements.
The main contributors for the design and development of dependence analysis are Rï¿œmi Triolet, Franï¿œois Irigoin and Yi-qing Yang [?]. The code was improved by Corinne Ancourt and Bï¿œatrice Creusillet.
6.5.1 Menu for Dependence Tests
alias rice_fast_dependence_graph ’Preconditions Ignored’
alias rice_full_dependence_graph ’Loop Bounds Used’
alias rice_semantics_dependence_graph ’Preconditions Used’
alias rice_regions_dependence_graph ’Regions Used’
6.5.2 Fast Dependence Test
Use subscript expressions only, unless convex array regions were used to compute use-def chains, in which case regions are used instead. rice_regions_dependence_graph is a synonym for this rule, but emits a warning if region_chains is not selected.
< PROGRAM.entities
< MODULE.code
< MODULE.chains
< MODULE.cumulated_effects
6.5.3 Full Dependence Test
Use subscript expressions and loop bounds.
< PROGRAM.entities
< MODULE.code
< MODULE.chains
< MODULE.cumulated_effects
6.5.4 Semantics Dependence Test
Uses subscript expressions and preconditions (see 6.8).
< PROGRAM.entities
< MODULE.code
< MODULE.chains
< MODULE.preconditions
< MODULE.cumulated_effects
6.5.5 Dependence Test with Convex Array Regions
Synonym for rice_fast_dependence_graph, except that it emits a warning when region_chains is not selected.
< PROGRAM.entities
< MODULE.code
< MODULE.chains
< MODULE.cumulated_effects
6.5.6 Dependence Properties (Ricedg)
6.5.6.1 Dependence Test Selection
This property seems to be now obsolete. The dependence test choice is now controlled directly and only by rules in pipsmake. The procedures called by these rules may use this property. Anyway, it is useless to set it manually.
DEPENDENCE_TEST "full"
6.5.6.2 Statistics
Provide the following counts during the dependence test. There are three parts: numbers of dependencies and independences (fields 1-10), dimensions of referenced arrays and dependence natures (fields 11-25) and the same information for constant dependencies (fields 26-40), decomposition of the dependence test in elementary steps (fields 41-49), use and complexity of Fourier-Motzkin’s pair-wise elimination (fields 50, 51 and 52-68).
- array reference pairs, i.e. number of tests effected (used to be the number of use-def, def-use and def-def pairs on arrays);
- number of independences found (on array reference pairs);
Note: field 1 minus field 2 is the number of array dependencies.
- numbers of loop independent dependences between references in the same statement (not useful for program transformation and parallelization if statements are preserved); it should be subtracted from field 2 to compare results with other parallelizers;
- numbers of constant dependences;
- numbers of exact dependences;
Note: field 5 must be greater or equal to field 4.
- numbers of inexact dependences involved only by the elimination of equation;
- numbers of inexact dependences involved only by the F-M elimination;
- numbers of inexact dependences involved by both elimination of equation
and F-M elimination;
Note: the sum of fields 5 to 8 and field 2 equals field 1
- number of dependences among scalar variables;
- numbers of dependences among loop index variables;
- dependence types detail table with the dimensions [5][3] and constant
dependence detail table with the dimensions [5][3]; the first index is the
array dimension (from 0 to 4 - no larger arrays has ever been found); the
second index is the dependence nature (1: d-u, 2: u-d, 3: d-d); both arrays
are flatten according to C rule as 5 sequences of 3 natures;
Note: the sum of fields 11 to 25 should be equal to the sum of field 9 and 2 minus field 1.
Note: the fields 26 to 40 must be less than or equal to the corresponding fields 11 to 25
- numbers of independences found by the test of constant;
- numbers of independences found by the GCD test;
- numbers of independences found by the normalize test;
- numbers of independences found by the lexico-positive test for constant Di variables;
- numbers of independences found during the projection on Di variables by the elimination of equation;
- numbers of independences found during the projection on Di variables by the Fourier-Motzkin’s elimination;
- numbers of independences found during the test of faisability of Di sub-system by the elimination of equation;
- numbers of independences found during the test of faisability of Di sous-system by the Fourier-Motzkin’s elimination;
- numbers of independences found by the test of lexico-positive for Di
sub-system;
Note: the sum of fields 41 to 49 equals field 2
- total number of Fourier-Motzkin’s pair-wise eliminations used;
- number of Fourier-Motzkin’s pair-wise elimination in which the system size doesn’t augment after the elimination;
- complexity counter table of dimension [17]. The complexity of one projection by F-M is the product of the number of positive inequalities and the number of negatives inequalities that contain the eliminated variable. This is an histogram of the products. Products which are less than or equal to 4 imply that the total number of inequalities does not increase. So if no larger product exists, field 50 and 51 must be equal.
The results are stored in the current workspace in MODULE.resulttestfast, MODULE.resultesttestfull, or MODULE.resulttestseman according to the test selected.
RICEDG_PROVIDE_STATISTICS FALSE
Provide the statistics above and count all array reference pairs including these involved in call statement.
RICEDG_STATISTICS_ALL_ARRAYS FALSE
6.5.6.3 Algorithmic Dependences
Only take into account true flow dependences (Def – Use) during the computation of SCC? Note that this is different from the CHAINS_DATAFLOW_DEPENDENCE_ONLY option which doesn’t compute the whole graph. Warning: this option potentially yields incorrect parallel code.
RICE_DATAFLOW_DEPENDENCE_ONLY FALSE
6.5.6.4 Printout
Here are the properties used to control the printing of dependence graphs in a file called module_name.dg. These properties should not be used explicitly because they are set implicitly by the different print-out procedures available in pipsmake.rc. However, not all combinations are available from pipsmake.rc.
PRINT_DEPENDENCE_GRAPH FALSE
To print the dependence graph without the dependences on privatized variables
PRINT_DEPENDENCE_GRAPH_WITHOUT_PRIVATIZED_DEPS FALSE
To print the dependence graph without the non-loop-carried dependences:
PRINT_DEPENDENCE_GRAPH_WITHOUT_NOLOOPCARRIED_DEPS FALSE
To print the dependence graph with the dependence cones:
PRINT_DEPENDENCE_GRAPH_WITH_DEPENDENCE_CONES FALSE
To print the dependence graph in a computer friendly format defined by Deborah Whitfield (SRU):
PRINT_DEPENDENCE_GRAPH_USING_SRU_FORMAT FALSE
6.5.6.5 Optimization
The default option is to compute the dependence graph only for loops which can be parallelized using Allen & Kennedy algorithm. However it is possible to compute the dependences in all cases, even for loop containing test, goto, etc... by setting this option to TRUE.
Of course, this information is not used by the parallelization phase which is restricted to loops meeting the A&K conditions. By the way, the hierarchical control flow graph is not exploited either by the parallelization phase.
COMPUTE_ALL_DEPENDENCES FALSE
6.6 Flinter
Function flinter 6.6 performs some intra and interprocedural checks about formal/actual argument pairs, use of COMMONs,... It was developed by Laurent Aniort and Fabien Coelho. Ronan Keryell added the uninitialized variable checking.
flinter > MODULE.flinted_file
< PROGRAM.entities
< MODULE.code
< CALLEES.code
< MODULE.proper_effects
< MODULE.chains
In the past, flinter 6.6 used to require MODULE.summary_effects to check the parameter passing modes and to make sure that no module would attempt an assignment on an expression. However, this kind of bug is detected by the effect analysis… which was required by flinter.
Resource CALLEES.code is not explicitly required but it produces the global symbols which function flinter 6.6 needs to check parameter lists.
6.7 Loop statistics
Computes statistics about loops in module. It computes the number of perfectly and imperfectly nested loops and gives their depths. And it gives the number of nested loops which we can treat with our algorithm.
< PROGRAM.entities
< MODULE.code
6.8 Semantics Analysis
PIPS semantics analysis targets mostly integer scalar variables. It is a two-pass process, with a bottom-up pass computing transformers 6.8.1, and a top-down pass propagating preconditions 6.8.5. Transformers and preconditions are specially powerful case of return and jump functions [?]. They abstract relations between program states with polyhedra and encompass most standard interprocedural constant propagations as well as most interval analyses. It is a powerful relational symbolic analysis.
Unlike [?] their computations are based on PIPS Hierarchical Control Flow Graph and on syntactic constructs instead of a standard flow graph. The best presentation of this part of PIPS is in [?].
A similar analysis is available in Parafrase-2 []. It handles polynomial equations between scalar integer variables. SUIF [] also performs some kind of semantics analysis.
The semantics analysis part of PIPS was designed and developed by Franï¿œois Irigoin.
6.8.1 Transformers
RK: The following is hard to read without any example for someone that knows nothing about PIPS... FI: do you want to have everything in this documentation?
A transformer is an approximate relation between the symbolic initial values of scalar variables and their values after the execution of a statement, simple or compound (see [?] and [?]). In abstract interpretation terminology, a transformer is an abstract command linking the input abstract state of a statement and its output abstract state.
By default, only integer scalar variables are analyzed, but properties can be set to handle boolean, string and floating point scalar variables6 : SEMANTICS_ANALYZE_SCALAR_INTEGER_VARIABLES 6.8.11.1 SEMANTICS_ANALYZE_SCALAR_BOOLEAN_VARIABLES 6.8.11.1 SEMANTICS_ANALYZE_SCALAR_STRING_VARIABLES 6.8.11.1 SEMANTICS_ANALYZE_SCALAR_FLOAT_VARIABLES 6.8.11.1 SEMANTICS_ANALYZE_SCALAR_COMPLEX_VARIABLES 6.8.11.1
Transformers can be computed intraprocedurally by looking at each function independently or they can be computed interprocedurally starting with the leaves of the call tree7 .
Intraprocedural algorithms use cumulated_effects 6.2.3 to handle procedure calls correctly. In some respect, they are interprocedural since call statements are accepted. Interprocedural algorithms use the summary_transformer 6.8.2 of the called procedures.
Fast algorithms use a very primitive non-iterative transitive closure algorithm (two possible versions: flow sensitive or flow insensitive). Full algorithms use a transitive closure algorithm based on vector subspace (i.e. ᅵ la Karr [?]) or one based on the discrete derivatives [?, ?]. The iterative fix point algorithm for transformers (i.e. Halbwachs/Cousot [?] is implemented but not used because the results obtained with transitive closure algorithms are faster and up-to-now sufficient. Property SEMANTICS_FIX_POINT_OPERATOR 6.8.11.6 is set to select the transitive closure algorithm used.
Additional information, such as array declarations and array references, can be used to improve transformers. See the property documentation for:
SEMANTICS_TRUST_ARRAY_DECLARATIONS 6.8.11.2 SEMANTICS_TRUST_ARRAY_REFERENCES 6.8.11.2
Within one procedure, the transformers can be computed in forward mode, using precondition information gathered along. Transformers can also be recomputed once the preconditions are available. In both cases, more precise transformers are obtained because the statement can be better modelized using precondition information. For instance, a non-linear expression can turn out to be linear because the values of some variables are numerically known and can be used to simplify the initial expression. See properties:
SEMANTICS_RECOMPUTE_EXPRESSION_TRANSFORMERS 6.8.11.4
SEMANTICS_COMPUTE_TRANSFORMERS_IN_CONTEXT 6.8.11.4
SEMANTICS_RECOMPUTE_FIX_POINTS_WITH_PRECONDITIONS 6.8.11.6
and phase refine_transformers 6.8.1.6.
Unstructured control flow graphs can lead to very long transformer computations, whose results are usually not interesting. Their sizes are limited by two properties:
SEMANTICS_MAX_CFG_SIZE2 6.8.11.3 SEMANTICS_MAX_CFG_SIZE1 6.8.11.3
discussed below.
Default value were set in the early nineties to obtain results fast enough for live demonstrations. They have not been changed to preserve the non-regression tests. However since 2005, processors are fast enough to use the most precise options in all cases.
A transformer map contains a transformer for each statement of a module. It is a mapping from statements to transformers (type statement_mapping, which is not a NewGen file). Transformers maps are stored on and retrieved from disk by pipsdbm.
6.8.1.1 Menu for Transformers
alias transformers_intra_fast ’Quick Intra-Procedural Computation’
alias transformers_inter_fast ’Quick Inter-Procedural Computation’
alias transformers_intra_full ’Full Intra-Procedural Computation’
alias transformers_inter_full ’Full Inter-Procedural Computation’
alias refine_transformers ’Refine Transformers’
6.8.1.2 Fast Intraprocedural Transformers
Build the fast intraprocedural transformers.
< PROGRAM.entities
< MODULE.code
< MODULE.cumulated_effects
< MODULE.summary_effects
< MODULE.proper_effects
6.8.1.3 Full Intraprocedural Transformers
Build the improved intraprocedural transformers.
< PROGRAM.entities
< MODULE.code
< MODULE.cumulated_effects
< MODULE.summary_effects
< MODULE.proper_effects
6.8.1.4 Fast Interprocedural Transformers
Build the fast interprocedural transformers.
< PROGRAM.entities
< MODULE.code
< MODULE.cumulated_effects
< MODULE.summary_effects
< CALLEES.summary_transformer
< MODULE.proper_effects
< PROGRAM.program_precondition
6.8.1.5 Full Interprocedural Transformers
Build the improved interprocedural transformers (This should be used as default option.).
< PROGRAM.entities
< MODULE.code
< MODULE.cumulated_effects
< MODULE.summary_effects
< CALLEES.summary_transformer
< MODULE.proper_effects
< PROGRAM.program_precondition
6.8.1.6 Full Interprocedural Transformers
Rebuild the interprocedural transformers using interprocedural preconditions. Intraprocedural preconditions are also used to refine all transformers.
< PROGRAM.entities
< MODULE.code
< MODULE.cumulated_effects
< MODULE.summary_effects
< CALLEES.summary_transformer
< MODULE.proper_effects
< MODULE.transformers
< MODULE.preconditions
< MODULE.summary_precondition
< PROGRAM.program_precondition
6.8.2 Summary Transformer
A summary transformer is an interprocedural version of the module statement transformer, obtained by eliminating dynamic local, a.k.a. stack allocated, variables. The filtering is based on the module summary effects. Note: each module has a UNIQUE top-level statement.
A summary_transformer 6.8.2 is of Newgen type transformer.
< PROGRAM.entities
< MODULE.transformers
< MODULE.summary_effects
6.8.3 Initial Precondition
All DATA initializations contribute to the global initial state of the program. The contribution of each module is computed independently. Note that variables statically initialized behave as static variables and are preserved between calls according to Fortran standard. The module initial states are abstracted by an initial precondition based on integer scalar variables only.
Note: To be extended to handle C code. To be extended to handle properly unknown modules.
< PROGRAM.entities
< MODULE.code
< MODULE.cumulated_effects
< MODULE.summary_effects
All initial preconditions, including the initial precondition for the main, are combined to define the program precondition which is an abstraction of the program initial state.
< PROGRAM.entities
< ALL.initial_precondition
The program precondition can only be used for the initial state of the main procedure. Although it appears below for all interprocedural analyses and it always is computed, it only is used when a main procedure is available.
6.8.4 Intraprocedural Summary Precondition
A summary precondition is of type ”transformer”, but the argument list must be empty as it is a simple predicate on the initial state. So in fact it is a state predicate.
The intraprocedural summary precondition uses DATA statement for the main module and is the TRUE constant for all other modules.
< PROGRAM.entities
< MODULE.initial_precondition
Interprocedural summary preconditions can be requested instead. They are not described in the same section in order to introduce the summary precondition resource at the right place in pipsmake.rc.
No menu is declared to select either intra- or interprocedural summary preconditions.
6.8.5 Preconditions
A precondition for a statement s in a module m is a predicate true for every state reachable from the initial state of m, in which s is executed. A precondition is of NewGen type ”transformer” (see PIPS Internal Representation of Fortran and C code8 ) and preconditions is of type statement_mapping.
Option preconditions_intra 6.8.5.2 associates a precondition to each statement, assuming that no information is available at the module entry point.
Inter-procedural preconditions may be computed with intra-procedural transformers but the benefit is not clear. Intra-procedural preconditions may be computed with inter-procedural transformers. This is faster that a full interprocedural analysis because there is no need for a top-down propagation of summary preconditions. This is compatible with code transformations like partial_eval 8.4.2, simplify_control 8.3.1 and dead_code_elimination 8.3.2.
Since these two options for transformer and precondition computations are independent and that transformers_inter_full 6.8.1.5 and preconditions_inter_full 6.8.5.4 must be both (independently) selected to obtain the best possible results. These two options are recommended.
6.8.5.1 Menu for Preconditions
alias preconditions_intra ’Intra-Procedural Analysis’
alias preconditions_inter_fast ’Quick Inter-Procedural Analysis’
alias preconditions_inter_full ’Full Inter-Procedural Analysis’
alias preconditions_intra_fast ’Fast intra-Procedural Analysis’
6.8.5.2 Intra-Procedural Preconditions
Only build the preconditions in a module without any interprocedural propagation. The fast version uses a fast but crude approximation of preconditions for unstructured code.
< PROGRAM.entities
< MODULE.cumulated_effects
< MODULE.transformers
< MODULE.summary_effects
< MODULE.summary_transformer
< MODULE.summary_precondition
< MODULE.code
< PROGRAM.entities
< MODULE.cumulated_effects
< MODULE.transformers
< MODULE.summary_effects
< MODULE.summary_transformer
< MODULE.summary_precondition
< MODULE.code
6.8.5.3 Fast Inter-Procedural Preconditions
Option preconditions_inter_fast 6.8.5.3 uses the module own precondition derived from its callers as initial state value and propagates it downwards in the module statement.
The fast versions use no fix-point operations for loops.
preconditions_inter_fast > MODULE.preconditions
< PROGRAM.entities
< PROGRAM.program_precondition
< MODULE.code
< MODULE.cumulated_effects
< MODULE.transformers
< MODULE.summary_precondition
< MODULE.summary_effects
< CALLEES.summary_effects
< MODULE.summary_transformer
6.8.5.4 Full Inter-Procedural Preconditions
Option preconditions_inter_full 6.8.5.4 uses the module own precondition derived from its callers as initial state value and propagates it downwards in the module statement.
The full versions use fix-point operations for loops.
< PROGRAM.entities
< PROGRAM.program_precondition
< MODULE.code
< MODULE.cumulated_effects
< MODULE.transformers
< MODULE.summary_precondition
< MODULE.summary_effects
< CALLEES.summary_transformer
< MODULE.summary_transformer
6.8.6 Interprocedural Summary Precondition
By default, summary preconditions are computed intraprocedurally. The interprocedural option must be explicitly activated.
An interprocedural summary precondition for a module is derived from all its call sites. Of course, preconditions must be known for all its callers’ statements. The summary precondition is the convex hull of all call sites preconditions, translated into a proper environment which is not necessarily the module’s frame. Because of invisible global and static variables and aliasing, it is difficult for a caller to know which variables might be used by the caller to represent a given memory location. To avoid the problem, the current summary precondition is always translated into the caller’s frame. So each module must first translate its summary precondition, when receiving it from the resource manager (pipsdbm) before using it.
Note: the previous algorithm was based on a on-the-fly reduction by convex hull. Each time a call site was encountered while computing a module preconditions, the callee’s summary precondition was updated. This old scheme was more efficient but not compatible with program transformations because it was impossible to know when the summary preconditions of the modules had to be reset to the infeasible (a.k.a. empty) precondition.
An infeasible precondition means that the module is never called although a main is present in the workspace. If no main module is available, a TRUE precondition is generated. Note that, in both cases, the impact of static initializations propagated by link edition is taken into account although this is prohibited by the Fortran Standard which requires a BLOCKDATA construct for such initializations. In other words, a module which is never called has an impact on the program execution and its declarations should not be destroyed.
< PROGRAM.entities
< PROGRAM.program_precondition
< CALLERS.preconditions
< MODULE.callers
The following rule is obsolete. It is context sensitive and its results depends on the history of commands performed on the workspace.
< PROGRAM.entities
< CALLERS.preconditions
< MODULE.callers
6.8.7 Total Preconditions
Total preconditions are interesting to optimize the nominal behavior of a terminating application. It is assumed that the application ends in the main procedure. All other exits, aborts or stops, explicit or implicit such as buffer overflows and zero divide and null pointer dereferencing, are considered exceptions. This also applies at the module level. Modules nominally return. Other control flows are considered exceptions. Non-terminating modules have an empty total precondition9 . The standard preconditions can be refined by anding with the total preconditions to get information about the nominal behavior. Similar sources of increased accuracy are the array declarations and the array references, which can be exploited directly with properties described in section 6.8.11.2. These two properties should be set to true whenever possible.
Hence, a total precondition for a statement s in a module m is a predicate true for every state from which the final state of m, in which s is executed, is reached. It is an over-approximation of the theoretical total precondition. So, if the predicate is false, the final control state cannot be reached. A total precondition is of NewGen type ”transformer” (see PIPS Internal Representation of Fortran and C code10 ) and total_preconditions is of type statement_mapping.
The relationship with continuations (see Section 6.9) is not clear. Total preconditions should be more general but no must version exist.
Option total_preconditions_intra 6.8.7.2 associates a precondition to each statement, assuming that no information is available at the module return point.
Inter-procedural total preconditions may be computed with intra-procedural transformers but the benefit is not clear. Intra-procedural total preconditions may be computed with inter-procedural transformers. This is faster than a full interprocedural analysis because there is no need for a top-down propagation of summary total postconditions.
Since these two options for transformer and total precondition computations are independent, transformers_inter_full 6.8.1.5 and total_preconditions_inter 6.8.7.3 must be both (independently) selected to obtain the best possible results.
Status: This is a set of experimental passes. The intraprocedural part is implemented. The interprocedural part is not implemented yet, waiting for an expressed practical interest. Neither C for loops nor repeat loops are supported.
6.8.7.1 Menu for Total Preconditions
alias total_preconditions_intra ’Total Intra-Procedural Analysis’
alias total_preconditions_inter ’Total Inter-Procedural Analysis’
6.8.7.2 Intra-Procedural Total Preconditions
Only build the total preconditions in a module without any interprocedural propagation. No specific condition must be met when reaching a RETURN statement.
< PROGRAM.entities
< MODULE.cumulated_effects
< MODULE.transformers
< MODULE.preconditions
< MODULE.summary_effects
< MODULE.summary_transformer
< MODULE.code
6.8.7.3 Inter-Procedural Total Preconditions
Option total_preconditions_inter 6.8.7.3 uses the module own total postcondition derived from its callers as final state value and propagates it backwards in the module statement. This total module postcondition must be true when the RETURN statement is reached.
total_preconditions_inter > MODULE.total_preconditions
< PROGRAM.entities
< PROGRAM.program_postcondition
< MODULE.code
< MODULE.cumulated_effects
< MODULE.transformers
< MODULE.preconditions
< MODULE.summary_total_postcondition
< MODULE.summary_effects
< CALLEES.summary_effects
< MODULE.summary_transformer
The program postcondition is only used for the main module.
6.8.8 Summary Total Precondition
The summary total precondition of a module is the total precondition of its statement limited to information observable by callers, just like a summary transformer (see Section 6.8.2).
A summary total precondition is of type ”transformer”.
< PROGRAM.entities
< CALLERS.total_preconditions
6.8.9 Summary Total Postcondition
A final postcondition for a module is derived from all its call sites. Of course, total postconditions must be known for all its callers’ statements. The summary total postcondition is the convex hull of all call sites total postconditions, translated into a proper environment which is not necessarily the module’s frame. Because of invisible global and static variables and aliasing, it is difficult for a caller to know which variables might be used by the caller to represent a given memory location. To avoid the problem, the current summary total postcondition is always translated into the caller’s frame. So each module must first translate its summary total postcondition, when receiving it from the resource manager (pipsdbm) before using it.
A summary total postcondition is of type ”transformer”.
< PROGRAM.entities
< CALLERS.total_preconditions
< MODULE.callers
6.8.10 Final Postcondition
The program postcondition cannot be derived from the source code. It should be defined explicitly by the user. By default, the predicate is always true. But you might want some variables to have specific values, e.g. KMAX==1, or signs,KMAX>1 or relationships KMAX>JMAX.
6.8.11 Semantic Analysis Properties
6.8.11.1 Value types
By default, the semantic analysis is restricted to scalar integer variables as they are key variables to understand scientific code behavior. However it is possible to analyze scalar variables with other data types. Fortran LOGICAL variables are represented as 0/1 integers. Character string constants and floating point constants are represented as undefined values.
The analysis is thus limited to constant propagation for character strings and floating point values whereas integer and boolean variables are processed with a relational analysis.
Character string constants of fixed maximal length could be translated into integers but the benefit is not yet assessed because they are not much used in the benchmark and commercial applications we have studied. The risk is to increase significantly the number of overflows encountered during the analysis.
SEMANTICS_ANALYZE_SCALAR_INTEGER_VARIABLES TRUE
SEMANTICS_ANALYZE_SCALAR_BOOLEAN_VARIABLES FALSE
SEMANTICS_ANALYZE_SCALAR_STRING_VARIABLES FALSE
SEMANTICS_ANALYZE_SCALAR_FLOAT_VARIABLES FALSE
SEMANTICS_ANALYZE_SCALAR_COMPLEX_VARIABLES FALSE
6.8.11.2 Array declarations and accesses
For every module, array declaration are assumed to be correct with respect to the standard: the upper bound must be greater than or equal to the lower bound. When implicit, the lower bound is one. The star upper bound is neglected.
This property is turned off by default because it might slow down PIPS quite a lot without adding any useful information because loop bounds are usually different from array bounds.
SEMANTICS_TRUST_ARRAY_DECLARATIONS FALSE
For every module, array references are assumed to be correct with respect to the declarations: the subscript expressions must have values lower than or equal to the upper bound and greater than or equal to the lower bound.
This property is turned off by default because it might slow down PIPS quite a lot without adding any useful information.
SEMANTICS_TRUST_ARRAY_REFERENCES FALSE
6.8.11.3 Flow Sensitivity
Perform “meet” operations for semantics analysis. This property is managed by pipsmake which often sets it to TRUE. See comments in pipsmake documentation to turn off convex hull operations for a module or more if they last too long.
SEMANTICS_FLOW_SENSITIVE FALSE
Complex control flow graph may require excessive computation resources. This may happen when analyzing a parser for instance.
SEMANTICS_ANALYZE_UNSTRUCTURED TRUE
To reduce execution time, this property is complemented with a heuristics to turn off the analysis of very complex unstructured.
If the control flow graph counts more than SEMANTICS_MAX_CFG_SIZE1 6.8.11.3 vertices, use effects only.
SEMANTICS_MAX_CFG_SIZE2 20
If the control flow graph counts more than SEMANTICS_MAX_CFG_SIZE1 6.8.11.3 but less than SEMANTICS_MAX_CFG_SIZE2 6.8.11.3 vertices, perform the convex hull of its elementary transformers and take the fixpoint of it. Note that SEMANTICS_MAX_CFG_SIZE2 6.8.11.3 is assumed to be greater than or equal to SEMANTICS_MAX_CFG_SIZE1 6.8.11.3.
SEMANTICS_MAX_CFG_SIZE1 20
6.8.11.4 Context for statement and expression transformers
Without preconditions, transformers can be precise only for affine expressions. Approximate transformers can sometimes be derived for other expressions, involving for instance products of variables or divisions.
However, a precondition of an expression can be used to refine the approximation. For instance, some non-linear expressions can become affine because some of the variables have constant values, and some non-linear expressions can be better approximated because the variables signs or ranges are known.
To be backward compatible and to be conservative for PIPS execution time, the default value is false.
Not implemented yet.
SEMANTICS_RECOMPUTE_EXPRESSION_TRANSFORMERS FALSE
Intraprocedural preconditions can be computed at the same time as transformers and used to improve the accuracy of expression and statement transformers. Non-linear expressions can sometimes have linear approximations over the subset of all possible stores defined by a precondition. In the same way, the number of convex hulls can be reduced if a test branch is never used or if a loop is always entered.
SEMANTICS_COMPUTE_TRANSFORMERS_IN_CONTEXT FALSE
The default value is false for reverse compatibility and for speed.
6.8.11.5 Interprocedural Semantics Analysis
To be refined later; basically, use callee’s transformers instead of callee’s effects when computing transformers bottom-up in the call graph; when going top-down with preconditions, should we care about unique call site and/or perform meet operation on call site preconditions ?
SEMANTICS_INTERPROCEDURAL FALSE
This property is used internally and is not user selectable.
6.8.11.6 Fix Point Operators
CPU time and memory space are cheap enough to compute loop fix points for transformers. This property implies SEMANTICS_FLOW_SENSITIVE 6.8.11.3 and is not user-selectable.
SEMANTICS_FIX_POINT FALSE
The default fix point operator, called transfer, is good for induction variables but it is not good for all kinds of code. The default fix point operator is based on the transition function associated to a loop body. A computation of eigenvectors for eigenvalue 1 is used to detect loop invariants. This fails when no transition function but only a transition relation is available. Only equations can be found.
The second fix point operator, called pattern, is based on a pattern matching of elementary equations and inequalities of the loop body transformer. Obvious invariants are detected. This fix point operator is not better than the previous one for induction variables but it can detect invariant equations and inequalities.
A third fix point operator, called derivative, is based on finite differences. It was developed to handled DO loops desugared into WHILE loops as well as standard DO loops. The loop body transformer on variable values is projected onto their finite differences. Invariants, both equations and inequalities, are deduced directly from the constraints on the differences and after integration. This third fix point operator should be able to find at least as many invariants as the two previous one, but at least some inequalities are missed because of the technique used. For instance, constraints on a flip-flop variable can be missed. Unlike Cousot-Halbwachs fix point (see below), it does not use Chernikova steps and it should not slow down analyses.
This property is user selectable and its default value is derivative. The default value is the only one which is now seriously maintained.
SEMANTICS_FIX_POINT_OPERATOR "derivative"
The next property is experimental and its default value is 1. It is used to unroll while loops virtually, i.e. at the semantics equation level, to cope with periodic behaviors such as flip-flops. It is effective only for standard while loops and the only possible value other than 1 is 2.
SEMANTICS_K_FIX_POINT 1
The next property SEMANTICS_PATTERN_MATCHING_FIX_POINT has been removed and replaced by option pattern of the previous property.
This property was defined to select one of Cousot-Halbwachs’s heuristics and to compute fix points with inequalities and equalities for loops. These heuristics could be used to compute fix points for transformers and/or preconditions. This option implies SEMANTICS_FIX_POINT 6.8.11.6 and SEMANTICS_FLOW_SENSITIVE 6.8.11.3. It has not been implemented yet in PIPS11 because its accuracy has not yet been required, but is now badly named because there is no direct link between inequality and Halbwachs. Its default value is false and it is not user selectable.
SEMANTICS_INEQUALITY_INVARIANT FALSE
Because of convexity, some fix points may be improved by using some of the information carried by the preconditions. Hence, it may be profitable to recompute loop fix point transformer when preconditions are being computed.
The default value is false because this option slows down PIPS and does not seem to add much useful information in general.
SEMANTICS_RECOMPUTE_FIX_POINTS_WITH_PRECONDITIONS FALSE
The next property is used to refine the computation of preconditions inside nested loops. The loop body is reanalyzed to get one transformer for each control path and the identity transformer is left aside because it is useless to compute the loop body precondition. This development is experimental and turned off by default.
SEMANTICS_USE_TRANSFORMER_LISTS FALSE
The next property is only useful if the previous one is set to true. Instead of computing the fix point of the convex hull of the transformer list, it computes the convex hull of the derivative constraints. Since it is a new feature, it is set to false by default, but it should become the default option because it should always be more accurate, at least indirectly because the systems are smaller. The number of overflows is reduced, as well as the execution time. In practice, these improvements have not been measured. This development is experimental and turned off by default.
SEMANTICS_USE_DERIVATIVE_LIST FALSE
The next property is only useful if Property SEMANTICS_USE_TRANSFORMER_LISTS 6.8.11.6 is set to true. Instead of computing the precondition derived from the transitive closure of a transformer list, semantics also computes the preconditions associated to different projections of the transformer list and use as loop precondition the intersection of these preconditions. Although it is a new feature, it is set to true by default for the validation’s sake. See test case Semantics/maisonneuve09.c: it improves the accuracy, but not as much as SEMANTICS_USE_DERIVATIVE_LIST 6.8.11.6. This development is experimental and turned off by default.
SEMANTICS_USE_LIST_PROJECTION TRUE
The string Property SEMANTICS_LIST_FIX_POINT_OPERATOR 6.8.11.6 is used to select a particular heuristic to compute an approximation of the transitive closure of a list of transformers. It is only useful if Property SEMANTICS_USE_TRANSFORMER_LISTS 6.8.11.6 is selected. The current default value is “depth_two”. An experimental value is “max_depth”.
SEMANTICS_LIST_FIX_POINT_OPERATOR "depth_two"
Preconditions can (used to) preserve initial values of the formal parameters. This is not often useful in C because programmers usually avoid modifying scalar parameters, especially integer ones. However, old values create problems in region computation because preconditions seem to be used instead of tranformer ranges. Filtering out the initial value does reduce the precision of the precondition analysis, but this does not impact the transformer analysis. Since the advantage is really limited to preconditions and for the region’s sake, the default value is set to true. Turn it to false if you have a doubt about the preconditions really available.
The loop index is usually dead on loop exit. So keeping information about its value is useless... most of the times. However, it is preserved by default.
SEMANTICS_KEEP_DO_LOOP_EXIT_CONDITION TRUE
SEMANTICS_FILTER_INITIAL_VALUES TRUE
6.8.11.7 Normalization level
Normalizing transformer and preconditions systems is a delicate issue which is not mathematically defined, and as such is highly empirical. It’s a tradeoff between eliminating redundant information, keeping an internal storage not too far from the prettyprinting for non-regression testing, exposing useful information for subsequent analyses,... all this at a reasonable cost.
Several levels of normalization are possible. These levels do not correspond to graduations on a normalization scale, but are different normalization heuristics. A level of 4 includes a preliminary lexicographic sort of contraints, which is very user friendly, but currently implies strings manipulations which are quite costly. It has been recently chosen to perform this normalization only before storing transformers and preconditions to the database (SEMANTICS_NORMALIZATION_LEVEL_BEFORE_STORAGE with a default value of 4). However, this can still have a serious impact on performances. With any other value, the normalization level is equel to 2.
SEMANTICS_NORMALIZATION_LEVEL_BEFORE_STORAGE 4
6.8.11.8 Prettyprint
Preconditions reflect by default all knowledge gathered about the current state (i.e. store). However, it is possible to restrict the information to variables actually read or written, directly or indirectly, by the statement following the precondition.
SEMANTICS_FILTERED_PRECONDITIONS FALSE
6.8.11.9 Debugging
Output semantics results on stdout
SEMANTICS_STDOUT FALSE
Debug level for semantics used to be controlled by a property. A Shell variable, SEMANTICS_DEBUG_LEVEL, is used instead.
6.9 Continuation conditions
Continuation conditions are attached to each statement. They represent the conditions under which the program will not stop in this statement. Under- and over-approximations of these conditions are computed.
> MODULE.may_continuation
> MODULE.must_summary_continuation
> MODULE.may_summary_continuation
< PROGRAM.entities
< MODULE.code
< MODULE.cumulated_effects
< MODULE.transformers
< CALLEES.must_summary_continuation
< CALLEES.may_summary_continuation
6.10 Complexities
Complexities are symbolic approximations of the execution times of statements. They are computed interprocedurally and based on polynomial approximations of execution times. Non-polynomial execution times are represented by unknown variables which are not free with respect to the program variables. Thus non-polynomial expressions are equivalent to polynomial expressions over a larger set of variables.
Probabilities for tests should also result in unknown variables (still to be implemented). See [?].
A summary_complexity is the approximation of a module execution times. It is translated and used at call sites.
Complexity estimation could be refined (i.e. the number of unknown variables reduced) by using transformers to combine elementary complexities using local states, rather than preconditions to combine elementary complexities relatively to the module initial state. The same options exist for region computation. The initial version [?] used the initial state for combinations. The new version [?] delays evaluation of variable values as long as possible but does not really use local states.
The first version of the complexity estimator was designed and developed by Pierre Berthomier. It was restricted to intra-procedural analysis. This first version was enlarged and validated on real code for SPARC-2 machines by Lei Zhou [?]. Since, it has been modified slightly by Franï¿œois Irigoin. For simple programs, complexity estimation are strongly correlated with execution times. The estimations can be used to see if program transformations are beneficial.
Known bugs: tests and while loops are not correctly handled because a fixed probably of 0.5 is systematically assumed.
6.10.1 Menu for Complexities
alias uniform_complexities ’Uniform’
alias fp_complexities ’FLOPs’
alias any_complexities ’Any’
6.10.2 Uniform Complexities
Complexity estimation is based on a set of basic operations and fixed execution times for these basic operation. The choice of the set is critical but fixed. Experiments by Lei Zhou showed that it should be enlarged. However, the basic times, which also are critical, are tabulated. New sets of tables can easily be developed for new processors.
Uniform complexity tables contain a unit execution time for all basic operations. They nevertheless give interesting estimations for SPARC SS-10, especially for -O2/-O3 optimized code.
< PROGRAM.entities
< MODULE.code MODULE.preconditions
< CALLEES.summary_complexity
6.10.3 Summary Complexity
Local variables are eliminated from the complexity associated to the top statement of a module in order to obtain the modules’ summary complexity.
< PROGRAM.entities
< MODULE.code MODULE.complexities
6.10.4 Floating Point Complexities
Tables for floating point complexity estimation are set to 0 for non-floating point operations, and to 1 for all floating point operations, including intrinsics like SIN.
< PROGRAM.entities
< MODULE.code MODULE.preconditions
< CALLEES.summary_complexity
This enables the default specification within the properties to be considered.
< PROGRAM.entities
< MODULE.code MODULE.preconditions
< CALLEES.summary_complexity
6.10.5 Complexity properties
The following properties control the static estimation of dynamic code execution time.
6.10.5.1 Debugging
Trace the walk across a module’s internal representation:
COMPLEXITY_TRACE_CALLS FALSE
Trace all intermediate complexities:
COMPLEXITY_INTERMEDIATES FALSE
Print the complete cost table at the beginning of the execution:
COMPLEXITY_PRINT_COST_TABLE FALSE
The cost table(s) contain machine and compiler dependent information about basic execution times, e.g. time for a load or a store.
6.10.5.2 Fine Tuning
It is possible to specify a list of variables which must remain literally in the complexity formula, although their numerical values are known (this is OK) or although they have multiple unknown and unrelated values during any execution (this leads to an incorrect result).
Formal parameters and imported global variables are left unevaluated.
They have relatively high priority (FI: I do not understand this comment by Lei).
This list should be empty by default (but is not for unknown historical reasons):
COMPLEXITY_PARAMETERS "IMAX␣LOOP"
Controls the printing of accuracy statistics:
- 0: do not prettyprint any statistics with complexities (to give the user a false sense of accuracy and/or to avoid cluttering his/her display); this is the default value;
- 1: prettyprint statistics only for loop/block/test/unstr. statements and not for basic statements, since they should not cause accuracy problems;
- 2: prettyprint statistics for all statements
COMPLEXITY_PRINT_STATISTICS 0
6.10.5.3 Target Machine and Compiler Selection
This property is used to select a set of basic execution times. These times depend on the target machine, the compiler and the compilation options used. It is shown in [?] that fixed basic times can be used to obtain accurate execution times, if enough basic times are considered, and if the target machine has a simple RISC processor. For instance, it is not possible to use only one time for a register load. It is necessary to take into account the nature of the variable, i.e. formal parameter, dynamic variable, global variable, and the nature of the access, e.g. the dimension of an accessed array. The cache can be ignored an replacer by an average hit ratio.
Different set of elementary cost tables are available:
- all_1: each basic operation cost is 1;
- fp_1: only floating point operations are taken into account and have cost unit 1; all other operations have a null cost.
In the future, we might add a sparc-2 table...
The different elementary table names are defined in complexity-local.h. They presently are operation, memory, index, transcend and trigo.
The different tables required are to be found in $PIPS_LIBDIR/complexity/xyz, where xyz is specified by this property:
COMPLEXITY_COST_TABLE "all_1"
6.10.5.4 Evaluation Strategy
For the moment, we have designed two ways to solve the complexity combination problem. Since symbolic complexity formulae use program variables it is necessary to specify in which store they are evaluated. If two complexity formulae are computed relatively to two different stores, they cannot be directly added.
The first approach, which is implemented, uses the module initial store as universal store for all formulae (but possibly for the complexity of elementary statements). In some way, symbolic variable are evaluated as early as possible as soon as it is known that they won’t make it in the module summary complexity.
This first method is easy to implement when the preconditions are available but it has at least two drawbacks:
- if a variable is used in different places with the same unknown value, each occurrence will be replaced by a different unknown value symbol (the infamous UU_xx symbols in formulae).
- since variables are replaced by numerical values as soon as possible as early as possible, the user is shown a numerical execution time instead of a symbolic formulae which would likely be more useful (see property COMPLEXITY_PARAMETERS 6.10.5.2). This is especially true with interprocedural constant propagation.
The second approach, which is not implemented, delay variable evaluation as late as possible. Complexities are computed and given relatively to the stores used by each statements. Two elementary complexities are combined together using the earliest store. The two stores are related by a transformer (see Section 6.8.11). Such an approach is used to compute MUST regions as precisely as possible (see Section 6.11.9).
A simplified version of the late evaluation was implemented. The initial store of the procedure is the only reference store used as with the early evaluation, but variables are not evaluated right away. They only are evaluated when it is necessary to do so. This not an ideal solution, but it is easy to implement and reduces considerably the number of unknown values which have to be put in the formulae to have correct results.
COMPLEXITY_EARLY_EVALUATION FALSE
6.11 Convex Array Regions
Convex array regions are functions mapping a memory store onto a convex set of array elements. They are used to represent the memory effects of modules or statements. Hence, they are expressed with respect to the initial store of the module or to the store immediately preceding the execution of the statement they are associated with. The latter is now standard in PIPS. Comprehensive information about convex array regions and their associated algorithms is available in Creusillet’s PhD Dissertation [?].
Apart from the array name and its dimension descriptors (or ϕ variables), an array region contains three additional informations:
- The type of the region: READ (R) or WRITE (W) to represent the effects of statements and procedures; IN and OUT to represent the flow of array elements.
- The approximation of the region: EXACT when the region exactly represents
the requested set of array elements, or MAY or MUST if it is an over- or
under-approximation (MUST ⊆ EXACT ⊆ MAY).
Unfortunately, for historical reasons, MUST is still used in the implementation instead of EXACT, and actual MUST regions are not computed. Moreover, the must_regions option in fact computes exact and may regions.
MAY regions are flow-insensitive regions, whereas MUST regions are flow sensitive. Any array element touched by any execution of a statement is in the MAY region of this statement. Any array element in the MUST region of a statement is accessed by any execution of this statement.
- a convex polyhedron containing equalities and inequalities: they link the ϕ variables that represent the array dimensions, to the values of the program integer scalar variables.
For instance, the convex array region:
<A(ϕ1,ϕ2)-W-EXACT-{ϕ1==I, ϕ1==ϕ2}>
Internally, convex array regions are of type effect and as such can be used to build use-def chains (see Section 6.4.3). Regions chains are built using proper regions which are particular READ and WRITE regions. For simple statements (assignments, calls to intrinsic functions), summarization is avoided to preserve accuracy. At this inner level of the program control flow graph, the extra amount of memory necessary to store regions without computing their convex hull should not be too high compared to the expected gain for dependence analysis. For tests and loops, proper regions contain the regions associated to the condition or the range. And for external calls, proper regions are the summary regions of the callee translated into the caller’s name space, to which are merely appended the regions of the expressions passed as argument (no summarization for this step).
Together with READ/WRITE regions and IN regions are computed their invariant versions for loop bodies (MODULE.inv_regions and MODULE.inv_in_regions). For a given loop body, they are equal to the corresponding regions in which all variables that may be modified by the loop body (except the current loop index) are eliminated from the descriptors (convex polyhedron). For other statements, they are equal to the empty list of regions.
In the following trivial example,
for(i=0; i<N; i++)
{
// regions for loop body:
// <a[phi1]-W-EXACT-{PHI1==K,K==I}>
// invariant regions for loop body:
// <a[phi1]-W-EXACT-{PHI1==I}>
k = k+1;
a[k] = k;
}
notice that the variable k which is modified in the loop body, and which appears in the loop body region polyhedron, does not appear anymore in the invariant region polyhedron.
MAY READ and WRITE region analysis was first designed by Rï¿œmi Triolet [?] and then revisited by Franï¿œois Irigoin [?]. Alexis Platonoff [?] implemented the first version of region analysis in PIPS. These regions were computed with respect to the initial stores of the modules. Franï¿œois Irigoin and, mainly, Bï¿œatrice Creusillet [?, ?, ?], added new functionalities to this first version as well as functions to compute MUST regions, and IN and OUT regions.
Array regions for C programs are currently under development.
6.11.1 Menu for Convex Array Regions
alias may_regions ’MAY regions’
alias must_regions ’EXACT or MAY regions’
6.11.2 MAY READ/WRITE Convex Array Regions
This function computes the MAY pointer regions in a module.
> MODULE.pointer_regions
> MODULE.inv_pointer_regions
< PROGRAM.entities
< MODULE.code
< MODULE.cumulated_effects
< MODULE.transformers
< MODULE.preconditions
< CALLEES.summary_pointer_regions
This function computes the MAY regions in a module.
> MODULE.regions
> MODULE.inv_regions
< PROGRAM.entities
< MODULE.code
< MODULE.cumulated_effects
< MODULE.transformers
< MODULE.preconditions
< CALLEES.summary_regions
6.11.3 MUST READ/WRITE Convex Array Regions
This function computes the MUST regions in a module.
> MODULE.pointer_regions
> MODULE.inv_pointer_regions
< PROGRAM.entities
< MODULE.code
< MODULE.cumulated_effects
< MODULE.transformers
< MODULE.preconditions
< CALLEES.summary_pointer_regions
This function computes the MUST pointer regions in a module using simple points_to information to disambiguate dereferencing paths.
> MODULE.pointer_regions
> MODULE.inv_pointer_regions
< PROGRAM.entities
< MODULE.code
< MODULE.cumulated_effects
< MODULE.transformers
< MODULE.preconditions
< MODULE.points_to_list
< CALLEES.summary_pointer_regions
This function computes the MUST regions in a module.
> MODULE.regions
> MODULE.inv_regions
< PROGRAM.entities
< MODULE.code
< MODULE.cumulated_effects
< MODULE.transformers
< MODULE.preconditions
< CALLEES.summary_regions
This function computes the MUST regions in a module.
> MODULE.regions
> MODULE.inv_regions
< PROGRAM.entities
< MODULE.code
< MODULE.cumulated_effects
< MODULE.transformers
< MODULE.preconditions
< CALLEES.summary_regions
6.11.4 Summary READ/WRITE Convex Array Regions
Module summary regions provides an approximation of the effects it’s execution has on its callers variables as well as on global and static variables of its callees.
< PROGRAM.entities
< MODULE.code
< MODULE.pointer_regions
< PROGRAM.entities
< MODULE.code
< MODULE.regions
6.11.5 IN Convex Array Regions
IN convex array regions are flow sensitive regions. They are read regions not covered (i.e. not previously written) by assignments in the local hierarchical control-flow graph. There is no way with the current pipsmake-rc and pipsmake to express the fact that IN (and OUT) regions must be calculated using must_regions 6.11.3 (a new kind of resources, must_regions 6.11.3, should be added). The user must be knowledgeable enough to select must_regions 6.11.3 first.
> MODULE.cumulated_in_regions
> MODULE.inv_in_regions
< PROGRAM.entities
< MODULE.code
< MODULE.summary_effects
< MODULE.cumulated_effects
< MODULE.transformers
< MODULE.preconditions
< MODULE.regions
< MODULE.inv_regions
< CALLEES.in_summary_regions
6.11.6 IN Summary Convex Array Regions
This pass computes the IN convex array regions of a module. They contain the array elements and scalars whose values impact the output of the module.
< PROGRAM.entities
< MODULE.code
< MODULE.transformers
< MODULE.preconditions
< MODULE.in_regions
6.11.7 OUT Summary Convex Array Regions
This pass computes the OUT convex array regions of a module. They contain the array elements and scalars whose values impact the continuation of the module.
See Section 6.11.8.
< PROGRAM.entities
< CALLERS.out_regions
6.11.8 OUT Convex Array Regions
OUT convex array regions are also flow sensitive regions. They are downward exposed written regions which are also used (i.e. imported) in the continuation of the program. They are also called exported regions. Unlike READ, WRITE and IN regions, they are propagated downward in the call graph and in the hierarchical control flow graphs of the subroutines.
< PROGRAM.entities
< MODULE.code
< MODULE.transformers
< MODULE.preconditions
< MODULE.regions
< MODULE.inv_regions
< MODULE.summary_effects
< MODULE.cumulated_effects
< MODULE.cumulated_in_regions
< MODULE.inv_in_regions
< MODULE.out_summary_regions
6.11.9 Properties for Convex Array Regions
If MUST_REGIONS is true, then it computes convex array regions using the algorithm described in report E/181/CRI, called T-1 algorithm. It provides more accurate regions, and preserve MUST approximations more often. As it is more costly, its default value is FALSE. EXACT_REGIONS is true for the moment for backward compatibility only.
EXACT_REGIONS TRUE
MUST_REGIONS FALSE
The default option is to compute regions without taking into account array bounds. The next property can be turned to TRUE to systematically add them in the region descriptors. Both options have their advantages and drawbacks.
REGIONS_WITH_ARRAY_BOUNDS FALSE
Property MEMORY_IN_OUT_REGIONS_ONLY 6.11.9’s default value is set to TRUE to avoid computing IN and OUT regions on non-memory effects, even if MEMORY_EFFECTS_ONLY 6.2.7 is set to FALSE.
MEMORY_IN_OUT_REGIONS_ONLY TRUE
The current implementation of effects, simple effects as well as convex array regions, relies on a generic engine which is independent of the effect descriptor representation. The current representation for array regions, parameterized integer convex polyhedra, allows various patterns an provides the ability to exploit context information at a reasonable expense. However, some very common patterns such as nine-point stencils used in seismic computations or red-black patterns cannot be represented. It has been a long lasting temptation to try other representations [?].
A Complementary sections (see Section 6.13) implementation was formerly began as a set of new phases by Manjunathaiah Muniyappa, but is not maintained anymore.
And Nga Nguyen more recently created two properties to switch between regions and disjunctions of regions (she has already prepared basic operators). For the moment, they are always FALSE.
DISJUNCT_REGIONS FALSE
DISJUNCT_IN_OUT_REGIONS FALSE
Statistics may be obtained about the computation of convex array regions. When the next property (REGIONS_OP_STATISTICS) is set to TRUE statistics are provided about operators on regions (union, intersection, projection,…). The second next property turns on the collection of statistics about the interprocedural translation.
REGIONS_OP_STATISTICS FALSE
REGIONS_TRANSLATION_STATISTICS FALSE
6.12 Alias Analysis
6.12.1 Dynamic Aliases
Dynamic aliases are pairs (formal parameter, actual parameter) of convex array regions generated at call sites. An “IN alias pair” is generated for each IN region of a called module and an “OUT alias pair” for each OUT region. For EXACT regions, the transitive, symmetric and reflexive closure of the dynamic alias relation results in the creation of equivalence classes of regions (for MAY regions, the closure is different and does not result in an equivalence relation, but nonetheless allows us to define alias classes). A set of alias classes is generated for a module, based on the IN and OUT alias pairs of all the modules below it in the callgraph. The alias classes for the whole workspace are those of the module which is at the root of the callgraph, if the callgraph has a unique root. As an intermediate phase between the creation of the IN and OUT alias pairs and the creation of the alias classes, “alias lists” are created for each module. An alias list for a module is the transitive closure of the alias pairs (IN or OUT) for a particular path through the callgraph subtree rooted in this module.
< PROGRAM.entities
< MODULE.callers
< MODULE.in_summary_regions
< CALLERS.code
< CALLERS.cumulated_effects
< CALLERS.preconditions
out_alias_pairs > MODULE.out_alias_pairs
< PROGRAM.entities
< MODULE.callers
< MODULE.out_summary_regions
< CALLERS.code
< CALLERS.cumulated_effects
< CALLERS.preconditions
alias_lists > MODULE.alias_lists
< PROGRAM.entities
< MODULE.in_alias_pairs
< MODULE.out_alias_pairs
< CALLEES.alias_lists
alias_classes > MODULE.alias_classes
< PROGRAM.entities
< MODULE.alias_lists
6.12.2 Intraprocedural Summary Points to Analysis
This phase generates synthetic points-to relations for formal parameters. It creates synthetic sinks, i.e. stubs, for formal parameters and provides an initial set of points-to to the points_to_analysis ??.
Currently, it assumes that no sharing exists between the formal parameters and within the data structures pointed to by the formal parameters. Two properties should control this behavior, ALIASING_ACROSS_FORMAL_PARAMETERS 6.12.5 and ALIASING_ACROSS_TYPES 6.12.5. The first one supersedes the property ALIASING_INSIDE_DATA_STRUCTURE 6.12.5.
intraprocedural_summary_points_to_analysis > MODULE.summary_points_to_list
< PROGRAM.entities
< MODULE.code
6.12.3 Points to Analysis
This function is being implemented by Amira Mensi. The points_to_analysis ?? is implemented in order to compute points-to relations, based on Emami algorithm. Emami algorithm is a top-down analysis which calcules the points-to relations by applying specific rules to each assignement pattern identified. This phase requires another resource which is intraprocedural_summary_points_to_analysis ??.
points_to_analysis > MODULE.points_to_list
< PROGRAM.entities
< MODULE.code
< MODULE.summary_points_to_list
The pointer effects are useful, but they are recomputed for each expression and subexpression by the points-to analysis.
6.12.4 Pointer Values Analyses
Pointer values analysis is another kind of pointer analysis which tries to gather Pointer Values both in terms of other pointer values but also of memory addresses. This phase is under development.
simple_pointer_values > MODULE.simple_pointer_values
< PROGRAM.entities
< MODULE.code
6.12.5 Properties for pointer analyses
The following properties are defined to ensure the safe use of points_to_analysis ??.
The property ALIASING_ACROSS_TYPES 6.12.5 specifies that two pointers of different effective types can be aliased. The default and safe value is TRUE; when it is turned to FALSE two pointers of different types are never aliased.
ALIASING_ACROSS_TYPES TRUE
The property ALIASING_ACROSS_FORMAL_PARAMETERS 6.12.5 is used to handle the aliasing between formal parameters and global variables of pointer type. When it is set to TRUE, two formal parameters or a formal one and a global pointer or two global pointers can be aliased. If it is turned to FALSE, such pointers are assumed to be unaliased for intraprocedural analysis and generally for root module(i.e. modules without callers). The default value is FALSE. It is the only value currently implemented.
ALIASING_ACROSS_FORMAL_PARAMETERS FALSE
The nest property specifies that one data structure can recursively contain two pointers pointing to the same location. If it is turned to FALSE, it is assumed that two different not included memory access paths cannot point to the same memory locations. The safe value is TRUE, but parallelization is hindered. Often, the user can guarantee that data structures do not exhibit any sharing. Optimistically, FALSE is the default value.
ALIASING_INSIDE_DATA_STRUCTURE FALSE
Property ALIASING_ACROSS_IO_STREAMS 6.12.5 can be set to FALSE to specify that two io streams (two variables declared as FILE *) cannot be aliased, neither the locations to which they point. The safe and default value is TRUE
ALIASING_ACROSS_IO_STREAMS TRUE
The following string property defines the lattice of maximal elements to use when precise information is lost. Three values are possible: ”unique”, ”function” and ”area”. The first value is the default value. A unique identifier is defined to represent any set of unknown locations. The second value defines a separate identifier for each function and compilation unit. Note that compilation units require more explanation about this definition and about the conflict detection scheme. The third value, ”area”, requires a separate identifier for each area of each function or compilation unit. These abstract lcoation lattice values are further refined if the property ALIASING_ACROSS_TYPES 6.12.5 is set to FALSE. The abstract location API hides all these local maximal values from its callers. Note that the dereferencing of any such top abstract location returns the very top of all abstract locations.
The ABSTRACT_HEAP_LOCATIONS 6.12.5 specifies the modeling of the heap. The possible values are ”unique”, ”insensitive”, ”flow-sensitive” and ”context-sensitive”. Each value defines a stricly refined analysis with respect to analyses defined by previous values [This may not be a good idea, since flow and context sensitivity are orthogonal].
The default value, ”unique”, implies that the heap is a unique array. It is enough to parallelize simple loops containing pointer-based references such as ”p[i]”.
In the ”insensitive” case and all other cases, one array is allocated in each function to modelize the heap.
In the ”flow-sensitive” case, the statement numbers of the malloc() call sites are used to subscribe this array, as well as all indices of the surrounding loops [Two improvements in one property...].
In the ”context_sensitive” case, the interprocedural translation of memory acces paths based on the abstract heap are prefixed by the same information regarding the call site: function containing the call site, statement number of the call site and indices of surrounding loops.
Note that the naming of options is not fully compatible with the usual notations in pointer analyses. Note also that the insensitive case is redundant with context sensitive case: in the later case, a unique heap associated to malloc() would carry exactly the same amount of information [flow and context sensitivity are orthogonal].
Finally, note that abstract heap arrays are distinguished according to their types if the property ALIASING_ACROSS_TYPES 6.12.5 is set to FALSE [impact on abstract heap location API]. Else, the heap array is of type unknown. If a heap abstract location is dereferenced without any point-to information nor heap aliasing information, the safe result is the top abstract location.
ABSTRACT_HEAP_LOCATIONS "unique"
The property POINTS_TO_UNINITIALIZED_POINTER_DEREFERENCING 6.12.5 specifies that we are dereferencing an uninitialized pointer. The safe value is TRUE and it stops the analysis with a pips_user_error. However the default value is FALSE which arises a pips_user_warning and we continue the analysis, it’s due to dead code: we don’t have to stop the analysis if a dereferencing will never be executed.
POINTS_TO_UNINITIALIZED_POINTER_DEREFERENCING FALSE
The property POINTS_TO_STRICT_POINTER_TYPES 6.12.5 is used to handle pointer arithmetic. According to C standard(section 6.5.6, item 8) the following C code :
p = \&i;
p++ ;
is correct and p points to the same area, expressed by the points to analysis as i[*]. The default value is FALSE, meaning that p points to an array element. When it’s set to TRUE typing becone strict ; meaning that p points to an integer and the behavior is undefined. So the analysis stops with a pips_user_error(illegal pointer arithmetic)
POINTS_TO_STRICT_POINTER_TYPES FALSE
The integer property POINTS_TO_K_LIMITING 6.12.5 specifies the maximum authorized length of an access path. Beyond this value the access path is changed into an ANYWHERE. This property is necessary while computing points to relations inside loops and it’s default value is 10.
POINTS_TO_K_LIMITING 10
6.12.6 Menu for Alias Views
alias print_in_alias_pairs ’In Alias Pairs’
alias print_out_alias_pairs ’Out Alias Pairs’
alias print_alias_lists ’Alias Lists’
alias print_alias_classes ’Alias Classes’
Display the dynamic alias pairs (formal region, actual region) for the IN regions of the module.
< PROGRAM.entities
< MODULE.cumulated_effects
< MODULE.in_alias_pairs
Display the dynamic alias pairs (formal region, actual region) for the OUT regions of the module.
< PROGRAM.entities
< MODULE.cumulated_effects
< MODULE.out_alias_pairs
Display the transitive closure of the dynamic aliases for the module.
< PROGRAM.entities
< MODULE.cumulated_effects
< MODULE.alias_lists
Display the dynamic alias equivalence classes for this module and those below it in the callgraph.
< PROGRAM.entities
< MODULE.cumulated_effects
< MODULE.alias_classes
6.13 Complementary Sections
A new representation of array regions added in PIPS by Manjunathaiah Muniyappa. This anlysis is not maintained anymore.
6.13.1 READ/WRITE Complementary Sections
This function computes the complementary sections in a module.
< PROGRAM.entities
< MODULE.code
< MODULE.cumulated_effects
< MODULE.transformers
< MODULE.preconditions
< CALLEES.summary_compsec
6.13.2 Summary READ/WRITE Complementary Sections
< PROGRAM.entities
< MODULE.code
< MODULE.compsec
Chapter 7
Parallelization and Distribution
7.1 Code Parallelization
PIPS basic parallelization function, rice_all_dependence 7.1.3, produces a new version of the Module code with DOALL loops exhibited using Allen & Kennedy’s algorithm. The DOALL syntactic construct is non-standard but easy to understand and usual in text book like [?]. As parallel prettyprinter option, it is possible to use Fortran 90 array syntax (see Section 9.4). For C, the loops can be output as for-loop decorated with OpenMP pragma.
Remember that Allen & Kennedy’s algorithm can only be applied on loops with simple bodies, i.e. sequences of assignments, because it performs loop distribution and loop regeneration without taking control dependencies into account. If the loop body contains tests and branches, the coarse grain parallelization algorithm should be used (see 7.1.6).
Loop index variables are privatized whenever possible, using a simple algorithm. Dependence arcs related to the index variable and stemming from the loop body must end up inside the loop body. Else, the loop index is not privatized because its final value is likely to be needed after the loop end and because no copy-out scheme is supported.
A better privatization algorithm for all scalar variable may be used as a preliminary code transformation. An array privatizer is also available (see Section 8.11.2). A non-standard PRIVATE declaration is used to specify which variables should be allocated on stack for each loop iteration. An HPF or OpenMP format can also be selected.
Objects of type parallelized_code differs from objects of type code for historic reasons, to simplify the user interface and because most algorithms cannot be applied on DOALL loops. This used to be true for pre-condition computation, dependence testing and so on... It is possible neither to re-analyze parallel code, nor to re-parse it (although it would be interesting to compute the complexity of a parallel code) right now but it should evolves. See § 7.1.8.
7.1.1 Parallelization properties
There are few properties that control the parallelization behaviour.
7.1.1.1 Properties controlling Rice parallelization
TRUE to make all possible parallel loops, FALSE to generate real (vector, innermost parallel?) code:
GENERATE_NESTED_PARALLEL_LOOPS TRUE
Show statistics on the number of loops parallelized by pips:
PARALLELIZATION_STATISTICS FALSE
To select whether parallelization and loop distribution is done again for already parallel loops:
PARALLELIZE_AGAIN_PARALLEL_CODE FALSE
The motivation is we may want to parallelize with a coarse grain method first, and finish with a fine grain method here to try to parallelize what has not been parallelized. When applying ᅵ la Rice parallelizing to parallelize some (still) sequential code, we may not want loop distribution on already parallel code to preserve cache resources, etc.
Thread-safe libraries are protected by critical sections. Their functions can be called safely from different execution threads. For instance, a loop whose body contains calls to malloc can be parallelized. The underlying state changes do no hinder parallelization, at least if the code is not sensitive to pointer values.
PARALLELIZATION_IGNORE_THREAD_SAFE_VARIABLES FALSE
Since this property is used to mask arcs in the dependence graph, it must be exploited by each parallelization phase independently. It is not used to derived a simplified version of the use-def chains or of the dependence graph to avoid wrong result with use-def elimination, which is based on the same graph.
7.1.2 Menu for Parallelization Algorithm Selection
Entries in menu for the resource parallelized_code and for the different parallelization algorithms with may be activated or selected. Note that the nest parallelization algorithm is not debugged.
alias rice_all_dependence ’All Dependences’
alias rice_data_dependence ’True Dependences Only’
alias rice_cray ’CRAY Microtasking’
alias nest_parallelization ’Loop Nest Parallelization’
alias coarse_grain_parallelization ’Coarse Grain Parallelization’
alias internalize_parallel_code ’Consider a parallel code as a sequential one’
7.1.3 Allen & Kennedy’s Parallelization Algorithm
Use Allen & Kennedy’s algorithm and consider all dependences.
< PROGRAM.entities
< MODULE.code MODULE.dg
7.1.4 Def-Use Based Parallelization Algorithm
Several other parallelization functions for shared-memory target machines are available. Function rice_data_dependence 7.1.4 only takes into account data flow dependences, a.k.a true dependences. It is of limited interest because transitive dependences are computed. It is not equivalent at all to performing array and scalar expansion based on direct dependence computation (Brandes, Feautrier, Pugh). It is not safe when privatization is performed before parallelization.
This phase is named after the historical classification of data dependencies in output dependence, anti-dependence and true or data dependence. It should not be used for standard parallelization, but only for experimental parallelization by knowledgeable users, aware that the output code may be illegal.
< PROGRAM.entities
< MODULE.code MODULE.dg
7.1.5 Parallelization and Vectorization for Cray Multiprocessors
Function rice_cray 7.1.5 targets Cray vector multiprocessors. It selects one outermost parallel loop to use multiple processors and one innermost loop for the vector units. It uses Cray microtasking directives. Note that a prettyprinter option must also be selected independently (see Section 9.4).
< PROGRAM.entities
< MODULE.code MODULE.dg
7.1.6 Coarse Grain Parallelization
Function coarse_grain_parallelization 7.1.6 implements a loop parallelization algorithm based on convex array regions. It considers only one loop at a time, its body being abstracted by its invariant read and write regions. No loop distribution is performed, but any kind of loop body is acceptable whereas Allen & Kennedy algorithm only copes with very simple loop bodies.
For nasty reasons about effects that are statement addresses to effects mapping, this pass changes the code instead of producing a parallelized_code resource. It is not a big deal since often we want to modify the code again and we should use internalize_parallel_code 7.1.8 just after if its behavior were modified.
< PROGRAM.entities
< MODULE.code
< MODULE.cumulated_effects
< MODULE.preconditions
< MODULE.inv_regions
Function coarse_grain_parallelization_with_reduction 7.1.6 extend the standard coarse_grain_parallelization 7.1.6 by using reduction detection informations.
< PROGRAM.entities
< MODULE.code
< MODULE.cumulated_effects
< MODULE.cumulated_reductions
< MODULE.proper_reductions
< MODULE.inv_regions
7.1.7 Global Loop Nest Parallelization
Function nest_parallelization 7.1.7 is an attempt at combining loop transformations and parallelization for perfectly nested loops. Different parameters are computed like loop ranges and contiguous directions for references. Loops with small ranges are fully unrolled. Loops with large ranges are strip-mined to obtain vector and parallel loops. Loops with medium ranges simply are parallelized. Loops with unknown range also are simply parallelized.
For each loop direction, the amount of spatial and temporal localities is estimated. The loop with maximal locality is chosen as innermost loop.
This algorithm still is in the development stage. It can be tried to check that loops are interchanged when locality can be improvedInternship!. An alternative for static control section, is to use the interface with PoCC (see Section ??).
< PROGRAM.entities
< MODULE.code MODULE.dg
7.1.8 Coerce Parallel Code into Sequential Code
To simplify the user interface and to display with one click a parallelized program, programs in PIPS are (parallelized code) instead of standard (code.PV:not clear As a consequence, parallelized programs cannot be further analyzed and transformed because sequential code and parallelized code do not have the same resource type. Most pipsmake rules apply to code but not to parallelized code. Unfortunately, improving the parallelized code with some other transformations such as dead-code elimination is also useful. Thus this pseudo-transformation is added to coerce a parallel code into a classical (sequential) one. Parallelization is made an internal code transformation in PIPS with this rule.
Although this is not the effective process, parallel loops are tagged as parallel and loop local variables may be added in a code resource because of a previous privatization phase.
If you display the “generated” code, it may not be displayed as a parallel one if the PRETTYPRINT_SEQUENTIAL_STYLE 9.2.21.3.2 is set to a parallel output style (such as omp). Anyway, the information is available in code.
Note this transformation may no be usable with some special parallelizations in PIPS such as WP65 or HPFC that generate other resource types that may be quite different.
< MODULE.parallelized_code
7.1.9 Detect Computation Intensive Loops
Generate a pragma on each loop that seems to be computation intensive according to a simple cost model.
The computation intensity is derived from the complexity and the memory footprint.
It assumes the cost model:
A loop is marked with pragma COMPUTATION_INTENSITY_PRAGMA 7.1.9
if the communication costs are lower than the execution cost as given by
uniform_complexities 6.10.2.
< MODULE.code
< MODULE.regions
< MODULE.complexities
This correspond to the transfer startup overhead. Time unit is the same as in complexities.
COMPUTATION_INTENSITY_STARTUP_OVERHEAD 10
This corresponds to the memory bandwidth in octet per time unit.
COMPUTATION_INTENSITY_BANDWIDTH 100
And This is the processor frequency, in operation per time unit.
COMPUTATION_INTENSITY_FREQUENCY 1000
This is the generated pragma.
COMPUTATION_INTENSITY_PRAGMA "pips␣intensive␣loop"
Those values have limited meaning here, only their ratio have some. Having COMPUTATION_INTENSITY_FREQUENCY 7.1.9 and COMPUTATION_INTENSITY_BANDWIDTH 7.1.9 of the same magnitude clearly limits the number of generated pragmas…
7.1.10 Limit Parallelism in Parallel Loop Nests
This phase restricts the parallelism of parallel do-loop nests by limiting the number of top-level parallel do-loops to be below a given limit. The too many innermost parallel loops are replaced by sequential loops, if any. This is useful to keep enough coarse-grain parallelism and respecting some hardware or optimization constraints. For example on GPU, in CUDA there is a 2D limitation on grids of thread blocks, in OpenCL it is limited to 3D. Of course, since the phase works onto parallel loop nest, it might be interesting to use a parallelizing phase such as internalize_parallel_code (see § 7.1.8) or coarse grain parallelization before applying limit_nested_parallelism.
< MODULE.code
PIPS relies on the property NESTED_PARALLELISM_THRESHOLD 7.1.10 to determine the desired level of nested parallelism.
NESTED_PARALLELISM_THRESHOLD 0
7.2 SIMDizer for SIMD Multimedia Instruction Set
The SAC project aims at generating efficient code for processors with SIMD extension instruction set such as VMX, SSE4, etc. which are also refered to as Superword Level Parallelism (SLP). For more information, see https://info.enstb.org/projets/sac.
Some phases use ACCEL_LOAD 7.2 and ACCEL_STORE 7.2 to generate DMA calls and ACCEL_WORK 7.2.
ACCEL_LOAD "SIMD_LOAD"
ACCEL_STORE "SIMD_STORE"
ACCEL_WORK "SIMD_"
Here is yet another atomizer, based on new_atomizer (see Section 8.4.1.2), used to reduce complex statements to three-address code close to assembly code. There are only some minor differences with respect to new_atomizer, except that it does not break down simple expressions, that is, expressions that are the sum of a reference and a constant such as tt i+1. This is needed to generate code that could potentially be efficient, whereas the original atomizer would most of the time generate inefficient code.
simd_atomizer > MODULE.code
< PROGRAM.entities
< MODULE.code
Use the SIMD_ATOMIZER_ATOMIZE_REFERENCE 7.2 property to make the SIMD Atomizer go wild: unlike other atomizer, it will break the content of a reference. SIMD_ATOMIZER_ATOMIZE_LHS 7.2 can be used to tell the atomizer to atomize both lhs and rhs.
SIMD_ATOMIZER_ATOMIZE_REFERENCE FALSE
SIMD_ATOMIZER_ATOMIZE_LHS FALSE
The SIMD_OVERRIDE_CONSTANT_TYPE_INFERENCE 7.2 property is used by the sac library to know if it must override C constant type inference. In C, an integer constant always as the minimum size needed to hold its value, starting from an int. In sac we may want to have it converted to a smaller size, in situation like char b;/*...*/;char a = 2 + b;. Otherwise the result of 2+b is considered as an int. if SIMD_OVERRIDE_CONSTANT_TYPE_INFERENCE 7.2 is set to TRUE, the result of 2+b will be a char.
SIMD_OVERRIDE_CONSTANT_TYPE_INFERENCE FALSE
Tries to unroll the code for making the simdizing process more efficient. It thus tries to compute the optimal unroll factor, allowing to pack the most instructions together. Sensible to SIMDIZER_AUTO_UNROLL_MINIMIZE_UNROLL 7.2.1.1 and SIMDIZER_AUTO_UNROLL_SIMPLE_CALCULATION 7.2.1.1.
simdizer_auto_unroll > MODULE.code
< PROGRAM.simd_treematch
< PROGRAM.simd_operator_mappings
< PROGRAM.entities
< MODULE.code
Similiar to simdizer_auto_unroll 7.2 but at the loop level.
Sensible to LOOP_LABEL 8.1.1.
< PROGRAM.entities
< MODULE.code
Tries to tile the code to make the simdizing process more efficient.
Sensible to LOOP_LABEL 8.1.1 to select the loop nest to tile.
< PROGRAM.entities
< MODULE.cumulated_effects
< MODULE.code
This phase tries to pre-process reductions, so that they can be vectorized efficiently by the simdizer 7.2 phase. When multiple reduction statements operating on the same variable with the same operation are detected inside a loop body, each “instance” of the reduction is renamed, and some code is added before and after the loop to initialize the new variables and compute the final result.
simd_remove_reductions > MODULE.code
> MODULE.callees
! MODULE.simdizer_init
< PROGRAM.entities
< MODULE.cumulated_reductions
< MODULE.code
< MODULE.dg
SIMD_REMOVE_REDUCTIONS_PREFIX "RED"
SIMD_REMOVE_REDUCTIONS_PRELUDE ""
SIMD_REMOVE_REDUCTIONS_POSTLUDE ""
Remove useless load store calls (and more)
redundant_load_store_elimination > MODULE.code
> MODULE.callees
< PROGRAM.entities
< MODULE.code
< MODULE.out_regions
< MODULE.chains
If REDUNDANT_LOAD_STORE_ELIMINATION_CONSERVATIVE 7.2 is set to false, redundant_load_store_elimination 7.2 will remove any statement not implied in the computation of out regions, otherwise it will not remove statement that modifies aprameters reference.
REDUNDANT_LOAD_STORE_ELIMINATION_CONSERVATIVE TRUE
...
deatomizer > MODULE.code
< PROGRAM.entities
< MODULE.code
< MODULE.proper_effects
< MODULE.dg
This phase is the first phase of the if-conversion algorithm. The complete if conversion algorithm is performed by applying the three following phase: if_conversion_init 7.2, if_conversion 7.2 and if_conversion_compact 7.2.
Use IF_CONVERSION_INIT_THRESHOLD 7.2 to control whether if conversion will occur or not: beyhond this number of call, no conversion is done.
IF_CONVERSION_INIT_THRESHOLD 40
if_conversion_init > MODULE.code
< PROGRAM.entities
< MODULE.code
< MODULE.summary_complexity
This phase is the second phase of the if-conversion algorithm. The complete if conversion algorithm is performed by applying the three following phase: if_conversion_init 7.2, if_conversion 7.2 and if_conversion_compact 7.2.
IF_CONVERSION_PHI "__C-conditional__"
if_conversion > MODULE.code
< PROGRAM.entities
< MODULE.code
< MODULE.proper_effects
This phase is the third phase of the if-conversion algorithm. The complete if conversion algorithm is performed by applying the three following phase: if_conversion_init 7.2, if_conversion 7.2 and if_conversion_compact 7.2.
if_conversion_compact > MODULE.code
< PROGRAM.entities
< MODULE.code
< MODULE.proper_effects
< MODULE.dg
< PROGRAM.entities
< MODULE.code
This phase tries to minimize dependencies in the code by renaming scalars when legal.
< PROGRAM.entities
< MODULE.dg
< MODULE.proper_effects
This function initialize a treematch used by simdizer 7.2 for simd-oriented pattern matching
< PROGRAM.entities
< MODULE.code
Function simdizer 7.2 is an attempt at generating SIMD code for SIMD multimedia instruction set such as MMX, SSE2, VIS,... This transformation performs the core vectorization, transforming sequences of similar statements into vector operations.
simdizer > MODULE.code
> MODULE.callees
> PROGRAM.entities
! MODULE.simdizer_init
< PROGRAM.simd_treematch
< PROGRAM.simd_operator_mappings
< PROGRAM.entities
< MODULE.code
< MODULE.proper_effects
< MODULE.cumulated_effects
< MODULE.dg
When set to true, following property tells the simdizer to try to padd arrays when it seems to be profitable
SIMDIZER_ALLOW_PADDING FALSE
Skip generation of load and stores, using generic functions instead.
SIMDIZER_GENERATE_DATA_TRANSFERS TRUE
This phase is to be called after simdization of affectation operator. It performs type substitution from char/short array to in array using the packing from the simdization phase For example, four consecutive load from a char array could be a single load from an int array. This prove to be useful for c to vhdl compilers such as c2h.
simd_memory_packing > MODULE.code
< PROGRAM.entities
< MODULE.code
7.2.1 SIMD properties
This property is used to set the target register size, expressed in bits, for places where this is needed (for instance, auto-unroll with simple algorithm).
SAC_SIMD_REGISTER_WIDTH 64
7.2.1.1 Auto-Unroll
This property is used to control how the auto unroll phase computes the unroll factor. By default, the minimum unroll factor is used. It is computed by using the minimum of the optimal factor for each statement. If the property is set to FALSE, then the maximum unroll factor is used instead.
SIMDIZER_AUTO_UNROLL_MINIMIZE_UNROLL TRUE
This property controls how the “optimal” unroll factor is computed. Two algorithms can be used. By default, a simple algorithm is used, which simply compares the actual size of the variables used to the size of the registers to find out the best unroll factor. If the property is set to FALSE, a more complex algorithm is used, which takes into account the actual SIMD instructions.
SIMDIZER_AUTO_UNROLL_SIMPLE_CALCULATION TRUE
7.2.1.2 Memory Organisation
This property is used by the sac library to know which elements of multi-dimensional array are consecutive in memory. Let us consider the three following references a(i,j,k), a(i,j,k+1) and a(i+1,j,k). Then, if SIMD_FORTRAN_MEM_ORGANISATION 7.2.1.2 is set to TRUE, it means that a(i,j,k) and a(i+1,j,k) are consecutive in memory but a(i,j,k) and a(i,j,k+1) are not. However, if SIMD_FORTRAN_MEM_ORGANISATION 7.2.1.2 is set to FALSE, a(i,j,k) and a(i,j,k+1) are consecutive in memory but a(i,j,k) and a(i+1,j,k) are not.
SIMD_FORTRAN_MEM_ORGANISATION TRUE
7.2.1.3 Pattern file
This property is used by the sac library to know the path of the pattern definition file. If the file is not found, the execution fails.
SIMD_PATTERN_FILE "patterns.def"
7.2.2 Scalopes project
This pass outlines code parts based on pragma. It can outline blocs or loops with a #pragma scmp task flag. It is based on the outline pass.
> MODULE.callees
> PROGRAM.entities
< PROGRAM.entities
< MODULE.code
< MODULE.cumulated_effects
7.2.2.1 Bufferization
The goal of Bufferization is to generate a dataflow communication through buffers between modules. The communication is done by special function call generated by kernel_load_store 7.3.7.4. To keep flows consistent outside the module scalopify ?? surround variable call with a special function too. A C file with stubs is needed.
Note that you must also set KERNEL_LOAD_STORE_DEALLOCATE_FUNCTION 7.3.7.4 to ”” in order to have it generate relevant code.
The goal of this pass is to keep consistent flows outside the tasks.
> MODULE.callees
> PROGRAM.entities
< PROGRAM.entities
< MODULE.code
< MODULE.cumulated_effects
7.2.2.2 SCMP generation
The goal of the following phase is to generate SCMP tasks from ’normal’ modules. The tasks are linked and scheduled using the SCMP HAL. Sesamify take in input a module and analyze all its callees (per example the ’main’ after GPU-IFY or SCALOPRAGMA application). Each analyzed module is transformed into a SCMP task if its name begin with P4A_scmp_task. To generate final files for SCMP the pass output need to be transform with a special python parser.
> MODULE.callees
> PROGRAM.entities
< PROGRAM.entities
< MODULE.code
< MODULE.cumulated_effects
7.3 Code Distribution
Different automatic code distribution techniques are implemented in PIPS for distributed-memory machines. The first one is based on the emulation of a shared-memory. The second one is based on HPF. A third one target architectures with hardware coprocessors. Another one is currently developed at IT Sud Paris that generate MPI code from OpenMP one.
7.3.1 Shared-Memory Emulation
WP651 [?, ?, ?] produces a new version of a module transformed to be executed on a distributed memory machine. Each module is transformed into two modules. One module, wp65_compute_file, performs the computations, while the other one, wp65_bank_file, emulates a shared memory.
This rule does not have data structure outputs, as the two new program generated have computed names. This does not fit the pipsmake framework too well, but is OK as long as nobody wishes to apply PIPS on the generated code, e.g. to propagate constant or eliminate dead code.
Note that use-use dependencies are used to allocate temporary arrays in local memory (i.e. in the software cache).
This compilation scheme was designed by Corinne Ancourt and Franï¿œois Irigoin. It uses theoretical results in [?]. Its input is a very small subset of Fortran program (e.g. procedure calls are not supported). It was implemented by the designers, with help from Lei Zhou.
alias wp65_bank_file ’Bank Distributed View’
wp65 > MODULE.wp65_compute_file
> MODULE.wp65_bank_file
! MODULE.privatize_module
< PROGRAM.entities
< MODULE.code
< MODULE.dg
< MODULE.cumulated_effects
< MODULE.chains
< MODULE.proper_effects
Name of the file for the target model:
WP65_MODEL_FILE "model.rc"
7.3.2 HPF Compiler
The HPF compiler2 is a project by itself, developed by Fabien Coelho in the PIPS framework.
A whole set of rules is used by the PIPS HPF compiler3 , HPFC4 . By the way, the whole compiler is just a big hack according to Fabien Coelho.
7.3.2.1 HPFC Filter
The first rule is used to apply a shell to put HPF-directives in an f77 parsable form. Some shell script based on sed is used. The hpfc_parser 4.2.2 must be called to analyze the right file. This is triggered automatically by the bang selection in the hpfc_close 7.3.2.5 phase.
< MODULE.source_file
7.3.2.2 HPFC Initialization
The second HPFC rule is used to initialize the hpfc status and other data structures global to the compiler. The HPF compiler status is bootstrapped. The compiler status stores (or should store) all relevant information about the HPF part of the program (data distribution, IO functions and so on).
> PROGRAM.hpfc_status
< PROGRAM.entities
7.3.2.3 HPF Directive removal
This phase removes the directives (some special calls) from the code. The remappings (implicit or explicit) are also managed at this level, through copies between differently shaped arrays.
To manage calls with distributed arguments, I need to apply the directive extraction bottom-up, so that the callers will know about the callees through the hpfc_status. In order to do that, I first thought of an intermediate resource, but there was obscure problem with my fake calls. Thus the dependence static then dynamic directive analyses is enforced at the bang sequence request level in the hpfc_close 7.3.2.5 phase.
The hpfc_static_directives 7.3.2.3 phase analyses static mapping directives for the specified module. The hpfc_dynamic_directives 7.3.2.3 phase does manages realigns and function calls with prescriptive argument mappings. In order to do so it needs its callees’ required mappings, hence the need to analyze beforehand static directives. The code is cleaned from the hpfc_filter 7.3.2.1 artifacts after this phase, and all the proper information about the HPF stuff included in the routines is stored in hpfc_status.
> PROGRAM.hpfc_status
< PROGRAM.entities
< PROGRAM.hpfc_status
< MODULE.code
hpfc_dynamic_directives > MODULE.code
> PROGRAM.hpfc_status
< PROGRAM.entities
< PROGRAM.hpfc_status
< MODULE.code
< MODULE.proper_effects
7.3.2.4 HPFC actual compilation
This rule launches the actual compilation. Four files are generated:
- the host code that mainly deals with I/Os,
- the SPMD node code,
- and some initialization stuff for the runtime (2 files).
Between this phase and the previous one, many PIPS standard analyses are performed, especially the regions and preconditions. Then this phase will perform the actual translation of the program into a host and SPMD node code.
> MODULE.hpfc_node
> MODULE.hpfc_parameters
> MODULE.hpfc_rtinit
> PROGRAM.hpfc_status
< PROGRAM.entities
< PROGRAM.hpfc_status
< MODULE.regions
< MODULE.summary_regions
< MODULE.preconditions
< MODULE.code
< MODULE.cumulated_references
< CALLEES.hpfc_host
7.3.2.5 HPFC completion
This rule deals with the compiler closing. It must deal with commons. The hpfc parser selection is put here.
! SELECT.hpfc_parser
! SELECT.must_regions
! ALL.hpfc_static_directives
! ALL.hpfc_dynamic_directives
< PROGRAM.entities
< PROGRAM.hpfc_status
< MAIN.hpfc_host
7.3.2.6 HPFC install
This rule performs the installation of HPFC generated files in a separate directory. This rule is added to make hpfc usable from wpips and epips. I got problems with the make and run rules, because it was trying to recompute everything from scratch. To be investigated later on.
< PROGRAM.hpfc_commons
hpfc_make
hpfc_run
7.3.2.7 HPFC High Performance Fortran Compiler properties
Debugging levels considered by HPFC: HPFC_{,DIRECTIVES,IO,REMAPPING}_DEBUG_LEVEL.
These booleans control whether some computations are directly generated in the output code, or computed through calls to dedicated runtime functions. The default is the direct expansion.
HPFC_EXPAND_COMPUTE_LOCAL_INDEX TRUE
HPFC_EXPAND_COMPUTE_COMPUTER TRUE
HPFC_EXPAND_COMPUTE_OWNER TRUE
HPFC_EXPAND_CMPLID TRUE
HPFC_NO_WARNING FALSE
Hacks control…
HPFC_FILTER_CALLEES FALSE
GLOBAL_EFFECTS_TRANSLATION TRUE
These booleans control the I/O generation.
HPFC_SYNCHRONIZE_IO FALSE
HPFC_IGNORE_MAY_IN_IO FALSE
Whether to use lazy or non-lazy communications
HPFC_LAZY_MESSAGES TRUE
Whether to ignore FCD (Fabien Coelho Directives…) or not. These directives are used to instrument the code for testing purposes.
HPFC_IGNORE_FCD_SYNCHRO FALSE
HPFC_IGNORE_FCD_TIME FALSE
HPFC_IGNORE_FCD_SET FALSE
Whether to measure and display the compilation times for remappings, and whether to generate outward redundant code for remappings. Also whether to generate code that keeps track dynamically of live mappings. Also whether not to send data to a twin (a processor that holds the very same data for a given array).
HPFC_TIME_REMAPPINGS FALSE
HPFC_REDUNDANT_SYSTEMS_FOR_REMAPS FALSE
HPFC_OPTIMIZE_REMAPPINGS TRUE
HPFC_DYNAMIC_LIVENESS TRUE
HPFC_GUARDED_TWINS TRUE
Whether to use the local buffer management. 1 MB of buffer is allocated.
HPFC_BUFFER_SIZE 1000000
HPFC_USE_BUFFERS TRUE
Whether to use in and out convex array regions for input/output compiling
HPFC_IGNORE_IN_OUT_REGIONS TRUE
Whether to extract more equalities from a system, if possible.
HPFC_EXTRACT_EQUALITIES TRUE
Whether to try to extract the underlying lattice when generating code for systems with equalities.
HPFC_EXTRACT_LATTICE TRUE
7.3.3 STEP: MPI code generation from OpenMP programs
RK: IT SudParis : insert your documentation here; FI: or a pointer towards you documentation
7.3.3.1 STEP Directives
The step_parser 7.3.3.1 phase identifies the OpenMP constructs. The directive semantics are stored in the MODULE.step_directives ressource.
> MODULE.code
< MODULE.code
7.3.3.2 STEP Analysis
The step_analyse_init 7.3.3.2 phase init the PROGRAM.step_comm ressources
The step_analyse 7.3.3.2 phase triggers the convex array regions analyses to compute SEND and RECV regions leading to MPI messages and checks whether a given SEND region corresponding to a directive construct is consumed by a RECV region corresponding to a directive construct. In this case, communications can be optimized.
> MODULE.step_send_regions
> MODULE.step_recv_regions
< PROGRAM.entities
< PROGRAM.step_comm
< MODULE.step_directives
< MODULE.code
< MODULE.preconditions
< MODULE.transformers
< MODULE.cumulated_effects
< MODULE.regions
< MODULE.in_regions
< MODULE.out_regions
< MODULE.chains
< CALLEES.code
< CALLEES.step_send_regions
< CALLEES.step_recv_regions
7.3.3.3 STEP code generation
Based on the OpenMP construct and analyses, new modules are generated to translate the original code with OpenMP directives. The default code transformation for OpenMP construct is driven by the STEP_DEFAULT_TRANSFORMATION 7.3.3.3 property. The different value allowed are :
- "HYBRID" : for OpenMP and MPI parallel code
- "MPI" : for MPI parallel code
- "OMP" : for OpenMP parallel code
STEP_DEFAULT_TRANSFORMATION "HYBRID"
The step_compile 7.3.3.3 phase generates source code for OpenMP constructs depending of the transformation desired. Each OpenMP construct could have a specific transformation define by STEP clauses (without specific clauses, the STEP_DEFAULT_TRANSFORMATION 7.3.3.3 is used). The specific STEP clauses allowed are :
- "!\$step hybrid" : for OpenMP and MPI parallel code
- "!\$step no\_mpi" : for OpenMP parallel code
- "!\$step mpi" : for MPI parallel code
- "!\$step ignore" : for sequential code
< PROGRAM.entities
< PROGRAM.step_comm
< MODULE.step_directives
< MODULE.code
The step_install 7.3.3.3 phase copy the generated source files in the directory specified by the STEP_INSTALL_PATH 7.3.3.3 property.
< ALL.step_file
STEP_INSTALL_PATH ""
7.3.4 PHRASE: high-level language transformation for partial evaluation in reconfigurable logic
The PHRASE project is an attempt to automatically (or semi-automatically) transform high-level language programs into code with partial execution on some accelerators such as reconfigurable logic (such as FPGAs) or data-paths.
This phases allow to split the code into portions of code delimited by PHRASE-pragma (written by the programmer) and a control program managing them. Those portions of code are intended, after transformations, to be executed in reconfigurable logic. In the PHRASE project, the reconfigurable logic is synthesized with the Madeo tool that take SmallTalk code as input. This is why we have a SmallTalk pretty-printer (see section 9.10).
7.3.4.1 Phrase Distributor Initialisation
This phase is a preparation phase for the Phrase Distributor phrase_distributor 7.3.4.2: the portions of code to externalize are identified and isolated here. Comments are modified by this phase.
phrase_distributor_init > MODULE.code
< PROGRAM.entities
< MODULE.code
This phase is automatically called by the following phrase_distributor 7.3.4.2.
7.3.4.2 Phrase Distributor
The job of distribution is done here. This phase should be applied after the initialization (Phrase Distributor Initialisation phrase_distributor_init 7.3.4.1), so this one is automatically applied first.
phrase_distributor > MODULE.code
> MODULE.callees
! MODULE.phrase_distributor_init
< PROGRAM.entities
< MODULE.code
< MODULE.in_regions
< MODULE.out_regions
< MODULE.dg
7.3.4.3 Phrase Distributor Control Code
This phase add control code for PHRASE distribution. All calls to externalized code portions are transformed into START and WAIT calls. Parameters communication (send and receive) are also handled here
phrase_distributor_control_code > MODULE.code
< PROGRAM.entities
< MODULE.code
< MODULE.in_regions
< MODULE.out_regions
< MODULE.dg
7.3.5 Safescale
The Safescale project is an attempt to automatically (or semi-automatically) transform sequential code written in C language for the Kaapi runtime.
7.3.5.1 Distribution init
This phase is intended for the analysis of a module given with the aim of finding blocks of code delimited by specific pragmas from it.
safescale_distributor_init > MODULE.code
< PROGRAM.entities
< MODULE.code
7.3.5.2 Statement Externalization
This phase is intended for the externalization of a block of code.
safescale_distributor > MODULE.code
> MODULE.callees
! MODULE.safescale_distributor_init
< PROGRAM.entities
< MODULE.code
< MODULE.regions
< MODULE.in_regions
< MODULE.out_regions
7.3.6 CoMap: Code Generation for Accelerators with DMA
7.3.6.1 Phrase Remove Dependences
phrase_remove_dependences > MODULE.code
> MODULE.callees
! MODULE.phrase_distributor_init
< PROGRAM.entities
< MODULE.code
< MODULE.in_regions
< MODULE.out_regions
< MODULE.dg
7.3.6.2 Phrase comEngine Distributor
This phase should be applied after the initialization (Phrase Distributor Initialisation or phrase_distributor_init 7.3.4.1). The job of comEngine distribution is done here.
phrase_comEngine_distributor > MODULE.code
> MODULE.callees
! MODULE.phrase_distributor_init
< PROGRAM.entities
< MODULE.code
< MODULE.in_regions
< MODULE.out_regions
< MODULE.dg
< MODULE.summary_complexity
7.3.6.3 PHRASE ComEngine properties
This property is set to TRUE if we want to synthesize only one process on the HRE.
COMENGINE_CONTROL_IN_HRE TRUE
This property holds the fifo size of the ComEngine.
COMENGINE_SIZE_OF_FIFO 128
7.3.7 Parallelization for Terapix architecture
7.3.7.1 Isolate Statement
Isolate the statement given in ISOLATE_STATEMENT_LABEL 7.3.7.1 in a separated memory. Data transfer are generated using the same DMA as KERNEL_LOAD_STORE ??.
The algorithm is based on Read and write regions (no in / out yet) to compute the data that must be copied and allocated. Rectangular hull of regions are used to match allocator and data transfers prototypes. If an analysis fails, definition regions are use instead. If a sizeof is involved, EVAL_SIZEOF 8.4.2 must be set to true.
> MODULE.callees
< MODULE.code
< MODULE.regions
< PROGRAM.entities
ISOLATE_STATEMENT_LABEL ""
As a side effect of isolate_statement pass, some new variables are declared into the function. A prefix can be used for the names of those variables using the property ISOLATE_STATEMENT_VAR_PREFIX. It is also possible to insert a suffix using the property ISOLATE_STATEMENT_VAR_SUFFIX. The suffix will be inserted between the original variable name and the instance number of the copy.
ISOLATE_STATEMENT_VAR_PREFIX ""
ISOLATE_STATEMENT_VAR_SUFFIX ""
By default we cannot isolate a statement with some complex effects on the non local memory. But if we know we can (for example ), we can override this behaviour by setting the following property:
ISOLATE_STATEMENT_EVEN_NON_LOCAL FALSE
7.3.7.2 Delay Communications
Optimize the load/store dma by delaying the stores and performing the stores as soon as possible. Interprocedural version.
It uses ACCEL_LOAD 7.2 and ACCEL_STORE 7.2 to distinguish loads and stores from other calls.
The communication elimination makes the assumption that a load/store pair can always be removed.
> MODULE.callees
! CALLEES.delay_communications_inter
! MODULE.delay_load_communications_inter
! MODULE.delay_store_communications_inter
< PROGRAM.entities
< MODULE.code
< MODULE.regions
< MODULE.dg
> MODULE.callees
> CALLERS.code
> CALLERS.callees
< PROGRAM.entities
< MODULE.code
< CALLERS.code
< MODULE.proper_effects
< MODULE.cumulated_effects
< MODULE.dg
> MODULE.callees
> CALLERS.code
> CALLERS.callees
< PROGRAM.entities
< MODULE.code
< CALLERS.code
< MODULE.proper_effects
< MODULE.cumulated_effects
< MODULE.dg
Optimize the load/store dma by delaying the stores and performing the stores as soon as possible. Intra Procedural version.
It uses ACCEL_LOAD 7.2 and ACCEL_STORE 7.2 to distinguish loads and stores from other calls.
The communication elimination makes the assumption that a load/store pair can always be removed.
> MODULE.callees
! MODULE.delay_load_communications_intra
! MODULE.delay_store_communications_intra
< PROGRAM.entities
< MODULE.code
< MODULE.regions
< MODULE.dg
> MODULE.callees
< PROGRAM.entities
< MODULE.code
< MODULE.proper_effects
< MODULE.cumulated_effects
< MODULE.dg
> MODULE.callees
< PROGRAM.entities
< MODULE.code
< MODULE.proper_effects
< MODULE.cumulated_effects
< MODULE.dg
7.3.7.3 Hardware Constraints Solver
if SOLVE_HARDWARE_CONSTRAINTS_TYPE 7.3.7.3 is set to VOLUME, Given a loop label, a maximum memory footprint and an unknown entity, try to find the best value for SOLVE_HARDWARE_CONSTRAINTS_UNKNOWN 7.3.7.3 to make memory footprint of SOLVE_HARDWARE_CONSTRAINTS_LABEL 7.3.7.3 reach but not exceed SOLVE_HARDWARE_CONSTRAINTS_LIMIT 7.3.7.3. If it is set to NB_PROC, it tries to find the best value for SOLVE_HARDWARE_CONSTRAINTS_UNKNOWN 7.3.7.3 to make the maximum range of first dimension of all regions accessed by SOLVE_HARDWARE_CONSTRAINTS_LABEL 7.3.7.3 equals to SOLVE_HARDWARE_CONSTRAINTS_LIMIT 7.3.7.3.
< MODULE.code
< MODULE.regions
< PROGRAM.entities
SOLVE_HARDWARE_CONSTRAINTS_LABEL ""
SOLVE_HARDWARE_CONSTRAINTS_LIMIT 0
SOLVE_HARDWARE_CONSTRAINTS_UNKNOWN ""
SOLVE_HARDWARE_CONSTRAINTS_TYPE ""
7.3.7.4 kernelize
Bootstraps the kernel ressource
Add a kernel to the list of kernels known to pips
< PROGRAM.kernels
Generate unoptimized load / store information for each call to the module.
> CALLERS.callees
> PROGRAM.kernels
< PROGRAM.kernels
< CALLERS.code
< CALLERS.regions
< CALLERS.preconditions
The legacy kernel_load_store 7.3.7.4 approach is limited because it generates the DMA around a call, and isolate_statement 7.3.7.1 engine does not perform well in interprocedural.
The following properties are used to specify the names of runtime functions. Since they are used in Par4All, their default names begin with P4A_. To have an idea about their prototype, have a look to the Par4All accelerator runtime or in validation/AcceleratorUtils/include/par4all.c.
Enable/disable the scalar handling by kernel load store.
KERNEL_LOAD_STORE_SCALAR FALSE
The ISOLATE_STATEMENT_EVEN_NON_LOCAL 7.3.7.1 property can be used to force the generation even with non local memory access. But beware it would not solve all the issues...
The following properties can be used to customized the allocate/load/store functions:
KERNEL_LOAD_STORE_ALLOCATE_FUNCTION "P4A_accel_malloc"
KERNEL_LOAD_STORE_DEALLOCATE_FUNCTION "P4A_accel_free"
The following properties are used to name the dma functions to use for scalars:
KERNEL_LOAD_STORE_LOAD_FUNCTION "P4A_copy_to_accel"
KERNEL_LOAD_STORE_STORE_FUNCTION "P4A_copy_from_accel"
and for 1-dimension arrays:
KERNEL_LOAD_STORE_LOAD_FUNCTION_1D "P4A_copy_to_accel_1d"
KERNEL_LOAD_STORE_STORE_FUNCTION_1D "P4A_copy_from_accel_1d"
and in 2 dimensions:
KERNEL_LOAD_STORE_LOAD_FUNCTION_2D "P4A_copy_to_accel_2d"
KERNEL_LOAD_STORE_STORE_FUNCTION_2D "P4A_copy_from_accel_2d"
and in 3 dimensions:
KERNEL_LOAD_STORE_LOAD_FUNCTION_3D "P4A_copy_to_accel_3d"
KERNEL_LOAD_STORE_STORE_FUNCTION_3D "P4A_copy_from_accel_3d"
and in 4 dimensions:
KERNEL_LOAD_STORE_LOAD_FUNCTION_4D "P4A_copy_to_accel_4d"
KERNEL_LOAD_STORE_STORE_FUNCTION_4D "P4A_copy_from_accel_4d"
and in 5 dimensions:
KERNEL_LOAD_STORE_LOAD_FUNCTION_5D "P4A_copy_to_accel_5d"
KERNEL_LOAD_STORE_STORE_FUNCTION_5D "P4A_copy_from_accel_5d"
and in 6 dimensions:
KERNEL_LOAD_STORE_LOAD_FUNCTION_6D "P4A_copy_to_accel_6d"
KERNEL_LOAD_STORE_STORE_FUNCTION_6D "P4A_copy_from_accel_6d"
As a side effect of kernel load store pass, some new variables are declared into the function. A prefix can be used for the names of those variables using the property KERNEL_LOAD_STORE_VAR_PREFIX. It is also possible to insert a suffix using the property KERNEL_LOAD_STORE_VAR_PREFIX. The suffix will be inserted between the original variable name and the instance number of the copy.
KERNEL_LOAD_STORE_VAR_PREFIX "P4A_var_"
KERNEL_LOAD_STORE_VAR_SUFFIX ""
Split a parallel loop with a local index into three parts: a host side part, a kernel part and an intermediate part. The intermediate part simulates the parallel code to the kernel from the host
> MODULE.callees
> PROGRAM.kernels
! MODULE.privatize_module
! MODULE.coarse_grain_parallelization
< PROGRAM.entities
< MODULE.code
< PROGRAM.kernels
KERNELIZE_NBNODES 128
KERNELIZE_KERNEL_NAME ""
KERNELIZE_HOST_CALL_NAME ""
OUTLINE_LOOP_STATEMENT FALSE
Gather all constants from a module and put them in a single array. Relevant for Terapix code generation, and maybe for other accelerators as well
< PROGRAM.entities
< MODULE.code
< MODULE.regions
You may want to group constants only for a particular statement, in that case use GROUP_CONSTANTS_STATEMENT_LABEL 7.3.7.4
GROUP_CONSTANTS_STATEMENT_LABEL ""
The way variables are grouped is control by GROUP_CONSTANTS_LAYOUT 7.3.7.4, the only relevant value as of now is "terapix".
GROUP_CONSTANTS_LAYOUT ""
The name of the variable holding constants can be set using GROUP_CONSTANTS_HOLDER 7.3.7.4.
GROUP_CONSTANTS_HOLDER "caillou"
You may want to skip loop bounds from the grouping
GROUP_CONSTANTS_SKIP_LOOP_RANGE FALSE
You may want to skip litterals too.
GROUP_CONSTANTS_LITERAL TRUE
Perform various checks on a Terapix microcode to make sure it can be synthesized. GROUP_CONSTANTS_HOLDER 7.3.7.4 is used to differentiate mask and image.
> CALLERS.code
> COMPILATION_UNIT.code
< PROGRAM.entities
< MODULE.code
< MODULE.callers
< MODULE.cumulated_effects
This pass is meaningless for any other target :(.
< PROGRAM.entities
< MODULE.code
converts divide operator into multiply operator using formula a∕cste = a * (1∕b) ≃ a * (128∕cste)∕128
< PROGRAM.entities
< MODULE.code
TERAPIX_REMOVE_DIVIDE_ACCURACY 4
7.3.7.5 Generating communications
This phase computes the mapping of data on the accelarators. It records the set of data that have to be copied on the GPU before each statement in the module, and the set of data that have to be copied back from the GPU after the execution of each statement.
Then according to this information, the copy-in and copy-out transfers are generated using same properties as KERNEL_LOAD_STORE ??:
kernel_data_mapping > MODULE.kernel_copy_in
> MODULE.kernel_copy_out
< PROGRAM.entities
< PROGRAM.kernels
< MODULE.code
< MODULE.summary_effects
< MODULE.cumulated_effects
< MODULE.transformers
< MODULE.preconditions
< MODULE.regions
< MODULE.in_regions
< MODULE.out_regions
< MODULE.callees
< MODULE.callers
< CALLEES.kernel_copy_out
< CALLEES.kernel_copy_in
This phase wrap argument at call site with an access function. The wrapper name is controlled with WRAP_KERNEL_ARGUMENT_FUNCTION_NAME 7.3.7.5. Currently the purpose of this is to insert call to a runtime to resolve addresses in accelerator memory corresponding to addresses in host memory.
WRAP_KERNEL_ARGUMENT_FUNCTION_NAME "P4A_runtime_host_ptr_to_accel_ptr"
wrap_kernel_argument > CALLERS.code
> CALLERS.callees
< PROGRAM.entities
< CALLERS.code
< CALLERS.callees
< MODULE.callers
7.3.8 Code distribution on GPU
This phase generate GPU kernels from perfect parallel loop nests. GPU_IFY_ANNOTATE_LOOP_NESTS 7.3.8 property triggers automatically the annotation of the loop nest (see GPU_LOOP_NEST_ANNOTATE ??).
GPU_IFY_ANNOTATE_LOOP_NESTS FALSE
gpu_ify > MODULE.code
> MODULE.callees
> PROGRAM.entities
! MODULE.privatize_module
< PROGRAM.entities
< MODULE.code
< MODULE.cumulated_effects
For example from
for(j = 1; j <= 499; j += 1)
save[i][j] = 0.25*(space[i-1][j]+space[i+1][j]+space[i][j-1]+space[i][j+1]);
it generates something like
[...]
void p4a_kernel_launcher_0(float_t save[501][501], float_t space[501][501])
{
int i;
int j;
for(i = 1; i <= 499; i += 1)
for(j = 1; j <= 499; j += 1)
p4a_kernel_wrapper_0(save, space, i, j);
}
void p4a_kernel_wrapper_0(float_t save[501][501], float_t space[501][501], int i, int j)
{
i = P4A_pv_0(i);
j = P4A_pv_1(j);
p4a_kernel_0(save, space, i, j);
}
void p4a_kernel_0(float_t save[501][501], float_t space[501][501], int
i, int j) {
save[i][j] = 0.25*(space[i-1][j]+space[i+1][j]+space[i][j-1]+space[i][j+1]);
}
The launcher, wrapper and kernel prefix names to be used during the generation:
GPU_LAUNCHER_PREFIX "p4a_launcher"
GPU_WRAPPER_PREFIX "p4a_wrapper"
GPU_KERNEL_PREFIX "p4a_kernel"
This boolean property control wherever the outliner use the original function name as a suffix instead of only numerical suffix.
GPU_OUTLINE_SUFFIX_WITH_OWNER_NAME TRUE
For Fortran output you may need to have these prefix name in uppercase.
Indeed, each level of outlining can be enabled or disabled according to the following properties:
GPU_USE_LAUNCHER TRUE
GPU_USE_WRAPPER TRUE
GPU_USE_KERNEL TRUE
Each generated function can go in its own source file according to the following properties:
GPU_USE_KERNEL_INDEPENDENT_COMPILATION_UNIT FALSE
GPU_USE_LAUNCHER_INDEPENDENT_COMPILATION_UNIT FALSE
GPU_USE_WRAPPER_INDEPENDENT_COMPILATION_UNIT FALSE
By default they are set to FALSE for languages like CUDA that allow kernel and host codes mixed in a same file but for OpenCL it is not the case.
When the original code is in Fortran it might be useful to wrap the kernel launcher in an independent C file. The GPU_USE_FORTRAN_WRAPPER 7.3.8 can be used for that purpose. The name of the function wrapper can be configured using the property GPU_FORTRAN_WRAPPER_PREFIX 7.3.8. As specified before it is safe to use prefix name in uppercase.
GPU_USE_FORTRAN_WRAPPER FALSE
GPU_FORTRAN_WRAPPER_PREFIX "P4A_FORTRAN_WRAPPER"
The phase generates a wrapper function to get the iteration coordinate from intrinsics functions instead of the initial loop indices. Using this kind of wrapper is the normal behaviour but for simulation of an accelerator code, not using a wrapper is useful.
The intrinsics function names to get an ith coordinate in the iteration space are defined by this GNU ᅵ la printf format:
GPU_COORDINATE_INTRINSICS_FORMAT "P4A_vp_%d"
where %d is used to get the dimension number. Here vp stands for virtual processor dimension and is a reminiscence from PompC and HyperC...
Please, do not use this feature for buffer-overflow attack...
Annotates loop nests with comments and guards for further generation of CUDA calls.
gpu_loop_nest_annotate > MODULE.code
< PROGRAM.entities
< MODULE.code
To annotate only outer parallel loop nests, set the following variable to true:
GPU_LOOP_NEST_ANNOTATE_PARALLEL TRUE
Parallelize annotated loop nests based on the sentinel comments.
< PROGRAM.entities
< MODULE.code
This phase promote sequential code in GPU kernels to avoid memory transfers.
> PROGRAM.entities
< PROGRAM.entities
< MODULE.code
7.3.9 Task generation for SCALOPES project
The goal of the following phase is to generate several tasks from one sequential program. Each task is generated as an independent main program. Then the tasks are linked and scheduled using the SCMP HAL.
> MODULE.callees
> PROGRAM.entities
< PROGRAM.entities
< MODULE.code
< MODULE.cumulated_effects
> MODULE.callees
> PROGRAM.entities
< PROGRAM.entities
< MODULE.code
< MODULE.cumulated_effects
Phase sesam_buffers_processing is to be run after isolate_statement has been applied to all tasks statements. It then produces a header file to be included by the future sesam application individual tasks. This header file describes how kernel and server tasks use the sesam buffers.
< PROGRAM.entities
< MODULE.code
< MODULE.cumulated_effects
The next two properties are used by phase sesam_buffers_processing to detect kernel tasks statements in the input module and to generate server tasks names in the output header file.
SCALOPES_KERNEL_TASK_PREFIX "P4A_sesam_task_"
SCALOPES_SERVER_TASK_PREFIX "P4A_sesam_server_"
Chapter 8
Program Transformations
A program transformation is a special phase which takes a code as input, modifies it, possibly using results from several different analyses, and puts back this modified code as result.
A rule describing a program transformation will never be chosen automatically by pipsmake to generate some code since every transformation rule contains a cycle for the MODULE.code resource. Since the first rule producing code, described in this file, is controlizer 4.3 and since it is the only non-cyclic rule, the internal representation always is initialized with it.
As program transformations produce nothing else, pipsmake cannot guess when to apply these rules automatically. This is exactly what the user want most of the time: program transformations are under explicit control by the user. Transformations are applied when the user pushes one of wpips transformation buttons or when (s)he enters an apply command when running tpips1 , or by executing a Perform Shell script. See the introduction for pointers to the user interfaces.
Unfortunately, it is sometime nice to be able to chain several transformations without any user interaction. No general macro mechanism is available in pipsmake, but it is possible to impose some program transformations with the ’!’ command.
User inputs are not well-integrated although a user_query rule and a string resource could easily be added. User interaction with a phase are performed directly without notifying pipsmake to be more flexible and to allow dialogues between a transformation and the user.
8.1 Loop Transformations
8.1.1 Introduction
Most loop transformations require the user to give a valid loop label to locate the loop to be transformed. This is done interactively or by setting the following property to the valid label:
LOOP_LABEL ""
Put a label on unlabelled loops for further interactive processing. Unless FLAG_LOOPS_DO_LOOPS_ONLY 8.1.1 is set to false, only do loops are considered.
> MODULE.loops
< PROGRAM.entities
< MODULE.code
FLAG_LOOPS_DO_LOOPS_ONLY TRUE
Display label of all modules loops
< MODULE.loops
8.1.2 Loop range Normalization
Use intermediate variables as loop upper and lower bound when they are not affine.
< PROGRAM.entities
< MODULE.code
8.1.3 Loop Distribution
Function distributer 8.1.3 is a restricted version of the parallelization function rice* (see Section 7.1.3).
Distribute all the loops of the module.
Allen & Kennedy’s algorithm [?] is used in both cases. The only difference is that distributer 8.1.3 does not produce DOALL loops, but just distributes loops as much as possible.
distributer > MODULE.code
< PROGRAM.entities
< MODULE.code
< MODULE.dg
Partial distribution distributes the statements of a loop nest except the isolated statements,that have no dependences at the common level l, are gathered in the same l-th loop.
PARTIAL_DISTRIBUTION FALSE
8.1.4 Statement Insertion
Check if the statement flagged by STATEMENT_INSERTION_PRAGMA 8.1.4 can be safely inserted in the current control flow. This pass should be reserved to internal use only, another pass should create and insert a flagged statement and then call this one to verify the validity of the insertion
< PROGRAM.entities
< ALL.code
> ALL.code
< MODULE.regions
< MODULE.out_regions
STATEMENT_INSERTION_PRAGMA "pips␣inserted␣statement␣to␣check"
STATEMENT_INSERTION_SUCCESS_PRAGMA "pips␣inserted␣statement"
STATEMENT_INSERTION_FAILURE_PRAGMA "pips␣inserted␣statement␣to␣remove"
8.1.5 Loop Expansion
Prepare the loop expansion by creating a new statement (that may be invalid) for further processing by statement_insertion 8.1.4. Use STATEMENT_INSERTION_PRAGMA 8.1.4 to identify the created statement. Otherwise LOOP_LABEL 8.1.1 and LOOP_EXPANSION_SIZE 8.1.5 have the same meaning as in loop_expansion 8.1.5
< PROGRAM.entities
< MODULE.code
Extends the range of a loop given by LOOP_LABEL 8.1.1 to fit a size given by LOOP_EXPANSION_SIZE 8.1.5. An offset can be set if LOOP_EXPANSION_CENTER 8.1.5 is set to True. The new loop is guarded to prevent illegal iterations, further transformations can elaborate on this.
< PROGRAM.entities
< MODULE.code
< MODULE.cumulated_effects
LOOP_EXPANSION_SIZE ""
LOOP_EXPANSION_CENTER FALSE
Extends the dimension of all declared arrays so that no access is illegal.
< PROGRAM.entities
< MODULE.code
< MODULE.regions
8.1.6 Loop Fusion
This pass fuses as many loops as possible in a greedy manner. The loops must appear in a sequence and have exactly the same loop bounds and if possible the same loop indices. We’ll always try first to fuse loops where there is a dependence between their body. We expect that this policy will maximize possibilities for further optimizations.
Property LOOP_FUSION_GREEDY 8.1.6 allows to control whether it’ll try to fuse as many loop as possible even without any reuse. This will be done in a second pass.
Property LOOP_FUSION_MAXIMIZE_PARALLELISM 8.1.6 is used to control if loop fusion has to preserve parallelism while fusing. If this property is true, a parallel loop is never fused with a sequential loop.
Property LOOP_FUSION_KEEP_PERFECT_PARALLEL_LOOP_NESTS 8.1.6 prevents to lose parallelism when fusing outer loops from a loop nests without being able to fuse inner loops.
Property LOOP_FUSION_MAX_FUSED_PER_LOOP 8.1.6 limit the number of fusion per loop. A negative value means that no limit will be enforced.
The fusion legality is checked in the standard way by comparing the dependence graphs obtained before and after fusion.
This pass is still in the experimental stage. It may have side effects on the source code when the fusion is attempted but not performed in case loop indices are different.
loop_fusion > MODULE.code
< PROGRAM.entities
< MODULE.code
< MODULE.proper_effects
< MODULE.dg
LOOP_FUSION_MAXIMIZE_PARALLELISM TRUE
LOOP_FUSION_GREEDY FALSE
LOOP_FUSION_KEEP_PERFECT_PARALLEL_LOOP_NESTS TRUE
LOOP_FUSION_MAX_FUSED_PER_LOOP -1
8.1.7 Index Set Splitting
Index Set Splitting [?] splits the loop referenced by property LOOP_LABEL 8.1.1 into two loops. The first loop ends at an iteration designated by property INDEX_SET_SPLITTING_BOUND 8.1.7 and the second start thereafter. It currently only works for do loops. This transformation is always legal. Index set splitting in combination with loop unrolling could be used to perform loop peeling.
index_set_splitting > MODULE.code
> PROGRAM.entities
< PROGRAM.entities
< MODULE.code
Index Set Splitting requires the following globals to be set :
- LOOP_LABEL 8.1.1 is the loop label
- INDEX_SET_SPLITTING_BOUND 8.1.7 is the splitting bound
INDEX_SET_SPLITTING_BOUND ""
Additionnaly, INDEX_SET_SPLITTING_SPLIT_BEFORE_BOUND 8.1.7 can be used to accurately tell to split the loop before or after the bound given in INDEX_SET_SPLITTING_BOUND 8.1.7
INDEX_SET_SPLITTING_SPLIT_BEFORE_BOUND FALSE
8.1.8 Loop Unrolling
8.1.8.1 Regular Loop Unroll
Unroll requests a loop label and an unrolling factor from the user. Then it unrolls the specified loop as specified. The transformation is very general, and it is interesting to run partial_eval 8.4.2, simplify_control 8.3.1 and dead_code_elimination 8.3.2 after this transformation. When the number of iterations cannot be proven to be a multiple of the unrolling factor, the extra iterations can be executed first or last (see LOOP_UNROLL_WITH_PROLOGUE 8.1.8.1).
Labels in the body are deleted. To unroll nested loops, start with the innermost loop.
This transformation is always legal.
unroll > MODULE.code
< PROGRAM.entities
< MODULE.code
Use LOOP_LABEL 8.1.1 and UNROLL_RATE 8.1.8.1 if you do not want to unroll interactively You can also set LOOP_UNROLL_MERGE 8.1.8.1 to use the same declarations among all the unrolled statement (only meaningful in C).
UNROLL_RATE 0
LOOP_UNROLL_MERGE FALSE
The unrolling rate does not always divide exactly the number of iterations. So an extra loop must be added to execute the remaining iterations. This extra loop can be executed with the first iterations (prologue option) or the last iterations (epilogue option). Property LOOP_UNROLL_WITH_PROLOGUE 8.1.8.1 can be set to FALSE to use the epilogue when possible. The current implementation of the unrolling with prologue is general, while the implementation of the unrolling with epilogue is restricted to loops with a statically knonw increment of one. The epilogue option may reduce misalignments.
LOOP_UNROLL_WITH_PROLOGUE TRUE
Another option might be to require unrolling of the prologue or epilogue loop when possible.
8.1.8.2 Full Loop Unroll
A loop can also be fully unrolled if the range is numerically known. “Partial Eval” may be usefully applied first.
This is only useful for small loop ranges.
Unrolling can be interactively applied and the user is requested a loop label:
full_unroll > MODULE.code
< PROGRAM.entities
< MODULE.code
Or directives can be inserted as comments for loops to be unrolled with:
full_unroll_pragma > MODULE.code
< PROGRAM.entities
< MODULE.code
Full loop unrolling is applied one loop at a time by default. The user must specify the loop label. This default feature can be turned off and all loops with constant loop bounds and constant increment are fully unrolled.
Use LOOP_LABEL 8.1.1 to pass the desired label if you do not want to give it interactively
Property FULL_LOOP_UNROLL_EXCEPTIONS 8.1.8.2 is used to forbid loop unrolling when specific user functions are called in the loop body. The function names are separated by SPACEs. The default value is the empy set, i.e. the empry string.
FULL_LOOP_UNROLL_EXCEPTIONS ""
8.1.9 Loop Fusion
This pass applies unconditionnally a loop fusion between the loop designated by the property LOOP_LABEL 8.1.1 and the following loop. They must have the same loop index and the same iteration set. No legality check is performed.
< PROGRAM.entities
< MODULE.code
8.1.10 Strip-mining
Strip-mine requests a loop label and either a chunk size or a chunk number. Then it strip-mines the specified loop, if it is found. Note that the DO/ENDDO construct is not compatible with such local program transformations.
strip_mine > MODULE.code
< PROGRAM.entities
< MODULE.code
Behavior of strip mining can be controlled by the following properties:
- LOOP_LABEL 8.1.1 selects the loop to strip mine
- STRIP_MINE_KIND 8.1.10 can be set to 0 (fixed-size chunks) or 1 (fixed number of chunks). Negative value is used for interactive prompt.
- STRIP_MINE_FACTOR 8.1.10 controls the size of the chunk or the number of chunk depending on STRIP_MINE_KIND 8.1.10. Negative value is used for interactive prompt.
STRIP_MINE_KIND -1
STRIP_MINE_FACTOR -1
8.1.11 Loop Interchange
loop_interchange 8.1.11 requests a loop label and exchange the outer-most loop with this label and the inner-most one in the same loop nest, if such a loop nest exists.
Presently, legality is not checked.
loop_interchange > MODULE.code
< PROGRAM.entities
< MODULE.code
Property LOOP_LABEL 8.1.1 can be set to a loop label instead of using the default interactive method.
8.1.12 Hyperplane Method
loop_hyperplane 8.1.12 requests a loop label and a hyperplane direction vector and applies the hyperplane method to the loop nest starting with this loop label, if such a loop nest exists.
Presently, legality is not checked.
loop_hyperplane > MODULE.code
< PROGRAM.entities
< MODULE.code
8.1.13 Loop Nest Tiling
loop_tiling 8.1.13 requests from the user a numerical loop label and a numerical partitioning matrix and applies the tiling method to the loop nest starting with this loop label, if such a loop nest exists.
The partitioning matrix must be of dimension n × n where n is the loop nest depth. The default origin for the tiling is 0, but lower loop bounds are used to adjust it and decrease the control overhead. For instance, if each loop is of the usual kind, DO I = 1, N, the tiling origin is point (1, 1,...). The code generation is performed according to the PPoPP’91 paper but redundancy elimination may results in different loop bounds.
Presently, legality is not checked. There is no decision procedure to select automatically an optimal partitioning matrix. Since the matrix must be numerically known, it is not possible to generate a block distribution unless all loop bounds are numerically known. It is assumed that the loop nest is fully parallel.
Jingling Xue published an advanced code generation algorithm for tiling in Parallel Processing Letters (http://cs.une.edu.au/~xue/pub.html).
loop_tiling > MODULE.code
< PROGRAM.entities
< MODULE.code
This transformations prompts the user for a partition matrix. Alternatively, this matrix can be provided through the LOOP_TILING_MATRIX 8.1.13 property. The format of the matrix is a00 a01 a02,a10 a11 a12,a20 a21 a22
LOOP_TILING_MATRIX ""
Likewise, one can use the LOOP_LABEL 8.1.1 property to specify the targeted loop.
8.1.14 Symbolic Tiling
Tiles a loop nest using a partitioning vector that can contain symbolic values. The tiling only works for parallelepiped tiles. Use LOOP_LABEL 8.1.1 to specify the loop to tile. Use SYMBOLIC_TILING_VECTOR 8.1.14 as a comma-separated list to specify tile sizes. Use SYMBOLIC_TILING_FORCE 8.1.14 to bypass condition checks. Consider using loop_nest_unswitching 7.2 if generated max disturbs further analyses
! MODULE.coarse_grain_parallelization
< PROGRAM.entities
< MODULE.code
< MODULE.cumulated_effects
SYMBOLIC_TILING_VECTOR ""
SYMBOLIC_TILING_FORCE FALSE
8.1.15 Loop Normalize
The loop normalization consists in transforming all the loops of a given module into a normal form. In this normal form, the lower bound and the increment are equal to one (1).
Property LOOP_NORMALIZE_PARALLEL_LOOPS_ONLY 8.1.15 control whether we want to normalize only parallel loops or all loops.
If we note the initial DO loop as:
DO I = lower, upper, incre |
... |
ENDDO |
the transformation gives the folowing code:
DO | NLC = 0, (upper - lower + incre)/incre - 1, 1 |
I = incre*NLC + lower |
... |
ENDDO |
I = | incre * MAX
upper - lower + incre)/incre, 0) + lower</span></td></tr></table>
<p class="indent">
<p class="indent"> The normalization is done only if the initial increment is a constant number. The
normalization produces two assignment statements on the initial loop index. The first
one (at the beginning of the loop body) assigns it to its value function of the new
index and the second one (after the end of the loop) assigns it to its final
value.
<div class="alltt">
<p class="noindent"><div class="obeylines-v">
<span class="cmtt-10">alias</span><span class="cmtt-10"> loop_normalize</span><span class="cmtt-10"> ’Loop</span><span class="cmtt-10"> Normalize’</span>
<br /><span class="cmtt-10">loop_normalize</span><span class="cmtt-10"> </span><span class="cmtt-10"> </span><span class="cmtt-10"> </span><span class="cmtt-10"> </span><span class="cmtt-10"> </span><span class="cmtt-10"> </span><span class="cmtt-10"> </span><span class="cmtt-10"> </span><span class="cmtt-10"> </span><span class="cmtt-10"> ></span><span class="cmtt-10"> MODULE.code</span>
<br /><span class="cmtt-10"> </span><span class="cmtt-10"> </span><span class="cmtt-10"> </span><span class="cmtt-10"> </span><span class="cmtt-10"> </span><span class="cmtt-10"> </span><span class="cmtt-10"> </span><span class="cmtt-10"> <</span><span class="cmtt-10"> PROGRAM.entities</span>
<br /><span class="cmtt-10"> </span><span class="cmtt-10"> </span><span class="cmtt-10"> </span><span class="cmtt-10"> </span><span class="cmtt-10"> </span><span class="cmtt-10"> </span><span class="cmtt-10"> </span><span class="cmtt-10"> <</span><span class="cmtt-10"> MODULE.code</span>
</div>
</div>
<p class="indent"> If the increment is 1, the loop is considered already normalized. To have a
1-increment loop normalized too, set the following property
<pre class="listings"><span class="cmtt-9"> </span><br /><span class="label"><a id="x1-233002r1"></a></span><span class="cmtt-9">LOOP_NORMALIZE_ONE_INCREMENT</span><span class="cmtt-9"> </span><span class="cmtt-9">FALSE</span>
<span class="cmtt-9"> </span><br /><span class="label"><a id="x1-233003r2"></a></span></pre>
<p class="noindent">This is useful to have iteration spaces that begin at 0 for GPU for example.
<p class="indent"> The loop normalization has been defined in some days only Fortran was available,
so having loops starting at 1 like the default for arrays too make sense in
Fortran.
<p class="indent"> Anyway, no we could generalize for C (starting at 0 is more natural) or why not
from any other value that can be chosen with the following property:
<pre class="listings"><span class="cmtt-9"> </span><br /><span class="label"><a id="x1-233004r1"></a></span><span class="cmtt-9">LOOP_NORMALIZE_LOWER_BOUND</span><span class="cmtt-9"> </span><span class="cmtt-9">1</span>
<span class="cmtt-9"> </span><br /><span class="label"><a id="x1-233005r2"></a></span></pre>
<p class="indent"> If you are sure the final assignment is useless, you can skip it with the following
property.
<pre class="listings"><span class="cmtt-9"> </span><br /><span class="label"><a id="x1-233006r1"></a></span><span class="cmtt-9">LOOP_NORMALIZE_SKIP_INDEX_SIDE_EFFECT</span><span class="cmtt-9"> </span><span class="cmtt-9">FALSE</span>
<span class="cmtt-9"> </span><br /><span class="label"><a id="x1-233007r2"></a></span></pre>
<pre class="listings"><span class="cmtt-9"> </span><br /><span class="label"><a id="x1-233008r1"></a></span><span class="cmtt-9"> </span><span class="cmtt-9">LOOP_NORMALIZE_PARALLEL_LOOPS_ONLY</span><span class="cmtt-9"> </span><span class="cmtt-9">FALSE</span>
<span class="cmtt-9"> </span><br /><span class="label"><a id="x1-233009r2"></a></span></pre>
<p class="noindent">
<h4 class="subsectionHead"><span class="titlemark">8.1.16 </span> <a id="x1-2340008.1.16"></a>Guard Elimination and Loop Transformations</h4>
<p class="noindent">Youcef <span class="cmcsc-10">B<span class="small-caps">o</span><span class="small-caps">u</span><span class="small-caps">c</span><span class="small-caps">h</span><span class="small-caps">e</span><span class="small-caps">b</span><span class="small-caps">a</span><span class="small-caps">b</span><span class="small-caps">a</span></span>’s implementation of unimodular loop transformations…
<div class="alltt">
<p class="noindent"><div class="obeylines-v">
<span class="cmtt-10">guard_elimination</span><span class="cmtt-10"> </span><span class="cmtt-10"> </span><span class="cmtt-10"> </span><span class="cmtt-10"> </span><span class="cmtt-10"> </span><span class="cmtt-10"> </span><span class="cmtt-10"> ></span><span class="cmtt-10"> MODULE.code</span>
<br /><span class="cmtt-10"> </span><span class="cmtt-10"> </span><span class="cmtt-10"> </span><span class="cmtt-10"> </span><span class="cmtt-10"> </span><span class="cmtt-10"> </span><span class="cmtt-10"> </span><span class="cmtt-10"> <</span><span class="cmtt-10"> PROGRAM.entities</span>
<br /><span class="cmtt-10"> </span><span class="cmtt-10"> </span><span class="cmtt-10"> </span><span class="cmtt-10"> </span><span class="cmtt-10"> </span><span class="cmtt-10"> </span><span class="cmtt-10"> </span><span class="cmtt-10"> <</span><span class="cmtt-10"> MODULE.code</span>
</div>
</div>
<p class="noindent">
<h4 class="subsectionHead"><span class="titlemark">8.1.17 </span> <a id="x1-2350008.1.17"></a>Tiling for sequences of loop nests</h4>
<p class="noindent">Tiling for sequences of loop nests
<p class="noindent">Youcef <span class="cmcsc-10">B<span class="small-caps">o</span><span class="small-caps">u</span><span class="small-caps">c</span><span class="small-caps">h</span><span class="small-caps">e</span><span class="small-caps">b</span><span class="small-caps">a</span><span class="small-caps">b</span><span class="small-caps">a</span></span>’s implementation of tiling for sequences of loop nests
…
<div class="alltt">
<p class="noindent"><div class="obeylines-v">
<span class="cmtt-10">alias</span><span class="cmtt-10"> tiling_sequence</span><span class="cmtt-10"> ’Tiling</span><span class="cmtt-10"> sequence</span><span class="cmtt-10"> of</span><span class="cmtt-10"> loop</span><span class="cmtt-10"> nests’</span>
</div>
</div>
<div class="alltt">
<p class="noindent"><div class="obeylines-v">
<span class="cmtt-10">tiling_sequence</span><span class="cmtt-10"> </span><span class="cmtt-10"> </span><span class="cmtt-10"> </span><span class="cmtt-10"> </span><span class="cmtt-10"> </span><span class="cmtt-10"> ></span><span class="cmtt-10"> MODULE.code</span>
<br /><span class="cmtt-10"> </span><span class="cmtt-10"> </span><span class="cmtt-10"> </span><span class="cmtt-10"> </span><span class="cmtt-10"> </span><span class="cmtt-10"> </span><span class="cmtt-10"> </span><span class="cmtt-10"> <</span><span class="cmtt-10"> PROGRAM.entities</span>
<br /><span class="cmtt-10"> </span><span class="cmtt-10"> </span><span class="cmtt-10"> </span><span class="cmtt-10"> </span><span class="cmtt-10"> </span><span class="cmtt-10"> </span><span class="cmtt-10"> </span><span class="cmtt-10"> <</span><span class="cmtt-10"> MODULE.code</span>
</div>
</div>
<p class="noindent">
<h3 class="sectionHead"><span class="titlemark">8.2 </span> <a id="x1-2360008.2"></a>Redundancy Elimination</h3>
<p class="noindent">
<h4 class="subsectionHead"><span class="titlemark">8.2.1 </span> <a id="x1-2370008.2.1"></a>Loop Invariant Code Motion</h4>
<a id="dx1-237001"></a>
<p class="noindent">This is a test to implement a loop-invariant code motion. This phase hoist
loop-invariant code out of the loop.
<p class="indent"> A side effect of this transformation is that the code is parallelized too with some
loop distribution. If you don’t want this side effect, you can check section <a href="#x1-2730008.4.9">8.4.9</a> which
does a pretty nice job too.
<p class="indent"> The original algorithm used is described in Chapters 12, 13 and 14 of Julien
Zory’s PhD dissertation <span class="cite">[<span class="cmbx-10">?</span>]</span>.
<div class="alltt">
<p class="noindent"><div class="obeylines-v">
<span class="cmtt-10">invariant_code_motion</span><span class="cmtt-10"> </span><span class="cmtt-10"> </span><span class="cmtt-10"> </span><span class="cmtt-10"> </span><span class="cmtt-10"> </span><span class="cmtt-10"> </span><span class="cmtt-10"> </span><span class="cmtt-10"> </span><span class="cmtt-10"> </span><span class="cmtt-10"> </span><span class="cmtt-10"> </span><span class="cmtt-10"> </span><span class="cmtt-10"> ></span><span class="cmtt-10"> MODULE.code</span>
<br /><span class="cmtt-10"> </span><span class="cmtt-10"> </span><span class="cmtt-10"> </span><span class="cmtt-10"> </span><span class="cmtt-10"> </span><span class="cmtt-10"> </span><span class="cmtt-10"> </span><span class="cmtt-10"> <</span><span class="cmtt-10"> PROGRAM.entities</span>
<br /><span class="cmtt-10"> </span><span class="cmtt-10"> </span><span class="cmtt-10"> </span><span class="cmtt-10"> </span><span class="cmtt-10"> </span><span class="cmtt-10"> </span><span class="cmtt-10"> </span><span class="cmtt-10"> <</span><span class="cmtt-10"> MODULE.proper_effects</span>
<br /><span class="cmtt-10"> </span><span class="cmtt-10"> </span><span class="cmtt-10"> </span><span class="cmtt-10"> </span><span class="cmtt-10"> </span><span class="cmtt-10"> </span><span class="cmtt-10"> </span><span class="cmtt-10"> <</span><span class="cmtt-10"> MODULE.code</span><span class="cmtt-10"> MODULE.dg</span>
</div>
</div>
<p class="noindent">Note: this pass deals with loop invariant code motion while the <span class="obeylines-h"><span class="verb"><span class="cmtt-10">icm</span></span></span> pass deals with
expressions.
<p class="noindent">
<h4 class="subsectionHead"><span class="titlemark">8.2.2 </span> <a id="x1-2380008.2.2"></a>Partial Redundancy Elimination</h4>
<p class="noindent">In essence, a <span class="cmti-10">partial redundancy </span><span class="cite">[<span class="cmbx-10">?</span>]</span> is a computation that is done more than once on
some path through a flowgraph. We implement here a partial redundancy elimination
transformation for logical expressions such as bound checks by using informations
given by precondition analyses.
<p class="indent"> This transformation is implemented by Thi Viet Nga <span class="cmcsc-10">N<span class="small-caps">g</span><span class="small-caps">u</span><span class="small-caps">y</span><span class="small-caps">e</span><span class="small-caps">n</span></span>.
<p class="noindent">See also the transformation in <span class="cmsy-10">§</span> sec:comm-subexpr-elim-1, the partial evaluation,
and so on.
<div class="alltt">
<p class="noindent"><div class="obeylines-v">
<span class="cmtt-10">alias</span><span class="cmtt-10"> partial_redundancy_elimination</span><span class="cmtt-10"> ’Partial</span><span class="cmtt-10"> Redundancy</span><span class="cmtt-10"> Elimination’</span>
<br />
<br /><span class="cmtt-10">partial_redundancy_elimination</span><span class="cmtt-10"> </span><span class="cmtt-10"> </span><span class="cmtt-10"> </span><span class="cmtt-10"> </span><span class="cmtt-10"> </span><span class="cmtt-10"> </span><span class="cmtt-10"> </span><span class="cmtt-10"> </span><span class="cmtt-10"> </span><span class="cmtt-10"> </span><span class="cmtt-10"> </span><span class="cmtt-10"> </span><span class="cmtt-10"> ></span><span class="cmtt-10"> MODULE.code</span>
<br /><span class="cmtt-10"> </span><span class="cmtt-10"> </span><span class="cmtt-10"> </span><span class="cmtt-10"> </span><span class="cmtt-10"> </span><span class="cmtt-10"> </span><span class="cmtt-10"> </span><span class="cmtt-10"> <</span><span class="cmtt-10"> PROGRAM.entities</span>
<br /><span class="cmtt-10"> </span><span class="cmtt-10"> </span><span class="cmtt-10"> </span><span class="cmtt-10"> </span><span class="cmtt-10"> </span><span class="cmtt-10"> </span><span class="cmtt-10"> </span><span class="cmtt-10"> <</span><span class="cmtt-10"> MODULE.code</span>
<br /><span class="cmtt-10"> </span><span class="cmtt-10"> </span><span class="cmtt-10"> </span><span class="cmtt-10"> </span><span class="cmtt-10"> </span><span class="cmtt-10"> </span><span class="cmtt-10"> </span><span class="cmtt-10"> <</span><span class="cmtt-10"> MODULE.preconditions</span>
</div>
</div>
<p class="noindent">
<h3 class="sectionHead"><span class="titlemark">8.3 </span> <a id="x1-2390008.3"></a>Control-Flow Optimizations</h3>
<p class="noindent">
<h4 class="subsectionHead"><span class="titlemark">8.3.1 </span> <a id="x1-2400008.3.1"></a>Dead Code Elimination</h4>
<a id="dx1-240001"></a>
<a id="dx1-240002"></a>
<a id="dx1-240003"></a>
<a id="dx1-240004"></a>
<a id="dx1-240005"></a>
<a id="dx1-240006"></a>
<p class="noindent">Alias for <span class="cmtt-10">simplify_control</span> <a href="#x1-2400008.3.1">8.3.1</a>.
<p class="noindent">Function <span class="cmtt-10">simplify_control</span> <a href="#x1-2400008.3.1">8.3.1</a> is used to delete non-executed code, such as
empty loop nests or zero-trip loops, for example after strip-mining or partial
evaluation.
<p class="indent"> Preconditions are used to find always true conditions in tests and to eliminate
such tests. In some cases, tests cannot be eliminated, but test conditions can be
simplified. One-trip loops are replaced by an index initialization and the loop body.
Zero-trip loops are replaced by an index initialization. Effects in bound computations
are preserved.
<p class="indent"> A lot of dead code can simply be eliminated by testing its precondition feasibility.
A very simple and fast test may be used if the preconditions are normalized when
they are computed, but this slows down the precondition computation. Or
non-normalized preconditions are stored in the database and an accurate and
slow feasibility test must be used. Currently, the first option is used for
assignments, calls, IOs and IF statements but a stronger feasibility test is used for
loops.
<p class="indent"> FORMAT statements are suppressed because they behave like a NOP command.
They should be gathered at the beginning or at the end of the module using
property <span class="cmtt-10">GATHER_FORMATS_AT_BEGINNING</span> <a href="#x1-530004.3">4.3</a> or <span class="cmtt-10">GATHER_FORMATS_AT_END</span> <a href="#x1-530004.3">4.3</a>.
The property must be set before the control flow graph of the module is
computed.
<p class="indent"> The cumulated effects are used in debug mode to display information.
<p class="indent"> The <span class="cmtt-10">simplify_control</span> <a href="#x1-2400008.3.1">8.3.1</a> phase also performs some <span class="cmti-10">If Simplifications </span>and
<span class="cmti-10">Loop Simplifications</span> <span class="cite">[<span class="cmbx-10">?</span>]</span>.
<p class="indent"> This function was designed and implemented by Ronan K<span class="cmcsc-10"><span class="small-caps">e</span><span class="small-caps">r</span><span class="small-caps">y</span><span class="small-caps">e</span><span class="small-caps">l</span><span class="small-caps">l</span></span>.
<div class="alltt">
<p class="noindent"><div class="obeylines-v">
<span class="cmtt-10">alias</span><span class="cmtt-10"> simplify_control</span><span class="cmtt-10"> ’Simplify</span><span class="cmtt-10"> Control’</span>
<br /><span class="cmtt-10">simplify_control</span><span class="cmtt-10"> </span><span class="cmtt-10"> </span><span class="cmtt-10"> </span><span class="cmtt-10"> </span><span class="cmtt-10"> </span><span class="cmtt-10"> </span><span class="cmtt-10"> </span><span class="cmtt-10"> </span><span class="cmtt-10"> </span><span class="cmtt-10"> ></span><span class="cmtt-10"> MODULE.code</span>
<br /><span class="cmtt-10"> </span><span class="cmtt-10"> </span><span class="cmtt-10"> </span><span class="cmtt-10"> </span><span class="cmtt-10"> </span><span class="cmtt-10"> </span><span class="cmtt-10"> </span><span class="cmtt-10"> </span><span class="cmtt-10"> </span><span class="cmtt-10"> </span><span class="cmtt-10"> </span><span class="cmtt-10"> </span><span class="cmtt-10"> </span><span class="cmtt-10"> </span><span class="cmtt-10"> </span><span class="cmtt-10"> </span><span class="cmtt-10"> </span><span class="cmtt-10"> </span><span class="cmtt-10"> </span><span class="cmtt-10"> </span><span class="cmtt-10"> </span><span class="cmtt-10"> </span><span class="cmtt-10"> </span><span class="cmtt-10"> </span><span class="cmtt-10"> </span><span class="cmtt-10"> ></span><span class="cmtt-10"> MODULE.callees</span>
<br /><span class="cmtt-10"> </span><span class="cmtt-10"> </span><span class="cmtt-10"> </span><span class="cmtt-10"> </span><span class="cmtt-10"> </span><span class="cmtt-10"> </span><span class="cmtt-10"> </span><span class="cmtt-10"> <</span><span class="cmtt-10"> PROGRAM.entities</span>
<br /><span class="cmtt-10"> </span><span class="cmtt-10"> </span><span class="cmtt-10"> </span><span class="cmtt-10"> </span><span class="cmtt-10"> </span><span class="cmtt-10"> </span><span class="cmtt-10"> </span><span class="cmtt-10"> <</span><span class="cmtt-10"> MODULE.code</span>
<br /><span class="cmtt-10"> </span><span class="cmtt-10"> </span><span class="cmtt-10"> </span><span class="cmtt-10"> </span><span class="cmtt-10"> </span><span class="cmtt-10"> </span><span class="cmtt-10"> </span><span class="cmtt-10"> <</span><span class="cmtt-10"> MODULE.proper_effects</span>
<br /><span class="cmtt-10"> </span><span class="cmtt-10"> </span><span class="cmtt-10"> </span><span class="cmtt-10"> </span><span class="cmtt-10"> </span><span class="cmtt-10"> </span><span class="cmtt-10"> </span><span class="cmtt-10"> <</span><span class="cmtt-10"> MODULE.cumulated_effects</span>
<br /><span class="cmtt-10"> </span><span class="cmtt-10"> </span><span class="cmtt-10"> </span><span class="cmtt-10"> </span><span class="cmtt-10"> </span><span class="cmtt-10"> </span><span class="cmtt-10"> </span><span class="cmtt-10"> <</span><span class="cmtt-10"> MODULE.summary_effects</span>
<br /><span class="cmtt-10"> </span><span class="cmtt-10"> </span><span class="cmtt-10"> </span><span class="cmtt-10"> </span><span class="cmtt-10"> </span><span class="cmtt-10"> </span><span class="cmtt-10"> </span><span class="cmtt-10"> <</span><span class="cmtt-10"> MODULE.preconditions</span>
</div>
</div>
<p class="noindent">This pass is the same as <span class="cmtt-10">simplify_control</span> <a href="#x1-2400008.3.1">8.3.1</a>. It is used under this obsolete
name in some validation scripts. The name has been preserved for backward
compatibility.
<div class="alltt">
<p class="noindent"><div class="obeylines-v">
<span class="cmtt-10">suppress_dead_code</span><span class="cmtt-10"> </span><span class="cmtt-10"> </span><span class="cmtt-10"> </span><span class="cmtt-10"> </span><span class="cmtt-10"> </span><span class="cmtt-10"> </span><span class="cmtt-10"> </span><span class="cmtt-10"> </span><span class="cmtt-10"> </span><span class="cmtt-10"> ></span><span class="cmtt-10"> MODULE.code</span>
<br /><span class="cmtt-10"> </span><span class="cmtt-10"> </span><span class="cmtt-10"> </span><span class="cmtt-10"> </span><span class="cmtt-10"> </span><span class="cmtt-10"> </span><span class="cmtt-10"> </span><span class="cmtt-10"> </span><span class="cmtt-10"> </span><span class="cmtt-10"> </span><span class="cmtt-10"> </span><span class="cmtt-10"> </span><span class="cmtt-10"> </span><span class="cmtt-10"> </span><span class="cmtt-10"> </span><span class="cmtt-10"> </span><span class="cmtt-10"> </span><span class="cmtt-10"> </span><span class="cmtt-10"> </span><span class="cmtt-10"> </span><span class="cmtt-10"> </span><span class="cmtt-10"> </span><span class="cmtt-10"> </span><span class="cmtt-10"> </span><span class="cmtt-10"> </span><span class="cmtt-10"> </span><span class="cmtt-10"> </span><span class="cmtt-10"> ></span><span class="cmtt-10"> MODULE.callees</span>
<br /><span class="cmtt-10"> </span><span class="cmtt-10"> </span><span class="cmtt-10"> </span><span class="cmtt-10"> </span><span class="cmtt-10"> </span><span class="cmtt-10"> </span><span class="cmtt-10"> </span><span class="cmtt-10"> <</span><span class="cmtt-10"> PROGRAM.entities</span>
<br /><span class="cmtt-10"> </span><span class="cmtt-10"> </span><span class="cmtt-10"> </span><span class="cmtt-10"> </span><span class="cmtt-10"> </span><span class="cmtt-10"> </span><span class="cmtt-10"> </span><span class="cmtt-10"> <</span><span class="cmtt-10"> MODULE.code</span>
<br /><span class="cmtt-10"> </span><span class="cmtt-10"> </span><span class="cmtt-10"> </span><span class="cmtt-10"> </span><span class="cmtt-10"> </span><span class="cmtt-10"> </span><span class="cmtt-10"> </span><span class="cmtt-10"> <</span><span class="cmtt-10"> MODULE.proper_effects</span>
<br /><span class="cmtt-10"> </span><span class="cmtt-10"> </span><span class="cmtt-10"> </span><span class="cmtt-10"> </span><span class="cmtt-10"> </span><span class="cmtt-10"> </span><span class="cmtt-10"> </span><span class="cmtt-10"> <</span><span class="cmtt-10"> MODULE.cumulated_effects</span>
<br /><span class="cmtt-10"> </span><span class="cmtt-10"> </span><span class="cmtt-10"> </span><span class="cmtt-10"> </span><span class="cmtt-10"> </span><span class="cmtt-10"> </span><span class="cmtt-10"> </span><span class="cmtt-10"> <</span><span class="cmtt-10"> MODULE.summary_effects</span>
<br /><span class="cmtt-10"> </span><span class="cmtt-10"> </span><span class="cmtt-10"> </span><span class="cmtt-10"> </span><span class="cmtt-10"> </span><span class="cmtt-10"> </span><span class="cmtt-10"> </span><span class="cmtt-10"> <</span><span class="cmtt-10"> MODULE.preconditions</span>
</div>
</div>
<p class="noindent">This pass is very similar to <span class="cmtt-10">simplify_control</span> <a href="#x1-2400008.3.1">8.3.1</a>, but it does not require the
preconditions. Only local information is used. It can be useful to clean up input code
with constant tests, e.g. <span class="obeylines-h"><span class="verb"><span class="cmtt-10">3>4</span></span></span>, and constant loop bounds. It can also be used after
<span class="cmtt-10">partial_eval</span> <a href="#x1-2660008.4.2">8.4.2</a> to avoid recomputing the preconditions yet another time.
<div class="alltt">
<p class="noindent"><div class="obeylines-v">
<span class="cmtt-10">alias</span><span class="cmtt-10"> simplify_control_directly</span><span class="cmtt-10"> ’Simplify</span><span class="cmtt-10"> Control</span><span class="cmtt-10"> Directly’</span>
<br /><span class="cmtt-10">simplify_control_directly</span><span class="cmtt-10"> </span><span class="cmtt-10"> </span><span class="cmtt-10"> </span><span class="cmtt-10"> ></span><span class="cmtt-10"> MODULE.code</span>
<br /><span class="cmtt-10"> </span><span class="cmtt-10"> </span><span class="cmtt-10"> </span><span class="cmtt-10"> </span><span class="cmtt-10"> </span><span class="cmtt-10"> </span><span class="cmtt-10"> </span><span class="cmtt-10"> </span><span class="cmtt-10"> </span><span class="cmtt-10"> </span><span class="cmtt-10"> </span><span class="cmtt-10"> </span><span class="cmtt-10"> </span><span class="cmtt-10"> </span><span class="cmtt-10"> </span><span class="cmtt-10"> </span><span class="cmtt-10"> </span><span class="cmtt-10"> </span><span class="cmtt-10"> </span><span class="cmtt-10"> </span><span class="cmtt-10"> </span><span class="cmtt-10"> </span><span class="cmtt-10"> </span><span class="cmtt-10"> </span><span class="cmtt-10"> </span><span class="cmtt-10"> </span><span class="cmtt-10"> </span><span class="cmtt-10"> </span><span class="cmtt-10"> ></span><span class="cmtt-10"> MODULE.callees</span>
<br /><span class="cmtt-10"> </span><span class="cmtt-10"> </span><span class="cmtt-10"> </span><span class="cmtt-10"> </span><span class="cmtt-10"> </span><span class="cmtt-10"> </span><span class="cmtt-10"> </span><span class="cmtt-10"> <</span><span class="cmtt-10"> PROGRAM.entities</span>
<br /><span class="cmtt-10"> </span><span class="cmtt-10"> </span><span class="cmtt-10"> </span><span class="cmtt-10"> </span><span class="cmtt-10"> </span><span class="cmtt-10"> </span><span class="cmtt-10"> </span><span class="cmtt-10"> <</span><span class="cmtt-10"> MODULE.code</span>
<br /><span class="cmtt-10"> </span><span class="cmtt-10"> </span><span class="cmtt-10"> </span><span class="cmtt-10"> </span><span class="cmtt-10"> </span><span class="cmtt-10"> </span><span class="cmtt-10"> </span><span class="cmtt-10"> <</span><span class="cmtt-10"> MODULE.proper_effects</span>
<br /><span class="cmtt-10"> </span><span class="cmtt-10"> </span><span class="cmtt-10"> </span><span class="cmtt-10"> </span><span class="cmtt-10"> </span><span class="cmtt-10"> </span><span class="cmtt-10"> </span><span class="cmtt-10"> <</span><span class="cmtt-10"> MODULE.cumulated_effects</span>
<br /><span class="cmtt-10"> </span><span class="cmtt-10"> </span><span class="cmtt-10"> </span><span class="cmtt-10"> </span><span class="cmtt-10"> </span><span class="cmtt-10"> </span><span class="cmtt-10"> </span><span class="cmtt-10"> <</span><span class="cmtt-10"> MODULE.summary_effects</span>
</div>
</div>
<p class="noindent">
<h5 class="subsubsectionHead"><span class="titlemark">8.3.1.1 </span> <a id="x1-2410008.3.1.1"></a>Dead Code Elimination properties</h5>
<a id="dx1-241001"></a>
<a id="dx1-241002"></a>
<p class="noindent">It is sometimes useful to display statistics on what has been found useless and
removed in a function, this property controls the statistics display:
<pre class="listings"><span class="cmtt-9"> </span><br /><span class="label"><a id="x1-241003r1"></a></span><span class="cmtt-9">DEAD_CODE_DISPLAY_STATISTICS</span><span class="cmtt-9"> </span><span class="cmtt-9">TRUE</span>
<span class="cmtt-9"> </span><br /><span class="label"><a id="x1-241004r2"></a></span></pre>
<p class="noindent">
<h4 class="subsectionHead"><span class="titlemark">8.3.2 </span> <a id="x1-2420008.3.2"></a>Dead Code Elimination (a.k.a. Use-Def Elimination)</h4>
<a id="dx1-242001"></a>
<a id="dx1-242002"></a>
<p class="noindent">Function <span class="cmtt-10">dead_code_elimination</span> <a href="#x1-2420008.3.2">8.3.2</a> deletes statements whose def references are
all dead, i.e. are not used by later executions of statements. It was developed by
Ronan K<span class="cmcsc-10"><span class="small-caps">e</span><span class="small-caps">r</span><span class="small-caps">y</span><span class="small-caps">e</span><span class="small-caps">l</span><span class="small-caps">l</span></span>. The algorithm compute the set of live statements without fix-point.
An initial set of live statements is extended with new statements reached thru use-def
chains, control dependences and....
<p class="indent"> The initial set of live statements contains IO statements, RETURN,
STOP,...
<p class="indent"> Note that use-def chains are computed intraproceduraly and not interproceduraly.
Hence some statements may be preserved because they update a formal parameter
although this formal parameter is no longer used by the callers.
<p class="indent"> The dependence graph may be used instead of the use-def chains, but Ronan
<span class="cmcsc-10">K<span class="small-caps">e</span><span class="small-caps">r</span><span class="small-caps">y</span><span class="small-caps">e</span><span class="small-caps">l</span><span class="small-caps">l</span></span>, designer and implementer of the initial Fortran version, did not produce
convincing evidence of the benefit... The drawback is the additional CPU time
required.
<p class="indent"> This pass was extended to C by Mehdi <span class="cmcsc-10">A<span class="small-caps">m</span><span class="small-caps">i</span><span class="small-caps">n</span><span class="small-caps">i</span> </span>in 2009-2010, but it is
not yet stabilized. For C code, this pass requires that effects are calculated
with property <span class="obeylines-h"><span class="verb"><span class="cmtt-10">MEMORY_EFFECTS_ONLY</span></span></span> set to <span class="obeylines-h"><span class="verb"><span class="cmtt-10">FALSE</span></span></span> because we need that the
DG includes arcs for declarations as these latter are separate statements
now.
<p class="indent"> <span class="cmtt-10">CLEAN_DECLARATIONS</span> <span class="cmbx-10">??</span> is automatically done at the end, this is why cumulated
effects are needed.
<p class="indent"> Comments from Nga <span class="cmcsc-10">N<span class="small-caps">g</span><span class="small-caps">u</span><span class="small-caps">y</span><span class="small-caps">e</span><span class="small-caps">n</span></span>: According to <span class="cite">[<span class="cmbx-10">?</span>]</span> p. 595, and <span class="cite">[<span class="cmbx-10">?</span>]</span> p. 592, a variable
is <span class="cmti-10">dead </span>if it is not used on any path from the location in the code where it is defined
to the exit point of the routine in the question; an instruction is <span class="cmti-10">dead </span>if it
computes only values that are not used on any executable path leading from the
instruction. The transformation that identifies and removes such dead code is
called dead code elimination. So in fact, the <span class="cmti-10">Use-def elimination </span>pass in
PIPS is a <span class="cmti-10">Dead code elimination </span>pass and the <span class="cmti-10">Suppress dead code </span>pass (see
Section <a href="#x1-2400008.3.1">8.3.1</a>) does not have a standard name. It could be <span class="cmti-10">If and loop simplification</span>
pass.
<p class="indent"> The following properties are intended to force some functions to be kept by the
algorithm, <span class="cmtt-10">DEAD_CODE_ELIMINATION_KEEP_FUNCTIONS</span> <a href="#x1-2420008.3.2">8.3.2</a> expect a space separated
list of function names while <span class="cmtt-10">DEAD_CODE_ELIMINATION_KEEP_FUNCTIONS</span> <a href="#x1-2420008.3.2">8.3.2</a> expect
a space separated list of prefix for function name.
<pre class="listings"><span class="cmtt-9"> </span><br /><span class="label"><a id="x1-242003r1"></a></span><span class="cmtt-9">DEAD_CODE_ELIMINATION_KEEP_FUNCTIONS</span><span class="cmtt-9"> </span><span class="listings-nested"><span class="cmtt-9">"</span><span class="cmtt-9">"</span></span>
<span class="cmtt-9"> </span><br /><span class="label"><a id="x1-242004r2"></a></span></pre>
<pre class="listings"><span class="cmtt-9"> </span><br /><span class="label"><a id="x1-242005r1"></a></span><span class="cmtt-9">DEAD_CODE_ELIMINATION_KEEP_FUNCTIONS_PREFIX</span><span class="cmtt-9"> </span><span class="listings-nested"><span class="cmtt-9">"</span><span class="cmtt-9">"</span></span>
<span class="cmtt-9"> </span><br /><span class="label"><a id="x1-242006r2"></a></span></pre>
<div class="alltt">
<p class="noindent"><div class="obeylines-v">
<span class="cmtt-10">alias</span><span class="cmtt-10"> dead_code_elimination</span><span class="cmtt-10"> ’Dead</span><span class="cmtt-10"> Code</span><span class="cmtt-10"> Elimination’</span>
<br /><span class="cmtt-10">dead_code_elimination</span><span class="cmtt-10"> </span><span class="cmtt-10"> </span><span class="cmtt-10"> </span><span class="cmtt-10"> </span><span class="cmtt-10"> </span><span class="cmtt-10"> </span><span class="cmtt-10"> </span><span class="cmtt-10"> </span><span class="cmtt-10"> </span><span class="cmtt-10"> ></span><span class="cmtt-10"> MODULE.code</span>
<br /><span class="cmtt-10"> </span><span class="cmtt-10"> </span><span class="cmtt-10"> </span><span class="cmtt-10"> </span><span class="cmtt-10"> </span><span class="cmtt-10"> </span><span class="cmtt-10"> </span><span class="cmtt-10"> </span><span class="cmtt-10"> </span><span class="cmtt-10"> </span><span class="cmtt-10"> </span><span class="cmtt-10"> </span><span class="cmtt-10"> </span><span class="cmtt-10"> </span><span class="cmtt-10"> </span><span class="cmtt-10"> </span><span class="cmtt-10"> </span><span class="cmtt-10"> </span><span class="cmtt-10"> </span><span class="cmtt-10"> </span><span class="cmtt-10"> </span><span class="cmtt-10"> </span><span class="cmtt-10"> </span><span class="cmtt-10"> </span><span class="cmtt-10"> </span><span class="cmtt-10"> </span><span class="cmtt-10"> </span><span class="cmtt-10"> </span><span class="cmtt-10"> </span><span class="cmtt-10"> </span><span class="cmtt-10"> ></span><span class="cmtt-10"> MODULE.callees</span>
<br /><span class="cmtt-10"> </span><span class="cmtt-10"> </span><span class="cmtt-10"> </span><span class="cmtt-10"> </span><span class="cmtt-10"> </span><span class="cmtt-10"> </span><span class="cmtt-10"> </span><span class="cmtt-10"> <</span><span class="cmtt-10"> PROGRAM.entities</span>
<br /><span class="cmtt-10"> </span><span class="cmtt-10"> </span><span class="cmtt-10"> </span><span class="cmtt-10"> </span><span class="cmtt-10"> </span><span class="cmtt-10"> </span><span class="cmtt-10"> </span><span class="cmtt-10"> <</span><span class="cmtt-10"> MODULE.code</span>
<br /><span class="cmtt-10"> </span><span class="cmtt-10"> </span><span class="cmtt-10"> </span><span class="cmtt-10"> </span><span class="cmtt-10"> </span><span class="cmtt-10"> </span><span class="cmtt-10"> </span><span class="cmtt-10"> <</span><span class="cmtt-10"> MODULE.proper_effects</span>
<br /><span class="cmtt-10"> </span><span class="cmtt-10"> </span><span class="cmtt-10"> </span><span class="cmtt-10"> </span><span class="cmtt-10"> </span><span class="cmtt-10"> </span><span class="cmtt-10"> </span><span class="cmtt-10"> <</span><span class="cmtt-10"> MODULE.cumulated_effects</span>
<br /><span class="cmtt-10"> </span><span class="cmtt-10"> </span><span class="cmtt-10"> </span><span class="cmtt-10"> </span><span class="cmtt-10"> </span><span class="cmtt-10"> </span><span class="cmtt-10"> </span><span class="cmtt-10"> <</span><span class="cmtt-10"> MODULE.chains</span>
</div>
</div>
<p class="noindent">For backward compatibility, the previous pass name is preserved.
<div class="alltt">
<p class="noindent"><div class="obeylines-v">
<span class="cmtt-10">alias</span><span class="cmtt-10"> use_def_elimination</span><span class="cmtt-10"> ’Use-Def</span><span class="cmtt-10"> elimination’</span>
<br /><span class="cmtt-10">use_def_elimination</span><span class="cmtt-10"> </span><span class="cmtt-10"> </span><span class="cmtt-10"> </span><span class="cmtt-10"> </span><span class="cmtt-10"> </span><span class="cmtt-10"> </span><span class="cmtt-10"> </span><span class="cmtt-10"> </span><span class="cmtt-10"> </span><span class="cmtt-10"> ></span><span class="cmtt-10"> MODULE.code</span>
<br /><span class="cmtt-10"> </span><span class="cmtt-10"> </span><span class="cmtt-10"> </span><span class="cmtt-10"> </span><span class="cmtt-10"> </span><span class="cmtt-10"> </span><span class="cmtt-10"> </span><span class="cmtt-10"> <</span><span class="cmtt-10"> PROGRAM.entities</span>
<br /><span class="cmtt-10"> </span><span class="cmtt-10"> </span><span class="cmtt-10"> </span><span class="cmtt-10"> </span><span class="cmtt-10"> </span><span class="cmtt-10"> </span><span class="cmtt-10"> </span><span class="cmtt-10"> <</span><span class="cmtt-10"> MODULE.code</span>
<br /><span class="cmtt-10"> </span><span class="cmtt-10"> </span><span class="cmtt-10"> </span><span class="cmtt-10"> </span><span class="cmtt-10"> </span><span class="cmtt-10"> </span><span class="cmtt-10"> </span><span class="cmtt-10"> <</span><span class="cmtt-10"> MODULE.proper_effects</span>
<br /><span class="cmtt-10"> </span><span class="cmtt-10"> </span><span class="cmtt-10"> </span><span class="cmtt-10"> </span><span class="cmtt-10"> </span><span class="cmtt-10"> </span><span class="cmtt-10"> </span><span class="cmtt-10"> <</span><span class="cmtt-10"> MODULE.cumulated_effects</span>
<br /><span class="cmtt-10"> </span><span class="cmtt-10"> </span><span class="cmtt-10"> </span><span class="cmtt-10"> </span><span class="cmtt-10"> </span><span class="cmtt-10"> </span><span class="cmtt-10"> </span><span class="cmtt-10"> <</span><span class="cmtt-10"> MODULE.chains</span>
</div>
</div>
<p class="noindent">
<h4 class="subsectionHead"><span class="titlemark">8.3.3 </span> <a id="x1-2430008.3.3"></a>Control Restructurers</h4>
<a id="dx1-243001"></a>
<a id="dx1-243002"></a>
<p class="noindent">Two control restructurers are available: <span class="cmtt-10">unspaghettify</span> <a href="#x1-2440008.3.3.1">8.3.3.1</a> which is used by default in
conjunction with <span class="cmtt-10">controlizer</span> <a href="#x1-520004.3">4.3</a> and <span class="cmtt-10">restructure_control</span> <a href="#x1-2450008.3.3.2">8.3.3.2</a> which must be explicitly
applied<span class="footnote-mark"><a href="https://pips4u.org/doc/pipsmake-rc.htdoc/pipsmake-rc41.html#fn2x9"><sup class="textsuperscript">2</sup></a></span><a id="x1-243003f2"></a>
<h5 class="subsubsectionHead"><span class="titlemark">8.3.3.1 </span> <a id="x1-2440008.3.3.1"></a>Unspaghettify</h5>
<a id="dx1-244001"></a>
<a id="dx1-244002"></a>
<a id="dx1-244003"></a>
<a id="dx1-244004"></a>
<a id="dx1-244005"></a>
<p class="noindent">The <span class="cmti-10">unspaghettifier </span>is a heuristic to clean up and to simplify the control graphs of a
module. It is useful because the controlizer (see Section <a href="#x1-520004.3">4.3</a>) or some transformation
phases can generate some <span class="cmti-10">spaghetti </span>code with a lot of useless unstructured code
which can confuse some other parts of <span class="cmtt-10">PIPS</span>. Dead code elimination, for example,
uses <span class="cmtt-10">unspaghettify</span> <a href="#x1-2440008.3.3.1">8.3.3.1</a>.
<p class="indent"> This control restructuring transformation can be automatically applied in the
<span class="cmtt-10">controlizer</span> <a href="#x1-520004.3">4.3</a> phase (see Section <a href="#x1-520004.3">4.3</a>) if the <span class="cmtt-10">UNSPAGHETTIFY_IN_CONTROLIZER</span> <a href="#x1-520004.3">4.3</a>
property is true.
<p class="indent"> To add flexibility, the behavior of <span class="cmtt-10">unspaghettify</span> <a href="#x1-2440008.3.3.1">8.3.3.1</a> is controlled
by the properties <span class="cmtt-10">UNSPAGHETTIFY_TEST_RESTRUCTURING</span> <a href="#x1-2440008.3.3.1">8.3.3.1</a> and
<span class="cmtt-10">UNSPAGHETTIFY_RECURSIVE_DECOMPOSITION</span> <a href="#x1-2440008.3.3.1">8.3.3.1</a> to allow more restructuring
from <span class="cmtt-10">restructure_control</span> <a href="#x1-2450008.3.3.2">8.3.3.2</a> to be added in the <span class="cmtt-10">controlizer</span> <a href="#x1-520004.3">4.3</a> for
example.
<p class="indent"> This function was designed and implemented by Ronan <span class="cmcsc-10">K<span class="small-caps">e</span><span class="small-caps">r</span><span class="small-caps">y</span><span class="small-caps">e</span><span class="small-caps">l</span><span class="small-caps">l</span></span>.
<div class="alltt">
<p class="noindent"><div class="obeylines-v">
<span class="cmtt-10">alias</span><span class="cmtt-10"> unspaghettify</span><span class="cmtt-10"> ’Unspaghettify</span><span class="cmtt-10"> the</span><span class="cmtt-10"> Control</span><span class="cmtt-10"> Graph’</span>
</div>
</div>
<div class="alltt">
<p class="noindent"><div class="obeylines-v">
<span class="cmtt-10">unspaghettify</span><span class="cmtt-10"> </span><span class="cmtt-10"> </span><span class="cmtt-10"> </span><span class="cmtt-10"> </span><span class="cmtt-10"> </span><span class="cmtt-10"> </span><span class="cmtt-10"> </span><span class="cmtt-10"> </span><span class="cmtt-10"> </span><span class="cmtt-10"> ></span><span class="cmtt-10"> MODULE.code</span>
<br /><span class="cmtt-10"> </span><span class="cmtt-10"> </span><span class="cmtt-10"> </span><span class="cmtt-10"> </span><span class="cmtt-10"> </span><span class="cmtt-10"> </span><span class="cmtt-10"> </span><span class="cmtt-10"> <</span><span class="cmtt-10"> PROGRAM.entities</span>
<br /><span class="cmtt-10"> </span><span class="cmtt-10"> </span><span class="cmtt-10"> </span><span class="cmtt-10"> </span><span class="cmtt-10"> </span><span class="cmtt-10"> </span><span class="cmtt-10"> </span><span class="cmtt-10"> <</span><span class="cmtt-10"> MODULE.code</span>
</div>
</div>
<p class="noindent">To display the statistics about <span class="cmtt-10">unspaghettify</span> <a href="#x1-2440008.3.3.1">8.3.3.1</a> and control graph
restructuring <span class="cmtt-10">restructure_control</span> <a href="#x1-2450008.3.3.2">8.3.3.2</a>.
<pre class="listings"><span class="cmtt-9"> </span><br /><span class="label"><a id="x1-244006r1"></a></span><span class="cmtt-9">UNSPAGHETTIFY_DISPLAY_STATISTICS</span><span class="cmtt-9"> </span><span class="cmtt-9">TRUE</span>
<span class="cmtt-9"> </span><br /><span class="label"><a id="x1-244007r2"></a></span></pre>
<p class="indent"> The following option enables the use of IF/THEN/ELSE restructuring when
applying unspaghettify:
<pre class="listings"><span class="cmtt-9"> </span><br /><span class="label"><a id="x1-244008r1"></a></span><span class="cmtt-9">UNSPAGHETTIFY_TEST_RESTRUCTURING</span><span class="cmtt-9"> </span><span class="cmtt-9">FALSE</span>
<span class="cmtt-9"> </span><br /><span class="label"><a id="x1-244009r2"></a></span></pre>
<p class="noindent">It is assumed as true for <span class="cmtt-10">restructure_control</span> <a href="#x1-2450008.3.3.2">8.3.3.2</a>. It recursively implement
TEST restructuring (replacing IF/THEN/ELSE with GOTOs with structured
IF/THEN/ELSE without any GOTOs when possible) by applying pattern matching
methods.
<p class="indent"> The following option enables the use of control graph hierarchisation when
applying unspaghettify:
<pre class="listings"><span class="cmtt-9"> </span><br /><span class="label"><a id="x1-244010r1"></a></span><span class="cmtt-9">UNSPAGHETTIFY_RECURSIVE_DECOMPOSITION</span><span class="cmtt-9"> </span><span class="cmtt-9">FALSE</span>
<span class="cmtt-9"> </span><br /><span class="label"><a id="x1-244011r2"></a></span></pre>
<p class="noindent">It is assumed as true for <span class="cmtt-10">restructure_control</span> <a href="#x1-2450008.3.3.2">8.3.3.2</a>. It implements a recursive
decomposition of the control flow graph by an interval graph partitioning
method.
<p class="indent"> The restructurer can recover some while loops if this property is set:
<pre class="listings"><span class="cmtt-9"> </span><br /><span class="label"><a id="x1-244012r1"></a></span><span class="cmtt-9">UNSPAGHETTIFY_WHILE_RECOVER</span><span class="cmtt-9"> </span><span class="cmtt-9">FALSE</span>
<span class="cmtt-9"> </span><br /><span class="label"><a id="x1-244013r2"></a></span></pre>
<p class="noindent">
<h5 class="subsubsectionHead"><span class="titlemark">8.3.3.2 </span> <a id="x1-2450008.3.3.2"></a>Restructure Control</h5>
<p class="noindent"><span class="cmtt-10">restructure_control</span> <a href="#x1-2450008.3.3.2">8.3.3.2</a> is a more complete restructuring phase that is useful
to improve the accuracy of various <span class="cmtt-10">PIPS </span>phases.
<p class="indent"> It is implemented by calling <span class="cmtt-10">unspaghettify</span> <a href="#x1-2440008.3.3.1">8.3.3.1</a> (<span class="cmsy-10">§</span> <a href="#x1-2440008.3.3.1">8.3.3.1</a>)
with the properties <span class="cmtt-10">UNSPAGHETTIFY_TEST_RESTRUCTURING</span> <a href="#x1-2440008.3.3.1">8.3.3.1</a> and
<span class="cmtt-10">UNSPAGHETTIFY_RECURSIVE_DECOMPOSITION</span> <a href="#x1-2440008.3.3.1">8.3.3.1</a> set to <span class="cmtt-10">TRUE</span>.
<p class="indent"> Other restructuring methods are available in <span class="cmtt-10">PIPS </span>with the TOOLPACK’s
restructurer (see Section <a href="#x1-2520008.3.4">8.3.4</a>).
<div class="alltt">
<p class="noindent"><div class="obeylines-v">
<span class="cmtt-10">alias</span><span class="cmtt-10"> restructure_control</span><span class="cmtt-10"> ’Restructure</span><span class="cmtt-10"> the</span><span class="cmtt-10"> Control</span><span class="cmtt-10"> Graph’</span>
<br />
<br /><span class="cmtt-10">restructure_control</span><span class="cmtt-10"> </span><span class="cmtt-10"> </span><span class="cmtt-10"> </span><span class="cmtt-10"> </span><span class="cmtt-10"> </span><span class="cmtt-10"> </span><span class="cmtt-10"> </span><span class="cmtt-10"> </span><span class="cmtt-10"> </span><span class="cmtt-10"> ></span><span class="cmtt-10"> MODULE.code</span>
<br /><span class="cmtt-10"> </span><span class="cmtt-10"> </span><span class="cmtt-10"> </span><span class="cmtt-10"> </span><span class="cmtt-10"> </span><span class="cmtt-10"> </span><span class="cmtt-10"> </span><span class="cmtt-10"> <</span><span class="cmtt-10"> PROGRAM.entities</span>
<br /><span class="cmtt-10"> </span><span class="cmtt-10"> </span><span class="cmtt-10"> </span><span class="cmtt-10"> </span><span class="cmtt-10"> </span><span class="cmtt-10"> </span><span class="cmtt-10"> </span><span class="cmtt-10"> <</span><span class="cmtt-10"> MODULE.code</span>
</div>
</div>
<p class="noindent">
<h5 class="subsubsectionHead"><span class="titlemark">8.3.3.3 </span> <a id="x1-2460008.3.3.3"></a>For-loop recovering</h5>
<p class="noindent">This control-flow transformation try to recover for-loops from while-loops. Useful to
be run after transformations
<div class="alltt">
<p class="noindent"><div class="obeylines-v">
<span class="cmtt-10">alias</span><span class="cmtt-10"> recover_for_loop</span><span class="cmtt-10"> ’Recover</span><span class="cmtt-10"> for-loops</span><span class="cmtt-10"> from</span><span class="cmtt-10"> while-loops’</span>
<br />
<br /><span class="cmtt-10">recover_for_loop</span><span class="cmtt-10"> </span><span class="cmtt-10"> </span><span class="cmtt-10"> </span><span class="cmtt-10"> </span><span class="cmtt-10"> </span><span class="cmtt-10"> </span><span class="cmtt-10"> </span><span class="cmtt-10"> </span><span class="cmtt-10"> </span><span class="cmtt-10"> ></span><span class="cmtt-10"> MODULE.code</span>
<br /><span class="cmtt-10"> </span><span class="cmtt-10"> </span><span class="cmtt-10"> </span><span class="cmtt-10"> </span><span class="cmtt-10"> </span><span class="cmtt-10"> </span><span class="cmtt-10"> </span><span class="cmtt-10"> <</span><span class="cmtt-10"> PROGRAM.entities</span>
<br /><span class="cmtt-10"> </span><span class="cmtt-10"> </span><span class="cmtt-10"> </span><span class="cmtt-10"> </span><span class="cmtt-10"> </span><span class="cmtt-10"> </span><span class="cmtt-10"> </span><span class="cmtt-10"> <</span><span class="cmtt-10"> MODULE.code</span>
<br /><span class="cmtt-10"> </span><span class="cmtt-10"> </span><span class="cmtt-10"> </span><span class="cmtt-10"> </span><span class="cmtt-10"> </span><span class="cmtt-10"> </span><span class="cmtt-10"> </span><span class="cmtt-10"> <</span><span class="cmtt-10"> MODULE.transformers</span>
<br /><span class="cmtt-10"> </span><span class="cmtt-10"> </span><span class="cmtt-10"> </span><span class="cmtt-10"> </span><span class="cmtt-10"> </span><span class="cmtt-10"> </span><span class="cmtt-10"> </span><span class="cmtt-10"> <</span><span class="cmtt-10"> MODULE.summary_transformer</span>
<br /><span class="cmtt-10"> </span><span class="cmtt-10"> </span><span class="cmtt-10"> </span><span class="cmtt-10"> </span><span class="cmtt-10"> </span><span class="cmtt-10"> </span><span class="cmtt-10"> </span><span class="cmtt-10"> <</span><span class="cmtt-10"> MODULE.proper_effects</span>
<br /><span class="cmtt-10"> </span><span class="cmtt-10"> </span><span class="cmtt-10"> </span><span class="cmtt-10"> </span><span class="cmtt-10"> </span><span class="cmtt-10"> </span><span class="cmtt-10"> </span><span class="cmtt-10"> <</span><span class="cmtt-10"> MODULE.cumulated_effects</span>
<br /><span class="cmtt-10"> </span><span class="cmtt-10"> </span><span class="cmtt-10"> </span><span class="cmtt-10"> </span><span class="cmtt-10"> </span><span class="cmtt-10"> </span><span class="cmtt-10"> </span><span class="cmtt-10"> <</span><span class="cmtt-10"> MODULE.summary_effects</span>
</div>
</div>
<p class="noindent">This phase cannot be called from inside the control restructurer since it needs many
higher-level analysis. This is why it is in a separate phase.
<p class="noindent">
<h5 class="subsubsectionHead"><span class="titlemark">8.3.3.4 </span> <a id="x1-2470008.3.3.4"></a>For-loop to do-loop transformation</h5>
<p class="noindent">Since in PIPS some transformations and analysis are more precise for Fortran code,
this is a transformation than try to transform the C-like for-loops into Fortran-like
do-loops.
<p class="noindent">Don’t worry about the C-code output: the prettyprinter output do-loop as for-loop if
the C-output is selected. The do-loop construct is interesting since the iteration set is
computed at the loop entry (for example it is not sensible to the index modification
from the inside of the loop) and this simplifies abstract interpretation a
lot.
<p class="indent"> This transformation transform for example a
<div class="lstlisting" id="listing-15"><span class="label"><a id="x1-247001r1"></a></span><span class="cmbx-10">for</span> (i = lb; i <span class="cmmi-10"><</span> ub; i += stride) <br /><span class="label"><a id="x1-247002r2"></a></span>  body;
</div>
<p class="indent"> into a
<div class="lstlisting" id="listing-16"><span class="label"><a id="x1-247003r1"></a></span><span class="cmbx-10">do</span> i = lb, ub <span class="cmsy-10">-</span> 1, stride <br /><span class="label"><a id="x1-247004r2"></a></span>  body <br /><span class="label"><a id="x1-247005r3"></a></span><span class="cmbx-10">end</span> <span class="cmbx-10">do</span>
</div>
<div class="alltt">
<p class="noindent"><div class="obeylines-v">
<span class="cmtt-10">alias</span><span class="cmtt-10"> for_loop_to_do_loop</span><span class="cmtt-10"> ’For-loop</span><span class="cmtt-10"> to</span><span class="cmtt-10"> do-loop</span><span class="cmtt-10"> transformation’</span>
<br />
<br /><span class="cmtt-10">for_loop_to_do_loop</span><span class="cmtt-10"> </span><span class="cmtt-10"> </span><span class="cmtt-10"> </span><span class="cmtt-10"> </span><span class="cmtt-10"> </span><span class="cmtt-10"> </span><span class="cmtt-10"> </span><span class="cmtt-10"> </span><span class="cmtt-10"> </span><span class="cmtt-10"> ></span><span class="cmtt-10"> MODULE.code</span>
<br /><span class="cmtt-10"> </span><span class="cmtt-10"> </span><span class="cmtt-10"> </span><span class="cmtt-10"> </span><span class="cmtt-10"> </span><span class="cmtt-10"> </span><span class="cmtt-10"> </span><span class="cmtt-10"> <</span><span class="cmtt-10"> PROGRAM.entities</span>
<br /><span class="cmtt-10"> </span><span class="cmtt-10"> </span><span class="cmtt-10"> </span><span class="cmtt-10"> </span><span class="cmtt-10"> </span><span class="cmtt-10"> </span><span class="cmtt-10"> </span><span class="cmtt-10"> <</span><span class="cmtt-10"> MODULE.code</span>
</div>
</div>
<p class="noindent">
<h5 class="subsubsectionHead"><span class="titlemark">8.3.3.5 </span> <a id="x1-2480008.3.3.5"></a>For-loop to while-loop transformation</h5>
<p class="noindent">Since in PIPS some transformations and analysis may not be implemented for C
for-loop but may be implemented for while-loop, it is interesting to have this for-loop
to while-loop desugaring transformation.
<p class="noindent">This transformation transforms a
<div class="lstlisting" id="listing-17"><span class="label"><a id="x1-248001r1"></a></span><span class="cmbx-10">for</span> (init; cond; update) <br /><span class="label"><a id="x1-248002r2"></a></span>  body;
</div>
<p class="indent"> into a
<div class="lstlisting" id="listing-18"><span class="label"><a id="x1-248003r1"></a></span><span class="cmsy-10">{</span> <br /><span class="label"><a id="x1-248004r2"></a></span>  init; <br /><span class="label"><a id="x1-248005r3"></a></span>  <span class="cmbx-10">while</span>(cond) <span class="cmsy-10">{</span> <br /><span class="label"><a id="x1-248006r4"></a></span>    body; <br /><span class="label"><a id="x1-248007r5"></a></span>    update; <br /><span class="label"><a id="x1-248008r6"></a></span>  <span class="cmsy-10">}</span> <br /><span class="label"><a id="x1-248009r7"></a></span><span class="cmsy-10">}</span>
</div>
<p class="indent"> Since analysis are more precise on do-loops, you should apply a
<span class="cmtt-10">for_loop_to_do_loop</span> <a href="#x1-2470008.3.3.4">8.3.3.4</a> transformation first , and only after, apply this
<span class="cmtt-10">for_loop_to_while_loop</span> <a href="#x1-2480008.3.3.5">8.3.3.5</a> transformation that will transform the remaining
for-loops into while loops.
<div class="alltt">
<p class="noindent"><div class="obeylines-v">
<span class="cmtt-10">alias</span><span class="cmtt-10"> for_loop_to_while_loop</span><span class="cmtt-10"> ’For-loop</span><span class="cmtt-10"> to</span><span class="cmtt-10"> while-loop</span><span class="cmtt-10"> transformation’</span>
<br />
<br /><span class="cmtt-10">for_loop_to_while_loop</span><span class="cmtt-10"> </span><span class="cmtt-10"> </span><span class="cmtt-10"> </span><span class="cmtt-10"> </span><span class="cmtt-10"> </span><span class="cmtt-10"> </span><span class="cmtt-10"> </span><span class="cmtt-10"> </span><span class="cmtt-10"> </span><span class="cmtt-10"> ></span><span class="cmtt-10"> MODULE.code</span>
<br /><span class="cmtt-10"> </span><span class="cmtt-10"> </span><span class="cmtt-10"> </span><span class="cmtt-10"> </span><span class="cmtt-10"> </span><span class="cmtt-10"> </span><span class="cmtt-10"> </span><span class="cmtt-10"> <</span><span class="cmtt-10"> PROGRAM.entities</span>
<br /><span class="cmtt-10"> </span><span class="cmtt-10"> </span><span class="cmtt-10"> </span><span class="cmtt-10"> </span><span class="cmtt-10"> </span><span class="cmtt-10"> </span><span class="cmtt-10"> </span><span class="cmtt-10"> <</span><span class="cmtt-10"> MODULE.code</span>
</div>
</div>
<p class="noindent">
<h5 class="subsubsectionHead"><span class="titlemark">8.3.3.6 </span> <a id="x1-2490008.3.3.6"></a>Do-while to while-loop transformation</h5>
<p class="noindent">Some transformations only work on while loops, thus it is useful to have this
transformation that transforms a
<div class="lstlisting" id="listing-19"><span class="label"><a id="x1-249001r1"></a></span><span class="cmbx-10">do</span> <span class="cmsy-10">{</span> <br /><span class="label"><a id="x1-249002r2"></a></span>  body; <br /><span class="label"><a id="x1-249003r3"></a></span><span class="cmsy-10">}</span> <span class="cmbx-10">while</span> (cond);
</div>
<p class="indent"> into a
<div class="lstlisting" id="listing-20"><span class="label"><a id="x1-249004r1"></a></span><span class="cmsy-10">{</span> <br /><span class="label"><a id="x1-249005r2"></a></span>  body; <br /><span class="label"><a id="x1-249006r3"></a></span><span class="cmsy-10">}</span> <br /><span class="label"><a id="x1-249007r4"></a></span><span class="cmbx-10">while</span> (cond) <span class="cmsy-10">{</span> <br /><span class="label"><a id="x1-249008r5"></a></span>  body; <br /><span class="label"><a id="x1-249009r6"></a></span><span class="cmsy-10">}</span>
</div>
<p class="indent"> It is a transformation useful before while-loop to for-loop recovering for example
(see <span class="cmsy-10">§</span> <a href="#x1-2460008.3.3.3">8.3.3.3</a>).
<div class="alltt">
<p class="noindent"><div class="obeylines-v">
<span class="cmtt-10">alias</span><span class="cmtt-10"> dowhile_to_while</span><span class="cmtt-10"> ’Do-while</span><span class="cmtt-10"> to</span><span class="cmtt-10"> while-loop</span><span class="cmtt-10"> transformation’</span>
<br />
<br /><span class="cmtt-10">dowhile_to_while</span><span class="cmtt-10"> </span><span class="cmtt-10"> </span><span class="cmtt-10"> </span><span class="cmtt-10"> </span><span class="cmtt-10"> </span><span class="cmtt-10"> </span><span class="cmtt-10"> </span><span class="cmtt-10"> </span><span class="cmtt-10"> </span><span class="cmtt-10"> ></span><span class="cmtt-10"> MODULE.code</span>
<br /><span class="cmtt-10"> </span><span class="cmtt-10"> </span><span class="cmtt-10"> </span><span class="cmtt-10"> </span><span class="cmtt-10"> </span><span class="cmtt-10"> </span><span class="cmtt-10"> </span><span class="cmtt-10"> <</span><span class="cmtt-10"> PROGRAM.entities</span>
<br /><span class="cmtt-10"> </span><span class="cmtt-10"> </span><span class="cmtt-10"> </span><span class="cmtt-10"> </span><span class="cmtt-10"> </span><span class="cmtt-10"> </span><span class="cmtt-10"> </span><span class="cmtt-10"> <</span><span class="cmtt-10"> MODULE.code</span>
</div>
</div>
<p class="noindent">
<h5 class="subsubsectionHead"><span class="titlemark">8.3.3.7 </span> <a id="x1-2500008.3.3.7"></a>Spaghettify</h5>
<a id="dx1-250001"></a>
<a id="dx1-250002"></a>
<a id="dx1-250003"></a>
<a id="dx1-250004"></a>
<a id="dx1-250005"></a>
<p class="noindent"><span class="cmtt-10">spaghettify</span> <a href="#x1-2500008.3.3.7">8.3.3.7</a> is used in the context of the PHRASE project while creating
“Finite State Machine”-like code portions in order to synthesize them in
reconfigurable units.
<p class="indent"> This phases transform structured code portions (eg. loops) in unstructured
statements.
<p class="indent"> <span class="cmtt-10">spaghettify</span> <a href="#x1-2500008.3.3.7">8.3.3.7</a> transforms the module in a unstructured code with
hierarchical unstructured portions of code corresponding to the old control flow
structures.
<p class="indent"> To add flexibility, the behavior of <span class="cmtt-10">spaghettify</span> <a href="#x1-2500008.3.3.7">8.3.3.7</a> is controlled by the
properties
<ul class="itemize1">
<li class="itemize">DESTRUCTURE_TESTS
</li>
<li class="itemize">DESTRUCTURE_LOOPS
</li>
<li class="itemize">DESTRUCTURE_WHILELOOPS
</li>
<li class="itemize">DESTRUCTURE_FORLOOPS</li></ul>
<p class="noindent">to allow more or less destruction power.
<div class="alltt">
<p class="noindent"><div class="obeylines-v">
<span class="cmtt-10">alias</span><span class="cmtt-10"> spaghettify</span><span class="cmtt-10"> ’Spaghettify</span><span class="cmtt-10"> the</span><span class="cmtt-10"> Control</span><span class="cmtt-10"> Graph’</span>
</div>
</div>
<div class="alltt">
<p class="noindent"><div class="obeylines-v">
<span class="cmtt-10">spaghettify</span><span class="cmtt-10"> </span><span class="cmtt-10"> </span><span class="cmtt-10"> </span><span class="cmtt-10"> </span><span class="cmtt-10"> </span><span class="cmtt-10"> </span><span class="cmtt-10"> </span><span class="cmtt-10"> </span><span class="cmtt-10"> </span><span class="cmtt-10"> ></span><span class="cmtt-10"> MODULE.code</span>
<br /><span class="cmtt-10"> </span><span class="cmtt-10"> </span><span class="cmtt-10"> </span><span class="cmtt-10"> </span><span class="cmtt-10"> </span><span class="cmtt-10"> </span><span class="cmtt-10"> </span><span class="cmtt-10"> <</span><span class="cmtt-10"> PROGRAM.entities</span>
<br /><span class="cmtt-10"> </span><span class="cmtt-10"> </span><span class="cmtt-10"> </span><span class="cmtt-10"> </span><span class="cmtt-10"> </span><span class="cmtt-10"> </span><span class="cmtt-10"> </span><span class="cmtt-10"> <</span><span class="cmtt-10"> MODULE.code</span>
</div>
</div>
<p class="noindent">Thoses properties allow to fine tune <span class="cmtt-10">spaghettify</span> <a href="#x1-2500008.3.3.7">8.3.3.7</a> phase
<pre class="listings"><span class="cmtt-9"> </span><br /><span class="label"><a id="x1-250006r1"></a></span><span class="cmtt-9">DESTRUCTURE_TESTS</span><span class="cmtt-9"> </span><span class="cmtt-9">TRUE</span>
<span class="cmtt-9"> </span><br /><span class="label"><a id="x1-250007r2"></a></span></pre>
<pre class="listings"><span class="cmtt-9"> </span><br /><span class="label"><a id="x1-250008r1"></a></span><span class="cmtt-9">DESTRUCTURE_LOOPS</span><span class="cmtt-9"> </span><span class="cmtt-9">TRUE</span>
<span class="cmtt-9"> </span><br /><span class="label"><a id="x1-250009r2"></a></span></pre>
<pre class="listings"><span class="cmtt-9"> </span><br /><span class="label"><a id="x1-250010r1"></a></span><span class="cmtt-9">DESTRUCTURE_WHILELOOPS</span><span class="cmtt-9"> </span><span class="cmtt-9">TRUE</span>
<span class="cmtt-9"> </span><br /><span class="label"><a id="x1-250011r2"></a></span></pre>
<pre class="listings"><span class="cmtt-9"> </span><br /><span class="label"><a id="x1-250012r1"></a></span><span class="cmtt-9">DESTRUCTURE_FORLOOPS</span><span class="cmtt-9"> </span><span class="cmtt-9">TRUE</span>
<span class="cmtt-9"> </span><br /><span class="label"><a id="x1-250013r2"></a></span></pre>
<p class="noindent">
<h5 class="subsubsectionHead"><span class="titlemark">8.3.3.8 </span> <a id="x1-2510008.3.3.8"></a>Full Spaghettify</h5>
<p class="noindent">The <span class="cmtt-10">spaghettify</span> <a href="#x1-2500008.3.3.7">8.3.3.7</a> is used in context of PHRASE project while creating“Finite
State Machine”-like code portions in order to synthesize them in reconfigurable
units.
<p class="indent"> This phases transforms all the module in a unique flat unstructured statement.
<p class="indent"> Whereas the <span class="cmtt-10">spaghettify</span> <a href="#x1-2500008.3.3.7">8.3.3.7</a> transforms the module in a unstructured code
with hierarchical unstructured portions of code corresponding to the old structures,
the <span class="cmtt-10">full_spaghettify</span> <a href="#x1-2510008.3.3.8">8.3.3.8</a> transform the code in a sequence statement with a
beginning statement, a unique and flattened unstructured (all the unstructured and
sequences are flattened), and a final statement.
<div class="alltt">
<p class="noindent"><div class="obeylines-v">
<span class="cmtt-10">alias</span><span class="cmtt-10"> full_spaghettify</span><span class="cmtt-10"> ’Spaghettify</span><span class="cmtt-10"> the</span><span class="cmtt-10"> Control</span><span class="cmtt-10"> Graph</span><span class="cmtt-10"> for</span><span class="cmtt-10"> the</span><span class="cmtt-10"> entire</span><span class="cmtt-10"> module’</span>
</div>
</div>
<div class="alltt">
<p class="noindent"><div class="obeylines-v">
<span class="cmtt-10">full_spaghettify</span><span class="cmtt-10"> </span><span class="cmtt-10"> </span><span class="cmtt-10"> </span><span class="cmtt-10"> </span><span class="cmtt-10"> </span><span class="cmtt-10"> </span><span class="cmtt-10"> </span><span class="cmtt-10"> </span><span class="cmtt-10"> </span><span class="cmtt-10"> ></span><span class="cmtt-10"> MODULE.code</span>
<br /><span class="cmtt-10"> </span><span class="cmtt-10"> </span><span class="cmtt-10"> </span><span class="cmtt-10"> </span><span class="cmtt-10"> </span><span class="cmtt-10"> </span><span class="cmtt-10"> </span><span class="cmtt-10"> <</span><span class="cmtt-10"> PROGRAM.entities</span>
<br /><span class="cmtt-10"> </span><span class="cmtt-10"> </span><span class="cmtt-10"> </span><span class="cmtt-10"> </span><span class="cmtt-10"> </span><span class="cmtt-10"> </span><span class="cmtt-10"> </span><span class="cmtt-10"> <</span><span class="cmtt-10"> MODULE.code</span>
</div>
</div>
<p class="noindent">
<h4 class="subsectionHead"><span class="titlemark">8.3.4 </span> <a id="x1-2520008.3.4"></a>Control Structure Normalisation (STF)</h4>
<a id="dx1-252001"></a>
<a id="dx1-252002"></a>
<p class="noindent">Transformation <span class="cmtt-10">stf</span> <a href="#x1-2520008.3.4">8.3.4</a> is a C interface to a Shell script used to restructure a
Fortran program using ISTST (via the combined tool fragment ISTLY =
ISTLX/ISTYP and then ISTST) from TOOLPACK <span class="cite">[<span class="cmbx-10">?</span>, <span class="cmbx-10">?</span>]</span>.
<p class="indent"> Be careful, since TOOLPACK is written in Fortran, you need the Fortran runtime
libraries to run STF if is has not been statically compiled...
<p class="indent"> Known bug/feature: <span class="cmtt-10">stf</span> <a href="#x1-2520008.3.4">8.3.4</a> does not change resource <span class="cmtt-10">code </span>like other
transformations, but the <span class="cmtt-10">source </span>file. Transformations applied before <span class="cmtt-10">stf</span> <a href="#x1-2520008.3.4">8.3.4</a> are
lost. This should be changed in the near future.
<p class="indent"> This transformation is now assumed redundant with respect to the native PIPS
control restructurers that deal with other languages too.
<div class="alltt">
<p class="noindent"><div class="obeylines-v">
<span class="cmtt-10">alias</span><span class="cmtt-10"> stf</span><span class="cmtt-10"> ’Restructure</span><span class="cmtt-10"> with</span><span class="cmtt-10"> STF’</span>
<br /><span class="cmtt-10">stf</span><span class="cmtt-10"> </span><span class="cmtt-10"> </span><span class="cmtt-10"> </span><span class="cmtt-10"> </span><span class="cmtt-10"> </span><span class="cmtt-10"> </span><span class="cmtt-10"> </span><span class="cmtt-10"> </span><span class="cmtt-10"> </span><span class="cmtt-10"> </span><span class="cmtt-10"> </span><span class="cmtt-10"> </span><span class="cmtt-10"> </span><span class="cmtt-10"> </span><span class="cmtt-10"> </span><span class="cmtt-10"> </span><span class="cmtt-10"> </span><span class="cmtt-10"> </span><span class="cmtt-10"> </span><span class="cmtt-10"> </span><span class="cmtt-10"> </span><span class="cmtt-10"> ></span><span class="cmtt-10"> MODULE.source_file</span>
<br /><span class="cmtt-10"> </span><span class="cmtt-10"> </span><span class="cmtt-10"> </span><span class="cmtt-10"> </span><span class="cmtt-10"> </span><span class="cmtt-10"> </span><span class="cmtt-10"> </span><span class="cmtt-10"> <</span><span class="cmtt-10"> MODULE.source_file</span>
</div>
</div>
<p class="noindent">
<h4 class="subsectionHead"><span class="titlemark">8.3.5 </span> <a id="x1-2530008.3.5"></a>Trivial Test Elimination</h4>
<a id="dx1-253001"></a>
<p class="noindent">Function <span class="cmtt-10">suppress_trivial_test</span> <a href="#x1-2530008.3.5">8.3.5</a> is used to delete the branch TRUE of trivial
test instruction. After apply <span class="cmtt-10">suppress_trivial_test</span> <a href="#x1-2530008.3.5">8.3.5</a>, the condition of the new
test instruction is the condition correspondent to the branch FALSE of the test
initial.
<p class="indent"> This function was designed and implemented by Trinh Quoc <span class="cmcsc-10">A<span class="small-caps">n</span><span class="small-caps">h</span></span>.
<div class="alltt">
<p class="noindent"><div class="obeylines-v">
<span class="cmtt-10">alias</span><span class="cmtt-10"> suppress_trivial_test</span><span class="cmtt-10"> ’Trivial</span><span class="cmtt-10"> Test</span><span class="cmtt-10"> Elimination’</span>
<br /><span class="cmtt-10">suppress_trivial_test</span><span class="cmtt-10"> </span><span class="cmtt-10"> </span><span class="cmtt-10"> </span><span class="cmtt-10"> </span><span class="cmtt-10"> </span><span class="cmtt-10"> </span><span class="cmtt-10"> </span><span class="cmtt-10"> </span><span class="cmtt-10"> </span><span class="cmtt-10"> ></span><span class="cmtt-10"> MODULE.code</span>
<br /><span class="cmtt-10"> </span><span class="cmtt-10"> </span><span class="cmtt-10"> </span><span class="cmtt-10"> </span><span class="cmtt-10"> </span><span class="cmtt-10"> </span><span class="cmtt-10"> </span><span class="cmtt-10"> <</span><span class="cmtt-10"> PROGRAM.entities</span>
<br /><span class="cmtt-10"> </span><span class="cmtt-10"> </span><span class="cmtt-10"> </span><span class="cmtt-10"> </span><span class="cmtt-10"> </span><span class="cmtt-10"> </span><span class="cmtt-10"> </span><span class="cmtt-10"> <</span><span class="cmtt-10"> MODULE.code</span>
</div>
</div>
<p class="noindent">
<h4 class="subsectionHead"><span class="titlemark">8.3.6 </span> <a id="x1-2540008.3.6"></a>Finite State Machine Generation</h4>
<a id="dx1-254001"></a>
<a id="dx1-254002"></a>
<p class="noindent">Theses phases are used for PHRASE project.
<p class="indent"> NB: The PHRASE project is an attempt to automatically (or semi-automatically)
transform high-level language for partial evaluation in reconfigurable logic (such as
FPGAs or DataPaths).
<p class="indent"> This library provides phases allowing to build and modify ”Finite State
Machine”-like code portions which will be later synthesized in reconfigurable units.
This was implemented by Sylvain <span class="cmcsc-10">G<span class="small-caps">u</span></span><span class="cmcsc-10">ï¿<span class="small-caps">œ</span><span class="small-caps">r</span><span class="small-caps">i</span><span class="small-caps">n</span></span>.
<p class="noindent">
<h5 class="subsubsectionHead"><span class="titlemark">8.3.6.1 </span> <a id="x1-2550008.3.6.1"></a>FSM Generation</h5>
<p class="noindent">This phase tries to generate finite state machine from arbitrary code by applying
rules numeroting branches of the syntax tree and using it as state variable for the
finite state machine.
<p class="indent"> This phase recursively transforms each UNSTRUCTURED statement in a
WHILE-LOOP statement controlled by a state variable, whose different values are
associated to the different statements.
<p class="indent"> To add flexibility, the behavior of <span class="cmtt-10">fsm_generation</span> <a href="#x1-2550008.3.6.1">8.3.6.1</a> is controlled by the
property <span class="cmtt-10">FSMIZE_WITH_GLOBAL_VARIABLE</span> <a href="#x1-2590008.3.6.5">8.3.6.5</a> which controls the fact that the
same global variable (global to the current module) must be used for each FSMized
statements.
<div class="alltt">
<p class="noindent"><div class="obeylines-v">
<span class="cmtt-10">alias</span><span class="cmtt-10"> fsm_generation</span><span class="cmtt-10"> ’FSM</span><span class="cmtt-10"> Generation’</span>
<br />
<br /><span class="cmtt-10">fsm_generation</span><span class="cmtt-10"> </span><span class="cmtt-10"> </span><span class="cmtt-10"> </span><span class="cmtt-10"> </span><span class="cmtt-10"> </span><span class="cmtt-10"> </span><span class="cmtt-10"> </span><span class="cmtt-10"> ></span><span class="cmtt-10"> MODULE.code</span>
<br /><span class="cmtt-10"> </span><span class="cmtt-10"> </span><span class="cmtt-10"> </span><span class="cmtt-10"> </span><span class="cmtt-10"> </span><span class="cmtt-10"> </span><span class="cmtt-10"> </span><span class="cmtt-10"> </span><span class="cmtt-10"> </span><span class="cmtt-10"> </span><span class="cmtt-10"> </span><span class="cmtt-10"> </span><span class="cmtt-10"> </span><span class="cmtt-10"> </span><span class="cmtt-10"> </span><span class="cmtt-10"> </span><span class="cmtt-10"> </span><span class="cmtt-10"> </span><span class="cmtt-10"> </span><span class="cmtt-10"> </span><span class="cmtt-10"> </span><span class="cmtt-10"> ></span><span class="cmtt-10"> PROGRAM.entities</span>
<br /><span class="cmtt-10"> </span><span class="cmtt-10"> </span><span class="cmtt-10"> </span><span class="cmtt-10"> </span><span class="cmtt-10"> </span><span class="cmtt-10"> </span><span class="cmtt-10"> </span><span class="cmtt-10"> <</span><span class="cmtt-10"> PROGRAM.entities</span>
<br /><span class="cmtt-10"> </span><span class="cmtt-10"> </span><span class="cmtt-10"> </span><span class="cmtt-10"> </span><span class="cmtt-10"> </span><span class="cmtt-10"> </span><span class="cmtt-10"> </span><span class="cmtt-10"> <</span><span class="cmtt-10"> MODULE.code</span>
</div>
</div>
<p class="noindent">To generate a hierarchical finite state machine, apply first <span class="cmtt-10">spaghettify</span> <a href="#x1-2500008.3.3.7">8.3.3.7</a>
(<span class="cmsy-10">§</span> <a href="#x1-2500008.3.3.7">8.3.3.7</a>) and then <span class="cmtt-10">fsm_generation</span> <a href="#x1-2550008.3.6.1">8.3.6.1</a>.
<p class="indent"> To generate a flat finite state machine, apply first <span class="cmtt-10">full_spaghettify</span> <a href="#x1-2510008.3.3.8">8.3.3.8</a>
(<span class="cmsy-10">§</span> <a href="#x1-2510008.3.3.8">8.3.3.8</a>) and then <span class="cmtt-10">fsm_generation</span> <a href="#x1-2550008.3.6.1">8.3.6.1</a> or use the aggregate phase
<span class="cmtt-10">full_fsm_generation</span> <a href="#x1-2560008.3.6.2">8.3.6.2</a>.
<p class="noindent">
<h5 class="subsubsectionHead"><span class="titlemark">8.3.6.2 </span> <a id="x1-2560008.3.6.2"></a>Full FSM Generation</h5>
<p class="noindent">This phase tries to generate a flat finite state machine from arbitrary code by
applying rules numeroting branches of the syntax tree and using it as state variable
for the finite state machine.
<p class="indent"> This phase transform all the module in a FSM-like code, which is a WHILE-LOOP
statement controlled by a state variable, whose different values are associated to the
different statements.
<p class="indent"> In fact, this phase do nothing but rely on pipsmake to apply the succession of the
2 phases <span class="cmtt-10">full_spaghettify</span> <a href="#x1-2510008.3.3.8">8.3.3.8</a> and <span class="cmtt-10">fsm_generation</span> <a href="#x1-2550008.3.6.1">8.3.6.1</a> (<span class="cmsy-10">§</span> <a href="#x1-2550008.3.6.1">8.3.6.1</a>)
<div class="alltt">
<p class="noindent"><div class="obeylines-v">
<span class="cmtt-10">alias</span><span class="cmtt-10"> full_fsm_generation</span><span class="cmtt-10"> ’Full</span><span class="cmtt-10"> FSM</span><span class="cmtt-10"> Generation’</span>
<br />
<br /><span class="cmtt-10">full_fsm_generation</span><span class="cmtt-10"> </span><span class="cmtt-10"> </span><span class="cmtt-10"> ></span><span class="cmtt-10"> MODULE.code</span>
<br /><span class="cmtt-10"> </span><span class="cmtt-10"> </span><span class="cmtt-10"> </span><span class="cmtt-10"> </span><span class="cmtt-10"> </span><span class="cmtt-10"> </span><span class="cmtt-10"> </span><span class="cmtt-10"> </span><span class="cmtt-10"> </span><span class="cmtt-10"> </span><span class="cmtt-10"> </span><span class="cmtt-10"> </span><span class="cmtt-10"> </span><span class="cmtt-10"> </span><span class="cmtt-10"> </span><span class="cmtt-10"> </span><span class="cmtt-10"> </span><span class="cmtt-10"> </span><span class="cmtt-10"> </span><span class="cmtt-10"> </span><span class="cmtt-10"> </span><span class="cmtt-10"> ></span><span class="cmtt-10"> PROGRAM.entities</span>
<br /><span class="cmtt-10"> </span><span class="cmtt-10"> </span><span class="cmtt-10"> </span><span class="cmtt-10"> </span><span class="cmtt-10"> </span><span class="cmtt-10"> </span><span class="cmtt-10"> </span><span class="cmtt-10"> !</span><span class="cmtt-10"> MODULE.full_spaghettify</span>
<br /><span class="cmtt-10"> </span><span class="cmtt-10"> </span><span class="cmtt-10"> </span><span class="cmtt-10"> </span><span class="cmtt-10"> </span><span class="cmtt-10"> </span><span class="cmtt-10"> </span><span class="cmtt-10"> !</span><span class="cmtt-10"> MODULE.fsm_generation</span>
<br /><span class="cmtt-10"> </span><span class="cmtt-10"> </span><span class="cmtt-10"> </span><span class="cmtt-10"> </span><span class="cmtt-10"> </span><span class="cmtt-10"> </span><span class="cmtt-10"> </span><span class="cmtt-10"> <</span><span class="cmtt-10"> PROGRAM.entities</span>
<br /><span class="cmtt-10"> </span><span class="cmtt-10"> </span><span class="cmtt-10"> </span><span class="cmtt-10"> </span><span class="cmtt-10"> </span><span class="cmtt-10"> </span><span class="cmtt-10"> </span><span class="cmtt-10"> <</span><span class="cmtt-10"> MODULE.code</span>
</div>
</div>
<p class="noindent">
<h5 class="subsubsectionHead"><span class="titlemark">8.3.6.3 </span> <a id="x1-2570008.3.6.3"></a>FSM Split State</h5>
<p class="noindent">This phase is not yet implemented and do nothing right now...
<p class="indent"> This phase transform a state of a FSM-like statement and split it into n new
states where the portion of code to execute is smaller.
<p class="indent"> NB: Phase <span class="cmtt-10">full_spaghettify</span> <a href="#x1-2510008.3.3.8">8.3.3.8</a> must have been applied first !
<div class="alltt">
<p class="noindent"><div class="obeylines-v">
<span class="cmtt-10">alias</span><span class="cmtt-10"> fsm_split_state</span><span class="cmtt-10"> ’FSM</span><span class="cmtt-10"> split</span><span class="cmtt-10"> state</span>
<br />
<br /><span class="cmtt-10">fsm_split_state</span><span class="cmtt-10"> </span><span class="cmtt-10"> </span><span class="cmtt-10"> </span><span class="cmtt-10"> ></span><span class="cmtt-10"> MODULE.code</span>
<br /><span class="cmtt-10"> </span><span class="cmtt-10"> </span><span class="cmtt-10"> </span><span class="cmtt-10"> </span><span class="cmtt-10"> </span><span class="cmtt-10"> </span><span class="cmtt-10"> </span><span class="cmtt-10"> <</span><span class="cmtt-10"> PROGRAM.entities</span>
<br /><span class="cmtt-10"> </span><span class="cmtt-10"> </span><span class="cmtt-10"> </span><span class="cmtt-10"> </span><span class="cmtt-10"> </span><span class="cmtt-10"> </span><span class="cmtt-10"> </span><span class="cmtt-10"> <</span><span class="cmtt-10"> MODULE.code</span>
</div>
</div>
<p class="noindent">
<h5 class="subsubsectionHead"><span class="titlemark">8.3.6.4 </span> <a id="x1-2580008.3.6.4"></a>FSM Merge States</h5>
<p class="noindent">This phase is not yet implemented and do nothing right now...
<p class="indent"> This phase transform 2 or more states of a FSM-like statement and merge them
into a new state where the portion of code to execute is bigger.
<p class="indent"> NB: Phase <span class="cmtt-10">full_spaghettify</span> <a href="#x1-2510008.3.3.8">8.3.3.8</a> must have been applied first !
<div class="alltt">
<p class="noindent"><div class="obeylines-v">
<span class="cmtt-10">alias</span><span class="cmtt-10"> fsm_merge_states</span><span class="cmtt-10"> ’FSM</span><span class="cmtt-10"> merge</span><span class="cmtt-10"> states</span>
<br />
<br /><span class="cmtt-10">fsm_merge_states</span><span class="cmtt-10"> </span><span class="cmtt-10"> </span><span class="cmtt-10"> </span><span class="cmtt-10"> ></span><span class="cmtt-10"> MODULE.code</span>
<br /><span class="cmtt-10"> </span><span class="cmtt-10"> </span><span class="cmtt-10"> </span><span class="cmtt-10"> </span><span class="cmtt-10"> </span><span class="cmtt-10"> </span><span class="cmtt-10"> </span><span class="cmtt-10"> <</span><span class="cmtt-10"> PROGRAM.entities</span>
<br /><span class="cmtt-10"> </span><span class="cmtt-10"> </span><span class="cmtt-10"> </span><span class="cmtt-10"> </span><span class="cmtt-10"> </span><span class="cmtt-10"> </span><span class="cmtt-10"> </span><span class="cmtt-10"> <</span><span class="cmtt-10"> MODULE.code</span>
</div>
</div>
<p class="noindent">
<h5 class="subsubsectionHead"><span class="titlemark">8.3.6.5 </span> <a id="x1-2590008.3.6.5"></a>FSM properties</h5>
<a id="dx1-259001"></a>
<a id="dx1-259002"></a>
<p class="noindent">Control the fact that the same global variable (global to the current module) must be
used for each FSMized statements.
<pre class="listings"><span class="cmtt-9"> </span><br /><span class="label"><a id="x1-259003r1"></a></span><span class="cmtt-9">FSMIZE_WITH_GLOBAL_VARIABLE</span><span class="cmtt-9"> </span><span class="cmtt-9">FALSE</span>
<span class="cmtt-9"> </span><br /><span class="label"><a id="x1-259004r2"></a></span></pre>
<p class="noindent">
<h4 class="subsectionHead"><span class="titlemark">8.3.7 </span> <a id="x1-2600008.3.7"></a>Control Counters</h4>
<a id="dx1-260001"></a>
<p class="noindent">A code instrumentation that adds local integer counters in tests and loops to know
how many times a path is taken. This transformation may help some semantical
analyses.
<div class="alltt">
<p class="noindent"><div class="obeylines-v">
<span class="cmtt-10">alias</span><span class="cmtt-10"> add_control_counters</span><span class="cmtt-10"> ’Control</span><span class="cmtt-10"> counters</span>
<br />
<br /><span class="cmtt-10">add_control_counters</span><span class="cmtt-10"> </span><span class="cmtt-10"> </span><span class="cmtt-10"> </span><span class="cmtt-10"> ></span><span class="cmtt-10"> MODULE.code</span>
<br /><span class="cmtt-10"> </span><span class="cmtt-10"> </span><span class="cmtt-10"> </span><span class="cmtt-10"> </span><span class="cmtt-10"> </span><span class="cmtt-10"> </span><span class="cmtt-10"> </span><span class="cmtt-10"> <</span><span class="cmtt-10"> PROGRAM.entities</span>
<br /><span class="cmtt-10"> </span><span class="cmtt-10"> </span><span class="cmtt-10"> </span><span class="cmtt-10"> </span><span class="cmtt-10"> </span><span class="cmtt-10"> </span><span class="cmtt-10"> </span><span class="cmtt-10"> <</span><span class="cmtt-10"> MODULE.code</span>
</div>
</div>
<p class="noindent">
<h3 class="sectionHead"><span class="titlemark">8.4 </span> <a id="x1-2610008.4"></a>Expression Transformations</h3>
<p class="noindent">
<h4 class="subsectionHead"><span class="titlemark">8.4.1 </span> <a id="x1-2620008.4.1"></a>Atomizers</h4>
<a id="dx1-262001"></a>
<a id="dx1-262002"></a>
<p class="noindent">Atomizer produces, or should produce, three-address like instructions, in Fortran. An
atomic instructions is an instruction that contains no more than three variables, such
as <span class="cmtt-10">A = B op C</span>. The result is a program in a low-level Fortran on which you are able
to use all the others passes of <span class="cmtt-10">PIPS</span>.
<p class="indent"> Atomizers are used to simplify the statement encountered by automatic
distribution phases. For instance, indirect addressing like <span class="obeylines-h"><span class="verb"><span class="cmtt-10">A(B(I[+]
= ... is replaced
by T=B(I);A(T) = ....
8.4.1.1 General Atomizer
alias atomizer ’Atomizer’
atomizer > MODULE.code < PROGRAM.entities < MODULE.code < MODULE.cumulated_effects < MODULE.dg
8.4.1.2 Limited AtomizerThis pass performs subscripts atomization so that they can be converted in reference for more accruate analysis.
simplify_subscripts > MODULE.code
< PROGRAM.entities < MODULE.code This pass performs a conversion from complex to real. SIMPLIFY_COMPLEX_USE_ARRAY_OF_STRUCTS 8.4.1.2 controls the new layout
simplify_complex > MODULE.code
< PROGRAM.entities < MODULE.code
Split structures in separated variables when possible, that is remove the structure variable and replaces all fields by different variables.
split_structures > MODULE.code
< PROGRAM.entities < MODULE.code Here is a new version of the atomizer using a small atomizer from the HPF compiler (see Section 7.3.2).
alias new_atomizer ’New Atomizer’
new_atomizer > MODULE.code < PROGRAM.entities < MODULE.code < MODULE.cumulated_effects An atomizer is also used by WP65 (see Section 7.3.1)
8.4.1.3 Atomizer propertiesThis transformation only atomizes indirect references of array access functions.
By default, simple array accesses such as X(I+2) are atomized, although it is not necessary to generate assembly code:
The purpose of the default option is to maximise common subexpression elimination. Once a code has been atomized, you can use this transformation to generate two address code only It can be useful for asm generation
generate_two_addresses_code > MODULE.code
< MODULE.code < MODULE.cumulated_effects < PROGRAM.entities
8.4.2 Partial EvaluationFunction partial_eval 8.4.2 produces code where numerical constant expressions or subexpressions are replaced by their value. Using the preconditions, some variables are evaluated to a integer constant, and replaced wherever possible. They are not replaced in user function calls because Fortran uses a call-by-reference mechanism and because they might be updated by the function. For the same conservative reason, they are not replaced in intrinsics calls. Note that symbolic constants were left unevaluated because they already are constant. However it was found unfriendly by users because the principle of least surprise was not enforced: symbolic constants were sometimes replaced in the middle of an expression but not when the whole expression was a reference to a symbolic constant. Symbolic integer constants are now replaced by their values systematically. Transformations simplify_control 8.3.1 and dead_code_elimination 8.3.2 should be performed after partial evaluation. It is sometimes important to run more than one partial evaluation in a row, because the first partial evaluation may linearize some initially non-linear expressions. Perfect Club benchmark ocean is a case in point. Comments from Nga Nguyen: According to [?] and [?], the name of this optimization should be Constant-Expression Evaluation or Constant Folding for integer values. This transformation produces well error message at compile time indicating potential error such as division by zero.
alias partial_eval ’Partial Eval’
partial_eval > MODULE.code < PROGRAM.entities < MODULE.code < MODULE.proper_effects < MODULE.cumulated_effects < MODULE.preconditions PIPS3 default behavior in various places is to evaluate symbolic constants. While meaningful, this approach is not source-to-source compliant, so one can set property EVAL_SYMBOLIC_CONSTANT 8.4.2 to FALSE to prevent some of those evaluations.
One can also set PARTIAL_EVAL_ALWAYS_SIMPLIFY 8.4.2 to TRUE in order to force distribution, even when it does not seem profitable
Likewise, one can turn following property to true if he wants to use hard-coded value for size of types
This function was implemented initially by Bruno Baron. 8.4.3 Reduction DetectionPhase Reductions detects generalized instructions and replaces them by calls to a run-time library supporting parallel reductions. It was developed by Pierre Jouvelot in CommonLISP, as a prototype, to show than NewGen data structures were language-neutral. Thus it by-passes some of pipsmake/dbm facilities. This phase is now obsolete, although reduction detection is critical for code restructuring and optimization... A new reduction detection phase was implemented by Fabien Coelho. Have a look at § 6.3 but it does not include a code transformation. Its result could be prettyprinted in an HPF style (FC: implementation?).
old_reductions > MODULE.code
< PROGRAM.entities < MODULE.code < MODULE.cumulated_effects
8.4.4 Reduction Replacementreplace_reduction_with_atomic 8.4.4 replace all reduction in loop that are marked as parallel with reduction by coarse_grain_parallelization_with_reduction 7.1.6. The property ATOMIC_OPERATION_PROFILE 8.4.4 control the set of atomic operations and operand allowed. At that time only “cuda” is supported. flag_parallel_reduced_loops_with_atomic 8.4.4 flag as parallel all loops that were detected by coarse_grain_parallelization_with_reduction 7.1.6. The property ATOMIC_OPERATION_PROFILE 8.4.4 control the set of atomic operations and operand allowed. At that time only “cuda” is supported.
replace_reduction_with_atomic > MODULE.code
> MODULE.callees < PROGRAM.entities < MODULE.code < MODULE.reduction_parallel_loops < MODULE.cumulated_reductions
flag_parallel_reduced_loops_with_atomic > MODULE.code
> MODULE.callees < PROGRAM.entities < MODULE.code < MODULE.reduction_parallel_loops < MODULE.cumulated_reductions Flag loops with openmp directives, taking into account reductions.
flag_parallel_reduced_loops_with_openmp_directives > MODULE.code
< PROGRAM.entities < MODULE.code < MODULE.reduction_parallel_loops < MODULE.cumulated_reductions
8.4.5 Forward SubstitutionScalars can be forward substituted. The effect is to undo already performed optimizations such as invariant code motion and common subexpression elimination, or manual atomization. However we hope to do a better job automatically!
alias forward_substitute ’Forward Substitution’
forward_substitute > MODULE.code < PROGRAM.entities < MODULE.code < MODULE.proper_effects < MODULE.dg < MODULE.cumulated_effects One can set FORWARD_SUBSTITUTE_OPTIMISTIC_CLEAN 8.4.5 to TRUE in order to clean (without check) forward - substituted assignments. Use cautiously !
8.4.6 Expression SubstitutionThis transformation is quickly developed to fulfill the need of a simple pattern matcher in pips. The user provide a module name through EXPRESSION_SUBSTITUTION_PATTERN 8.4.6 property and all expression similar to those contained in EXPRESSION_SUBSTITUTION_PATTERN 8.4.6 will be substituted to a call to this module. It is a kind of simple outlining transformations, it proves to be useful during simdization to recognize some idioms. Note that the pattern must contain only a single return instruction! This phase was developed by Serge Guelton during his PhD.
alias expression_substitution ’Expression Substitution’
expression_substitution > MODULE.code > MODULE.callee < PROGRAM.entities < ALL.code Set RELAX_FLOAT_ASSOCIATIVITY 8.4.6 to TRUE if you want to consider all floating point operations as really associative4 :
This property is used to set the one-liner module used during expression substitution. It must be the name of a module already loaded in pips and containing only one return instruction (the instruction to be matched).
8.4.7 Rename operatorThis transformation replaces all language operators by function calls.
rename_operator > MODULE.code
< MODULE.code < PROGRAM.entities Function name is deduced from operator name, operator arguments type and a common prefix. Each function name is built using the pattern [PREFIX][OP NAME][SUFFIX] (eg: int + int will lead to op_addi). The replacement function must have been declared, otherwise a warning is asserted and the operator is ignored. OP NAME is given by the following table:
Using the property RENAME_OPERATOR_OPS 8.4.7, it is possible to give a restrictive list of operator name on which operator renaming should be applied. Every operator not listed in this list will be ignored.
Assuming that all arguments of the operator have the same type. SUFFIX is deduced using the following table:
Using the property RENAME_OPERATOR_SUFFIXES 8.4.7, it is possible to give a restrictive list of suffix on which operator renaming should be applied. Every type not listed in this list will be ignored.
The PREFIX is a common prefix defined by the property RENAME_OPERATOR_PREFIX 8.4.7 which is applied to each operators. It can be used to choose between multiple implementations of the same operator. The default value is op_.
In Pips, C For loop like for(i=0; i < n; i++) is represented by a Fortran-like range-based Do loop do i = 1,n-1. Thus, the code: will be rewritten : If you want it to be rewritten : you should set the property RENAME_OPERATOR_REWRITE_DO_LOOP_RANGE 8.4.7 to TRUE. This is not the default behaviour, because in most case you don’t want to rewrite For loop like this.
Some operators (=, +=, …) takes a modifiable lvalue. In this case, the expected function signature for a type T is T (T*, T). For instance, the code: would be rewritten:
8.4.8 Array to pointer conversionThis transformation replaces all arrays in the module by equivalent linearized arrays. Eventually using array/ pointer equivalence.
linearize_array > MODULE.code
> COMPILATION_UNIT.code > CALLERS.code > PROGRAM.entities < PROGRAM.entities < MODULE.code < COMPILATION_UNIT.code < CALLERS.code This transformation replaces all arrays in the module by equivalent linearized arrays. This only makes the arrays starting their index from one.
linearize_array_fortran > MODULE.code
> CALLERS.code > PROGRAM.entities < PROGRAM.entities < MODULE.code < CALLERS.code Use LINEARIZE_ARRAY_USE_POINTERS 8.4.8 to control whether arrays are declared as 1D arrays or pointers. Pointers are accessed using dereferencement and arrays using subscripts. This property does not apply to the fortran case.
Use LINEARIZE_ARRAY_MODIFY_CALL_SITE 8.4.8 to control whether the call site is modified or not.
Use LINEARIZE_ARRAY_CAST_AT_CALL_SITE 8.4.8 to control whether a cast is inserted at call sites. Turning it on break further effects analysis, but without the cast it might break compilation or at least generate warnings for type mismatch. This property does not apply to the fortran case.
Use LINEARIZE_ARRAY_VLA_ONLY 8.4.8 to limit this pass to C99 VLA array.
8.4.9 Expression Optimizations
8.4.9.1 Expression optimization using algebraic propertiesThis is an experimental section developed by Julien Zory as PhD work [?]. This phase aims at optimizing expression evaluation using algebraic properties such as associativity, commutativity, neutral elements and so forth. This phase restructure arithmetic expressions in order (1) to decrease the number of operations (e.g. through factorization), (2) to increase the ILP by keeping the corresponding DAG wide enough, (3) to facilitate the detection of composite instructions such as multiply-add, (4) to provide additional opportunities for (4a) invariant code motion (ICM) and (4b) common subexpression elimination (CSE). Large arithmetic expressions are first built up via forward substitution when the programmer has already applied ICM and CSE by hand. The optimal restructuring of expressions depends on the target defined by a combination of the computer architecture and the compiler. The target is specified by a string property called EOLE_OPTIMIZATION_STRATEGY 8.4.9.1 which can take values such as "P2SC" for IBM Power-2 architecture and XLF 4.3. To activate all sub-transformations such as ICM and CSE set it to "FULL". See properties for more information about values for this property and about other properties controlling the behavior of this phase. The current implementation is still shaky and does not handle well expressions of mixed types such as X+1 where 1 is implictly promoted from integer to real. Warning: this phase relies on an external (and unavailable) binary. To make it work, you can set EOLE_OPTIMIZATION_STRATEGY 8.4.9.1 to "CSE" or "ICM", or even ICMCSE to have both. This will only activate common subexpressions elimination or invariant code motion. Since it is a quite common use case, they have been defined as independent phase too. See 8.4.9.2.
alias optimize_expressions ’Optimize Expressions’
optimize_expressions > MODULE.code < PROGRAM.entities < MODULE.proper_effects < MODULE.cumulated_effects < MODULE.code alias instruction_selection ’Select Instructions’ instruction_selection > MODULE.code < PROGRAM.entities < MODULE.code EOLE: Evaluation Optimization of Loops and Expressions. Julien Zory stuff integrated within pips [?]. It relies on an external tool named eole. The version and options set can be controlled from the following properties. The status is experimental. See the optimize_expressions 8.4.9.1 pass for more details about the advanced transformations performed.
8.4.9.2 Common subexpression eliminationHere are described 2 interesting cases of the one in § 8.4.9.1. Run common sub-expression elimination to factorize out some redundant expressions in the code. One can use COMMON_SUBEXPRESSION_ELIMINATION_SKIP_ADDED_CONSTANT 8.4.9.2 to skip expression of the form a+2 and COMMON_SUBEXPRESSION_ELIMINATION_SKIP_LHS 8.4.9.2 to prevent elimination of left hand side of assignment. The heuristic used for common subexpression elimination is described in Chapter 15 of Julien Zory’s PhD dissertation [?].
alias common_subexpression_elimination ’Common Subexpression Elimination’
common_subexpression_elimination > MODULE.code < PROGRAM.entities < MODULE.proper_effects < MODULE.cumulated_effects < MODULE.code alias icm ’Invariant Code Motion’ icm > MODULE.code < PROGRAM.entities < MODULE.proper_effects < MODULE.cumulated_effects < MODULE.code Note: the icm deals with expressions while the invariant_code_motion deals with loop invariant code. The following property is used in sac to limit the subexpressions: When set to true, only subexpressions without ”+constant” terms are eligible.
Performs invariant code motion over sub expressions
8.5 Hardware AcceleratorGenerate code from a FREIA application possibly targeting hardware accelerator, such as SPoC, Terapix, or GPGPU. I’m unsure about the right granularity (now it is at the function level) and the resource which is produced (should it be an accelerated file?). The current choice does not allow to easily mix different accelerators.
8.5.1 FREIA SoftwareGenerate code for a software FREIA implementation, by applying various optimizations at the library API level, but without generating accelerated functions.
freia_aipo_compiler > MODULE.code
> MODULE.callees < PROGRAM.entities < MODULE.code < MODULE.cumulated_effects The following properties are generic to all FREIA accelerator targets. Whether to label arcs in dag dot output with the image name, and to label nodes with the statement number, and whether to filter out unused scalar nodes.
Whether to compile lone operations, i.e. operations which do not belong to a sequence.
Whether to normalize some operations:
Whether to simplify the DAG using algebraic properties.
Whether to remove dead image operations in the DAG. Should always be beneficial.
Whether to remove duplicate operations in the DAG, including algebraic optimizations with commutators. Should be always beneficial to terapix, but it may depend for spoc.
Whether to remove useless image copies from the expression DAG.
Whether to move image copies within an expression DAG outside as external copies, if possible.
Whether to merge identical arguments, especially kernels, when calling an accelerated function:
Whether to attempt to reuse initial images if possible, instead of keeping possibly newly introduced temporary images.
Try to allow shuffling image pointers, but this is not allowed by default because it may lead to wrong code as the compiler currently ignores the information and mixes up images.
Whether to assume that casts are simple image copies. Default is to keep a cast as cast, which is not accelerated.
Whether to cleanup freia returned status, as the code is assumed correct when compiled.
Assume this pixel size in bits:
If set to a non-zero value, assume this image size when generating code. If zero, try generic code. In particular, the height is useful to compute a better imagelet size when generating code for the Terapix hardware accelerator.
8.5.2 FREIA SPoCFREIA Compiler for SPoC target. Consider applying freia_unroll_while beforehand.
freia_spoc_compiler > MODULE.code
> MODULE.callees > MODULE.spoc_file < PROGRAM.entities < MODULE.code < MODULE.cumulated_effects freia_unroll_while > MODULE.code < PROGRAM.entities < MODULE.code Default depth of the target SPoC accelerator:
8.5.3 FREIA TerapixFREIA compiler for Terapix target.
freia_terapix_compiler > MODULE.code
> MODULE.callees > MODULE.terapix_file < PROGRAM.entities < MODULE.code < MODULE.cumulated_effects Number of processing elements (PE) for the Terapix accelerator:
Default size of memory, in pixel, for the Terapix accelerator (RAMPE is RAM of PE):
Terapix DMA bandwidth. How many terapix cycles to transfer an imagelet row (size of which is necessarily the number of pe):
Terapix 2D global RAM (GRAM) width and height:
Whether and how to further cut the dag for terapix. Expected values, by order of compilation costs, are: none, compute, enumerate.
Whether input and output memory transfers overlap one with the other, that we have a full duplex DMA.
Note that it is already assumed that computations overlap with communications. This adds the information that host-accelerator loads and stores run in parallel. This has two impacts: the communication apparent time is reduced thanks to the overlapping, which is good, but the imagelet memory cannot be reused for inputs because it is still live while being stored, which is bad for memory pressure. Use this maximum size (height) of an imagelet if set to non-zero. It may be useful to set this value to the image height (if known) so that compiler generates code for smaller imagelets, so that the runtime is not surprised. This is rather use of debug to impose an imagelet size.
8.5.4 FREIA OpenCLFREIA compiler for OpenCL target, which may run on both multi-core and GPU.
freia_opencl_compiler > MODULE.code
> MODULE.callees > MODULE.opencl_file < PROGRAM.entities < MODULE.code < MODULE.cumulated_effects Whether we should attempt to generate merged OpenCL image operations. If not, the result will be simular to simple AIPO compilation, no actual helper functions will be generated.
Whether merged OpenCL image operations should include reductions as well. Added to help debugging the code.
Whether to generate OpenCL code for one operation on its own. It is not interesting to do so because it is just equivalent to the already existing AIPO implementation, but it can be useful for debug.
8.6 Function Level Transformations
8.6.1 InliningInlining is a well known technique. Basically, it replaces a function call by the function body. The current implementation does not work if the function has static declarations, access global variables …. Actually it (seems to) work(s) for pure, non-recursive functions …and not to work for any other kind of call. Property INLINING_CALLERS 8.6.1 can be set to define the list of functions where the call sites have to be inlined. By default, all call sites of the inlined function are inlined. Only for C because of pipsmake output declaration !
inlining > CALLERS.c_source_file
> PROGRAM.entities > MODULE.callers ! MODULE.split_initializations < PROGRAM.entities < CALLERS.code < CALLERS.printed_file < MODULE.code < MODULE.cumulated_effects * ALL.restructure_control * ALL.remove_useless_label Use following property to control how generated variables are initialized
Use following property to control whether inlining should ignore stubs:
Use following property to control whether inlining should ignore stubs:
Set following property to TRUE to add comments on inlinied statements to keep track of their origin.
Same as inlining but always simulate the by-copy argument passing Only for C because of pipsmake output declaration !
inlining_simple > CALLERS.c_source_file
> PROGRAM.entities > MODULE.callers ! MODULE.split_initializations < PROGRAM.entities < CALLERS.code < CALLERS.printed_file < MODULE.code < MODULE.callers * ALL.restructure_control * ALL.remove_useless_label Regenerate the ri from the ri ... Only for C because of pipsmake output declaration !
recompile_module > MODULE.c_source_file
< MODULE.code The default behavior of inlining is to inline the given module in all call sites. Use INLINING_CALLERS 8.6.1 property to filter the call sites: only given module names will be considered.
8.6.2 UnfoldingUnfolding is a complementary transformation of inlining 8.6.1. While inlining inlines all call sites to a given module in other modules, unfolding inlines recursively all call sites in a given module, thus unfolding the content of the module. An unfolded source code does not contain any call anymore. If you run it recursievly, you should set INLINING_USE_INITIALIZATION_LIST 8.6.1 to false. Only for C because of output declaration in pipsmake rule!
unfolding > MODULE.c_source_file
> MODULE.callees > PROGRAM.entities ! CALLERS.split_initializations < PROGRAM.entities < MODULE.code < MODULE.printed_file < MODULE.cumulated_effects < CALLEES.code * ALL.restructure_control * ALL.remove_useless_label Same as unfolding, but cumulated effects are not used, and the resulting code always simulates the by-copy argument passing. Only for C because of output declaration in pipsmake rule!
unfolding_simple > MODULE.c_source_file
> MODULE.callees > PROGRAM.entities ! CALLERS.split_initializations < PROGRAM.entities < MODULE.code < MODULE.printed_file < CALLEES.code * ALL.restructure_control * ALL.remove_useless_label Use UNFOLDING_CALLEES 8.6.2, to specify which modules you want to inline in the unfolded module. The unfolding will be performed as long as one of the module in UNFOLDING_CALLEES 8.6.2 is called. More than one module can be specified, they are separated by blank spaces.
The default behavior of the unfolding 8.6.2 pass is to recursively inline all callees from the current module or from the argument modules of the pass, as long as a callee remains. You can use UNFOLDING_FILTER 8.6.2 to inline all call sites to a module not present in the space separated module list defined by the string property:
By default this list is empty and hence all call sites are inlined.
8.6.3 OutliningThis documentation is a work in progress, as well as the documented topicSerge? Still true? What do you mean?. Outlining is the opposite transformation of inlining 8.6.1. It replaces some statements in an existing module by a call site to a new function whose execution is equivalent to the execution of the replaced statements. The body of the new function is similar to the piece of code replaced in the existing module. The user is prompted for various pieces of information in order to perform the outlining:
The statements are a subset of a sequence. They are counted in the sequenceSerge: I invent...(FI). OUTLINE_WRITTEN_SCALAR_BY_REFERENCE 8.6.3 controls whether we pass written scalar by reference or not. This property might lead to incorrect code ! outline > MODULE.code ! MODULE.privatize_module > PROGRAM.entities < PROGRAM.entities < MODULE.cumulated_effects < MODULE.regions < MODULE.code The property OUTLINE_SMART_REFERENCE_COMPUTATION 8.6.3 is used to limit the number of entities passed by reference. With it, a[0][0] is passed as an a[n][m] entity, without it it is passed as an int or int* depending on the cumulated read/write memory effects of the outlined statementsSerge: but cumulated effects are always required?. If you need to pass the upper bound expression of a particular loop as a parameter, which is used in Ter@pix code generation (see Section ???), set OUTLINE_LOOP_BOUND_AS_PARAMETER 8.6.3 to the loop labelSerge: a few words of motivation?. The property OUTLINE_MODULE_NAME 8.6.3 is used to specify the new module name. The user is prompted if it is set to its default value, the empty string. But first the pass scansSerge: unconditionally? the code for any statement flagged with the pragma defined by the string property OUTLINE_PRAGMA 8.6.3Serge: What happens when it is set to the empty strin?. If set, the string property OUTLINE_LABEL 8.6.3 is used to choose the statementSerge: you outline a set of statements; is it reduced to a singleton in that case? to outline. The boolean property OUTLINE_ALLOW_GLOBALS 8.6.3 controls whether global variables whose initial values are not used are passed as parameters or notSerge: why is it called “globals”? It seems that this makes sense for any local variable.... It is suggested to addressed this issue with a previous privatization pass. Finally, the boolean property OUTLINE_INDEPENDENT_COMPILATION_UNIT 8.6.3 can be set to true to outline the new module into a newly created compilation unit. It is named after the OUTLINE_MODULE_NAME 8.6.3. All necessary types, global variables and functionsSerge? are declared into this new compilation unitSerge: Consistency betweeen properties? Name conflicts?.
8.6.4 CloningProcedures can be cloned to obtain several specialized versions. The call sites must be updated to refer to the desired version. User assisted cloning.See examples in clone validation suite. RK: terse; to be improved by FC alias clone ’Manual Clone’ clone > CALLERS.code > CALLERS.callees < MODULE.code < MODULE.callers < MODULE.user_file < CALLERS.callees < CALLERS.code
alias clone_substitute ’Manual Clone Substitution’
clone_substitute > CALLERS.code > CALLERS.callees < MODULE.code < MODULE.callers < MODULE.user_file < CALLERS.callees < CALLERS.code Cloning of a subroutine according to an integer scalar argument. The argument is specified through integer property TRANSFORMATION_CLONE_ON_ARGUMENT 8.6.4. If set to 0, a user request is performed.
alias clone_on_argument ’Clone On Argument’
clone_on_argument > CALLERS.code > CALLERS.callees > MODULE.callers < MODULE.code < MODULE.callers < MODULE.user_file < CALLERS.callees < CALLERS.preconditions < CALLERS.code Not use assisted version of cloning it just perform the cloning without any substitution Use the CLONE_NAME 8.6.4 property if you want a particular clone name. It’s up to another phase to perform the substitution. alias clone_only ’Simple Clone’ clone_only < MODULE.code < MODULE.user_file There are two cloning properties. Cloning on an argument. If 0, a user request is performed.
Clone name can be given using the CLONE_NAME properties Otherwise, a new one is generated
8.7 Declaration Transformations
8.7.1 Declarations cleaningClean the declarations of unused variables and commons and so. It is also a code transformation, since not only the module entity are updated by the process, but also the declaration statements, some useless writes... Clean the declarations of unused variables and commons and so.
alias clean_declarations ’Clean Declarations’
clean_declarations > MODULE.code < PROGRAM.entities < MODULE.code < MODULE.cumulated_effects In C, dynamic variables which are allocated and freed but otherwise never used can be removed. This phase removes the calls to the dynamic allocation functions (malloc and free or user defined equivalents), and remove their declarations. Clean unused local dynamic variables by removing malloc/free calls.
clean_unused_dynamic_variables > MODULE.code
< PROGRAM.entities < MODULE.code It may be a regular expression instead of a function name?
Detecting and forcing variables in the register storage class can help subsequent analyses, as they cannot be referenced by pointers.
force_register_declarations > PROGRAM.entities
> MODULE.code < PROGRAM.entities < MODULE.code Whether to allow arrays to be qualified as registers:
Whether to allow pointers to be qualified as registers:
Whether to allow formal parameters to be qualified as registers:
8.7.2 Array ResizingOne problem of Fortran code is the unnormalized array bound declarations. In many program, the programmer put an asterisk (assumed-size array declarator), even 1 for every upper bound of last dimension of array declaration. This feature affects code quality and prevents others analyses such as array bound checking, alias analysis. We developed in PIPS two new methods to find out automatically the proper upper bound for the unnormalized and assumed-size array declarations, a process we call array resizing. Both approaches have advantages and drawbacks and maybe a combination of these ones is needed. To have 100% resized arrays, we implement also the code instrumentation task, in the top-down approach. Different options to compute new declarations for different kinds of arrays are described in properties-rc.tex. You can combine the two approaches to have a better results by using these options. How to use these approaches: after generating new declarations in the logfile, you have to use the script $PIPS_ROOT/Src/Script/misc/array_resizing_instrumentation.pl to replace the unnormalized declarations and add new assignments in the source code.
8.7.2.1 Top Down Array ResizingThe method uses the relationship between actual and formal arguments from parameter-passing rules. New array declarations in the called procedure are computed with respect to the declarations in the calling procedures. It is faster than the first one because convex array regions are not needed. This phase is implemented by Thi Viet Nga Nguyen (see [?]).
alias array_resizing_top_down ’Top Down Array Resizing’
array_resizing_top_down > MODULE.new_declarations > PROGRAM.entities < PROGRAM.entities < CALLERS.code < CALLERS.new_declarations < CALLERS.preconditions
8.7.2.2 Bottom Up Array ResizingThe approach is based on an convex array region analysis that gives information about the set of array elements accessed during the execution of code. The regions READ and WRITE of each array in each module are merged and a new value for the upper bound of the last dimension is calculated and then it will replace the 1 or *. This function is firstly implemented by Trinh Quoc Anh, and ameliorated by Corinne Ancourt and Thi Viet Nga Nguyen (see [?]).
alias array_resizing_bottom_up ’Bottom Up Array Resizing’
array_resizing_bottom_up > MODULE.code < PROGRAM.entities < MODULE.code < MODULE.preconditions < MODULE.regions
8.7.2.3 Array Resizing StatisticWe provide here a tool to calculate the number of pointer-type A(,1) and assumed-size A(,*) array declarators as well as other information.
alias array_resizing_statistic ’Array Resizing Statistic’
array_resizing_statistic > MODULE.code < PROGRAM.entities < MODULE.code
8.7.2.4 Array Resizing propertiesThis phase is firstly designed to infer automatically new array declarations for assumed-size (A(*)) and one (A(1) or also called ugly assumed-size) array declarators. But it also can be used for all kinds of array : local or formal array arguments, unnormalized or all kinds of declarations. There are two different approaches that can be combined to have better results.
Top-down Array ResizingThere are three different options:
So the combination of the three above options gives us a number from 0 to 7 (binary representation : 000, 001,..., 111). You must pay attention to the order of options. For example, if you want to use information from MAIN program to compute new declarations for assumed-size and one array declarations, both of them, the option is 4 (100). The default option is 0 (000).
Bottom-up Array ResizingThere are also three different options:
So the combination of the three above options gives us a number from 0 to 7 (binary representation : 000, 001,..., 111). You must pay attention to the order of options. There are some options that exclude others, such as the option to compute new declarations for instrumented array (I_PIPS_MODULE_ARRAY). The default option is 0 (000).
8.7.3 ScalarizationScalarization is the process of replacing array references with references to scalar copies of the array elements wherever appropriate. Expected benefits include lower memory footprint and access time because registers can be used instead of temporary stack variables or memory accesses, and hence, shorter execution times, as long as the register pressure is not so high that spill code is generated. Scalarizing a given array reference is subject to two successive criteria, a Legality criterion and a Profitability criterion:
The legality test is based on convex array regions, and not on the dependence graph as is the Carr’s algorithm. Currently, loop carried dependence arcs prevent scalarization by PIPS, although some can be processed by Carr’s algorithm. See non-regression tests Transformations/scalarization30 to 36. This transformation is useful 1) to improve the readability of the source code with constructs similar to let x be ..., 2) to improve the modelizations of the execution, time or energy, by using source instructions closer to the machine instruction, 3) to perform an optimization at the source level because the code generator does not include a powerful (partially) redundant load elimination, and 4) to be a useful pass in a fully source-to-source compiler. This transformation is useful to reduce the expansion caused by the different atomizer passes. The new scalar variables use the default prefix ___scalar__ and are thus easily identified, but a new prefix can be user defined with Property SCALARIZATION_PREFIX 8.7.3. IfSCALARIZATION_PREFIX 8.7.3 is the empty string, the names of the scalarized variables are used. If needed according to the IN and OUT convex array regions, the new variables are initialized, e.g. __scalar0__ = A[i], and/or copied back into the initial array, e.g. A[i] = __scalar0__. Scalarization is currently applicable both to Fortran and C code, but the code generation slightly differs because local declarations are possible with C. However, this may destroy perfect loop nests required by other passes. Property SCALARIZATION_PRESERVE_PERFECT_LOOP_NEST 8.7.3 can be used to preserve C perfect loop nests. Pass scalarization 8.7.3 uses the read and written convex array regions to decide if the scalarization is possible, the IN and OUT regions to decide if it is useful to initialize the scalar copy or to restore the value of the array element.
alias scalarization ’Scalarization’
scalarization > MODULE.code < PROGRAM.entities < MODULE.code < MODULE.regions < MODULE.in_regions < MODULE.out_regions Since OUT regions are computed interprocedurally, strange code may result when a small or not so small function without any real output is scalarized. The problem can be fixed by adding a PRINT or a printf to force an OUT region, or by setting Property SCALARIZATION_FORCE_OUT 8.7.3 to true. Also, it may be useful to use Property SEMANTICS_TRUST_ARRAY_DECLARATIONS 6.8.11.2 and/or SEMANTICS_TRUST_ARRAY_REFERENCES 6.8.11.2 to make sure that as many loops as possible are entered. If not, loop bounds may act like guards, which prevents PIPS from hoisting a reference out of a loop, although this is puzzling for programmers because they expect all loops to be entered at least once... As explained above, Property SCALARIZATION_PREFIX 8.7.3 is used to select the names of the new scalar variables. If it is set to the empty string, "", the scalarized variable name is used as prefix, which improves readability.
Also, as explained above, Property SCALARIZATION_PRESERVE_PERFECT_LOOP_NEST 8.7.3 is used to control the way new scalar variables are declared and initialized in C. The current default value is FALSE because the locality of declarations is better for automatic loop parallelization, but this could be changed as a privatization pass should do as well while preserving perfect loop nests. Note that initializations may have to be placed in such a way that a perfect loop nest is nevertheless destroyed, even with this property set to true.
Property SCALARIZATION_FORCE_OUT 8.7.3 forces the generation of a copy-out statement when it is useless according to the OUT region, for instance when dealing with a library function.
Numerical property SCALARIZATION_THRESHOLD 8.7.3 is used to decide profitability: the estimated complexity of the scalarized code must be less than the initial complexity. Its minimal value is 2. It can be set to around 5 to forbid scalarization in sequences with no loop benefit.
Property SCALARIZATION_STRICT_MEMORY_ACCESSES 8.7.3 is used to control the insertion of new memory accesses. A memory access may be moved out of a loop, which is the best case for performance, but it then is no longer control dependent on the loop bounds. A new access is thus added if the compiler cannot prove that the loop is always entered. However, the burden of the proof may be too much, if only because nothing can be proven when the loop is sometimes entered and sometimes not. Also, the memory access may be always perfectly legal. The default value is FALSE, going for performance before safety. This property is pretty hard to understand, as well as its impact. See sequence06, 07 and 08 in the validaiton suite of the scalarization pass.
This pass was designed by François Irigoin and implemented by Laurent Daverio and François Irigoin. Similar to scalarization 8.7.3, but with a different criterion: if an array is only accessed with numerical constant subscript expressions, it is replaced by a set of scalars and all its references are replaced by references to the corresponding scalars.
constant_array_scalarization > MODULE.code
< PROGRAM.entities < MODULE.code This pass may be useful to make C source code more palatable to C to VHDL converter that prefer dealing with scalars than arrays. It is also useful to propagate constants contained in an array since the semantics passes only analyze scalar variables. This pass was designed and implemented by Serge Guelton. Pass scalarization 8.7.3 may be costly because it relies on array regions analyses and uses them to check its legality criteria. The next phase alleviates these drawbacks. This Pass solely relies on proper and cumulated effets, and as such may fail to scalarize some accesses. However, it is expected to give good results in current cases, especially after loop_fusion 8.1.6. It basically uses the same algorithm as scalar privatization, but performs on the dependence graph rather than the chains graph for more precision about array dependences. Several legality criteria are then tested to ensure the safety of the transformation. In particular, it is checked that candidate references are not accessed through hidden references (for instance in calls), and that only one kind of reference is scalarized in the loop.
alias quick_scalarization ’Quick Scalarization’
quick_scalarization > MODULE.code < PROGRAM.entities < MODULE.code < MODULE.cumulated_effects < MODULE.proper_effects < MODULE.dg
8.7.4 Induction Variable SubstitutionInduction substitution is the process of replacing scalar variables by a linear expression of the loop indices.
alias induction_substitution ’Induction variable substitution’
induction_substitution > MODULE.code < PROGRAM.entities < MODULE.code < MODULE.transformers < MODULE.preconditions < MODULE.cumulated_effects This pass was designed and implemented by Mehdi Amini.
8.7.5 Strength ReductionReduce the complexity of expression computation by generating induction variables when possible. E.g. Would become
strength_reduction > MODULE.code
> PROGRAM.entities < MODULE.code < MODULE.cumulated_effects < MODULE.transformers
8.7.6 Flatten CodeThe goal of this program transformation is to enlarge basic blocks as much as possible to increase the opportunities for optimization. This transformation has been developed in PIPS for heterogeneous computing and is combined with inlining to increase the size of the code executed by an external accelerator while reducing the externalization overhead5 . Other transformations, such as partial evaluation and dead code elimination (including use-def elimination) can be applied to streamline the resulting code further. The transformation flatten_code 8.7.6 firstly moves declarations up in the abstract syntax tree, secondly remove useless braces and thirdly fully unroll loops when there iteration counts are known and the FLATTEN_CODE_UNROLL property is true. Unrolling can also be controlled using Property FULL_LOOP_UNROLL_EXCEPTIONS 8.1.8.2. Inlining(s), which must be performed explicitly by the user with tpips or another PIPS interface, can be used first to create lots of opportunities. The basic block size increase is first due to brace removals made possible when declarations have been moved up, and then to loop unrollings. Finally, partial evaluation, dead code elimination and use-def based elimination can also straighten-out the code and enlarge basic blocks by removing useless tests or assignments. The code externalization and adaptation for a given hardware accelerator is performed by another phase, see for instance Section 8.5. Initially developed in August 2009 by Laurent Daverio, with help from Fabien Coelho and Franï¿œois Irigoin.
alias flatten_code ’Flatten Code’
flatten_code > MODULE.code < PROGRAM.entities < MODULE.code If the following property is set, loop unrolling is applied too for loops with static bounds.
8.7.7 Split Update OperatorSplit C operators such as a += b, a *= b, a >>= b, etc. into their expanded form such as a = a + b. Note that if you have side effects in the destination, since the destination is evaluated twice, it is not equivalent in the general case.
split_update_operator > MODULE.code
< PROGRAM.entities < MODULE.code
8.7.8 Split Initializations (C code)The purpose of this transformation is to separate the initialization part from the declaration part in C code in order to make static code analyses simpler. This transformation recurses through all variable declarations, and creates a new statement each time an initial value is specified in the declaration, if the initial value can be assigned. The declarations are modified by eliminating the initial value, and a new assignment statement with the initial value is added to the source code. This transformation can be used, for instance, to improve reduction detection (see TRAC ticket 181). Note that C array and structure initializations, which use braces, cannot be converted into assignments. In such cases, the initial declaration is left untouchedFI-¿SG: improved?.
alias split_initializations ’Split Initializations’
split_initializations > MODULE.code < PROGRAM.entities < MODULE.code This transformation uses the C89_CODE_GENERATION property to generate either C89 or C99 code.
8.7.9 Set Return TypeThe purpose of this transformation is to change the return type of a function. The new type will be a typedef whose name is controlled by SET_RETURN_TYPE_AS_TYPEDEF_NEW_TYPE 8.7.9. The corresponding typedef must exist in the symbol table. This transformation loops over the symbols in the symbol table, and for each of them which is a typedef, compare the local name to the property SET_RETURN_TYPE_AS_TYPEDEF_NEW_TYPE 8.7.9. This approach is unsafe because there can be differents typedef with the same name in different compilation units, resulting in differents entries in the symbol table for a same local name. The return type can also be incoherent with the return statement, thus it is not safe to run it on a non void function. However this pass has been created for special need in par4all, and considering previously described restrictions, it does the job.
alias set_return_type_as_typedef ’Set return type as typedef’
set_return_type_as_typedef > MODULE.code < PROGRAM.entities < MODULE.code
8.7.10 Cast at Call SitesThe purpose of this transformation is to force parameters at call sites to be casted as expected according to the prototype of the functions.
alias cast_at_call_sites ’Call parameters at call sites’
cast_at_call_sites > CALLERS.code < PROGRAM.entities < MODULE.code < MODULE.callers < CALLERS.code
8.8 Array Bound CheckingArray bound checking refers to determining whether all array references are within their declared range in all of their uses in a program. These array bound checks may be analysed intraprocedurally or interprocedurally, depending on the need for accuracy. There are two versions of intraprocedural array bounds checking: array bound check bottom up, array bound check top down. The first approach relies on checking every array access and on the elimination of redundant tests by advanced dead code elimination based on preconditions. The second approach is based on exact convex array regions. They are used to prove that all accessed in a compound statement are correct. These two dynamic analyses are implemented for Fortran. They are described in Nga Nguyen’s PhD (see [?]). They may work for C code, but this has not been validated.
8.8.1 Elimination of Redundant Tests: Bottom-Up ApproachThis transformation takes as input the current module, adds array range checks (lower and upper bound checks) to every statement that has one or more array accesses. The output is the module with those added tests. If one test is trivial or exists already for the same statement, it is no need to be generated in order to reduce the number of tests. As Fortran language permits an assumed-size array declarator with the unbounded upper bound of the last dimension, no range check is generated for this case also. Associated with each test is a bound violation error message and in case of real access violation, a STOP statement will be put before the current statement. This phase should always be followed by the partial_redundancy_elimination 8.2.2 for logical expression in order to reduce the number of bound checks.
alias array_bound_check_bottom_up ’Elimination of Redundant Tests’
array_bound_check_bottom_up > MODULE.code < PROGRAM.entities < MODULE.code
8.8.2 Insertion of Unavoidable TestsThis second implementation is based on the array region analyses phase which benefits some interesting proven properties:
If none of these two properties are satisfied, we consider the approximation of region. In case of MUST region, if the exact bound checks can be generated, they will be inserted before the block of code. If not, like in case of MAY region, we continue to go down to the children nodes in the control flow graph. The main advantage of this algorithm is that it permits to detect the sure bound violations or to tell that there is certainly no bound violation as soon as possible, thanks to the context given by preconditions and the top-down analyses.
alias array_bound_check_top_down ’Insertion of Unavoidable Tests’
array_bound_check_top_down > MODULE.code < PROGRAM.entities < MODULE.code < MODULE.regions
8.8.3 Interprocedural Array Bound CheckingThis phase checks for out of bound errors when passing arrays or array elements as arguments in procedure call. It ensures that there is no bound violation in every array access in the callee procedure, with respect to the array declarations in the caller procedure.
alias array_bound_check_interprocedural ’Interprocedural Array Bound Checking’
array_bound_check_interprocedural > MODULE.code < PROGRAM.entities < MODULE.code < MODULE.preconditions
8.8.4 Array Bound Checking InstrumentationWe provide here a tool to calculate the number of dynamic bound checks from both initial and PIPS generated code. These transformations are implemented by Thi Viet Nga Nguyen (see [?]).
alias array_bound_check_instrumentation ’Array Bound Checking Instrumentation’
array_bound_check_instrumentation > MODULE.code < PROGRAM.entities < MODULE.code Array bounds checking refers to determining whether all array reference are within their declared range in all of its uses in a program. Here are array bounds checking options for code instrumentation, in order to compute the number of bound checks added. We can use only one property for these two case, but the meaning is not clear. To be changed ?
In practice, bound violations may often occur with arrays in a common block. The standard is violated, but programmers think that they are not dangerous because the allocated size of the common is not reached. The following property deals with this kind of bad programming practice. If the array is a common variable, it checks if the reference goes beyond the size of the common block or not.
The following property tells the verification phases (array bound checking, alias checking or uninitialized variables checking) to instrument codes with the STOP or the PRINT message. Logically, if a standard violation is detected, the program will stop immediately. Furthermore, the STOP message gives the partial redundancy elimination phase more information to remove redundant tests occurred after this STOP. However, for the debugging purposes, one may need to display all possible violations such as out-of-bound or used-before-set errors, but not to stop the program. In this case, a PRINT message is chosen. By default, we use the STOP message.
8.9 Alias Verification
8.9.1 Alias PropagationAliasing occurs when two or more variables refer to the same storage location at the same program point. Alias analysis is critical for performing most optimizations correctly because we must know for certain that we have to take into account all the ways a location, or the value of a variable, may (or must) be used or changed. Compile-time alias information is also important for program verification, debugging and understanding. In Fortran 77, parameters are passed by address in such a way that, as long as the actual argument is associated with a named storage location, the called subprogram can change the value of the actual argument by assigning a value to the corresponding formal parameter. So new aliases can be created between formal parameters if the same actual argument is passed to two or more formal parameters, or between formal parameters and global parameters if an actual argument is an object in common storage which is also visible in the called subprogram or other subprograms in the call chain below it. Both intraprocedural and interprocedural alias determinations are important for program analysis. Intraprocedural aliases occur due to pointers in languages like LISP, C, C++ or Fortran 90, union construct in C or EQUIVALENCE in Fortran. Interprocedural aliases are generally created by parameter passing and by access to global variables, which propagates intraprocedural aliases across procedures and introduces new aliases. The basic idea for computing interprocedural aliases is to follow all the possible chains of argument-parameters and nonlocal variable-parameter bindings at all call sites. We introduce a naming memory locations technique which guarantees the correctness and enhances the precision of data-flow analysis. The technique associates sections, offsets of actual parameters to formal parameters following a certain call path. Precise alias information are computed for both scalar and array variables. The analysis is called alias propagation. This analysis is implemented by Thi Viet Nga Nguyen (see [?]). alias_propagation > MODULE.alias_associations < PROGRAM.entities < MODULE.code < CALLERS.alias_associations < CALLERS.code
8.9.2 Alias CheckingWith the call-by-reference mechanism in Fortran 77, new aliases can be created between formal parameters if the same actual argument is passed to two or more formal parameters, or between formal parameters and global parameters if an actual argument is an object in common storage which is also visible in the called subprogram or other subprograms in the call chain below it. Restrictions on association of entities in Fortran 77 (Section 15.9.3.6 [?]) say that neither aliased formal parameters nor the variable in the common block may become defined during execution of the called subprogram or the others subprograms in the call chain. This phase uses information from the alias_propagation 8.9.1 analysis and computes the definition informations of variables in a program, and then to verify statically if the program violates the standard restriction on alias or not. If these informations are not known at compile-time, we instrument the code with tests that check the violation dynamically during execution of program. This verification is implemented by Thi Viet Nga Nguyen (see [?]).
alias alias_check ’Alias Check’
alias_check > MODULE.code < PROGRAM.entities < MODULE.alias_associations < MODULE.cumulated_effects < ALL.code This is a property to control whether the alias propagation and alias checking phases use information from MAIN program or not. If the current module is never called by the main program, we do no alias propagation and alias checking for this module if the property is on. However, we can do nothing with modules that have no callers at all, because this is a top-down approach.
8.10 Used Before SetThis analysis checks if the program uses a variable or an array element which has not been assigned a value. In this case, anything may happen: the program may appear to run normally, or may crash, or may behave unpredictably. We use IN regions that give a set of read variables not previously written. Depending on the nature of the variable: local, formal or global, we have different cases. In principle, it works as follows: if we have a MUST IN region at the module statement, the corresponding variable must be used before being defined, a STOP is inserted. Else, we insert an initialization function and go down, insert a verification function before each MUST IN at each sub-statements. This is a top-down analysis that process a procedure before all its callees. Information given by callers is used to verify if we have to check for the formal parameters in the current module or not. In addition, we produce information in the resource MODULE.ubs to tell if the formal parameters of the called procedures have to be checked or not. This verification is implemented by Thi Viet Nga Nguyen (see [?]).
alias used_before_set ’Used Before Set’
used_before_set > MODULE.ubs < PROGRAM.entities < MODULE.code < MODULE.in_regions < CALLERS.ubs
8.11 Miscellaneous transformationsThe following warning paragraphs should not be located here, but the whole introduction has to be updated to take into account the merger with properties-rc.tex, the new content (the transformation section has been exploded) and the new passes such as gpips. No time right now. FI. All PIPS transformations assume that the initial code is legal according to the language standard. In other words, its semantics is well defined. Otherwise, it is impossible to maintain a constant semantics through program transformations. So uninitialized variables, for instance, can lead to codes that seem wrong, because they are likely to give different outputs than the initial code. But this does not matter as the initial code output is undefined and could well be the new output, Also, remember that dead code does not impact the semantics in an observable way. Hence dead code can be transformed in apparently weird ways. For instance, all loops that are part of a dead code section can be found parallel, although they are obviously sequential, because all the references will carry an unfeasible predicate. In fact, reference A(I), placed in a dead code section, does not reference the memory and does not have to be part of the dependence graph. Dead code can crop out in many modules when a whole application linked with a library is analyzed. All unused library modules are dead for PIPS. On the other hand, missing source modules synthesized by PIPS may also lead to weird results because they are integrated in the application with empty definitions. Their call sites have no impact on the application semantics.
8.11.1 Type CheckerTypecheck code according to Fortran standard + double-complex. Typechecking is performed interprocedurally for user-defined functions. Insert type conversions where implicitly needed. Use typed intrinsics instead of generic ones. Precompute constant conversions if appropriate (e.g. 16 to 16.0E0). Add comments about type errors detected in the code. Report back how much was done.
type_checker > MODULE.code
< PROGRAM.entities < MODULE.code < CALLEES.code Here are type checker options. Whether to deal with double complex or to refuse them. Whether to add a summary of errors, conversions and simplifications as a comment to the routine. Whether to always show complex constructors.
8.11.2 Scalar and Array PrivatizationVariable privatization consists in discovering variables whose values are local to a particular scope, usually a loop iteration. Three different privatization functions are available. The quick privatization is restricted to loop indices and is included in the dependence graph computation (see Section 6.5). The scalar privatization should be applied before any serious parallelization attempt. The array privatization is much more expensive and is still mainly experimental. You should keep in mind that, although they modify the internal repesentation of the code, scalar and array privatizations are only latent program transformations, and no actual local variable declaration is generated. This is the responsibility of code generation phases, which may use this information differently depending on their target.
8.11.2.1 Scalar PrivatizationPrivatizer detects variables that are local to a loop nest and marks these variables as private. A variable is private to a loop if the values assigned to this variable inside the loop cannot reach a statement outside the loop body. Note that illegal code, for instance code with uninitialized variables, can lead to surprising privatizations, which are still correct since the initial semantics is unspecified.
alias privatize_module ’Privatize Scalars’
privatize_module > MODULE.code < PROGRAM.entities < MODULE.code < MODULE.proper_effects < MODULE.cumulated_effects < MODULE.chains Use informations from privatize_module 8.11.2.1 to change C variable declarations. For instance becomes
localize_declaration > MODULE.code
! MODULE.privatize_module < PROGRAM.entities < MODULE.code
8.11.2.2 Array PrivatizationArray privatization aims at privatizing whole arrays (array_privatizer 8.11.2.2) or sets of array elements (array_section_privatizer 8.11.2.2) instead of scalar variables only. The algorithm, developed by Béatrice Creusillet [?], is very different from the algorithm used for solely privatizing scalar variables and relies on IN and OUT regions analyses. Of course, it also privatizes scalar variables, although the algorithm is much more expensive and should be used only when necessary. Array sections privatization is still experimental and should be used with great care. In particular, it is not compatible with the next steps of the parallelization process, i.e. dependence tests and code generation, because it does not modify the code, but solely produces a new region resource. Another transformation, which can also be called a privatization, consists in declaring as local to a procedure or function the variables which are used only locally. This happens quite frequently in old Fortran codes where variables are declared as SAVEd to avoid allocations at each invocation of the routine. However, this prevents parallelization of the loop surrounding the calls. The function which performs this transformation is called declarations_privatizer 8.11.2.2.
alias array_privatizer ’Privatize Scalars & Arrays’
alias array_section_privatizer ’Scalar and Array Section Privatization’ alias declarations_privatizer ’Declaration Privatization’
array_privatizer > MODULE.code
< PROGRAM.entities < MODULE.code < MODULE.cumulated_effects < MODULE.summary_effects < MODULE.transformers < MODULE.preconditions < MODULE.regions < MODULE.in_regions < MODULE.out_regions array_section_privatizer > MODULE.code > MODULE.privatized_regions > MODULE.copy_out_regions < PROGRAM.entities < MODULE.code < MODULE.cumulated_effects < MODULE.summary_effects < MODULE.transformers < MODULE.preconditions < MODULE.regions < MODULE.in_regions < MODULE.out_regions declarations_privatizer > MODULE.code < PROGRAM.entities < MODULE.code < MODULE.cumulated_effects < MODULE.summary_effects < MODULE.regions < MODULE.in_regions < MODULE.out_regions Several privatizability criterions can be applied for array section privatization, and its not yet clear which one should be used. The default case is to remove potential false dependences between iterations. The first option, when set to false, removes this constraint. It is useful for single assignment programs, to discover what section is really local to each iteration. When the second option is set to false, copy-out is not allowed, which means only array regions that are not further reused in the program continuation can be privatized.
8.11.3 Scalar and Array ExpansionVariable expansion consists in adding new dimensions to a variable so as to parallelize surrounding loops. There is no known advantage for expansion against privatization, but expansion is used when parallel loops must be distributed, for instance to generate SIMD code. It is assumed that the variables to be expanded are the private variables. So this phase only is useful if a privatization has been performed earlier.
8.11.3.1 Scalar ExpansionLoop private scalar variables are expanded
alias variable_expansion ’Expand Scalar’
variable_expansion > MODULE.code ! MODULE.privatize_module < PROGRAM.entities < MODULE.code Uses LOOP_LABEL 8.1.1 to select a particular loop, then finds all reduction in this loop and performs variable expension on all reduction variables.
reduction_variable_expansion > MODULE.code
< PROGRAM.entities < MODULE.cumulated_reductions < MODULE.code A variant of atomization that splits expressions but keep as much reduction as possible. E.g: r+=a+b becomes r+=a ; r+=b;
reduction_atomization > MODULE.code
< PROGRAM.entities < MODULE.cumulated_reductions < MODULE.code
8.11.3.2 Array ExpansionNot implemented yet.
8.11.4 Freeze variablesFunction freeze_variables 8.11.4 produces code where variables interactively specified by the user are transformed into constants. This is useful when the functionality of a code must be reduced. For instance, a code designed for N dimensions could be reduced to a 3-D code by setting N to 3. This is not obvious when N changes within the code. CA? More information? The variable names are requested from the PIPS user? This is useful to specialize a code according to specific input data6 .
alias freeze_variables ’Freeze Variables’
freeze_variables > MODULE.code < PROGRAM.entities < MODULE.code < MODULE.proper_effects < MODULE.cumulated_effects 8.11.5 Manual EditingThe window interfaces let the user edit the source files, because it is very useful to demonstrate PIPS. As with stf 8.3.4, editing is not integrated like other program transformations, and previously applied transformations are lost. Consistency is however always preserved. A general edit facility fully integrated in pipsmake is planned for the (not so) near future. Not so near because user demand for this feature is low. Since tpips can invoque any Shell command, it is also possible to touch and edit source files.
8.11.6 Transformation TestThis is plug to implement quickly a program transformation requested by a user. Currently, it is a full loop distribution suggested by Alain Darte to compare different implementations, namely Nestor and PIPS.
alias transformation_test ’Transformation Test’
transformation_test > MODULE.code < PROGRAM.entities < MODULE.code
8.12 Extensions Transformations
8.12.1 OpenMP PragmaThe following transformation reads the sequential code and generates OpenMP pragma as an extension to statements. The pragmas produced are based on the information previously computed by differents phases and already stores in the pips internal representation of the sequential code. It might be interesting to use the phase internalize_parallel_code (see § 7.1.8) before to apply ompify_code in order to maximize the number of parallel information available.
ompify_code > MODULE.code
< MODULE.cumulated_reductions < MODULE.code As defined in the ri, the pragma can be of different types. The following property can be set to str or expr. Obviously, if the property is set to str then pragmas would be generated as strings otherwise pragmas would be generated as expressions.
The PIPS phase OMP_LOOP_PARALLEL_THRESHOLD_SET allows to add the OpenMP if clause to all the OpenMP pragmas. Afterwards, the number of iterations in the loop is evaluated dynamically and compared to the defined threshold. The loop is parallelized only if the threshold is reached.
omp_loop_parallel_threshold_set > MODULE.code
< MODULE.code The OMP_LOOP_PARALLEL_THRESHOLD_VALUE property , is used as a parameter by the PIPS phase OMP_LOOP_PARALLEL_THRESHOLD_SET. The number of iteration of the parallel loop will be compared to that value in an omp if clause. The OpenMP run time will decide dynamicaly to parallelize the loop if the number of iteration is above this threshold.
The OMP_IF_CLAUSE_RECURSIVE property , is used as a parameter by the PIPS phase OMP_LOOP_PARALLEL_THRESHOLD_SET. If set to TRUE the number of iterations of the inner loops will be used to test if the threshold is reached. Otherwise only the nunber of iteration of the processed loop will be used.
Compiler tends to produce many parallel loops which is generally not optimal for performance. The following transformation merges nested omp pragma in a unique omp pragma.
omp_merge_pragma > MODULE.code
< MODULE.code PIPS merges the omp pragma on the inner or outer loop depending on the property OMP_MERGE_POLICY. This string property can be set to either outer or inner.
The OMP_MERGE_PRAGMA phase with the inner mode can be used after the phase limit_nested_parallelism (see § 7.1.10). Such a combinaison allows to fine choose the loop depth you really want to parallelize with OpenMP. The merging of the if clause of the omp pragma follows its own rule. This clause can be ignore without changing the output of the program, it only changes the program perfomances. Then three policies are offered to manage the if clause merging. The if clause can simply be ignored. Or the if clauses can be merged alltogether using the boolean opertaion or or and. When ignored, the if clause can be later regenerated using the appropriated PIPS phase : OMP_LOOP_PARALLEL_THRESHOLD_SET. To summarize, remenber that the property can be set to ignore or or and
Chapter 9
|