At the lowest level, AVM1 is compiled into a buffer containing AVM1 bytecode. There is no particular structure in this compiled bytecode. It is only intended to read a single raw action at a given position at a time.
To perform static analysis or compile bytecode, it is useful to have a higher level representation for groups actions. Adobe's SWF specification uses a sequence of action records ("ACTIONRECORD [zero or more]"). Open flash uses a more advanced representation called a Control Flow Graph (CFG).
The sequence of actions view used by Adobe's specification is deeply flawed. It does not abstract the low-level byte encoding while being unable to describe some of the behavior of the interpreter. Adobe's interpreter only has a local view in the bytecode, only running one action at a time. The following features are supported by Adobe's player but not the sequence of actions representation:
try
, catch
, finally
and with
bodiesThe core invalid assumption is that each action is followed by another action. Some actions (such as Return
or Throw
) don't have to be followed by another action as they end control flow. Other actions (If
, Jump
) allow to jump at arbitrary positions that wouldn't be found by only scanning the bytecode sequentially.
The control flow graph is a classic structure used compilers and analyzers to represent the runtime semantics of code. It consists of blocks containing code, the blocks are linked through edges representing the possible transitions between blocks. For example an If
action is represented with two edges: one pointing to the block to run in the true
case and the other one for the false
case.
TODO: Add example with visualization
Note that AVM1 blocks can be nested. This is used for function definitions, With
actions and Try
actions. Strictly speaking, Open Flash uses a "hypergraph".
A block represents a linear portion of bytecode. Compared to Adobe's single sequence of actions, a CFG can contain multiple blocks. There are no restrictions between blocks: their bytecode can be disjoint or overlap.
Each CFG block has two parts:
With
block, etc. The actual behavior depends on the type of the blockOpen Flash defines the following types of blocks:
try
block.with
block.The role of the AVM1 parser is to build the control flow graph from the AVM1 bytecode.
The main challenge of the parser is to figure out where the blocks start.