AVM1 Reference

AVM1 representation

At the lowest level, AVM1 is compiled into a buffer containing AVM1 bytecode. There is no particular structure in this compiled bytecode. It is only intended to read a single raw action at a given position at a time.

To perform static analysis or compile bytecode, it is useful to have a higher level representation for groups actions. Adobe's SWF specification uses a sequence of action records ("ACTIONRECORD [zero or more]"). Open flash uses a more advanced representation called a Control Flow Graph (CFG).

Limitations of the sequence of actions

The sequence of actions view used by Adobe's specification is deeply flawed. It does not abstract the low-level byte encoding while being unable to describe some of the behavior of the interpreter. Adobe's interpreter only has a local view in the bytecode, only running one action at a time. The following features are supported by Adobe's player but not the sequence of actions representation:

The core invalid assumption is that each action is followed by another action. Some actions (such as Return or Throw) don't have to be followed by another action as they end control flow. Other actions (If, Jump) allow to jump at arbitrary positions that wouldn't be found by only scanning the bytecode sequentially.

Control Flow Graph

The control flow graph is a classic structure used compilers and analyzers to represent the runtime semantics of code. It consists of blocks containing code, the blocks are linked through edges representing the possible transitions between blocks. For example an If action is represented with two edges: one pointing to the block to run in the true case and the other one for the false case.

TODO: Add example with visualization

Note that AVM1 blocks can be nested. This is used for function definitions, With actions and Try actions. Strictly speaking, Open Flash uses a "hypergraph".

A block represents a linear portion of bytecode. Compared to Adobe's single sequence of actions, a CFG can contain multiple blocks. There are no restrictions between blocks: their bytecode can be disjoint or overlap.

Each CFG block has two parts:

Open Flash defines the following types of blocks:

AVM1 parser

The role of the AVM1 parser is to build the control flow graph from the AVM1 bytecode.

The main challenge of the parser is to figure out where the blocks start.