Ast Nodes & Visitors

This page explain in more details how the AST nodes from hexrays works in Bip (Node representation in Bip), including the description of all possible nodes of the AST (AST Node types) and some information about the visitors in Bip. It also contains the API for two top level classes (AbstractCItem and HxCType) used internally.

Node representation in Bip

AST nodes of a decompiled hexrays C function are represented in two different ways in Bip: using CNode or HxCItem. In IDA all nodes are represented by the class citem_t with subclasses for different types of node, the Bip equivalent is the HxCItem. However there is some limitations to what it is provided by this object, the principal one is the fact that a HxCItem do not provide a link to the function from which they are derived. This limitation can be quite problematic at time and so the CNode abstraction was created for keeping a single link to the parent function and to the parent node.

In most cases, there is really few differences to use one or another of those representation but as CNode objects possess more functionnality there is few reasons not to use them.

Both HxCItem and CNode classes are abstract and posess multiple subclasses which represents the actual nodes of the AST depending on their types. Both those classes have a lot in common and inherit from the same abstract class AbstractCItem. For determining the correct node to implement it is necessary to check its type from the citem_t object, this type internal to IDA are represented in Bip by HxCType. For more information about which nodes represent what C element see AST Node types.

As subclasses of HxCItem and CNode would have a lot in common and for avoiding code duplication the subclasses for the CNode are dynamically generated. This should have no impact for the user, but when looking at the code of Bip it is normal to not find anywhere the subclasses of CNode. All nodes which inherit from HxCItem start with the HxC prefix while all class which inherit from CNode start with the CNode prefix.

For more information see CNode generation and internals.

AST Node types

AST nodes are split in two main categories: statements (Stmt) and expressions (Expr). Those two categories have basically the same meaning than in C: statements mainly include execution flow control (if, for, block, goto, return, …) while expressions are basically everything else from assignment, arithmetic operations to functions calls and cast.

Note

All the class given there after are the one for the CNode implementation but it works exactly the same for the HxCItem implementation.

The AST is a tree representation of the decompiled code, each node of this tree will be either a statements (object which inherit from CNodeStmt) or an expression (object which inherit from CNodeExpr). A statement node can have children which are expression or statements, while an expression node may only have children which are expression. The root_node() of a function should always be a CNodeStmtBlock which itself should contain the different statements used in the rest of the function. For a concrete example of an AST see Example of AST.

Bip provides a class hierarchy for “classifying” the different kind of nodes. This class hierarchy is composed of numerous abstract classes which represent different “types” of nodes. At the top of the hierarchy is the CNode class, followed just bello by the CNodeStmt and CNodeExpr which represent the statement and the expressions. Those are follow by other abastract classes which are detailed in abstract node types, one of the interest of those abstract class is to be able to use isinstance when developing a visitor or recuperating nodes of a particular types for treating several nodes the same way (see Common code patterns). The leaf of the class hierarchy are the concrete class which have a particular meaning in C, those node types are detailed in concrete node types.

Abstract node types

This table describes the abstract node type which represent the class hierarchy of the node.

Class name Parent class name Type Description Childrens
CNodeStmt CNode Stmt Base class for all concrete and abstract statement nodes. Depends, childrens can be statement or expressions.
CNodeStmtFinal CNodeStmt Stmt All statement class which do not have other statements as childrens but have values. Depends, childrens can not be statement.
CNodeStmtLoop CNodeStmt Stmt All statement representing a loop (for, while, dowhile). Depends, at least one expression CNodeStmtLoop.cond() for the loop condition and a statement CNodeStmtLoop.st_body() for the content of the loop.
CNodeExpr CNode Expr Base class for all concrete and abstract expression nodes. Depends, childrens can not be statement.
CNodeExprFinal CNodeExpr Expr Expression which do not have any other node as children, but have values. Those nodes are always leaf of the AST. No childrens, CNodeExprFinal.value() for getting the content of the expression.
CNodeExprDoubleOperation CNodeExpr Expr Expression which posess two operands, this includes assignment, most logical and mathematical operations and things like comma. Two expressions: CNodeExprDoubleOperation.first_op() and CNodeExprDoubleOperation.second_op().
CNodeExprAssignment CNodeExprDoubleOperation Expr Expression which represent an assignment, this include simple assignment but also assignment with operations (such as +=). Two expressions: CNodeExprAssignment.dst() (equivalent to CNodeExprDoubleOperation.first_op()) and CNodeExprAssignment.src() (equivalent to CNodeExprDoubleOperation.second_op()).
CNodeExprUnaryOperation CNodeExpr Expr Expression for unary operation such as ++, negation, pointer, cast and so on. One expression child CNodeExprUnaryOperation.operand().
CNodeExprMemAccess CNodeExpr Expr Expression which represent a memory access in an array or structure, this do not include simple pointers. One or two children: CNodeExprMemAccess.obj() as an expression and CNodeExprMemAccess.off() which can be an expression or an integer.

Those classes are represented on the schematic of the hexrays architecture:

../_images/bip_hexrays_cnode.png

Concrete node types

This tables describe the concrete nodes with actual meaning in C. For more clarity this table has been separated between the statements and the expressions.

Statements concrete node types

Class name Parent class name Description Childrens
CNodeStmtEmpty CNodeStmt Empty statement, should not be present in final AST but this happens. None.
CNodeStmtExpr CNodeStmtFinal Statement containing an expression. CNodeStmtExpr.value() contains an expression.
CNodeStmtGoto CNodeStmtFinal Statement representing a goto in C, it contains a label number. No statement or expression, CNodeStmtGoto.value() return the label number.
CNodeStmtAsm CNodeStmtFinal Statement containing an asm value (__asm), correspond to inline ASM in C. No statements or expression, CNodeStmtAsm.value() return the list of Instr contain in the statement.
CNodeStmtReturn CNodeStmtFinal Statement representing a return. CNodeStmtReturn.value() contain the expression the function return.
CNodeStmtIf CNodeStmt Statement representing a if. CNodeStmtIf.cond() is an expression representing the condition; CNodeStmtIf.st_then() is a statement representing the block taken if the condition is true; CNodeStmtIf.st_else() is a statement representing the block taken if the condition is false, if there is no else this will be None (test with CNodeStmtIf.has_else()): if (cond) { st_then } else { st_else };.
CNodeStmtFor CNodeStmtLoop Statement representing a for. Three expressions for the CNodeStmtFor.init(), CNodeStmtFor.cond(), CNodeStmtFor.step() (for (init; cond; step)) and one statement for the content of the loop: CNodeStmtFor.st_body()
CNodeStmtWhile CNodeStmtLoop Statement representing a while. One expression for the CNodeStmtWhile.cond() and one statement for the CNodeStmtWhile.st_body() (while (cond) { st_body };).
CNodeStmtDoWhile CNodeStmtLoop Statement representing a do ... while loop. One expression for the CNodeStmtWhile.cond() and one statement for the CNodeStmtWhile.st_body() (do { st_body } while (cond);).
CNodeStmtSwitch CNodeStmt Statement representing a switch statement. One expression for the value tested (CNodeStmtSwitch.expr()) and one statement per cases in the switch. The number of statements is variable.
CNodeStmtContinue CNodeStmt Statement representing a continue. None.
CNodeStmtBreak CNodeStmt Statement representing a break. None.
CNodeStmtBlock CNodeStmt Statement representing a C block statement. Contains a list of statements CNodeStmtBlock.elts() representing all the statements included in this block. This is used as the root node for all functions.

Expressions concrete node types

Class name Parent class name Description Childrens
CNodeExprEmpty CNodeExpr Empty node, should never be used but happens sometimes. None.
CNodeExprNum CNodeExprFinal An immediate number. None.
CNodeExprFNum CNodeExprFinal A floating point number. None.
CNodeExprStr CNodeExprFinal A string in the AST (constant str as integer, str referenced by their address use CNodeExprObj). None.
CNodeExprObj CNodeExprFinal An object representing by its address, this include: globals, functions, strings, … None.
CNodeExprVar CNodeExprFinal An object representing a local variable (HxLvar). None.
CNodeExprHelper CNodeExprFinal An “helper” function: not real functions but created by hexrays. In particular use for intrinsic. None.
CNodeExprInsn CNodeExprFinal An expression which contains a statements. This should never happend and is not implemented. None.
CNodeExprType CNodeExprFinal An expression which contains a type. This is used for example with the CNodeExprSizeof node. None.
CNodeExprTernary CNodeExpr A C ternary operation. Three expressions: cond(), expr1(), expr2(). C representation is cond ? expr1 : expr2.
CNodeExprCall CNodeExpr A call to a function, this include call to function pointer. One expression representing the caller (caller()) and several expressions corresponding to the arguments (args() as a list).
CNodeExprAsg CNodeExprAssignment C assignment operation: dst = src Two children expressions: dst() and src()
CNodeExprAsgbor CNodeExprAssignment C assignment operation: dst |= src Two children expressions: dst() and src()
CNodeExprAsgxor CNodeExprAssignment C assignment operation: dst ^= src Two children expressions: dst() and src()
CNodeExprAsgband CNodeExprAssignment C assignment operation: dst &= src Two children expressions: dst() and src()
CNodeExprAsgadd CNodeExprAssignment C assignment operation: dst += src Two children expressions: dst() and src()
CNodeExprAsgsub CNodeExprAssignment C assignment operation: dst -= src Two children expressions: dst() and src()
CNodeExprAsgmul CNodeExprAssignment C assignment operation: dst *= src Two children expressions: dst() and src()
CNodeExprAsgsshr CNodeExprAssignment C assignment operation: dst >>= src (signed) Two children expressions: dst() and src()
CNodeExprAsgushr CNodeExprAssignment C assignment operation: dst >>= src (unsigned) Two children expressions: dst() and src()
CNodeExprAsgshl CNodeExprAssignment C assignment operation: dst <<= src Two children expressions: dst() and src()
CNodeExprAsgsdiv CNodeExprAssignment C assignment operation: dst /= src (signed) Two children expressions: dst() and src()
CNodeExprAsgudiv CNodeExprAssignment C assignment operation: dst /= src (unsigned) Two children expressions: dst() and src()
CNodeExprAsgsmod CNodeExprAssignment C assignment operation: dst %= src (signed) Two children expressions: dst() and src()
CNodeExprAsgumod CNodeExprAssignment C assignment operation: dst %= src (unsigned) Two children expressions: dst() and src()
CNodeExprComma CNodeExprDoubleOperation C comma operator with deux expressions: first_op, second_op. Often used in conditions. Two children expressions: first_op() and second_op()
CNodeExprLor CNodeExprDoubleOperation C operation for first_op || second_op Two children expressions: first_op() and second_op()
CNodeExprLand CNodeExprDoubleOperation C operation for first_op && second_op Two children expressions: first_op() and second_op()
CNodeExprBor CNodeExprDoubleOperation C operation for first_op | second_op Two children expressions: first_op() and second_op()
CNodeExprXor CNodeExprDoubleOperation C operation for first_op ^ second_op Two children expressions: first_op() and second_op()
CNodeExprBand CNodeExprDoubleOperation C operation for first_op & second_op Two children expressions: first_op() and second_op()
CNodeExprEq CNodeExprDoubleOperation C operation for first_op == second_op (int or float) Two children expressions: first_op() and second_op()
CNodeExprNe CNodeExprDoubleOperation C operation for first_op != second_op (int or float) Two children expressions: first_op() and second_op()
CNodeExprSge CNodeExprDoubleOperation C operation for first_op >= second_op (signed or float) Two children expressions: first_op() and second_op()
CNodeExprUge CNodeExprDoubleOperation C operation for first_op >= second_op (unsigned) Two children expressions: first_op() and second_op()
CNodeExprSle CNodeExprDoubleOperation C operation for first_op <= second_op (signed or float) Two children expressions: first_op() and second_op()
CNodeExprUle CNodeExprDoubleOperation C operation for first_op <= second_op (unsigned) Two children expressions: first_op() and second_op()
CNodeExprSgt CNodeExprDoubleOperation C operation for first_op > second_op (signed or float) Two children expressions: first_op() and second_op()
CNodeExprUgt CNodeExprDoubleOperation C operation for first_op > second_op (unsigned) Two children expressions: first_op() and second_op()
CNodeExprSlt CNodeExprDoubleOperation C operation for first_op < second_op (signed or float) Two children expressions: first_op() and second_op()
CNodeExprUlt CNodeExprDoubleOperation C operation for first_op < second_op (unsigned) Two children expressions: first_op() and second_op()
CNodeExprSshr CNodeExprDoubleOperation C operation for first_op >> second_op (signed) Two children expressions: first_op() and second_op()
CNodeExprUshr CNodeExprDoubleOperation C operation for first_op >> second_op (unsigned) Two children expressions: first_op() and second_op()
CNodeExprShl CNodeExprDoubleOperation C operation for first_op << second_op Two children expressions: first_op() and second_op()
CNodeExprAdd CNodeExprDoubleOperation C operation for first_op + second_op Two children expressions: first_op() and second_op()
CNodeExprSub CNodeExprDoubleOperation C operation for first_op - second_op Two children expressions: first_op() and second_op()
CNodeExprMul CNodeExprDoubleOperation C operation for first_op * second_op Two children expressions: first_op() and second_op()
CNodeExprSdiv CNodeExprDoubleOperation C operation for first_op / second_op (signed) Two children expressions: first_op() and second_op()
CNodeExprUdiv CNodeExprDoubleOperation C operation for first_op / second_op (unsigned) Two children expressions: first_op() and second_op()
CNodeExprSmod CNodeExprDoubleOperation C operation for first_op % second_op (signed) Two children expressions: first_op() and second_op()
CNodeExprUmod CNodeExprDoubleOperation C operation for first_op % second_op (unsigned) Two children expressions: first_op() and second_op()
CNodeExprFadd CNodeExprDoubleOperation C operation for first_op + second_op (float) Two children expressions: first_op() and second_op()
CNodeExprFsub CNodeExprDoubleOperation C operation for first_op - second_op (float) Two children expressions: first_op() and second_op()
CNodeExprFmul CNodeExprDoubleOperation C operation for first_op * second_op (float) Two children expressions: first_op() and second_op()
CNodeExprFdiv CNodeExprDoubleOperation C operation for first_op / second_op (float) Two children expressions: first_op() and second_op()
CNodeExprPtr CNodeExprUnaryOperation Dereferencing a pointer *operand One child expression: operand()
CNodeExprFneg CNodeExprUnaryOperation C operation for -operand (float) One child expression: operand()
CNodeExprNeg CNodeExprUnaryOperation C operation for -operand One child expression: operand()
CNodeExprCast CNodeExprUnaryOperation Casting of an expression (type)operand One child expression: operand()
CNodeExprLnot CNodeExprUnaryOperation C operation for !operand One child expression: operand()
CNodeExprBnot CNodeExprUnaryOperation C operation for ~operand One child expression: operand()
CNodeExprRef CNodeExprUnaryOperation Get the reference of an expression &operand One child expression: operand()
CNodeExprPostinc CNodeExprUnaryOperation C operation for operand++ One child expression: operand()
CNodeExprPostdec CNodeExprUnaryOperation C operation for operand-- One child expression: operand()
CNodeExprPreinc CNodeExprUnaryOperation C operation for ++operand One child expression: operand()
CNodeExprPredec CNodeExprUnaryOperation C operation for --operand One child expression: operand()
CNodeExprSizeof CNodeExprUnaryOperation Sizeof of an expression sizeof(operand) One child expression: operand()
CNodeExprIdx CNodeExprMemAccess Access to an index into an array obj[off] Two children expressions: obj() and off().
CNodeExprMemref CNodeExprMemAccess Access to an element of a structure which is not a pointer obj.off One child expression: obj(). off() is a integer value.
CNodeExprMemptr CNodeExprMemAccess Access to an element of a structure which is a pointer obj->off One child expression: obj(). off() is a integer value.

Visitors in Bip

Two types of visitors are implemented in Bip. One use the ctree_visitor_t provided by IDA and allow to iterate on subclasses of HxCItem. The second one (and the one usually advise to use) is implemented directly in Bip and iterate on sublcasses of CNode. As a general rule, it should be expected to get improvement only on the Bip implementation of the visitors.

The Bip implementation of the visitor use a Deep-First Search (DFS) algorithm with preorder-traversal (the current node is visited before the children), when a statement is visited all the children expression will be visited before the children statements. The functions actually implementing this visitor are documented in Internal Hexrays Visitor API. The main methods allowing to use this visitor are visit_cnode() and visit_cnode_filterlist().

The IDA implementation can be access through the methods starting with hx_visit_ methods of HxCFunc. See Internal Hexrays Visitor API for more information on those.

AbstractCItem API

class bip.hexrays.AbstractCItem(citem)

Abstract class for common element between HxCItem and CNode.

This class provide access to common implementation: address of the element, equality operator, testing for the label and testing for expression or statement.

This class also define the is_handling_type() method used for determining the correct object to create using the TYPE_HANDLE attribute. Finally, it also define the _create_child() abstract method used for being able to create child nodes.

TYPE_HANDLE = -1

Class attribute indicating which type of item this class handles, this is used for determining if this is the good object to instantiate. All abstract class should have a value of -1 for this object, non-abstract class should have a value corresponding to the HxCType they handle.

__init__(citem)

Constructor for the abstract class HxCItem . This should never be used directly.

Parameters:citem – a citem_t object, in practice this should always be a cexpr_t or a cinsn_t object.
_citem = None

The citem_t object from ida, this is conserved at this level for providing a few functionnality compatible between different item types (such as HxCExpr and HxCStmt) .

ea

Property which return the address corresponding to this item.

Returns:An integer corresponding to the address of the item. This may be idc.BADADDR if the item as no equivalent address.
is_expr

Property which return true if this item is a C Expression (HxCExpr, cexpr_t).

is_statement

Property which return true if this item is a C Statement (HxCStmt, cinsn_t).

_ctype

Property which return the HxCType (ctype_t) of this object.

Return int:One of the HxCType constant value.
__str__()

Convert a citem to a string.

This is surcharge both by HxCStmt and HxCExpr.

has_label

Property which return True if the node has a label number.

label_num

Property which return the label number of the node. If this node has no label -1 is return. has_label() allows to check if the node has a label.

__eq__(other)

Compare to AST node. This is base on the compare implemented by hexrays and can return true for two different object including for comparing object which inherit from HxCItem and from CNode.

This seems to not work if the function has been recompiled.

Return NotImplemented if the element to compare does not
inherit from AbstractCItem
__ne__(other)

x.__ne__(y) <==> x!=y

_create_child(obj)

Abstract method which allow to create child element for this object with the correct class. This should be implemented by child classes and will raise a NotImplementedError exception if not surcharge.

classmethod is_handling_type(typ)

Class method which return True if the function handle the type passed as argument.

Parameters:typ – One of the HxCType value.
__weakref__

list of weak references to the object (if defined)

HxCtype Enum

class bip.hexrays.HxCType

Enum and static methods for manipulating the C type defined by HexRays. This is a wrapper on top of the ctype_t enum: cot_* are for the expresion (cexpr_t in ida, HxCExpr in bip ) and cit_* are for the statement (cinsn_t in ida, HxCStmt in bip). This also include some static function which are wrapper which manipulate those types.

Todo

static function for manipulating the enum ?

Comment on the enum are from hexrays.hpp .

COT_COMMA = 1

x, y

COT_ASG = 2

x = y

COT_ASGBOR = 3

x |= y

COT_ASGXOR = 4

x ^= y

COT_ASGBAND = 5

x &= y

COT_ASGADD = 6

x += y

COT_ASGSUB = 7

x -= y

COT_ASGMUL = 8

x *= y

COT_ASGSSHR = 9

x >>= y signed

COT_ASGUSHR = 10

x >>= y unsigned

COT_ASGSHL = 11

x <<= y

COT_ASGSDIV = 12

x /= y signed

COT_ASGUDIV = 13

x /= y unsigned

COT_ASGSMOD = 14

x %= y signed

COT_ASGUMOD = 15

x %= y unsigned

COT_TERN = 16

x ? y : z

COT_LOR = 17

x || y

COT_LAND = 18

x && y

COT_BOR = 19

x | y

COT_XOR = 20

x ^ y

COT_BAND = 21

x & y

COT_EQ = 22

x == y int or fpu (see EXFL_FPOP)

COT_NE = 23

x != y int or fpu (see EXFL_FPOP)

COT_SGE = 24

x >= y signed or fpu (see EXFL_FPOP)

COT_UGE = 25

x >= y unsigned

COT_SLE = 26

x <= y signed or fpu (see EXFL_FPOP)

COT_ULE = 27

x <= y unsigned

COT_SGT = 28

x >  y signed or fpu (see EXFL_FPOP)

COT_UGT = 29

x >  y unsigned

COT_SLT = 30

x <  y signed or fpu (see EXFL_FPOP)

COT_ULT = 31

x <  y unsigned

COT_SSHR = 32

x >> y signed

COT_USHR = 33

x >> y unsigned

COT_SHL = 34

x << y

COT_ADD = 35

x + y

COT_SUB = 36

x - y

COT_MUL = 37

x * y

COT_SDIV = 38

x / y signed

COT_UDIV = 39

x / y unsigned

COT_SMOD = 40

x % y signed

COT_UMOD = 41

x % y unsigned

COT_FADD = 42

x + y fp

COT_FSUB = 43

x - y fp

COT_FMUL = 44

x * y fp

COT_FDIV = 45

x / y fp

COT_FNEG = 46

-x fp

COT_NEG = 47

-x

COT_CAST = 48

(type)x

COT_LNOT = 49

!x

COT_BNOT = 50

~x

COT_PTR = 51

*x, access size in ‘ptrsize’

COT_REF = 52

&x

COT_POSTINC = 53

x++

COT_POSTDEC = 54

x--

COT_PREINC = 55

++x

COT_PREDEC = 56

--x

COT_CALL = 57

x(...)

COT_IDX = 58

x[y]

COT_MEMREF = 59

x.m

COT_MEMPTR = 60

x->m, access size in ‘ptrsize’

COT_NUM = 61

n

COT_FNUM = 62

fpc

COT_STR = 63

string constant

COT_OBJ = 64

obj_ea

COT_VAR = 65

v

COT_INSN = 66

instruction in expression, internal representation only

COT_SIZEOF = 67

sizeof(x)

COT_HELPER = 68

arbitrary name

COT_TYPE = 69

arbitrary type

COT_LAST = 69

All before this are cexpr_t after are cinsn_t

CIT_EMPTY = 70

instruction types start here

CIT_BLOCK = 71

block-statement: { … }

CIT_EXPR = 72

expression-statement: expr;

CIT_IF = 73

if-statement

CIT_FOR = 74

for-statement

CIT_WHILE = 75

while-statement

CIT_DO = 76

do-statement

CIT_SWITCH = 77

switch-statement

CIT_BREAK = 78

break-statement

CIT_CONTINUE = 79

continue-statement

CIT_RETURN = 80

return-statement

CIT_GOTO = 81

goto-statement

CIT_ASM = 82

asm-statement