Bip Project architecture

This part describe the main architecture of Bip. It is a good read for understanding the global design of Bip, however for starting reading the Overview is probably simpler.

The Module architecture part describes how the different modules and classes are interfaced together, the common code patterns part explains how Bip was developed for being used, finally the interfacing with IDA & limitations part explains how the interface with IDA is made and problems link to that design.

Module architecture

Bip is decomposed in three main modules: bip.base, bip.hexrays and bip.gui.

Base

The module bip.base is in charge of all the basic interfaces with IDA for manipulating and modifying the IDB. The following schematic represent the main classes and their link together:

../_images/bip_base2.png

Those classes represent the main elements which can be accessed and manipulated using Bip. Several building blocks exist inside this module:

  • The elements (BipBaseElt, BipRefElt and BipElt) are abstract classes which allow to provide common interfaces for their child classes. The GetElt() and GetEltByName() allow to recuperate directly the correct child class from their ID or their name. This allow to provide an easy to use way to get the correct object from just an address or an ID, this is in particular used for xref. For more information see Elements.
  • The xrefs allow to make link with all children objects of BipRefElt (BipInstr, BipData, BipStruct, BStructMember) and with BipFunction. Xref are represented by the BipXref classes but in most case methods are provided for recuperating directly the correct object or address without using directly those objects. Xref can be of several types depending if they are link because of data or code, they also include the control flow links.
  • The structure are represented by two different but linked classes: BipStruct and BStructMember. A BipStruct will keep references to its members and a member will keep a reference to its struct. Both of those class can use xref through the API they inherit from BipRefElt.
  • The type are represented by the abstract class BipType and several child classes starting by the prefix BType. Types are used in numerous occasions and impact both the analysis and the comportement of IDA. For more information on how they work in Bip see Type.
  • The instructions and data inherit from BipElt and so possess access to xref api provided by BipRefElt but also numerous API link to the fact of having an address and potentially data (“bytes”). BipData are also linked to the BipType which directly impact the behavior of some methods. BipInstr possess references to the BipFunction and BipBlock when they exist and can also allow to manipulate BipOperand.
  • The functions (BipFunction) are a critical link to the API: they possess BipXref, allow link to their basic block (BipBlock) and the BipInstr. They provide also methods for accessing their callers and callees. Finally they make the link between the bip.base module and the bip.hexrays module.

Hexrays

The module bip.hexrays contains the interfaces for manipulating the hexrays decompiler from IDA. This module will not provide anything if an hexrays decompiler for the current architecture is not set. The following schematic represent the architecture of this module:

../_images/bip_hexrays_cnode.png

The central part of the bip.hexrays module is the HxCFunc which is used for representing a C function as decompiled by HexRays. A HxCFunc allows to access local storage of the function lvar represented by HxLvar which have a name, a type and may or not be arguments of the function. The second interesting part about HxCFunc is they allow access to the AST created by HexRays, this AST represent a subset of C and it is possible to use visitors for inspecting the nodes from which it is composed.

Warning

When asking the hexrays decompiler to decompile again a function (F5) a new object representing the decompiled function will be created, if a HxCFunc was recuperated before that it will not use the correct object which can lead to wierd behavior.

CNode is an abstract class (all class are abstract for the CNode except the leaf of the inheritance tree) which represent a node of the AST, two main types of node exist: CNodeStmt which represent a C statement (if, for, while, block, goto, continue, return, …) and CNodeExpr which represent C expressions (arithmetic and logic operations, function calls, cast, memory access, …). As an AST is a tree most nodes will have children: CNodeStmt can have CNodeExpr or CNodeStmt as children, while CNodeExpr can only have other CNodeExpr as children. For helping to manipulate those objects some intermediate abstract class are define such as CNodeExprFinal which represent all expressions without child.

For more information about the usage and implementation of hexrays see Hexrays interface (bip.hexrays).

Note

CNode and HxCItem

It is expected of a Bip user to use CNode for manipulating AST nodes but in practice two different implementations of the hexrays AST nodes exist in Bip: the CNode and the HxCItem. Those two implementations are in fact exactly the same with the only difference that the CNode objects have a link to their HxCFunc and there parent CNode object in the AST (at the exception of the root node which does not have a parent).

This difference in implementation allow to travel more easilly the AST and to make efficient link with other components, the simplest example is the possibility to create a link between the CNodeExprVar object and the corresponding HxLvar object, while it is not possible using the HxCExprVar object (this may have change since IDA 7.3 with the access to the microcode API in IdaPython).

For avoiding code duplication all the CNode classes are automatically generated from their equivalent HxCItem classes at the exception of CNode (equivalent to HxCItem), CNodeExpr (HxCExpr) and CNodeStmt (HxCStmt). Every change in the HxCItem classes will also change the comportement of the equivalent CNode classes. The methods unique to the CNode classes are present in the cnode.py file and use the @addCNodeMethod decorator.

For more information about the internal implementation of CNode see CNode generation and internals.

Example of AST

As an illustration of the AST view, lets look at a really simple function with the following code decompiled:

// this is a simple example written by hand for explaining how this works,
//   hexrays may optimize this equivalent code differently.
int f(int a1, int *a2) {
    int v1;

    v1 = a1 + 2; // (1)
    if (a2) // (2)
        return v1 + 3; // (3)
    return v1; // (4)
}

Note

All the class given there after are the one for the CNode implementation but it works exactly the same for the HxCItem implementation.

Note

Depending on the IDA version, the tree structure may change however the idea how the AST works should stay the same.

A HxCFunc root_node() should always be a CNodeStmtBlock. This node will contain a list of other statments representing the content of the function.

The first statement (1) contained in the root_node() will be a CNodeStmtExpr which is just a statement containing an expression. This expression will be a CNodeExprAsg which represent an assignment, this expression contains two operands and inherit from the CNodeExprDoubleOperation abstract class. The first operand (first_op()) will be the left part of the assignment: a local variable (v1) which is represented by the CNodeExprVar with a value of 2, this value correspond to the index in the local variable array and arguments of the function counts also (so a1 will be at index 0), its possible to directly get the HxLvar object corresponding using the lvar() property. The right part of the assignment will be a CNodeExprAdd node with 2 children himself: a CNodeExprVar again for a1 and a CNodeExprNum for the number 2.

The second statement (2) contained in the root_node() will be a CNodeStmtIf which contains 2 children: an expression for the condition and a statement for the content of the if, if an else was present a third child will be present (a statement again). The expression for the condition will once again be a CNodeExprVar.

The content of the if (3) will be a CNodeStmtBlock with only one statement in it a CNodeStmtReturn representing the return, itself containing a CNodeExprAdd once again for the v1 + 3.

The last statement will be CNodeStmtReturn statement containing a CNodeExprVar.

As can be seen in this simple example this is a pretty straight forward AST structures, however for being able to use it well a good understanding of the node types is better. For a complete list of the different types of CNode see AST Node types.

Gui

Finally the bip.gui module is the smallest module, it contains the interfaces for the user interfaces and the plugins. Its architecture is represented by this schematic:

../_images/bip_gui.png

The most important part of this module for a user is the BipPlugin system. Bip defines its own plugin system which is separated from the one of IDA, each plugin should inherit from the class BipPlugin (directly or indirectly) and will be loaded by the BipPluginManager . Each Bip plugin should be a singleton and can be recuperated using the BipPluginManager, which is itself a singleton and a real IDA Plugin (recuperated using get_plugin_manager()).

Activities are objects made for interfacing with different parts of IDA, and in particular for being able to be used as decorator of methods of a BipPlugin. The BipActivity is an abstract class which is a callable and expect a handler and a way to register with the IDA interface. The simplest example of Activity is the BipAction which allows to define menu entry or shortcuts (hot-key) in IDA, as a general rule they are made to being used as decorator in a similar way as the property decorator of Python.

Note

BipActivityContainer

The BipActivityContainer is a particular activity containing several activities and which does not do any action by it-self. It is made for allowing to chain decorators on the same method.

For more information about writing plugins and their internal see Plugins.

Common code patterns

Bip class identification

Bip provide an abstraction in top of several objects in IDA, several different classes in Bip can be used for representing the same IDA objects (ex.: CNode, BipType, …). Each different class will provide different functionalities depending on attribute(s) of the underlying IDA object, this allows to avoid trying to use features which are not set or invalid in the IDA object and to clarify the usage of those objects.

In most cases Bip will provide one static function or one static method which allows to get the object of the correct class (ex: GetElt(), GetEltByName(), from_citem(), …). Most parent classes of the objects provide ways to test which kind of object will be produced. However, the intended way to check for the object type is to use the isinstance function with the object type being tested.

Here are a few examples of how it was intended to be used. In this first example the first instruction of a function is recuperated using GetEltByName(), in this case we know it is an instruction (BipInstr) but the function can return other subclasses of BipElt. We then look at the BipElt which reference this address, some are BipInstr and some are BipData, for knowing which is which we use isinstance.

>>> from bip import *
>>> elt = GetEltByName("RtlQueryProcessLockInformation")
>>> elt # first instruction of RtlQueryProcessLockInformation
BipInstr: 0x1800D2FF0 (mov     rax, rsp)
>>> elt.is_code # this are common property to BipElt which are used to get the correct object
True
>>> elt.is_data
False
>>> for e in elt.xEltTo: # here we get the Elt xref, elements can be BipInstr or BipData
...   if isinstance(e, BipInstr): # in case of instr we want to print the mnemonic
...     print("Found instr at 0x{:X} which ref function, with mnemonic: {}".format(e.ea, e.mnem))
...   elif isinstance(e, BipData): # for BipData there is no mnemonic available and so we just want the address
...     print("Found data ref at 0x{:x}".format(e.ea))
...   else:
...     print("Something else ?? {}".format(e))
Found instr at 0x1800C12A2 which ref function, with mnemonic: call
Found data ref at 0x1801136d7
Found data ref at 0x1801434a8
Found data ref at 0x18016c7fc

This next example show how to check for types. All types in Bip inherit from BipType, the from_tinfo() method allow to get the correct Bip object from the tinfo_t object used by IDA (which is used for all different types). In most cases there is no need to go through this method, Bip objects which are typed should have a type property which should allow to get their type and the methods from_c() and get_at() should allow to get the correct value easily. However when scripting it is often interesting to look at the type of an object, more information about types and the different classes which represent them can be found in the Type documentation. Here is a small example of how to look at the types, we start with a BTypeStruct and look at the members, if a member is a pointer (BTypePtr) we look at the subtype pointed.

>>> from bip import *
>>> tst = BipType.from_c("struct {char a; int b; void *c; __int64 d; char *e; void *(*f)(int i);}")
>>> tst.members_info
{'a': <bip.base.biptype.BTypeInt object at 0x0000029B22C24160>, 'c': <bip.base.biptype.BTypePtr object at 0x0000029B22C24128>, 'b': <bip.base.biptype.BTypeInt object at 0x0000029B22C24390>, 'e': <bip.base.biptype.BTypePtr object at 0x0000029B22C244A8>, 'd': <bip.base.biptype.BTypeInt object at 0x0000029B22C24438>, 'f': <bip.base.biptype.BTypePtr object at 0x0000029B22C24048>}
>>> for i in range(tst.nb_members):
...     if isinstance(tst.get_member_type(i), BTypePtr):
...        print("We have a ptr for member {}! Type pointed is: {}".format(tst.get_member_name(i), tst.get_member_type(i).pointed.str))
...     else:
...        print("Not a pointer for member {}, type is: {}".format(tst.get_member_name(i), tst.get_member_type(i).str))
Not a pointer for member a, type is: char
Not a pointer for member b, type is: int
We have a ptr for member c! Type pointed is: void
Not a pointer for member d, type is: __int64
We have a ptr for member e! Type pointed is: char
We have a ptr for member f! Type pointed is: void *__stdcall(int i)

This is also the case when using visitors in hexrays. The Bip visitors return objects which inherit from the CNode class. As in the other example, the easiest way to determine which types of node is to use isinstance. An example of this can simply be found in the overview in the part CNode / Visitors in the visit_call functions, it is also shown indirectly through the usage of the method visit_cnode_filterlist() which takes a list of class in argument, under the hood this function will visit all nodes and call the callback only for the one being instance of one of the class passed in the second argument.

It is worth noticing that in most cases the underlying object or identifier used by IDA will be kept in reference in one of the private attributes of the object.

Interfacing with IDA & limitations

This part of the documentation describe some limitations of Bip and some problems and limitations which can occur because of the interface with IDA. Basic usage of the API for recuperating should not create many of those problems, one noticeable exception is when modifying the database using the GUI and reusing objects which have been kept from before the modification, this, of course, include the undo feature of IDA. More “advanced” usage and developers should consider being careful about those.

In a lot of case IDA provides API which allows to recuperate the information necessary, however, in some cases the IDAPython API do not offer such useful wrappers and for being able to get the full benefits of the available API it is necessary to recuperate reference to the underlying objects. Those objects are available in Python through the swig interface (part of IDAPython) build by IDA on top of their C++ API. As a general rule, people from hexrays encourage avoiding keeping references on those objects, but as said earlier there is not always a choice. Because of this several problems exist.

The first is a really simple problem but hard to solve: when keeping an object of a type in Bip the underlying IDB can be changed (using the API or the GUI). This can make the current object of Bip invalid. A simple example of this will be to recuperate an BipInstr object using GetElt() and then to undefine this element using the GUI. If GetElt() is called again a BipData object will be returned, which is the object expected. However, if the previous object is used, it can lead to unexpected behavior because this address is not an instruction anymore, for example the property mnem will return an empty string.

For avoiding this problem as much as possible, Bip tries to avoid keeping references to the IDA objects or to memoize information, but often this is not possible and it has an overall cost in performance. As a general rule when doing modifications to the IDB the user should be careful to fetch again the object instead of re-using them. Sadly there is actually no solution implemented in Bip for solving this problem. In theory it could be implemented using the event API of hexrays but this may create several other complex problems if it is even possible.

Another common problem of using the IDA objects is that wrappers are only wrappers . What this basically said is that we have to handle the management of our objects in python as they are in C++, with SWIG in the middle. This include the example in the blogpost but also several others, such as for example the ones link to the Type API (look for the warning). As a rule, having problem with the memory management when using the Bip standard (not private) API is not considered normal and can be reported as a bug. However this can force development to make some particular choice for enforcing this.

Finally, the undo feature provided in IDA may invalidate any or all of the internal object use by IDA (“The simplest approach is to assume that the database has completely changed and to re-read information from the database to the memory.” and “Plugins in general should not cache and reuse pointers to kernel objects (like func_t, segment_t). These pointers may change between plugins invocations.” from Undo: IDA can do it). As there is nothing really possible to do at that point for supporting such a thing and trying to actualize all possible objects when an undo or redo is simply not acceptable it is advice to simply disable this feature.