Bip Project architecture¶
This part describe the main architecture of Bip. It is a good read for understanding the global design of Bip, however for starting reading the Overview is probably simpler.
The Module architecture part describes how the different modules and classes are interfaced together, the common code patterns part explains how Bip was developed for being used, finally the interfacing with IDA & limitations part explains how the interface with IDA is made and problems link to that design.
Module architecture¶
Bip is decomposed in three main modules: bip.base
, bip.hexrays
and
bip.gui
.
Base¶
The module bip.base
is in charge of all the basic interfaces with IDA for
manipulating and modifying the IDB. The following schematic represent the
main classes and their link together:
Those classes represent the main elements which can be accessed and manipulated using Bip. Several building blocks exist inside this module:
- The elements (
BipBaseElt
,BipRefElt
andBipElt
) are abstract classes which allow to provide common interfaces for their child classes. TheGetElt()
andGetEltByName()
allow to recuperate directly the correct child class from their ID or their name. This allow to provide an easy to use way to get the correct object from just an address or an ID, this is in particular used for xref. For more information see Elements. - The xrefs allow to make link with all children objects of
BipRefElt
(BipInstr
,BipData
,BipStruct
,BStructMember
) and withBipFunction
. Xref are represented by theBipXref
classes but in most case methods are provided for recuperating directly the correct object or address without using directly those objects. Xref can be of several types depending if they are link because of data or code, they also include the control flow links. - The structure are represented by two different but linked classes:
BipStruct
andBStructMember
. ABipStruct
will keep references to its members and a member will keep a reference to its struct. Both of those class can use xref through the API they inherit fromBipRefElt
. - The type are represented by the abstract class
BipType
and several child classes starting by the prefixBType
. Types are used in numerous occasions and impact both the analysis and the comportement of IDA. For more information on how they work in Bip see Type. - The instructions and data inherit from
BipElt
and so possess access to xref api provided byBipRefElt
but also numerous API link to the fact of having an address and potentially data (“bytes”).BipData
are also linked to theBipType
which directly impact the behavior of some methods.BipInstr
possess references to theBipFunction
andBipBlock
when they exist and can also allow to manipulateBipOperand
. - The functions (
BipFunction
) are a critical link to the API: they possessBipXref
, allow link to their basic block (BipBlock
) and theBipInstr
. They provide also methods for accessing their callers and callees. Finally they make the link between thebip.base
module and thebip.hexrays
module.
Hexrays¶
The module bip.hexrays
contains the interfaces for manipulating the
hexrays decompiler from IDA. This module will not provide anything if an
hexrays decompiler for the current architecture is not set. The following
schematic represent the architecture of this module:
The central part of the bip.hexrays
module is the HxCFunc
which is used for representing a C function as decompiled by HexRays.
A HxCFunc
allows to access local storage of the function lvar
represented by HxLvar
which have a name, a type and may or not be
arguments of the function. The second interesting part about HxCFunc
is they allow access to the AST created by HexRays, this AST represent a
subset of C and it is possible to use visitors for inspecting the nodes from
which it is composed.
Warning
When asking the hexrays decompiler to decompile again a function (F5
)
a new object representing the decompiled function will be created, if a
HxCFunc
was recuperated before that it will not use the
correct object which can lead to wierd behavior.
CNode
is an abstract class (all class are abstract for the CNode
except the leaf of the inheritance tree) which represent a node of the AST,
two main types of node exist: CNodeStmt
which represent a C
statement (if, for, while, block, goto, continue, return, …)
and CNodeExpr
which represent C expressions (arithmetic and logic
operations, function calls, cast, memory access, …). As an AST is a tree
most nodes will have children: CNodeStmt
can have CNodeExpr
or CNodeStmt
as children, while CNodeExpr
can only have
other CNodeExpr
as children. For helping to manipulate those objects
some intermediate abstract class are define such as CNodeExprFinal
which represent all expressions without child.
For more information about the usage and implementation of hexrays see Hexrays interface (bip.hexrays).
Note
CNode and HxCItem
It is expected of a Bip user to use CNode
for manipulating
AST nodes but in practice two different implementations of the hexrays AST
nodes exist in Bip: the CNode
and the HxCItem
. Those two
implementations are in fact exactly the same with the only difference
that the CNode
objects have a link to their HxCFunc
and there parent CNode
object in the AST (at the exception of
the root node which does not have a parent).
This difference in implementation allow to travel more easilly the AST and
to make efficient link with other components, the simplest example is the
possibility to create a link between the CNodeExprVar
object
and the corresponding HxLvar
object, while it is not possible
using the HxCExprVar
object (this may have change since IDA 7.3
with the access to the microcode API in IdaPython).
For avoiding code duplication all the CNode
classes are
automatically generated from their equivalent HxCItem
classes at
the exception of CNode
(equivalent to HxCItem
),
CNodeExpr
(HxCExpr
) and CNodeStmt
(HxCStmt
). Every change in the HxCItem
classes will
also change the comportement of the equivalent CNode
classes. The
methods unique to the CNode
classes are present in the
cnode.py
file and use the @addCNodeMethod
decorator.
For more information about the internal implementation of CNode
see CNode generation and internals.
Example of AST¶
As an illustration of the AST view, lets look at a really simple function with the following code decompiled:
// this is a simple example written by hand for explaining how this works,
// hexrays may optimize this equivalent code differently.
int f(int a1, int *a2) {
int v1;
v1 = a1 + 2; // (1)
if (a2) // (2)
return v1 + 3; // (3)
return v1; // (4)
}
Note
All the class given there after are the one for the
CNode
implementation but it works exactly the same
for the HxCItem
implementation.
Note
Depending on the IDA version, the tree structure may change however the idea how the AST works should stay the same.
A HxCFunc
root_node()
should
always be a CNodeStmtBlock
. This node will contain a
list of other statments representing the content of the function.
The first statement (1) contained in the root_node()
will be a CNodeStmtExpr
which is just a statement
containing an expression. This expression will be
a CNodeExprAsg
which represent an assignment, this
expression contains two operands and inherit from the
CNodeExprDoubleOperation
abstract class. The first
operand (first_op()
) will be the
left part of the assignment: a local variable (v1
) which is represented by
the CNodeExprVar
with a value of 2
, this value
correspond to the index in the local variable array and arguments of the
function counts also (so a1
will be at index 0
), its possible to
directly get the HxLvar
object corresponding using the
lvar()
property. The right part of the
assignment will be a CNodeExprAdd
node with 2 children
himself: a CNodeExprVar
again for a1
and a
CNodeExprNum
for the number 2
.
The second statement (2) contained in the
root_node()
will be
a CNodeStmtIf
which contains 2 children: an expression
for the condition and a statement for the content of the if
, if
an else
was present a third child will be present (a statement again).
The expression for the condition will once again be a
CNodeExprVar
.
The content of the if
(3) will be a
CNodeStmtBlock
with only one statement in it a
CNodeStmtReturn
representing the return
, itself
containing a CNodeExprAdd
once again for the v1 + 3
.
The last statement will be CNodeStmtReturn
statement
containing a CNodeExprVar
.
As can be seen in this simple example this is a pretty straight forward AST
structures, however for being able to use it well a good understanding of
the node types is better. For a complete list of the different types of
CNode
see AST Node types.
Gui¶
Finally the bip.gui
module is the smallest module, it contains the
interfaces for the user interfaces and the plugins. Its architecture is
represented by this schematic:
The most important part of this module for a user is the
BipPlugin
system. Bip defines its own plugin system which is
separated from the one of IDA, each plugin should inherit from the class
BipPlugin
(directly or indirectly) and will be loaded by the
BipPluginManager
. Each Bip plugin should
be a singleton and can be recuperated using the
BipPluginManager
, which is itself a singleton
and a real IDA Plugin (recuperated using get_plugin_manager()
).
Activities are objects made for interfacing with different parts of
IDA, and in particular for being able to be used as decorator of methods of a
BipPlugin
. The BipActivity
is an abstract class which is a
callable and expect a handler and a way to register with the IDA interface.
The simplest example of Activity is the BipAction
which allows to
define menu entry or shortcuts (hot-key) in IDA, as a general rule they
are made to being used as decorator in a similar way as the property
decorator of Python.
Note
BipActivityContainer
The BipActivityContainer
is a particular activity containing
several activities and which does not do any action by it-self. It is made
for allowing to chain decorators on the same method.
For more information about writing plugins and their internal see Plugins.
Common code patterns¶
Bip class identification¶
Bip provide an abstraction in top of several objects in IDA, several different
classes in Bip can be used for representing the same IDA objects (ex.:
CNode
, BipType
, …). Each different
class will provide different functionalities depending on attribute(s) of the
underlying IDA object, this allows to avoid trying to use features which are
not set or invalid in the IDA object and to clarify the usage of those
objects.
In most cases Bip will provide one static function or one static method which
allows to get the object of the correct class (ex: GetElt()
,
GetEltByName()
, from_citem()
, …).
Most parent classes of the objects provide ways to test which kind of object
will be produced. However, the intended way to check for the object type is
to use the isinstance
function with the object type being tested.
Here are a few examples of how it was intended to be used. In this first
example the first instruction of a function is recuperated using
GetEltByName()
, in this case we know it is an instruction
(BipInstr
) but the function can return other subclasses
of BipElt
. We then look at the BipElt
which reference this address, some are BipInstr
and some
are BipData
, for knowing which is which we
use isinstance
.
>>> from bip import *
>>> elt = GetEltByName("RtlQueryProcessLockInformation")
>>> elt # first instruction of RtlQueryProcessLockInformation
BipInstr: 0x1800D2FF0 (mov rax, rsp)
>>> elt.is_code # this are common property to BipElt which are used to get the correct object
True
>>> elt.is_data
False
>>> for e in elt.xEltTo: # here we get the Elt xref, elements can be BipInstr or BipData
... if isinstance(e, BipInstr): # in case of instr we want to print the mnemonic
... print("Found instr at 0x{:X} which ref function, with mnemonic: {}".format(e.ea, e.mnem))
... elif isinstance(e, BipData): # for BipData there is no mnemonic available and so we just want the address
... print("Found data ref at 0x{:x}".format(e.ea))
... else:
... print("Something else ?? {}".format(e))
Found instr at 0x1800C12A2 which ref function, with mnemonic: call
Found data ref at 0x1801136d7
Found data ref at 0x1801434a8
Found data ref at 0x18016c7fc
This next example show how to check for types. All types in Bip inherit from
BipType
, the from_tinfo()
method
allow to get the correct Bip object from the tinfo_t
object used by
IDA (which is used for all different types). In most cases there is no need
to go through this method, Bip objects which are typed should have a type
property which should allow to get their type and
the methods from_c()
and
get_at()
should allow to get the correct value easily.
However when scripting it is often interesting to look at the type of an
object, more information about types and the different classes which represent
them can be found in the Type documentation. Here is a
small example of how to look at the types, we start with a
BTypeStruct
and look at the members, if a member is a
pointer (BTypePtr
) we look at the subtype pointed.
>>> from bip import *
>>> tst = BipType.from_c("struct {char a; int b; void *c; __int64 d; char *e; void *(*f)(int i);}")
>>> tst.members_info
{'a': <bip.base.biptype.BTypeInt object at 0x0000029B22C24160>, 'c': <bip.base.biptype.BTypePtr object at 0x0000029B22C24128>, 'b': <bip.base.biptype.BTypeInt object at 0x0000029B22C24390>, 'e': <bip.base.biptype.BTypePtr object at 0x0000029B22C244A8>, 'd': <bip.base.biptype.BTypeInt object at 0x0000029B22C24438>, 'f': <bip.base.biptype.BTypePtr object at 0x0000029B22C24048>}
>>> for i in range(tst.nb_members):
... if isinstance(tst.get_member_type(i), BTypePtr):
... print("We have a ptr for member {}! Type pointed is: {}".format(tst.get_member_name(i), tst.get_member_type(i).pointed.str))
... else:
... print("Not a pointer for member {}, type is: {}".format(tst.get_member_name(i), tst.get_member_type(i).str))
Not a pointer for member a, type is: char
Not a pointer for member b, type is: int
We have a ptr for member c! Type pointed is: void
Not a pointer for member d, type is: __int64
We have a ptr for member e! Type pointed is: char
We have a ptr for member f! Type pointed is: void *__stdcall(int i)
This is also the case when using visitors in hexrays. The Bip visitors return
objects which inherit from the CNode
class. As
in the other example, the easiest way to determine which types of node is to
use isinstance
. An example of this can simply be found in the overview in
the part CNode / Visitors in the visit_call
functions,
it is also shown indirectly through the usage of the method
visit_cnode_filterlist()
which takes a list of
class in argument, under the hood this function will visit all nodes and call
the callback only for the one being instance of one of the class passed in the
second argument.
It is worth noticing that in most cases the underlying object or identifier used by IDA will be kept in reference in one of the private attributes of the object.
Interfacing with IDA & limitations¶
This part of the documentation describe some limitations of Bip and some problems and limitations which can occur because of the interface with IDA. Basic usage of the API for recuperating should not create many of those problems, one noticeable exception is when modifying the database using the GUI and reusing objects which have been kept from before the modification, this, of course, include the undo feature of IDA. More “advanced” usage and developers should consider being careful about those.
In a lot of case IDA provides API which allows to recuperate the information necessary, however, in some cases the IDAPython API do not offer such useful wrappers and for being able to get the full benefits of the available API it is necessary to recuperate reference to the underlying objects. Those objects are available in Python through the swig interface (part of IDAPython) build by IDA on top of their C++ API. As a general rule, people from hexrays encourage avoiding keeping references on those objects, but as said earlier there is not always a choice. Because of this several problems exist.
The first is a really simple problem but hard to solve: when keeping an object
of a type in Bip the underlying IDB can be changed (using the API or the GUI).
This can make the current object of Bip invalid. A simple example of this
will be to recuperate an BipInstr
object
using GetElt()
and then to undefine this element using the GUI.
If GetElt()
is called again a BipData
object will be returned, which is the object expected. However, if the previous
object is used, it can lead to unexpected behavior because this address is
not an instruction anymore, for example the
property mnem
will return an empty string.
For avoiding this problem as much as possible, Bip tries to avoid keeping references to the IDA objects or to memoize information, but often this is not possible and it has an overall cost in performance. As a general rule when doing modifications to the IDB the user should be careful to fetch again the object instead of re-using them. Sadly there is actually no solution implemented in Bip for solving this problem. In theory it could be implemented using the event API of hexrays but this may create several other complex problems if it is even possible.
Another common problem of using the IDA objects is that wrappers are only wrappers . What this basically said is that we have to handle the management of our objects in python as they are in C++, with SWIG in the middle. This include the example in the blogpost but also several others, such as for example the ones link to the Type API (look for the warning). As a rule, having problem with the memory management when using the Bip standard (not private) API is not considered normal and can be reported as a bug. However this can force development to make some particular choice for enforcing this.
Finally, the undo feature provided in IDA may invalidate any or all of the internal object use by IDA (“The simplest approach is to assume that the database has completely changed and to re-read information from the database to the memory.” and “Plugins in general should not cache and reuse pointers to kernel objects (like func_t, segment_t). These pointers may change between plugins invocations.” from Undo: IDA can do it). As there is nothing really possible to do at that point for supporting such a thing and trying to actualize all possible objects when an undo or redo is simply not acceptable it is advice to simply disable this feature.