Malware analysis of functions
When you do analysis of malware in for example x64dbg or IDA Pro it’s important that you understand how functions are called, what arguments are passed to the function and how to recognize the local variables within that function.
Further down in this post are my notes from the SANS FOR610: Reverse-Engineering Malware: Malware Analysis Tools and Techniques course and the The IDA Pro Book.
Basic concepts of low-level analysis of functions
First some core concepts.
What is a function?
A function is a group of executable statements grouped into a unit. A function typically performs a specific task like writing data to a file or starting a network connection.
What are the building blocks for a function?
A function has three basic components
- Input, the part that deals with the information passed to the function;
- Body, the core statements that perform the task;
- Return, the value that is returned by the function when all tasks are completed.
When a program executes a function it jumps to another memory location, executes the tasks and then returns to the original location from where the jump was taken.
Stack frames?
Stack frames are the blocks of memory allocated within a program’s run time stack and dedicated to a specific invocation of such a function. In other words, this is the memory space to hold for example the information passed to the function (the parameters or arguments) and the local variables used by the function to perform its tasks. It also contains the address to which the function should return after finishing its tasks.
A side effect of stack frames is that it allows recursion. Each call to a function is given its own stack frame, “isolated” from the predecessors.
Prologue and epilogue
Passing variables to the function (allocating space or setting up registers) is called the prologue of a function. Accordingly, the clean up of the space (stack) and restoring registers is called the epilogue. An prologue happens at the start of a function whereas the epilogue happens at the end of a function.
Calling conventions
It would have been to simple to have one common, shared, method to call functions, including passing data in and out of functions. That’s why they invented calling conventions. A calling convention dictates
- Where a caller should place variables required by a function, either on the stack or in registers;
- Who is responsible for removing them from the stack or restoring the registers.
Oh, and to make things worse, the implementation of the convention may vary by compiler.
cdecl or C Calling Convention
Cdecl is used by most C compilers for the x86 architecture.
- Parameters to a function are placed on the stack from right-to-left;
- The caller removes the parameters from the stack;
- Because the caller removes the parameters, functions can have a variable number of parameters;
- Return variable placed in EAX.
stdcall or Standard Calling Convention
The label ‘standard’ is the name used by Microsoft for its conventions and is similar to cdecl.
- Parameters to a function are placed on the stack from right-to-left;
- The called function (callee) is responsible to remove the parameters from the stack;
- Because the callee removes the parameters, functions always have a fixed, determined, number of parameters;
- Return variable placed in EAX.
Because there is no need to foresee code to cleanup the stack after every function call this can result is less code.
Microsoft uses stdcall convention for all fixed-argument functions exported from shared library (DLL) files.
Fastcall
Fastcall is a variation of stdcall and uses up to two parameters in registers instead of the stack.
- The first two parameters are placed in the registers ECX and EDX;
- Any remaining parameters are placed on the stack;
- The called function (callee) is responsible to remove the parameters from the stack;
- Because the callee removes the parameters, functions always have a fixed, determined, number of parameters;
- Return variable placed in EAX.
Thiscall or C++ Calling Convention
In C++ objects can refer to their selves via the “this” pointer. The address of the object used to invoke the function must be supplied by the caller and is therefore provided as a parameter. Different compilers use different techniques for the implementation as there is no exact specification in the standard on how to implement this.
- For Microsoft, “this” is passed to the ECX register;
- For Microsoft, the function (callee) cleans up;
- GNU behaves as if cdecl is used and places “this” as last parameter on the stack;
- This also means that with GNU compilers the caller is responsible for cleaning up the stack.