IL2CPP Reverse Engineering Part 2: Structural Overview & Finding the Metadata

Earlier we learned what IL2CPP is, how to setup a build environment, and compared the C#, IL, C++ and disassembly of a simple function.

In this article, you will learn:

  • an overview of the key files in an IL2CPP application from a reverse-engineering perspective
  • how an IL2CPP application loads the metadata we are interested in
  • how to find the application binary’s metadata by hand in a disassembler (x64 and ARM)
  • beginner-level disassembly navigation and tidying in IDA
  • how to interpret C++ function calls in assembly language

Pre-requisites:

  • Basic knowledge of high-level programming
  • Basic knowledge of disassembly (the article uses IDA but Ghidra works equally well)
  • Basic knowledge of what IL2CPP is – I recommend that you read part 1 first if you’re new to IL2CPP

01 The Executable

IL2CPP applications are forged from two key components. First, there is the application code itself.

On Windows, the main executable of an IL2CPP application is essentially just a stub that loads UnityPlayer.dll and calls UnityMain. For an IL2CPP game, this will select Unity’s IL2CPP initialization path and load the main application binary; this is usually called GameAssembly.dll in the application’s root path but it can be placed elsewhere and renamed.

On Android, the application binary is libil2cpp.so, and on iOS everything is generally wrapped up into a single executable. Other platforms use different layouts, but all of the binaries can be analyzed in the same way, so the target platform doesn’t matter too much.

The application binary (which I’ll just call “the binary” from hereon) is the output created by taking the regular Mono DLLs for the application (eg. Assembly-CSharp.dll and its dependencies, as if it was shipped without IL2CPP) and running them through the IL2CPP transpiler, and is therefore the main target for reverse engineering since it contains the actual application code.

Besides the application code itself, the binary also contains a vast sea of binary-specific metadata such as a pointer list every C#-equivalent function, data about every type referenced by method code and so on. Many (most) binaries also expose the IL2CPP API – a large group of exported functions allowing you to query and modify data in the application at runtime – useful for dynamic analysis with a debugger. These APIs can be found in the export table and begin with the prefix il2cpp_.

02 The meta-data

The other main file of interest for analysts is global-metadata.dat (“the metadata”). This file is a platform-independent data file created by IL2CPP containing all of the .NET metadata for the application. This includes definitions (including symbols) for all of the types, methods, properties, fields and so on for the application. Many of the structures within are similar to those used by the actual .NET runtime, but tweaked for IL2CPP. Serge Lidin provides a thorough treatise of the metadata in the excellent book Expert .NET 2.0 IL Assembler.

The metadata file is always a little-endian 32-bit width set of data, with tables linked via indices rather than pointers. Therefore in principle, if you are compiling the same application for multiple platforms, you only need one copy of global-metadata.dat and different executable binaries for each platform. In practice, builds are often customized with platform-specific functions for Windows, Android and so on.

The metadata file format is very simple. It always starts with the signature 0xFAB11BAF (little-endian) followed by 4 bytes containing the metadata version number. This is followed by a long list of offset/length pairs for the various tables of information, directly followed by the tables themselves. Which tables are actually present depends on the version number, and there will also be corresponding changes in the binary for different versions.

One may wonder why an overzealous publisher would want to ship their product with everything required to reconstruct all of the types and method prototypes in plain sight. Ultimately, this data is required due to .NET’s heavy reliance on reflection (known in other languages as runtime type information or RTTI) and attributed programming, and cannot be easily elided. As is the case with Unity apps built with the Mono scripting backend, some developers choose to use canned obfuscation software such as the popular BeeByte to arbitrarily redefine un-exported symbol names. These tools are useful as a roadblock to thwart the casual attacker, but for anyone used to determining the meaning of code from the code itself rather than its symbols, such obfuscators have limited effect.

On its own, the metadata file can be used to re-construct the entire structure of the application as it was when it was written in C# – with more or less everything except for the actual source code to the methods themselves – however this gives us zero insight into the structure of the actual binary we’re analyzing. To do this, we need to combine the metadata file with the specific binary file we’re looking at, and to do that, we need to first find the location of the binary’s own metadata structures. This is crucial for successful reverse engineering, and is our goal for today.