More techniques for finding binary metadata

There are a plethora of other techniques you can used to find the binary metadata besides searching the init function table. Here is a summary of them, starting from the easiest:

  • Check the export table – some binaries define g_CodeRegistration and g_MetadataRegistration (sometimes with a leading underscore) as symbols. If this is the case, you can navigate straight to them from the export table
  • Signature search – you can take the raw hex values that make up the Il2CppCodeGenRegistration function, exclude address-specific values, and then search for them in another binary. This can be done by hand or by using FLIRT signatures in IDA. A tutorial for this can be found on this forum post at unknowncheats.me.
  • Brute-force attack – you can search the data sections for values which correlate with those in global-metadata.dat, then step backwards to find the start of each structure. The way this is done depends on the version of IL2CPP and can get a bit fiddly; check out the source code for ImageScan.cs in Il2CppInspector if you’re interested.

>>Metadata obfuscation

Automated tooling for IL2CPP binaries such as Il2CppInspector rely heavily on the ability to parse global-metadata.dat and find Il2CppCodeRegistration and Il2CppMetadataRegistration in the application. Therefore, these structures and the breadcrumb trails that lead to them are the prime targets for obfuscation by developers.

Typical forms of obfuscation include:

  • Stripping the export table
  • Encrypting the IL2CPP API export symbols
  • Packing or encrypting the binary
  • Encrypting global-metadata.dat
  • Embedding global-metadata.dat in the binary itself
  • Re-arranging the order of fields of structures in global-metadata.dat and/or the binary metadata
  • Obfuscating the control flow in the assembly code which accesses the binary metadata
  • Applying a .NET symbol obfuscator to the C# code in the event an attacker is able to extract the metadata

The most common form of encryption is the classical single-byte XOR, for which it is trivial to resolve the key by inspection in a hex editor by looking at an area of the file that would normally contain mostly zeroes. Strings encrypted with single-byte XOR are similarly decrypted by looking at the final byte of the string, which in a null-terminated string should also always be zero – therefore the final byte is the XOR key.

At the time of writing (December 2020), most current obfuscation is trivially defeated by manual analysis, although writing automated tooling to handle it is substantially more difficult. Il2CppInspector can currently resolve stripped exports, encrypted IL2CPP API exports, packed PE files, XOR encryption of ELF binaries, XOR string encryption in global-metadata.dat, rearranged fields in binaries and inlined functions for x86, x64, ARMv7 and ARMv8 automatically. It will search for binary metadata using symbol tables, signatures, code disassembly and brute-force attack. Plugins can be created to add missing functionality.

>>It’s a jungle out there

Now you know how to find the metadata, what does it mean and what can we do with it? We’ll talk about that in part 3, where we will begin to pick apart the labyrinthine web of metadata now at our disposal and find out how it all connects together.