Starting with a clean slate
As you work through the text below, I recommend you create an empty Unity project targeting IL2CPP and build it as described earlier,
so that you can look at each referenced function in the source code as you follow along. Don’t be afraid to open up the files and explore!
- global-metadata.dat is usually located at <appname>_Data/il2cpp_data/Metadata/global-metadata.dat regardless of target platform – you can examine it easily with a hex editor like HxD
- The source code for libil2cpp can be found at C:\Program Files\Unity\Hub\Editor\20xx.x.x\Editor\Data\il2cpp\libil2cpp if you have installed Unity via Unity Hub in the default location on Windows
- il2cpp.exe – which is the transpiler itself – can be found in the build folder located above the previous folder. There is no source code for this, however it is trivially browsed with your favourite .NET Decompiler and is not obfuscated
- The actual C++ generated by il2cpp.exe can be found in the il2cppOutput folder of your project’s build output
Tip: When you build the project, tick Copy PDB Files and Development Build.
This will generate symbol files for all of the functions in the binary.
IDA will load automatically load these files, making it much easier to navigate the disassembly.
>>How metadata is loaded
The key parts of the startup sequence from a reverse-engineering standpoint are shown in Figure below.
The sequence is convoluted but not particularly difficult to trace.
IL2CPP startup sequence for loading metadata
IL2CPP generates two files in the root of the C++ output called Il2CppCodeRegistration.cpp and Il2CppMetadataRegistration.c.
These files define the two key top-level binary metadata tables we are looking for.
These tables contain pointers to all of the other binary metadata tables,
and allow us to correlate the contents of the metadata file to concrete function addresses and used type references in the binary.
When a DLL (or .so) file loads, it may execute one or more startup functions before returning control to the caller.
Il2CppCodeRegstration.cpp generates just such a startup function, which looks something like this:
void s_Il2CppCodegenRegistration()
{
il2cpp_codegen_register (&g_CodeRegistration, &g_MetadataRegistration, &s_Il2CppCodeGenOptions);
}
When the binary loads, a pointer to this function is passed to
il2cpp::utils::RegisterRuntimeInitializeAndCleanup::RegisterRuntimeInitializeAndCleanup()
(snappy name I know) which stores it in a function table for later use.
Once control is returned to the UnityPlayer engine, it calls the API export il2cpp_init, which eventually leads to a call to
il2cpp::utils::RegisterRuntimeInitializeAndCleanup::ExecuteInitializations().
This function calls every function stored in the previously mentioned function table, thereby calling s_Il2CppCodeGenRegistration() in the process.
Notice that this hooking mechanism also enables 3rd party developers to perform dependency injection if they require their own initialization – or decryption – code.
Via a long-winded sequence of nested function calls, s_Il2CppCodegenRegistration()
eventually calls il2cpp::vm::MetadataCache::Register() which actually stores the pointers to
Il2CppCodeRegistration and Il2CppMetadataRegistration and performs some pre-processing.
Once this dance is completed, control returns and il2cpp::vm::MetadataCache::Initialize() is called.
This function is responsible for calling the loader that fetches global-metadata.dat,
however the file is not all loaded into memory at once – rather, it is mapped for demand paging via mmap.
This has a couple of consequences. First it means you can’t just dump the memory of a running application
to retrieve its entire metadata file should it be obfuscated or encrypted without some trickery. Second,
it means that file accesses to the metadata may appear at seemingly non-sensical code locations if you are looking at a stack trace.
Here is a stack trace using ProcMon from when the metadata file is first memory-mapped:
Here is one from later on:
In the second screenshot, reading a string from an array causes the Windows kernel to demand page the metadata file to find the string,
since it is actually in the file on disk. I’ll talk more about ProcMon’s role in IL2CPP reverse engineering in a later article.