Finding the metadata loader: Simplest case
If the metadata file is called global-metadata.dat and this string is not encrypted in the binary,
we can cruise through on easy mode. Simply search for the filename string, search for cross-references to the string address,
and the instruction you find will usually be in il2cpp::vm::MetadataCache::Initialize.
In IDA, you can do this as follows:
- Press Shift+F12 to generate a list of all the strings in the files
- Press Ctrl+F and type global-metadata.dat. There will likely be one match
- Double-click on the match
- Click on the label and press X to generate a list of cross-references:
- Press Enter to follow the first and likely only cross-reference (or double-click on the desired reference if there are more than one)
- Press F5. You will now be in il2cpp::vm::MetadataCache::Initialize
Of course, we cannot rely on this string being available in plaintext if the developers have chosen to encrypt the strings to prevent this easy search –
or at all if a different filename is used or if the metadata is embedded in the binary or another file – in which case, we will proceed to trace the code path.
>>Finding the metadata loader: Tracing an unobfuscated code path
The next simplest case involves no obfuscation of the actual code path to the loader (although the loader itself may be obfuscated).
After having navigated to il2cpp_init in IDA and invoked the decompiler, here is an example trace which is typical of most IL2CPP binaries
(this example is from an ubobfuscated version of Fall Guys):
__int64 __fastcall il2cpp_init(__int64 a1)
{
__int64 v1; // rbx
v1 = a1;
setlocale(0, Locale);
return (unsigned __int8)sub_18025B340(v1);
}
The call to sub_18025B340 is almost certainly il2cpp::vm::Runtime::Init – we rename it and click through:
__int64 __fastcall il2cpp::vm::Runtime::Init(__int64 a1)
{
v1 = a1;
v66 = &unk_182B7C088;
sub_180225A60(&unk_182B7C088);
v2 = dword_182B7C338++;
if ( v2 > 0 )
{
v3 = 1;
goto LABEL_125;
}
sub_180225480();
sub_180225460();
v4 = (__int64 *)operator new(0x10ui64);
v64 = v4;
if ( v4 )
v5 = (void *)sub_18028BD70(v4, 0x80000i64);
else
v5 = 0i64;
qword_182B7C298 = v5;
v6 = (__int64 *)operator new(0x10ui64);
v64 = v6;
if ( v6 )
v7 = (void *)sub_18028BD30(v6);
else
v7 = 0i64;
qword_182B7C2A0 = v7;
v8 = (__int64 *)operator new(0x10ui64);
v64 = v8;
if ( v8 )
v9 = (void *)sub_18028BD30(v8);
else
v9 = 0i64;
qword_182B7C2A8 = v9;
qword_182B7C398 = (__int64)"4.0";
sub_1802253C0();
sub_180225310();
sub_18028C380();
if ( !sub_18025CEF0() )
{
--dword_182B7C338;
v3 = 0;
goto LABEL_125;
}
sub_180293880();
sub_180227740((__int64)sub_180274290);
v10 = (__int64 *)operator new(0x18ui64);
v64 = v10;
This is just the start of the function and we have to wade through a bunch of junk, but we can use waypoints to help us.
Notice the assignment of the value 4.0 in line 37 – this is the .NET Framework version which was assigned in the source code above:
s_FrameworkVersion = framework_version_for(runtime_version);
os::Image::Initialize();
os::Thread::Init();
il2cpp::utils::RegisterRuntimeInitializeAndCleanup::ExecuteInitializations();
if (!MetadataCache::Initialize())
return false;
Notice this is followed up by three function calls,
followed by the call to MetadataCache::Initialize in the if statement.
This pattern is identical in both the source code and decompilation, so we can surmise that sub_18025CEF0 is probably the target function,
rename it and click through again:
char il2cpp::vm::MetadataCache::Initialize()
{
v0 = sub_180261550("global-metadata.dat");
*&xmmword_182B7C2D8 = v0;
if ( v0 )
{
*(&xmmword_182B7C2D8 + 1) = v0;
qword_182B7B948 = j_j__calloc_base(*(qword_182B7C2C0 + 48), 8i64);
qword_182B7B950 = j_j__calloc_base(*(*(&xmmword_182B7C2D8 + 1) + 164i64) / 0x5Cui64, 8i64);
qword_182B7B958 = j_j__calloc_base(*(*(&xmmword_182B7C2D8 + 1) + 52i64) >> 5, 8i64);
qword_182B7B968 = j_j__calloc_base(*(qword_182B7C2C0 + 64), 8i64);
dword_182B7B970 = *(*(&xmmword_182B7C2D8 + 1) + 172i64) / 0x28ui64;
qword_182B7B978 = j_j__calloc_base(dword_182B7B970, 80i64);
dword_182B7B980 = *(*(&xmmword_182B7C2D8 + 1) + 180i64) / 0x44ui64;
qword_182B7B988 = j_j__calloc_base(dword_182B7B980, 96i64);
v1 = *(&xmmword_182B7C2D8 + 1);
Tip: I’ve disabled casts in the code snippet above for readability.
You can do this in IDA by pressing \ (backslash) in the decompiler window.
Here we clearly see at the very start that sub_180261550 corresponds to il2cpp::vm::MetadataLoader::LoadMetadataFromFile,
and furthermore the resulting pointer v0 is stored in xmmword_182B7C2D8 – this is the static global storing the pointer to the memory-mapped metadata file
(this may also be a dword or a qword depending on the architecture of the file you’re reverse engineering).
This last point is very important because all accesses to the metadata by the application will occur via this pointer,
so if there is any just-in-time deobfuscation or decryption to be performed,
we will be able to find it by searching to references to this pointer (just-in-time means that the deobfuscation is performed just before the data is used,
rather than when the file is loaded; this has the advantage of not leaving deobfuscated data lying around in memory,
at the expense of slightly reduced performance). Generally, however,
the applications I’ve encountered perform the decryption before any accesses,
either at the start of the function above or in il2cpp::vm::MetadataLoader::LoadMetadataFromFile.