Reverse Engineering Adventures: Honkai Impact 3rd (Part 3)
This is a continuation of the Reverse Engineering Adventures: Honkai Impact 3rd mini-series – read
part 1 and
part 2 first!
So far, we have decrypted
global-metadata.dat, and identified and resolved the data obfuscation of
Il2CppGlobalMetadataHeader and the four obfuscated metadata tables. We have observed that the string data is still out of our reach, and we need this to be able to load Honkai Impact into Il2CppInspector. Today we’ll find out how to access this information and create a final working plugin that will enable us to fully deobfuscate the game and analyze it.
>>Two balls of string
IL2CPP metadata includes two distinct kinds of string data:
- .NET symbol identifiers – these are identifiers used in the source code such as class, method, field and property names. This information is included in the metadata to enable reflection, which is a major design pattern in .NET applications. IL2CPP and this article refers to these as “strings“.
- Fixed application strings – these are strings used by the application itself, such as error messages, network hostnames, logging output and any other
static string values used by the source code which don’t change over the lifetime of the application. IL2CPP and this article refers to these as “string literals“.
Strings are stored in a single table in
global-metadata.dat, located at the offset named
stringOffset in the header. The strings are null-terminated and indexed by their byte offset from the start of the string table. Each string immediately follows the previous with no alignment padding, ie. the first character of string n can be found at the byte following the null terminator of string n-1.
String literals are managed by two tables in
global-metadata.dat. The first is located at the offset named
stringLiteralDataOffset in the header. This is a pure blob of string data. The strings are not null-terminated, and zero-indexed as a single-dimension array, ie. 0, 1, 2… Each string immediately follows the previous with no alignment padding. The table located at the offset named
stringLiteralOffset in the header consists of two 32-bit integers per entry, each corresponding to a single string, specifying the string’s offset from the start of
stringLiteralDataOffset and the string’s length. To find a string literal of index n, you look up entry n from this table, then read length bytes starting at offset from the data table.
IL2CPP provides two functions,
const char *il2cpp::vm::MetadataCache::GetStringFromIndex(int index) and
Il2CppString *il2cpp::vm::MetadataCache::GetStringLiteralFromIndex(int index) to retrieve strings and string literals respectively (
Il2CppString * is a type with a small header indicating the string length, followed by each character encoded as UTF-16).
In order to find out how Honkai Impact retrieves its string data, we need to find and investigate these two functions in the binary. One of several ways to do this is to scan the IL2CPP source code to find all of the call sites for these two functions and try to trace a path up each call site’s call stack until we find a known function that we’ve already discovered in Honkai Impact’s disassembly, then, starting from this point, we can work our way back down the same call stack – this time in the disassembly – until we reach the desired function.
>>Better to be lucky than good
As it happens, we already stumbled across
GetStringFromIndex by accident when we decompiled
imageIndex = 0;
if ( s_ImagesCount > 0 )
imageIndex_1 = 0i64;
while ( 1 )
pString = pGetStringFromIndex(v9, *v10);
*(&local_ImagesTable->name + imageIndex_1) = pString;
Here is the equivalent code from the IL2CPP source code:
for (int32_t imageIndex = 0; imageIndex < s_ImagesCount; imageIndex++)
const Il2CppImageDefinition* imageDefinition = imagesDefinitions + imageIndex;
Il2CppImage* image = s_ImagesTable + imageIndex;
image->name = GetStringFromIndex(imageDefinition->nameIndex);
You may recall from the end of part 2 that the call was replaced with a call to a function in a different DLL
What are v9 and v10?
v9 = s_GlobalMetadata;
v10 = (s_GlobalMetadata + s_GlobalMetadataHeader->imagesOffset);
v9 is very simple, it points to the start of
global-metadata.dat in memory.
v10 is currently pointing to the start of the images table, which at first seems a little strange. At the end of the loop,
v10 is incremented by 8:
v10 += 8;
The image table consists of a list of
Il2CppImageDefinition. Let’s look at this type:
typedef struct Il2CppImageDefinition
(all of the types ending in Index here are uint32_ts)
Now the situation becomes more clear. The first field in an
Il2CppImageDefinition is a string index (offset), and there are 8 items in each image definition.
v10 is an
unsigned int *, so as a result of C pointer arithmetic magic, adding 8 to it increments the value by 8 pointer widths, not 8 bytes! Therefore,
v10 + 8 points to the start of the next image definition, and therefore the next image definition’s string index. From this, we can understand that the loop iterates over all of the image definitions and fetches a string corresponding to each, which for an image definition is the name of the image, eg.
UnityEngine.dll and so on.
So, the external call to
pGetStringFromIndex passes in two arguments: a pointer to
global-metadata.dat and the desired string index, and returns a pointer to the fetched string.
How do we find which DLL the function is in, and its address within that DLL? Normally, we would fire up a debugger and set a breakpoint at the call site, then step forward one instruction to find out where we land. Unfortunately, Honkai Impact is protected by VMProtect and is full of debugger traps which will cause any attempt to attach a debugger to crash the process. We’re going to have to get creative.
>>What’s in a (thread) name?
We navigate to
pGetStringFromIndex in IDA and bring up the list of cross-references to find out where it is set:
We find just one location, in the IL2CPP API
il2cpp_thread_get_name. What? What does fetching a thread name have to do with setting an import address? And why is it setting it from an MMX/SSE register?
Smelling some hijinks, we disassemble this function:
.text:00007FFF41EA3740 il2cpp_thread_get_name proc near
.text:00007FFF41EA3740 sub rsp, 48h
.text:00007FFF41EA3744 cmp dword ptr [rdx], 5F5E0EBh
.text:00007FFF41EA374A jnz short loc_7FFF41EA3782
.text:00007FFF41EA374C movups xmm1, xmmword ptr [rcx]
.text:00007FFF41EA374F movsd xmm0, qword ptr [rcx+10h]
.text:00007FFF41EA3754 movsd [rsp+48h+var_18], xmm0
.text:00007FFF41EA375A mov rax, [rsp+48h+var_18]
.text:00007FFF41EA375F movq cs:unityplayer_DecryptMetadata, xmm1
.text:00007FFF41EA3767 psrldq xmm1, 8
.text:00007FFF41EA376C mov cs:qword_7FFF43D74F90, rax
.text:00007FFF41EA3773 xor eax, eax
.text:00007FFF41EA3775 movq cs:pGetStringFromIndex, xmm1
.text:00007FFF41EA377D add rsp, 48h
Experienced analysts looking at this code right now probably just burst out laughing. If you didn’t, don’t worry: this is really obscure and the following explanation should put a smile on your face.
Let’s decompile this function:
__int64 __fastcall il2cpp_thread_get_name(__m128i *a1, _DWORD *a2)
__m128i v2; // xmm1
__int64 result; // rax
__int64 (__fastcall *v4)(_QWORD, _QWORD, _QWORD); // [rsp+30h] [rbp-18h]
if ( *a2 != 99999979 )
v2 = *a1;
v4 = a1.m128i_i64;
unityplayer_DecryptMetadata = *a1;
qword_7FFF43D74F90 = v4;
result = 0i64;
pGetStringFromIndex = *&_mm_srli_si128(v2, 8);
The second argument –
a2 – is not used at all except to check whether its dereferenced value is the very suspicious
99999979. If it’s not, the return function
sub_7FFF41EA8890 simply sets
*a2 to zero and returns zero (lines 7-8). Why on earth do we need this? We don’t: this serves no purpose except to refuse to run the function unless
*a2 is set to the tested value; it’s like a primitive form of authentication. The stench of shenanigans is rising.
The rest of the code uses something called SSE Intrinsics, which are generally used for SIMD floating point operations to accelerate functions like video decoding and other multimedia applications. The data type
__m128i as defined by Intel is a 16-byte integer. The intrinsic
m128i_i64 returns a two-item array such that index 0 contains the lower 64 bits (8 bytes) of the value, and index 1 contains the upper 64 bits.
Line 10 tells us that
a1 is a two-item array of
__m128i via the access to
a1. Lines 9 and 11 which access
*a1 can be considered equivalent to accessing
a1 (these are semantically equivalent in C).
Three function imports are stored via this function – one to
unityplayer.DecryptMetadata, one to
qword_7FFF43D74F90 and one to
pGetStringFromIndex. We resolved
unityplayer.DecryptMetadata in part 1 of this series, so it’s reasonable to assume that the call to
il2cpp_thread_get_name is being made by
UnityPlayer.dll and that the other two imports are from the same DLL. We will investigate this more later.
The assignment of
unityplayer.DecryptMetadata on line 11 causes an implicit cast of an
__m128i to a
QWORD. The latter is 64 bits wide so the top 64 bits of
a1 get discarded.
This leaves the slightly more tricky assignment on line 14. The SSE intrinsic
_mm_srli_si128 (which I freely admit I had to look up in the Intel Intrinsics Guide – 90% of hacking is research!) shifts the first operand right by the number of bytes (not bits) specified in the second operand. Line 14 calls
_mm_srli_si128(v2, 8), so we are essentially taking the top 64 bits of
v2 and discarding the bottom 64 bits (by shifting
v2 right 64 bits, the top 64 bits get filled with zeroes). The resulting value is then assigned to the import
Since this can be a little difficult to understand, a diagram might help:
Figure 1. Memory layout of a1
Ultimately, this function takes 24 bytes, interprets them as three 64-bit address pointers and stores them for later use.
This is some really sneaky stuff and I found it pretty amusing. This is probably a hand-coded assembly function, and it is quite strange because it’s clearly not obfuscated enough to have any meaningful effect, but it is obfuscated in an obscure enough way to raise a smile, knowing that whoever wrote this assembly was having a good time.
There is one final cherry on top to this bizarre excursion:
il2cpp_thread_get_name was removed in the Unity version immediately before the one used in Honkai Impact. In other words, this is not a real IL2CPP API export. It’s a decoy export designed to conceal where the import address are set. Hilarious!