Threading the needle
We start by navigating to the top of the user mode call stack, 0x7FFF4E2C3E85
in UnityPlayer.dll
:
.text:00007FFF4E2C3E6C mov [rbp+0D30h+anonymous_28], rax
.text:00007FFF4E2C3E73 mov rax, [rbp+0D30h+anonymous_69]
.text:00007FFF4E2C3E77 mov rcx, [rbp+0D30h+anonymous_30]
.text:00007FFF4E2C3E7E mov rdx, [rbp+0D30h+anonymous_28]
.text:00007FFF4E2C3E85 movups xmm0, xmmword ptr [rax+rcx]
.text:00007FFF4E2C3E89 movups xmmword ptr [rdx], xmm0
.text:00007FFF4E2C3E8C mov rsi, [rbp+0D30h+anonymous_23]
.text:00007FFF4E2C3E90 sub rsp, 20h
.text:00007FFF4E2C3E94 mov r8d, 0B00h ; Size
Note that the instruction pointer (EIP for x86, RIP for x64) is incremented before it’s pushed onto the stack, so the actual
instruction that triggers the call to ReadFile
is the previous one, at 0x7FFF4E2C3E7E
. It’s just a
mov
, so it is likely triggering the read call by attempting to read from an uninitialized location in the demand
paged memory range. Not very interesting. We scroll up and down in this function and discover it is both huge and obfuscated
using a technique called control flow obfuscation.
Here is the control flow graph
(CFG) for this function:
Essentially this is a form of multi-level control flow flattening
. In a nutshell, the function is a giant
finite state machine
(FSM) controlled by an arbitrarily-introduced state variable. The function loops repeatedly in its entirety, performing a very small action on
each loop iteration based on the state variable, then updating the state variable. The actions are buried within many layers of if
and switch
statements, making it very difficult to reverse engineer by static analysis. As an analyst, I could not possibly be
less excited about this diagram.
At this juncture I should note that the object of static analysis is not to determine what every line in a program does.
Disassemblies often consist of millions of lines of code, and trying to weave your way through figuring out what every
instruction means is a slow laborious way to accomplish nothing. Instead, we try to judge the overall purpose of functions
at a slightly higher level and only delve down into the instruction level for small snippets of code that hold the greatest relevance.
One way to do this is to look at the inputs and outputs of a function rather than its actual code. Consider a 100,000-line
obfuscated function which takes two integers as its input and returns one integer. If feeding in 1 and 2 produces an output
of 3 every time, and feeding in 11 and 22 produces an output of 33 every time, it’s fairly safe to assume that at least in general
, the function sums its two inputs and returns the total. There is no need to reverse engineer the function’s code unless it produces something that deviates from our thesis.
Immediately we have learned something useful. Our mystery function takes two arguments and returns one. In addition,
we know the first argument is a pointer because v25
has been cast to const void *
.
The return value is stored in v0
and not referenced again until this function ends, whereupon
it is passed back to the caller as the return value.
.text:00007FFF41EE075A xor edx, edx
.text:00007FFF41EE075C call sub_7FFF41EDD140
.text:00007FFF41EE0761 jmp short loc_7FFF41EE076F
.text:00007FFF41EE0763 mov edx, r12d ; _QWORD
.text:00007FFF41EE0766 call cs:qword_7FFF43D74F80
.text:00007FFF41EE076C mov rsi, rax
.text:00007FFF41EE076C ; } // starts at 7FFF41EE06F0
Looking again at the instruction prior to the one pointed to by the stack, this time we find an actual call, to
qword_7FFF43D74F80
, which is an uninitialized static value set at runtime. We know for sure this calls
DoSomethingWithMetadata
in UnityPlayer.dll
, so we rename this address to pDoSomethingWithMetadata
(the p
is short for pointer), navigate to the top of the function and invoke the decompiler. The decompiled function is a
couple of hundred lines long but the call to the obfuscated function is visible and looks like this:
a6 = 0;
v29 = sub_7FFF41EB3860(&a1, 3, 1, 1u, 0, &a6);
v26 = v29;
if ( !a6 )
{
v27 = sub_7FFF41EB36A0(v29, &a6);
v28 = v27.LowPart;
if ( !a6 )
{
v25 = (const void *)sub_7FFF41EDCFE0(v26, 0i64, 0);
sub_7FFF41EB3170(v26, &a6);
if ( a6 )
sub_7FFF41EDD140(v25);
else
v0 = pDoSomethingWithMetadata(v25, v28);
}
}
Immediately we have learned something useful. Our mystery function takes two arguments and returns one. In addition, we
know the first argument is a pointer because v25
has been cast to const void *
. The return value
is stored in v0
and not referenced again until this function ends, whereupon it is passed back to the caller as the return value.
We might be able to determine what v0
is by moving down in the stack once more, but first we want to try to
determine the input arguments. Generally we do this by clicking on the functions around the call to see if we can establish
some context – particularly if they use the same arguments or return values subsequently passed as arguments to the function
of interest. It doesn’t really matter how you approach this too much, but remember we just want to get an overview of what’s
happening without perfectly understanding every function. I start arbitrarily with the prior function call to sub_7FFF41EDD140
, whose only argument is the same as the first argument to the mystery function:
void __fastcall sub_7FFF41EDD140(LPCVOID a1)
{
LPCVOID lpBaseAddress; // rbx
void *v2; // rcx
_QWORD *v3; // rax
if ( a1 )
{
lpBaseAddress = a1;
sub_7FFF41EE16C0(&unk_7FFF43D7DF50);
UnmapViewOfFile(lpBaseAddress);
v2 = qword_7FFF43D7DF58;
v3 = (_QWORD *)*((_QWORD *)qword_7FFF43D7DF58 + 1);
if ( *((_BYTE *)v3 + 25) )
goto LABEL_15;
The full function is 36 lines but
The full function is 36 lines but all we need is line 11: this function unmaps a file from memory. By way of illustration,
lines 1, 9 and 11 are the only lines I looked at and the only lines of consequence. It doesn’t matter what the rest is –
it’s likely to just be error handling and other cleanup. The input argument a1
is passed to
UnmapViewOfFile
and that is this function’s primary purpose. In this case, IDA helps us by automatically naming the Win32 API call for us, as well as renaming
v1
to lpBaseAddress
– the name of the argument to UnmapViewOfFile
in Microsoft’s documentation.
Experienced analysts won’t need to look this up, but if you’re not familiar with an API call, it is especially useful
to refer to the official documentation. Let’s see what Microsoft says lpBaseAddress
is:
A pointer to the base address of the mapped view of a file that is to be unmapped. This value must be identical to the value returned by a previous call to the MapViewOfFile or MapViewOfFileEx function.
Since this argument is the same as the first argument to the mystery function, we now know that it is a pointer to demand paged memory.
The call is on the other side of the if branch to the unmap function, so a6 in the first decompilation above is likely an error flag.
We rename the function, v25 and a6, as well as setting a6 to bool (we don’t bother renaming anything in the unmap function,
there is no need to since we have what we needed to learn from it already and won’t be revisiting it):
*&error = 0;
v25 = sub_7FFF41EB3860(&v35, 3, 1i64);
v26 = v25;
if ( !*&error )
{
v27 = sub_7FFF41EB36A0(v25, &error);
if ( !*&error )
{
hFile = sub_7FFF41EDCFE0(v26, 0i64, 0i64);
sub_7FFF41EB3170(v26, &error);
if ( *&error )
unmapFile(hFile);
else
v0 = pDoSomethingWithMetadata(hFile, v27);
}
}
Before we go any further, do we have any thoughts on what the second argument – now v27
– might be? Unlike
in .NET, arrays in C and C++ (including blocks of bytes) do not have a convenient Length
property and are
actually just raw pointers to memory locations. If you want to know the size of the array, you need to pass it as a
separate argument, and that is an extremely common design pattern in C++. v27
is assigned by sub_7FFF41EB36A0
so let’s examine that function:
LARGE_INTEGER __fastcall sub_7FFF41EB36A0(void *a1, DWORD *a2)
{
DWORD *v2; // rbx
LARGE_INTEGER result; // rax
LARGE_INTEGER FileSize; // [rsp+38h] [rbp+10h]
v2 = a2;
*a2 = 0;
if ( GetFileSizeEx(a1, &FileSize) )
{
result = FileSize;
}
else
{
*v2 = GetLastError();
result.QuadPart = 0i64;
}
return result;
}
Very straightforward, a1 is a file handle and the function gets its size with GetFileSizeEx, returning any errors in a2.
Our theory is confirmed.
You can continue to flesh this out a bit if you like, depending on how much detail you need. Here is what I ended up with:
*&error = 0;
hFile_1 = fileOpen(&metadataPathname, 3, 1, 1u, 0, &error);
if ( !*&error )
{
v27 = getFileSize(hFile_1, &error);
metadataSize = v27.LowPart;
if ( !*&error )
{
hFile = mapFile(hFile_1, 0i64, 0);
closeFile(hFile_1, &error);
if ( *&error )
unmapFile(hFile);
else
v0 = pDoSomethingWithMetadata(hFile, metadataSize);
}
}
It should be pretty clear by this point that this code checks that global-metadata.dat
exists, gets its file size,
maps it into memory, and – if there were no errors – calls our mystery function with a pointer to the start of the file in paged
memory and its length.
What is the result in v0
, and what happens to it when the function we’re analyzing returns to the caller?
Obviously the current line of thinking is that the DoSomethingWithMetadata
function decrypts the metadata file,
and the return value is a pointer to the decrypted data, or perhaps the number of bytes decrypted or a result or error code.
Let’s step back for a moment. In another Il2CPP article
I presented this diagram illustrating the initialization process of IL2CPP as it pertains to loading the metadata:
The relevant part here is that there is a call chain that proceeds il2cpp_init()
-> il2cpp::vm::Runtime::Init()
->
il2cpp::vm::MetadataCache::Initialize()
. There is actually one more function call before global-metadata.dat
is accessed, which you can see from the source code of libil2cpp/vm/MetadataCache.cpp
:
void MetadataCache::Initialize()
{
s_GlobalMetadata = vm::MetadataLoader::LoadMetadataFile("global-metadata.dat");
s_GlobalMetadataHeader = (const Il2CppGlobalMetadataHeader*)s_GlobalMetadata;
IL2CPP_ASSERT(s_GlobalMetadataHeader->sanity == 0xFAB11BAF);
The function vm::MetadataLoader::LoadMetadataFile
is defined in libil2cpp/vm/MetadataLoader.cpp
and looks like this:
void* MetadataLoader::LoadMetadataFile(const char* fileName)
{
std::string resourcesDirectory = utils::PathUtils::Combine(utils::Runtime::GetDataDir(), utils::StringView<char>("Metadata"));
std::string resourceFilePath = utils::PathUtils::Combine(resourcesDirectory, utils::StringView<char>(fileName, strlen(fileName)));
int error = 0;
FileHandle* handle = File::Open(resourceFilePath, kFileModeOpen, kFileAccessRead, kFileShareRead, kFileOptionsNone, &error);
if (error != 0)
return NULL;
void* fileBuffer = utils::MemoryMappedFile::Map(handle);
File::Close(handle, &error);
if (error != 0)
{
utils::MemoryMappedFile::Unmap(fileBuffer);
fileBuffer = NULL;
return NULL;
}
return fileBuffer;
}
This more or less resembles the decompiled code we just analyzed, except it would seem an else
clause has
been added to the final if
to make that sneaky call into UnityPlayer.dll
! Note that the return
value of the original version of LoadMetadataFile
is a pointer to the start of the mapped global-metadata.dat
.
Since our decompiled version of LoadMetadataFile
returns the value returned by DoSomethingWithMetadata
,
it is almost a certainty that DoSomethingWithMetadata
decrypts the metadata and returns a pointer to it, since the caller
(il2cpp::vm::MetadataCache::Initialize()
) will expect unencrypted data unless it has been modified too.
We don’t normally have the source code to parts of applications we’re reverse engineering so we’re quite lucky that IL2CPP
is open source, but let’s imagine we don’t have that luxury. At this point I want to pull in the UnityPlayer.dll
of our blank project, which we haven’t looked at yet. All the symbols are available so we can easily navigate to il2cpp::vm::MetadataLoader::LoadMetadataFile
,
scroll down and compare:
error = 0;
v27 = il2cpp::os::File::Open(&path, 3, 1, 1, 0, &error);
v28 = v27;
if ( !error )
{
v29 = il2cpp::os::MemoryMappedFile::Map(v27, 0i64, 0i64);
il2cpp::os::File::Close(v28, &error);
if ( !error )
goto LABEL_45;
il2cpp::os::MemoryMappedFile::Unmap(v29, 0i64);
}
(if we didn’t have the symbols, we could just run ProcMon against the project and follow the stack trace as before)
It would indeed seem that the developers who obfuscated Honkai Impact added an extra call to fetch the file size,
and an else branch to call the decryption function if the file was mapped successfully.
Tip: Mastering IDA keyboard shortcuts can dramatically improve your productivity.
Here are the shortcuts I used for the session above:
Jump to virtual address:
G, type the address, label or function name, Enter
Jump to start of current function:
CTRL+P, Enter
Rename symbol:
N, type the symbol name, Enter
Decompile current function:
F5
Change variable type:
place cursor on the variable, Y, input the type declaration, Enter
Navigate in visited function history:
forward and back buttons on mouse
View cross-references to function:
place cursor on the function name, X