Threading the needle

We start by navigating to the top of the user mode call stack, 0x7FFF4E2C3E85 in UnityPlayer.dll :

.text:00007FFF4E2C3E6C                 mov     [rbp+0D30h+anonymous_28], rax
.text:00007FFF4E2C3E73                 mov     rax, [rbp+0D30h+anonymous_69]
.text:00007FFF4E2C3E77                 mov     rcx, [rbp+0D30h+anonymous_30]
.text:00007FFF4E2C3E7E                 mov     rdx, [rbp+0D30h+anonymous_28]

.text:00007FFF4E2C3E85 movups xmm0, xmmword ptr [rax+rcx]

.text:00007FFF4E2C3E89 movups xmmword ptr [rdx], xmm0 .text:00007FFF4E2C3E8C mov rsi, [rbp+0D30h+anonymous_23] .text:00007FFF4E2C3E90 sub rsp, 20h .text:00007FFF4E2C3E94 mov r8d, 0B00h ; Size

Note that the instruction pointer (EIP for x86, RIP for x64) is incremented before it’s pushed onto the stack, so the actual instruction that triggers the call to ReadFile is the previous one, at 0x7FFF4E2C3E7E. It’s just a mov, so it is likely triggering the read call by attempting to read from an uninitialized location in the demand paged memory range. Not very interesting. We scroll up and down in this function and discover it is both huge and obfuscated using a technique called control flow obfuscation. Here is the control flow graph (CFG) for this function:

alt

Essentially this is a form of multi-level control flow flattening . In a nutshell, the function is a giant finite state machine (FSM) controlled by an arbitrarily-introduced state variable. The function loops repeatedly in its entirety, performing a very small action on each loop iteration based on the state variable, then updating the state variable. The actions are buried within many layers of if and switch statements, making it very difficult to reverse engineer by static analysis. As an analyst, I could not possibly be less excited about this diagram.

At this juncture I should note that the object of static analysis is not to determine what every line in a program does. Disassemblies often consist of millions of lines of code, and trying to weave your way through figuring out what every instruction means is a slow laborious way to accomplish nothing. Instead, we try to judge the overall purpose of functions at a slightly higher level and only delve down into the instruction level for small snippets of code that hold the greatest relevance.

One way to do this is to look at the inputs and outputs of a function rather than its actual code. Consider a 100,000-line obfuscated function which takes two integers as its input and returns one integer. If feeding in 1 and 2 produces an output of 3 every time, and feeding in 11 and 22 produces an output of 33 every time, it’s fairly safe to assume that at least in general , the function sums its two inputs and returns the total. There is no need to reverse engineer the function’s code unless it produces something that deviates from our thesis.

Immediately we have learned something useful. Our mystery function takes two arguments and returns one. In addition, we know the first argument is a pointer because v25 has been cast to const void *. The return value is stored in v0 and not referenced again until this function ends, whereupon it is passed back to the caller as the return value.

.text:00007FFF41EE075A                 xor     edx, edx
.text:00007FFF41EE075C                 call    sub_7FFF41EDD140
.text:00007FFF41EE0761                 jmp     short loc_7FFF41EE076F
.text:00007FFF41EE0763                 mov     edx, r12d       ; _QWORD

.text:00007FFF41EE0766 call cs:qword_7FFF43D74F80

.text:00007FFF41EE076C mov rsi, rax .text:00007FFF41EE076C ; } // starts at 7FFF41EE06F0

Looking again at the instruction prior to the one pointed to by the stack, this time we find an actual call, to qword_7FFF43D74F80, which is an uninitialized static value set at runtime. We know for sure this calls DoSomethingWithMetadata in UnityPlayer.dll, so we rename this address to pDoSomethingWithMetadata (the p is short for pointer), navigate to the top of the function and invoke the decompiler. The decompiled function is a couple of hundred lines long but the call to the obfuscated function is visible and looks like this:

a6 = 0;
v29 = sub_7FFF41EB3860(&a1, 3, 1, 1u, 0, &a6);
v26 = v29;
if ( !a6 )
{
  v27 = sub_7FFF41EB36A0(v29, &a6);
  v28 = v27.LowPart;
  if ( !a6 )
  {
    v25 = (const void *)sub_7FFF41EDCFE0(v26, 0i64, 0);
    sub_7FFF41EB3170(v26, &a6);
    if ( a6 )
      sub_7FFF41EDD140(v25);
    else
      v0 = pDoSomethingWithMetadata(v25, v28);
  }
}

Immediately we have learned something useful. Our mystery function takes two arguments and returns one. In addition, we know the first argument is a pointer because v25 has been cast to const void *. The return value is stored in v0 and not referenced again until this function ends, whereupon it is passed back to the caller as the return value.

We might be able to determine what v0 is by moving down in the stack once more, but first we want to try to determine the input arguments. Generally we do this by clicking on the functions around the call to see if we can establish some context – particularly if they use the same arguments or return values subsequently passed as arguments to the function of interest. It doesn’t really matter how you approach this too much, but remember we just want to get an overview of what’s happening without perfectly understanding every function. I start arbitrarily with the prior function call to sub_7FFF41EDD140 , whose only argument is the same as the first argument to the mystery function:

void __fastcall sub_7FFF41EDD140(LPCVOID a1)
{
  LPCVOID lpBaseAddress; // rbx
  void *v2; // rcx
  _QWORD *v3; // rax
 
  if ( a1 )
  {
    lpBaseAddress = a1;
    sub_7FFF41EE16C0(&unk_7FFF43D7DF50);

UnmapViewOfFile(lpBaseAddress);

v2 = qword_7FFF43D7DF58; v3 = (_QWORD *)*((_QWORD *)qword_7FFF43D7DF58 + 1); if ( *((_BYTE *)v3 + 25) ) goto LABEL_15; The full function is 36 lines but

The full function is 36 lines but all we need is line 11: this function unmaps a file from memory. By way of illustration, lines 1, 9 and 11 are the only lines I looked at and the only lines of consequence. It doesn’t matter what the rest is – it’s likely to just be error handling and other cleanup. The input argument a1 is passed to UnmapViewOfFile and that is this function’s primary purpose. In this case, IDA helps us by automatically naming the Win32 API call for us, as well as renaming v1 to lpBaseAddress – the name of the argument to UnmapViewOfFile in Microsoft’s documentation.

Experienced analysts won’t need to look this up, but if you’re not familiar with an API call, it is especially useful to refer to the official documentation. Let’s see what Microsoft says lpBaseAddress is:

A pointer to the base address of the mapped view of a file that is to be unmapped. This value must be identical to the value returned by a previous call to the MapViewOfFile or MapViewOfFileEx function.

Since this argument is the same as the first argument to the mystery function, we now know that it is a pointer to demand paged memory. The call is on the other side of the if branch to the unmap function, so a6 in the first decompilation above is likely an error flag. We rename the function, v25 and a6, as well as setting a6 to bool (we don’t bother renaming anything in the unmap function, there is no need to since we have what we needed to learn from it already and won’t be revisiting it):

*&error = 0;
v25 = sub_7FFF41EB3860(&v35, 3, 1i64);
v26 = v25;
if ( !*&error )
{
  v27 = sub_7FFF41EB36A0(v25, &error);
  if ( !*&error )
  {
    hFile = sub_7FFF41EDCFE0(v26, 0i64, 0i64);
    sub_7FFF41EB3170(v26, &error);
    if ( *&error )
      unmapFile(hFile);
    else
      v0 = pDoSomethingWithMetadata(hFile, v27);
  }
}

Before we go any further, do we have any thoughts on what the second argument – now v27 – might be? Unlike in .NET, arrays in C and C++ (including blocks of bytes) do not have a convenient Length property and are actually just raw pointers to memory locations. If you want to know the size of the array, you need to pass it as a separate argument, and that is an extremely common design pattern in C++. v27 is assigned by sub_7FFF41EB36A0 so let’s examine that function:

LARGE_INTEGER __fastcall sub_7FFF41EB36A0(void *a1, DWORD *a2)
{
  DWORD *v2; // rbx
  LARGE_INTEGER result; // rax
  LARGE_INTEGER FileSize; // [rsp+38h] [rbp+10h]
 
  v2 = a2;
  *a2 = 0;
  if ( GetFileSizeEx(a1, &FileSize) )
  {
    result = FileSize;
  }
  else
  {
    *v2 = GetLastError();
    result.QuadPart = 0i64;
  }
  return result;
}

Very straightforward, a1 is a file handle and the function gets its size with GetFileSizeEx, returning any errors in a2. Our theory is confirmed.

You can continue to flesh this out a bit if you like, depending on how much detail you need. Here is what I ended up with:

*&error = 0;
hFile_1 = fileOpen(&metadataPathname, 3, 1, 1u, 0, &error);
if ( !*&error )
{
  v27 = getFileSize(hFile_1, &error);
  metadataSize = v27.LowPart;
  if ( !*&error )
  {
    hFile = mapFile(hFile_1, 0i64, 0);
    closeFile(hFile_1, &error);
    if ( *&error )
      unmapFile(hFile);
    else
      v0 = pDoSomethingWithMetadata(hFile, metadataSize);
  }
}

It should be pretty clear by this point that this code checks that global-metadata.dat exists, gets its file size, maps it into memory, and – if there were no errors – calls our mystery function with a pointer to the start of the file in paged memory and its length.

What is the result in v0, and what happens to it when the function we’re analyzing returns to the caller? Obviously the current line of thinking is that the DoSomethingWithMetadata function decrypts the metadata file, and the return value is a pointer to the decrypted data, or perhaps the number of bytes decrypted or a result or error code.

Let’s step back for a moment. In another Il2CPP article I presented this diagram illustrating the initialization process of IL2CPP as it pertains to loading the metadata:

alt

The relevant part here is that there is a call chain that proceeds il2cpp_init() -> il2cpp::vm::Runtime::Init() -> il2cpp::vm::MetadataCache::Initialize(). There is actually one more function call before global-metadata.dat is accessed, which you can see from the source code of libil2cpp/vm/MetadataCache.cpp:

void MetadataCache::Initialize()
{

s_GlobalMetadata = vm::MetadataLoader::LoadMetadataFile("global-metadata.dat");

s_GlobalMetadataHeader = (const Il2CppGlobalMetadataHeader*)s_GlobalMetadata; IL2CPP_ASSERT(s_GlobalMetadataHeader->sanity == 0xFAB11BAF);

The function vm::MetadataLoader::LoadMetadataFile is defined in libil2cpp/vm/MetadataLoader.cpp and looks like this:

void* MetadataLoader::LoadMetadataFile(const char* fileName)
{
    std::string resourcesDirectory = utils::PathUtils::Combine(utils::Runtime::GetDataDir(), utils::StringView<char>("Metadata"));
 
    std::string resourceFilePath = utils::PathUtils::Combine(resourcesDirectory, utils::StringView<char>(fileName, strlen(fileName)));
 
    int error = 0;
    FileHandle* handle = File::Open(resourceFilePath, kFileModeOpen, kFileAccessRead, kFileShareRead, kFileOptionsNone, &error);
    if (error != 0)
        return NULL;
 
    void* fileBuffer = utils::MemoryMappedFile::Map(handle);
 
    File::Close(handle, &error);
    if (error != 0)
    {
        utils::MemoryMappedFile::Unmap(fileBuffer);
        fileBuffer = NULL;
        return NULL;
    }
 
    return fileBuffer;
}

This more or less resembles the decompiled code we just analyzed, except it would seem an else clause has been added to the final if to make that sneaky call into UnityPlayer.dll! Note that the return value of the original version of LoadMetadataFile is a pointer to the start of the mapped global-metadata.dat. Since our decompiled version of LoadMetadataFile returns the value returned by DoSomethingWithMetadata, it is almost a certainty that DoSomethingWithMetadata decrypts the metadata and returns a pointer to it, since the caller (il2cpp::vm::MetadataCache::Initialize()) will expect unencrypted data unless it has been modified too.

We don’t normally have the source code to parts of applications we’re reverse engineering so we’re quite lucky that IL2CPP is open source, but let’s imagine we don’t have that luxury. At this point I want to pull in the UnityPlayer.dll of our blank project, which we haven’t looked at yet. All the symbols are available so we can easily navigate to il2cpp::vm::MetadataLoader::LoadMetadataFile, scroll down and compare:

error = 0;
v27 = il2cpp::os::File::Open(&path, 3, 1, 1, 0, &error);
v28 = v27;
if ( !error )
{
  v29 = il2cpp::os::MemoryMappedFile::Map(v27, 0i64, 0i64);
  il2cpp::os::File::Close(v28, &error);
  if ( !error )
    goto LABEL_45;
  il2cpp::os::MemoryMappedFile::Unmap(v29, 0i64);
}

(if we didn’t have the symbols, we could just run ProcMon against the project and follow the stack trace as before)

It would indeed seem that the developers who obfuscated Honkai Impact added an extra call to fetch the file size, and an else branch to call the decryption function if the file was mapped successfully.