Take me literally, not seriously

We can now repeat this process methodically to find GetStringLiteralFromIndex, but let’s not do unnecessary work. Only the string literals are M.I.A. now, and we have one imported function that we don’t know anything about.

Returning to the game binary, we search for references to this import, and once again find only one call to it. If the function really is GetStringLiteralFromIndex, this would make good sense because the IL2CPP source code only calls it in one place: il2cpp::vm::MetadataCache::InitializeMethodMetadataRange. We pull up the source code for this and decompile the function which calls the import to compare.

Snippet of the source code:

void MetadataCache::IntializeMethodMetadataRange(uint32_t start, uint32_t count, const utils::dynamic_array<Il2CppMetadataUsage>& expectedUsages)
{
   for (uint32_t i = 0; i < count; i++)
    {
        uint32_t offset = start + i;
        IL2CPP_ASSERT(s_GlobalMetadataHeader->metadataUsagePairsCount >= 0 && offset <= static_cast<uint32_t>(s_GlobalMetadataHeader->metadataUsagePairsCount));
        const Il2CppMetadataUsagePair* metadataUsagePairs = MetadataOffset<const Il2CppMetadataUsagePair*>(s_GlobalMetadata, s_GlobalMetadataHeader->metadataUsagePairsOffset, offset);
        // ...
        switch (usage)
        {
            case kIl2CppMetadataUsageFieldInfo:
                *s_Il2CppMetadataRegistration->metadataUsages[destinationIndex] = GetFieldInfoFromIndex(decodedIndex);
                break;
            case kIl2CppMetadataUsageStringLiteral:
                *s_Il2CppMetadataRegistration->metadataUsages[destinationIndex] = GetStringLiteralFromIndex(decodedIndex);
                break;
        // ...

(kIl2CppMetadataUsageFieldInfo is defined as 4 and kIl2CppMetadataUsageStringLiteral is defined as 5)

Snippet of the decompilation:

Il2CppGlobalMetadataHeader *__fastcall sub_7FFF41E81E30(unsigned int a1)
{
  v22 = -2i64;
  result = s_GlobalMetadataHeader;
  v2 = a1;
  v3 = s_GlobalMetadata + s_GlobalMetadataHeader->metadataUsageListsOffset;
  // ...
    {
      v9 = v6 + v4;
      v10 = s_GlobalMetadata + s_GlobalMetadataHeader->metadataUsagePairsOffset;
      v11 = *&v10[8 * v9];
      // ...
      switch ( v13 )
      {
        case 4:
          v18 = (s_GlobalMetadata + 8 * v14 + s_GlobalMetadataHeader->fieldRefsOffset);
          v19 = *(sub_7FFF41E81200(*v18) + 104) + 40i64 * v18[1];
          v8 = qword_7FFF43D74A38;
          result = *(qword_7FFF43D74A38 + 120);
          **(&result->unknown00 + v11) = v19;
          break;
        case 5:
          v20 = *(8i64 * v14 + qword_7FFF43D74800);
          if ( !v20 )
          {
            if ( dword_7FFF43D74AE8 > *(*v7 + 4i64) )
            {
              Init_thread_header(&dword_7FFF43D74AE8, v13, v2, v8, v22);
              if ( dword_7FFF43D74AE8 == -1 )
              {
                sub_7FFF41EE15A0(&unk_7FFF43D74AE0);
                atexit(sub_7FFF420969A0);
                Init_thread_footer(&dword_7FFF43D74AE8);
              }
            }
            v24 = &unk_7FFF43D74AE0;
            sub_7FFF41EE16C0(&unk_7FFF43D74AE0);
            v23 = 0;
            v21 = qword_7FFF43D74F90(s_GlobalMetadata, v14, &v23);
            v20 = il2cpp_string_new_len_0(v21, v23);
            *(8i64 * v14 + qword_7FFF43D74800) = v20;
            sub_7FFF41EE16F0(&unk_7FFF43D74AE0);
            v8 = qword_7FFF43D74A38;
          }
          result = *(*(v8 + 120) + 8 * v11);
          *&result->unknown00 = v20;
          break;

That looks pretty good! GetFieldInfoFromIndex and GetStringLiteralFromIndex (the original one) have been inlined by the compiler here – probably because they are only ever called once, from this function – so that is why the code in the case blocks is not a direct match, but you can examine the IL2CPP source code to see they are similar.

The code in the highlighted line calls the import, passing in the metadata file pointer, the string index (v14) and an address (v23). You may recall that the original function returns an Il2CppString*, and the API il2cpp_string_new_len on the following line takes a string and a length and creates an Il2CppString *. We can deduce from this that v23 contains the string length, and – since it is initialized to zero then passed by reference to the import – that the import sets the length. Renaming and retyping the function and variables leads to this much more readable code:

LODWORD(stringLength) = 0;
string = unityplayer_GetStringLiteralFromIndex(s_GlobalMetadata, index, &stringLength);
il2cppString = il2cpp_string_new_len_0(string, stringLength);
s_StringLiteralTable[index] = il2cppString;

Once again, we can augment our C# program to handle this as follows to fetch every string literal:

[UnmanagedFunctionPointer(CallingConvention.Cdecl)]
private delegate IntPtr GetStringLiteralFromIndex(byte[] bytes, uint index, ref int length);
 
// ...
 
var pGetStringLiteralFromIndex = (GetStringLiteralFromIndex)
                Marshal.GetDelegateForFunctionPointer(ModuleBase + 0x2CFA0, typeof(GetStringLiteralFromIndex));
 
var stringLiterals = new List<string>();
var length = 0;
 
for (uint index = 0; index < stringLiteralCount; index++) {
    var decryptedBytesUnmanaged = pGetStringLiteralFromIndex(decrypedMetadata, index, ref length);
    var str = new byte[length];
    Marshal.Copy(decryptedBytesUnmanaged, str, 0, length);
 
    stringLiterals.Add(Encoding.UTF8.GetString(str));
}

Once again, we subtract the address of the function from the module base to get the pointer offset 0x2CFA0. Notice that we use a ref parameter this time to enable us to receive the returned string length from unmanaged code. Since the returned string is not null-terminated, we have to use Marshal.Copy to copy the correct number of bytes from unmanaged memory into an array, then use Encoding.UTF8.GetString to convert the byte array to a .NET string object.

How do we find the maximum string literal index? There are a couple of ways, and I’ll discuss a version that will work with any IL2CPP binary below, but first let’s return to the metadata file that we examined in part 1. You may recall there were some blocks of data that were unaccounted for. It turns out that the encrypted block between 0x158-0x146480 contains the actual string literal data (stringLiteralDataOffset is 0x158). The offset/length table normally found at stringLiteralOffset starts immediately after this – at 0x146480. The beginning looks like this:

alt

This can be interpreted as follows: the first string literal starts at offset 0x0 and is 0x21 bytes long. The second string literal starts at 0x21 and is 0x1 byte long, and so on. The offsets are in sorted order so we can find the end of the table simply by searching for an inversion (an entry where the value is lower than the previous value):

alt

The next table begins at 0x1A7238. Since each entry is 8 bytes, we can trivially calculate the maximum index:

(0x1A7238 - 0x146480) / 8 - 1 == 0xC1B6

(the minus one term takes into account that the table is 0-indexed, so the index range is 0x00xC1B6 for a total of 0xC1B7 string literals)