Surveying the battlefield

As usual, I start by just loading the game into Il2CppInspector to see what happens:

The supplied metadata file is not valid.

This error means the global-metadata.dat file doesn’t have the expected form. Specifically it starts with the magic bytes (signature) AF 1B B1 FA followed by a 32-bit integer containing the IL2CPP version number (at the time of writing, a value from 0F-1B). This is followed up by a long list of offset/length pairs demarcating the various metadata tables in the file – learn more in this article about IL2CPP’s load process.

Here is an example of the start of global-metadata.dat from an empty project:

alt

Note that an “empty” Unity project still includes a pile of DLLs like mscorlib.dll, UnityEngine.dll and so on so it’s not really empty at all. The header ends at offset 0x110 and this is location is also the start of the first table.

global-metadata.dat for Honkai Impact 4.3 (PC version):

alt

Ouch, this doesn’t look very appetizing. At a casual glance it just looks encrypted or compressed, but there are actually some nuggets of data in here.

Bytes 0x00-0x3F don’t make any obvious sense, and neither do the bytes from 0x158 onwards, but at least some of the data from 0x40-0x157 seems to mean something. We can surmise this both from the fact there is a smattering of zeroes (low-entropy data), and that it at least vaguely resembles the metadata header from the empty project. The areas around 0x60-0x6F, 0xD8-0xDF, 0xF0-0xF7, 0x100-0x10F and 0x140-0x14F seem garbled, but the rest does seem like a set of file offsets and lengths.

You essentially have to determine this by carefully reading all the hex values by eye. Values are stored little-endian, meaning that the first byte of a value is the least significant byte (LSB) (bits 0-7), and the final byte is the most significant (MSB) (bits 24-31 in the case of 32-bit values). Given that the file is 0x0353C7DC bytes long, we can try to verify that these suspected offset/length pairs do actually make sense. The final pointer at offset 0x150 is to offset 0x033AFECC, with a length specified at offset 0x154 as 0x00188908 bytes. This means the block pointed to ends at 0x035387D4, which is indeed inside the bounds of the file.

Let’s continue our investigation by scrolling down the file to see if the whole thing is encrypted or if there is anything else in plaintext. There is a large block of garbled data starting around offset 0x158, and then around 0x14647C-0x146480, it ends and we start to see normal metadata tables again:

alt

Scrolling further, the rest of the file appears to contain normal data, except for one curious repeating pattern:

alt
alt

Every so often, there is a block of 0x40 garbled bytes in the middle of other data. After skipping around the file some more, we determine this happens like clockwork every 0x353C0 bytes.

We can determine from the offset/length lists in the header that these are not separate data structures, but embedded within valid lists. Therefore we can assume we’re looking at encryption. We can rule out trivial schemes like single-byte XOR because the encrypted blocks are high entropy (the distribution of values in the blocks is statistically even; see entropic security), so we are probably looking at strong encryption or a one-time pad (OTP) – the latter could potentially be a XOR blob (a block of random bytes to be XORed with the encrypted data to decrypt it).

Is there an OTP key hiding in the file somewhere? Looking at the second screenshot above, we might surmise (looking at the right hand three bytes on each of the four encrypted lines) that a XOR blob would contain sequential values 1E AE BE, 51 6D AD, 58 7A 03 and so on. We search for other occurrences of these in the file but come up blank.

The encryption may not be a XOR blob, or the XOR blob may be stored in the binary or an asset file, or the XOR blob may be obfuscated. On this occasion we come up empty-handed, but it’s important to exclude obvious potentially easy paths before we get our hands dirty analyzing assembly code, as it could save us a lot of time. We’re out of luck in this case though.

How far back in the file does this periodic block encryption go? The first encrypted offset we found is 0x174A40 (first of the two screenshots above), the block gap is 0x353C0 bytes. These two are exactly divisible with no remainder, therefore it’s plausible to imagine the first encrypted block starts at 0x0 – ie. the very first byte in the file. This also lines up with our earlier observation that bytes 0x00-0x3F are probably encrypted.

Let’s finish our analysis of the metadata by assessing the file’s coverage. In a normal global-metadata.dat, every byte is accounted for: that is to say, every single byte in the file is part of a header or table – there is no extraneous data. We do this by taking all of the offset/length pairs in the header and merging them together to map out all of the used regions in the file, then seeing if there is anything left over.

Why do we do this? Well, because hiding data in files is extremely common. In PE files (Windows exes and dlls), a highly common technique is to set the image size in the header to a value smaller than the true length of the file, and then add additional hidden data at the end. This data could be secret code, decryption keys or anything else.

In this case, we are aware that some of the offsets and lengths may be encrypted, but we work with what we’ve got anyway:

1C7AA8 + 1BE4E4 = 385F8C
385F8C + 4E4A8 = 3D4434
3D4434 + 382CD8 = 75710C
75710C + 9040 = 76014C
 
(16 bytes of unknown data)
 
76014C + 10C0 = 76120C
76120C + 2398 = 7635A4
7635A4 + 3F7E50 = B5B3F4
B5B3F4 + 6F50 = B62344
B62344 + CEA58 = C30D9C
C30D9C + 25044 = C55DE0
C55DE0 + 994 = C56774
C56774 + A0B60 = CF72D4
CF72D4 + 56BB8 = D4DE8C
D4DE8C + 99E8 = D57874
D57874 + 7490 = D5ED04
D5ED04 + B84 = D5F888
 
(8 bytes of unknown data)
 
15C0D8C + 3B4AA0 = 197582C
197582C + 74C = 1975F78
 
(8 bytes of unknown data)
 
1975F78 + 5C5238 = 1F3B1B0
 
(16 bytes of unknown data)
 
1F3B1B0 + 13F8 = 1F3C5A8
1F3C5A8 + 1DA4 = 1F3E34C
1F3E34C + 139200 = 207754C
207754C + 11B9A00 = 3230F4C
3230F4C + 15294 = 32461E0
32461E0 + 169CEC = 33AFECC
 
(16 bytes of unknown data)
 
33AFECC + 188908 = 35387D4

This is a breakdown of the data from 0x40-0x158.

The bytes at 0x158-0x1C7AA8, 0xD5F888-0x15C0D8C and 0x35387D4-0x3538CDC (the end of the file) are unaccounted for. We navigate to each of these offsets to see if there is anything of interest.

0x158-0x146480 contain probably encrypted data as mentioned earlier. 0x146480-0x1A7238 appear to contain a single table (we know this because it consists of a long sequence of what appears to be offsets and lengths, in ascending order). 0x1A7238-0x1AC558 contain another similar table, and so on. These look like normal metadata tables. 0xD5F888-0x15C0D8C contains the .NET symbol table (we know this because the data in this block is just human-readable strings). The most interesting block is probably the end of the file – a pointer to itself (the offset at 0x35387D4 contains the value 0x35387D4), four zeroes and then precisely 0x4000 of high entropy data – this may be encrypted data, or a decryption blob.

I haven’t included screenshots of everything here, but if your eyes are glazing over at all of these numbers right now, that’s perfectly okay: the best way to follow all of this is simply to open the metadata file into a hex editor and explore these file offsets for yourself. There is no special magic in how I determined these table boundaries: it is all determined by eye, by looking carefully and methodically for obvious patterns in the data to indicate groups of related data together in one place, and sudden changes in the data to indicate the boundaries between different kinds of data.

Let us now take a breath, step back and summarize what we’ve learned so far:

  • There are 0x40-byte blocks of unknown encryption every 0x353C0 bytes, starting most likely from the beginning of the file
  • There are some unknown pieces of data in the file header
  • A normal metadata header for this version of IL2CPP is 0x110 bytes. The header here appears to be 0x158 bytes long. The total amount of unknown data in the header is 0x40 bytes. This leaves a question mark over another 8 bytes.
  • There are three blocks of data that are unaccounted for. One contains various metadata tables and may be accounted for when we decrypt the first 0x40 bytes of the header. The second contains the string table. The third contains unknown data with a precise size of 0x4000 bytes.

Whether or not this information will actually be useful down the line is another question. As it turns out, some of it is and some of it isn’t. The key takeaway here is to just take a little bit of time to perform a superficial analysis of the data by eye and see what patterns can be spotted. Often, this insight is enough to determine a strategy to decrypt a file on its own, but in this case we’re going to need to step up our game.