IL2CPP Internals:

Il2CPP Reverse:

Tutorial:

Adventures:

Honkai Impact:

We came, we predicted, we scripted and decrypted

(you can skip this section if you don’t care about Genshin Impact; the details are not relevant to the main exercise in the article)

In the case of our particular case study, there are still 16 bytes of not fully-decrypted data per block. By looking through the data by eye, we can easily see that this can be resolved by a 16-byte XOR operation per block. All that is required is to derive the XOR term by manual inspection. Here are a few blocks (in Genshin Impact 1.1 they occur every 0x26700 bytes on the PC version and every 0x24B80 bytes on the Android version):

12 00 00 00 9C 27 02 00 0E 00 00 00 AA 27 02 00
A5 2F 42 30 D5 23 B2 9C 94 2A C0 BA B5 98 A7 68
0C 00 00 00 C7 27 02 00 0D 00 00 00 D4 27 02 00
...
03 00 00 00 B2 65 09 00 12 00 00 00 C4 65 09 00
BB 2F 42 30 BD 61 B9 9C 81 2A C0 BA F8 DA AC 68
06 00 00 00 FC 65 09 00 1A 00 00 00 16 66 09 00
...
00 00 00 00 FF FF FF FF 02 00 91 00 FF FF 00 00
C3 8D 00 06 B9 00 00 00 6B 33 00 00 96 1C 00 00
02 23 1B 00 55 C7 01 00 FF FF FF FF CB F4 00 00
FF FF FF FF 22 7E 00 00 D0 33 01 00 1F 00 00 00
AD 2F 42 30 98 FB 4F 63 9C 2A 56 BA F1 40 A5 68
C4 8D 00 06 B9 00 00 00 6B 33 00 00 F8 4F 00 00
02 23 1B 00 56 C7 01 00 FF FF FF FF CC F4 00 00
FF FF FF FF 7D 56 00 00 D1 33 01 00 60 00 00 00
00 00 00 00 FF FF FF FF 02 00 96 00 FF FF 00 00
...
03 00 00 00 4A F2 00 00 03 00 00 00 4D F2 00 00
03 00 00 00 50 F2 00 00 03 00 00 00 53 F2 00 00
AE 2F 42 30 31 F6 B0 9C 9E 2A C0 BA 57 4D A5 68
03 00 00 00 5C F2 00 00 03 00 00 00 5F F2 00 00
03 00 00 00 62 F2 00 00 03 00 00 00 65 F2 00 00

I chose these more or less at random. The XOR term can be derived based on the surrounding context. If there is an obvious pattern in the previous or following group of bytes, you can XOR an encrypted byte with the corresponding unencrypted byte to derive part of the term as follows:

  • Block 1: ?? 2F 42 30 ?? (23^27) (B2^02) 9C ?? 2A C0 BA ?? (98^27) (A7^02) 68
  • Block 2: no further information
  • Block 3: no further information
  • Block 4: (AE^03) .. .. .. ((3^5) << 4 + ?) .. .. .. (9E^03) .. .. .. ((5^5) << 4 + ?) .. .. ..

XOR term so far:

AD 2F 42 30 6? 04 80 9C 9D 2A C0 BA 0? BF A5 68

To find the remaining two nibbles we can save a bit of time by searching the file for parts of the term we have already found to try to find an obvious solution. We try AD 2F 42:

0 00 00 01 00 00 00 02 00 00 00 01 00 00 00 02
00 00 00 03 00 00 00 00 00 00 00 00 00 00 00 00
AD 2F 42 30 67 04 B0 9C 9D 2A C0 BA 0E BF A5 68
00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00

The non-zero numbers only occur every 4th byte so we can safely conclude the missing bytes are 67 and 0E for a final term of:

AD 2F 42 30 67 04 80 9C 9D 2A C0 BA 0E BF A5 68

And that’s it! By calling the decryption function, then XORing the first 16 bytes of each encrypted block with this value, we can now fully decrypt the metadata file.

>>Might as well JMP

The amount of code we reverse engineered in this exercise was zero. We merely used prior knowledge from a previous, less obfuscated product and a visual survey of the input data. This completely defeats all of the extra security measures employed with relative ease.

Setting up the initial test harness is a bit cumbersome, but once in place it can now be re-used for any function in any application with near-zero effort. All you need to do is change the function delegate, input arguments and output filter in the C# client code and generate a new list of addresses to scan.

Since the function we found in this particular DLL is a hidden export, the developers are going to have a hard time preventing this kind of attack. While significantly hardening the application against both static and dynamic analysis via assembly-level obfuscation and anti-debug traps, they completely neglected to consider that none of it matters if you can trivially find the entry point. It’s like building a huge layer cake with tunnel vision, and not realizing someone can just stick their finger straight into the lowest layer and lick off the chocolate.

While the developers could change the function signature, this would be a temporary workaround at best. Perhaps least bad solution here is probably to move the decryption functions into the game binary so they are not imports, and inline them in another function so there is no obvious entry point. This technique can be further improved by adding an argument to the enclosing function as a behaviour (code path) selector – perhaps merging a large collection of other functions together – so that the intent becomes very unclear. Of course, this will significantly impact the readability and maintainability of the original source code, especially if the merged functions require different argument types.

Another option could be to split the logic of the single target function into multiple functions to increase the search complexity. One will have to be aware of and find multiple functions, and call them in the correct order. The actual search time would only increase linearly with the number of additional functions; however, deriving the correct arguments for each could increase the complexity exponentially if implemented correctly. If the attacker does not know the function arguments, then the overall search space is a factor of the image size, the number of functions to find and the possible range of inputs to each.

As obfuscation developers find ever-more intricate mechanisms to deploy, so too must analysts learn to think outside the box and pursue attack vectors the obfuscation authors may not have thought of. The attack I outlined today is certainly not new and I take absolutely no credit for it, however I haven’t seen much coverage of it on the internet with practical working examples so I hope you found it insightful! Obfuscation techniques will continue to evolve inexorably, as will the tooling available to reverse engineer them. This cat and mouse game has been in progress since long before the rise of the modern internet. The cats are always on the prowl – and whenever they try something new, this mouse will be here, waiting for them.