Reverse Engineering Adventures: Brute-force function search, or how to crack Genshin Impact with PowerShell

Today, I thought we’d have a bit of fun and show you a novel and unorthodox alternative way to find any function with known discrete inputs and an assumption about the possible outputs in a compiled application – specifically for this example, a decryption function in a game. We’re going to crack it with PowerShell.

Well, sort of… far be it from me to troll my dear readers with a clickbait title, but there is an element of truth in this. At the very least, to perform this attack your life will definitely be easier if you have some kind of scripting language on hand.

The technique I’m about to describe can be applied to any application which you know contains a specific function with particular arguments and return type, but don’t know where it is located in the binary file.

>>Problem description

You want to find a function in an executable image. You know the function arguments and return type, but the assembly code is extremely obfuscated and difficult to disassemble, trace or decompile. What can we do?

What if… we just call every address in the image until one of them spits out the expected result?

This can be considered a kind of brute-force attack where the key space is the possible address range of code in the image. It might also be thought of as a kind of reverse fuzzing, where we deliberately pass invalid arguments to every function in an application except for one – where the arguments have been pre-chosen to be valid – to distinguish it from the rest.

Your first thought might be to simply write a piece of code which loads the image and calls every address in a for loop with the desired arguments, then checks the result. Unfortunately, this won’t work. Most of the time you will be calling into the middle of a function, or in the case of architectures such as x86 which uses variable instruction length, you may also be calling into the middle of an instruction that may end up being an invalid opcode.

Many of the calls you make to valid instructions will lead to stack corruption, as the end of the function will pop values off the stack that were never pushed on at the start, since those instructions were skipped. The majority of the rest of the calls will result in the function using invalid or uninitialized register or memory data, which will lead to undefined behaviour and quite probably crash the calling process.

The way we can deal with this is by using two processes: one which calls a specified address in the image – the “client process”, and a second process which contains a for loop that repeatedly invokes the first process for each address – the “host process”. If the client process crashes, the host process simply moves to the next iteration of the loop and re-invokes it.

>>About today’s case study

(you can skip this section if you don’t care about Genshin Impact; the details are not relevant to the main exercise in the article)

miHoYo’s open world action RPG Genshin Impact – or Gacha Impact as I like to call it – has seen a surge of popularity since it was launched in September 2020. It uses more or less the same protection from automatic reverse engineering by IL2CPP tools like Il2CppInspector as Honkai Impact, but with a substantially higher level of security:

  • process memory can no longer be dumped by tools like ProcDump (in Genshin Impact 1.1)
  • assembly instruction-level obfuscation is used to prevent the code from being decompiled in both the main exe and UnityPlayer.dll
  • UnityPlayer.dll now has anti-debugging countermeasures

They’re learning! There are also a couple of minor kinks:

  • some additional initialization is required prior to the metadata decryption step. Without this, one extra XOR step will be required after decrypting global-metadata.dat to fully recover the plaintext. The initialization code is heavily obfuscated but easily mitigated by determining the required XOR transformation from the partially decrypted file (see end of article for more details)
  • the fake decoy API il2cpp_thread_get_name in the main game DLL which receives pointers to the decryption functions from UnityPlayer.dll was renamed to il2cpp_init_security (another decoy API name) and the “authentication check” was removed

I have produced an extensive four-part series on how to reverse engineer Honkai Impact for IL2CPP tooling, so refer to those articles for more details – virtually all of it is equally applicable to Genshin Impact and you can use exactly the same techniques to reverse engineer this game.

I’ll be working with Genshin Impact 1.1 today but the technique should be equally applicable to any version. Full disclosure, I did not reverse engineer it in the traditional way first, so there was no cheating involved!

>>The function

In this example, we shall look for a function which has the following signature:

uint8_t *Decrypt(uint8_t *encryptedData, uint32_t length);

The function receives a blob of encrypted data and the length of the data as its input arguments, and returns a pointer to a decrypted blob of the same length. We have the encrypted data stored in a file (which in this case is called global-metadata.dat) and therefore also know its length. We also make assumptions about the content of the decrypted data. We have to know something – anything – about the output so that we can validate any results we get to check if we have found the correct function

In this case, the input file contains repeating blocks of 0x40 bytes that are encrypted, surrounded by unencrypted areas. For example:

002408C0  8F 5F 00 00 04 00 00 00 95 5F 00 00 05 00 00 00
002408D0  8F 5F 00 00 04 00 00 00 95 5F 00 00 05 00 00 00
002408E0  8F 5F 00 00 04 00 00 00 95 5F 00 00 05 00 00 00
002408F0  8F 5F 00 00 04 00 00 00 95 5F 00 00 05 00 00 00
00240900  1E 15 01 FB 76 FE F7 35 86 B4 B2 58 A1 58 46 04
00240910  30 37 2D 6A B0 65 87 77 A2 AA 8C D6 CD 33 EE 1F
00240920  99 EB F9 B9 8E 2E 7B 98 52 77 62 FA D2 8B 73 C3
00240930  75 33 91 28 42 4D 4E 2E 49 23 C6 91 58 AE F6 F8
00240940  8F 5F 00 00 04 00 00 00 95 5F 00 00 05 00 00 00
00240950  8F 5F 00 00 04 00 00 00 95 5F 00 00 05 00 00 00
00240960  8F 5F 00 00 04 00 00 00 95 5F 00 00 05 00 00 00
00240970  8F 5F 00 00 04 00 00 00 95 5F 00 00 05 00 00 00

In this portion of the file, part of a table is encrypted. We can reasonably assume that the data at offsets like 0x240908 is 95 5F 00 00 05 00 00 00, or at the very least that the last byte of this sequence is zero if nothing else.

It doesn’t matter if you don’t have access to this insight: if the entire file is encrypted, but you know the decrypted result contains some string, some sequence of bytes, starts or ends with a particular byte etc., you can just search the output to see if it matches your criteria after each function call. This is called a known plaintext attack.

For this particular example, we don’t know for sure the function exists, but we are assuming it does based on the fact that previous, less obfuscated versions of the game have a function with the same behaviour, derived from static analysis of those versions. The encryption algorithm may be different in the image being analyzed, but that doesn’t matter as long as the function signature is the same: the goal is to decrypt the data, not to discern the algorithm itself.