Are we nearly there yet?

We now know how to access all of the required metadata and can begin coding an Il2CppInspector plugin (wiki documentation) to process all of this. If we had to modify the main tool’s source code it would be quite fiddly, but we have provided some APIs that enable you to make arbitrary changes to the load pipeline.

Along the way, we’re going to have to tidy up some loose ends. The full source code for the miHoYo loader plugin can be found here and it’s well-commented so I’m not going to go over every line of code here, but let’s cover the key points so you have a flavour of what’s possible.

Loader plugins are created by implementing events that are broadcast to all plugins at various stages in the load pipeline. First, we deal with loading and unloading UnityPlayer.dll when the user triggers a new load task:

// This executes when the client begins to load a new IL2CPP application
public void LoadPipelineStarting(PluginLoadPipelineStartingEventInfo info) {
    // Try to load UnityPlayer.dll
    hModule = LoadLibrary(unityPath.Value);
 
    if (hModule == IntPtr.Zero)
        throw new FileLoadException("Could not load UnityPlayer DLL", unityPath.Value);
 
    // Get the base address of the loaded DLL in memory
    ModuleBase = Process.GetCurrentProcess().Modules.Cast<ProcessModule>().First(m => m.ModuleName == Path.GetFileName(unityPath.Value)).BaseAddress;
}
 
// This executes when the client finishes loading an IL2CPP application
public void LoadPipelineEnding(List<Il2CppInspector.Il2CppInspector> packages, PluginLoadPipelineEndingEventInfo info) {
 
    // Release memory lock on UnityPlayer.dll
    FreeLibrary(hModule);
}

This code is hopefully fairly self-explanatory.

The initial decryption of global-metadata.dat described in part 1 is processed by PreProcessMetadata, using exactly the same code shown in that part. We store a copy of the decrypted metadata byte array in metadataBlob so we can pass it to GetStringFromIndex and GetStringFromLiteralIndex later.

More interesting is how we deal with the table data obfuscation described in part 2. For this, we engage the API IFileFormatStream.AddObjectMapping:

stream.AddObjectMapping(typeof(Il2CppInspector.Il2CppGlobalMetadataHeader), typeof(Il2CppGlobalMetadataHeader));
stream.AddObjectMapping(typeof(Il2CppInspector.Il2CppTypeDefinition), typeof(Il2CppTypeDefinition));
stream.AddObjectMapping(typeof(Il2CppInspector.Il2CppMethodDefinition), typeof(Il2CppMethodDefinition));
stream.AddObjectMapping(typeof(Il2CppInspector.Il2CppFieldDefinition), typeof(Il2CppFieldDefinition));
stream.AddObjectMapping(typeof(Il2CppInspector.Il2CppPropertyDefinition), typeof(Il2CppPropertyDefinition));

This instructs Il2CppInspector to replace all reads of the standard IL2CPP objects specified in the 1st set of arguments with customized versions in the 2nd set of arguments. When Il2CppInspector encounters one of these objects, it reads the customized version, generates the original version and copies any fields with matching names over. Field names that don’t match in either object are ignored. This allows you to reorder fields and skip unknown data.

We also provide some attributes to help out. Let’s look at how we deal with the header:

public class Il2CppGlobalMetadataHeader
{
    [SkipWhenReading]
    public uint signature = Il2CppConstants.MetadataSignature;
 
    [SkipWhenReading]
    public int version = 24;
 
    [ArrayLength(FixedSize = 0x28)]
    public byte[] unk;
 
    public int genericContainersOffset; // Il2CppGenericContainer
    public int genericContainersCount;
    public int nestedTypesOffset; // TypeDefinitionIndex
    public int nestedTypesCount;
    // ...

Recall that the magic bytes and IL2CPP version number are absent even from the decrypted metadata. By using the [SkipWhenReading] attribute, we can tell Il2CppInspector not to read a value from the file, but to still copy the field to the deobfuscated object. We use this here to set the correct magic bytes and IL2CPP version statically without destructively editing the file stream.

If you have a large block of data you want to ignore, you can use the [ArrayLength] attribute to tell Il2CppInspector to read a specified number of bytes (you can also use an argument to tell it to read the number of bytes specified by another field in the struct). If the field name doesn’t match a field in the original object, the read data will simply be discarded as in the case with unk above.

For the encrypted/junk data in the header we don’t use, we just interpose dummy variables:

// ...
public int windowsRuntimeTypeNamesOffset; // Il2CppWindowsRuntimeTypeNamePair
public int windowsRuntimeTypeNamesSize;
public int exportedTypeDefinitionsOffset; // TypeDefinitionIndex
public int exportedTypeDefinitionsCount;
 
public int unk5;
public int unk6;
 
public int parametersOffset; // Il2CppParameterDefinition
public int parametersCount;
 
public int genericParameterConstraintsOffset; // TypeIndex
public int genericParameterConstraintsCount;
 
public int unk7;
public int unk8;
 
public int metadataUsagePairsOffset; // Il2CppMetadataUsagePair
public int metadataUsagePairsCount;
// ...

When it comes to fetching .NET string identifiers, we implement the GetStrings event. There is a slight kink here, because we need to determine every string index used by the binary. Recall that they are actually offsets into the start of the table rather than sequential values.

String indices are stored in a variety of fields across the metadata, most but not all of which are called nameIndex. We use Linq to iterate over every image, assembly, event, field, method, parameter, property, type and generic parameter in the metadata and build a list of every index as follows:

var stringIndexes =
                    metadata.Images.Select(x => x.nameIndex)
            .Concat(metadata.Assemblies.Select(x => x.aname.nameIndex))
            .Concat(metadata.Assemblies.Select(x => x.aname.cultureIndex))
            .Concat(metadata.Assemblies.Select(x => x.aname.hashValueIndex))
            .Concat(metadata.Assemblies.Select(x => x.aname.publicKeyIndex))
            .Concat(metadata.Events.Select(x => x.nameIndex))
            .Concat(metadata.Fields.Select(x => x.nameIndex))
            .Concat(metadata.Methods.Select(x => x.nameIndex))
            .Concat(metadata.Params.Select(x => x.nameIndex))
            .Concat(metadata.Properties.Select(x => x.nameIndex))
            .Concat(metadata.Types.Select(x => x.nameIndex))
            .Concat(metadata.Types.Select(x => x.namespaceIndex))
            .Concat(metadata.GenericParameters.Select(x => x.nameIndex))
            .OrderBy(x => x)
            .Distinct()
            .ToList();

We can then call GetStringFromIndex repeatedly, iterating over stringIndexes to fetch every string.

For fetching string literals, we have another problem – determining the maximum index. Although we were able to find this by looking at global-metadata.dat by eye, the table offset and length will surely change with every version so we’d like a more general solution.

We can look at the metadata usages table (see the Metadata Usages section of this article for an explanation of this table) to determine the maximum string literal index, however we cannot do this until Il2CppInspector has finished analyzing the binary.

We start by creating a dummy implementation of GetStringLiterals:

public void GetStringLiterals(Metadata metadata, PluginGetStringLiteralsEventInfo data) {
    // We need to prevent Il2CppInspector from attempting to read string literals from the metadata file until we can calculate how many there are
    data.FullyProcessed = true;
}

We then implement PostProcessPackage – which executes after both the metadata and binary have been analyzed and the relationships between the data in the two files have been correlated with each other – and scan the metadata usages table to find the maximum string literal index used:

public void PostProcessPackage(Il2CppInspector.Il2CppInspector package, PluginPostProcessPackageEventInfo data) {
    var stringLiteralCount = package.MetadataUsages.Where(u => u.Type == MetadataUsageType.StringLiteral).Max(u => u.SourceIndex) + 1;

Now we can just implement a for loop calling GetStringLiteralFromIndex for each index.

>>Et voilà!

alt

…and that’s it! At last, the application loads into Il2CppInspector and we can enjoy the fruits of our labour. We may now finally sleep – after we reflect on what we’ve learned.

>>Takeaways for budding analysts

Don’t be intimidated. The obfuscation of an application may seem overwhelming at first, but anything that is obfuscated can be deobfuscated with enough effort. Spend time gathering a superficial overview of the moving parts before delving into any one problem area, and break the problem up into manageable blocks.

Don’t do more work than necessary. While we could certainly sit and reverse engineer every function in the application, there is really no need to do this. Focus on key functions, don’t try to understand every line of code, work at the decompiler rather than assembly level where it makes sense to do so. Observe that we don’t need to understand deobfuscation functions to be able to call them: just get the application to do the hard work for us. Observe that we can often discern what a function does merely by looking at its input and output parameters.

Strong pattern recognition skills are essential. So many of the techniques we used in this series were essentially lo-fi attacks where we just used our eyes to find clustered data, related code and areas of high and low entropy. Humans are much better at finding patterns in seemingly random data than computers: leverage on this and hone your pattern-finding skills! You can determine vast amounts of information about how an application works just by looking at its data structures.

Make educated guesses based on logical thought processes. During this exercise, we made many assumptions about the purpose of code and data based purely on context or on known common design patterns. Most of the time, our assumptions were correct. The ability to make educated guesses is a skill that takes time to develop; the more code you reverse engineer, the more you will see the same design patterns over and over again, and gradually this will become easier and easier.

Do your research. A large percentage of reverse engineering is in external research. Without knowledge of IL2CPP or access to the freely available public source code, reverse engineering this application would have been much more difficult. Use any and all available information. Refer to the Microsoft API documentation. Refer to the Intel and ARM instruction set references. Google how to perform tasks in IDA or other tools. Don’t re-invent the wheel – there is probably a tool out there that does what you need already! Learn about common encryption and obfuscation techniques so that you’re aware of them and know their strengths and weaknesses. Don’t be afraid to ask questions or make mistakes – that is a normal part of the learning process, and there is always more to learn.

Don’t blog while you’re reverse engineering. It reduces your productivity ten-fold 😎

>>Thoughts on Honkai Impact

I expect miHoYo and others creating obfuscation schemes to read this article – this section is for you.

The obfuscation in Honkai Impact is interesting. It demonstrates an awareness of the IL2CPP reverse engineering tools available and explicitly targets them using several layers of protection. In addition to what I’ve covered in this series, the game is also obfuscated with VMProtect and – as a last line of defence if all else fails – Beebyte has been applied to the .NET identifiers. This is quite the layer cake.

Unfortunately, the layers are disjoint. The three primary decryption functions are obfuscated via control flow flattening, but there is nothing to stop an attacker from just loading the DLL in isolation and calling them without having to care how they work. No assembly-level obfuscation is applied to defeat a decompiler, which makes the code quite easy to compare to an arbitrary IL2CPP project. Anti-debugging is applied when BH3.exe loads bh3base.dll, but doesn’t prevent UnityPlayer.dll from being debugged on its own.

Tricks like the il2cpp_thread_get_name decoy are a cute Easter egg, but such security by obscurity doesn’t really add anything to the reverse engineering complexity.

I examined versions 3.8 to 4.3 of the game. This series covered version 4.3, but the obfuscation and encryption algorithms are identical from version to version. This is a mistake. Additionally, hackers will likely take the path of least resistance which is in fact likely not the PC version but the mobile versions. I’m just a sucker for punishment, but you cannot apply products like VMProtect or heavy obfuscation that impacts performance to an Android app. Versions for all platforms need to be bolstered, and this is likely to be a pain point. Additionally, the Android metadata can be fed into the PC UnityPlayer.dll without issues. A tiny tweak to the encryption algorithm between the two platform builds could have mitigated this.

Overall, the obfuscation and protection feels bolted on as an afterthought, and that results in the various loopholes we exploited as we navigated around a hodgepodge of disparate forms of obfuscation. Obviously it requires significant expenditure of resources to design a product with security in mind from the outset and I for one am glad companies don’t waste too much time on this. Ultimately, obfuscation is only a delay tactic, so whether it matters is a really question of what you want to accomplish by applying it. For a paid game that typically generates most of its sales revenue in the first few weeks, slapping Denuvo on it to mitigate that for a few months may be a viable solution.

Honkai Impact is free-to-play, so we suppose that the obfuscation is to prevent cheating. In my opinion, relying on the client to enforce anti-cheat is a mistake. You can never trust the behaviour of a client or the data it sends to a server. It is better to deal with cheaters via backend design: always make sure the game server is the single source of truth, analyze incoming network data for suspicious patterns like super-human aiming ability and flag those accounts for review, create server-side honeypots that matchmake cheaters with each other and so on. Can client-side anti-cheat be a useful tool as part of a vertical solution? I believe the effect is extremely limited. Riot Games trotted out Vanguard with Valorant last year which employs a kernel mode driver reminiscent of the notorious StarForce DRM, yet cheating in Valorant is rampant. If your product is considered a high value target, you can expect it to be reverse engineered.

For companies that really want to apply client-side obfuscation despite it being essentially pointless, in-house obfuscation designs are contraindicated unless the developers have prior expertise in the field. While miHoYo was smart to target IL2CPP tools, and certainly achieved more than anyone else by far, the result was still lackluster because of the weak linkage between the different elements of obfuscation. Experienced obfuscation authors will not make this mistake. Nobody can battle-harden your application like determined hackers. The talent is out there: leverage on their expertise to produce hardened applications and have analysts try to find exploits before release.

One really interesting phenomenon is that, once the cat is out of the bag, you can’t put it back in again. We’ll see this when I examine how to break Genshin Impact with Powershell: the protection was improved from Honkai Impact, but it doesn’t matter because so much knowledge was gained by reverse engineering Honkai Impact that we could mostly just skip over all of it. Once your protection is broken, iterating on it incrementally doesn’t really help. You pretty much have to start again with an entirely new scheme.

A+ for effort, though.