IL2CPP Internals: A Tour of Generated Code

This is the second blog post in the IL2CPP Internals series. In this post, we will investigate the C++ code generated by il2cpp.exe. Along the way, we will see how managed types are represented in native code, take a look at runtime checks used to support the .NET virtual machine, see how loops are generated and more!

We will get into some very version-specific code that is certainly going to change in later versions of Unity. Still, the concepts will remain the same.

Example project

I’ll use the latest version of Unity available, 5.0.1p1, for this example. As in the first post in this series, I’ll start with an empty project and add one script file. This time, it has the following contents:

using UnityEngine;
public class HelloWorld : MonoBehaviour {
private class Important {
public static int ClassIdentifier = 42;
public int InstanceIdentifier;
}

void Start () {
Debug.Log("Hello, IL2CPP!");

Debug.LogFormat("Static field: {0}", Important.ClassIdentifier);

var importantData = new [] {
new Important { InstanceIdentifier = 0 },
new Important { InstanceIdentifier = 1 } };

Debug.LogFormat("First value: {0}", importantData[0].InstanceIdentifier);
Debug.LogFormat("Second value: {0}", importantData[1].InstanceIdentifier);
try {
throw new InvalidOperationException("Don't panic");
}
catch (InvalidOperationException e) {
Debug.Log(e.Message);
}

for (var i = 0; i < 3; ++i) {
Debug.LogFormat("Loop iteration: {0}", i);
}
}
}

I’ll build this project for WebGL, running the Unity editor on Windows. I’ve selected the Development Player option in the Build Settings, so that we can get relatively nice names in the generated C++ code. I’ve also set the Enable Exceptions option in the WebGL Player Settings to Full.

Overview of the generated code

After the WebGL build is complete, the generated C++ code is available in the Temp\StagingArea\Data\il2cppOutput directory in my project directory. Once the editor is closed, this directory will be deleted. As long as the editor is open though, this directory will remain unchanged, so we can inspect it.

The il2cpp.exe utility generated a number of files, even for this small project. I see 4625 header files and 89 C++ source code files. To get a handle on all of this code, I like to use a text editor which works with Exuberant CTags. CTags will usually generate a tags file quickly for this code, which makes it easier to navigate.

Initially, you can see that many of the generated C++ files are not from the simple script code, but instead are the converted version of the code in the standard libraries, like mscorlib.dll. As mentioned in the first post in this series, the IL2CPP scripting backend uses the same standard library code as the Mono scripting backend. Note that we convert the code in mscorlib.dll and other standard library assemblies each time il2cpp.exe runs. This might seem unnecessary, since that code does not change.

However, the IL2CPP scripting backend always uses byte code stripping to decrease the executable size. So even small changes in the script code can cause many different parts of the standard library code to be used or not, depending on the situation. Therefore, we need to convert the mscorlib.dll assembly each time. We are researching better ways to do incremental builds, but we don’t have any good solutions yet.