Here is some toothpaste, put it back in the tube
In part 1 I listed all of the discovered table offsets, let’s pick one at random to explore this technique –
I’ll take the table beginning at 0x15C0D8C
, it starts like this:
and ends like this:
First we need to make some observations. Each table entry is likely to be 16 (0x10
) bytes;
we can see this because the layout seems to repeat every 16 bytes. If you struggle to see this, try to imagine in
your mind’s eye that each entry is four 4-byte integers named a
, b
, c
and
d
, then – remembering that the data is stored
little-endian –
you can see that table[0].b == 0x08000001
, table[1].b == 0x08000002
etc., while
table[0].d = table[1].d = table[2].d = 0x0B
etc. That is not to say the data is actually 4-byte integers –
b
here might be an 8-byte number for example – but we’re not trying to understand the data format per se.
The point is that each “b
” seems to contain a sequential value (related data), and each “d
”
is often 0x0B
(again related data), which gives us confidence that 16 bytes is the size of each entry.
Also, the actual table size is divisible by 16, and obviously the total size must be exactly divisible by the size of one entry.
Let’s consult Il2CppGlobalMetadataHeader
and see which referenced structs are 16 bytes long.
Unity Technologies very kindly commented the header struct with the names of all the structs used in each table,
as you saw above. All of these structs are in the same file il2cpp-metadata.h
, and virtually every
item in each struct is either an int32_t
or typedef
‘d to one, so given that int32_t
is four bytes long, we just need to pick out any struct that contains four fields. Here is what we find:
typedef struct Il2CppFieldDefinition
{
StringIndex nameIndex;
TypeIndex typeIndex;
CustomAttributeIndex customAttributeIndex;
uint32_t token;
} Il2CppFieldDefinition;
typedef struct Il2CppParameterDefinition
{
StringIndex nameIndex;
uint32_t token;
CustomAttributeIndex customAttributeIndex;
TypeIndex typeIndex;
} Il2CppParameterDefinition;
Some knowledge of .NET IL metadata goes a long way here, because the token
field is a dead giveaway:
every IL item (assembly, type, property, method etc.) is given a metadata token when compiled that uniquely identifies
it within its scope. The bottom 24 bits are an ID and the top 8 bits identify the token type, which you can see in this
abbrieviated definition of
CorTokenType
from the .NET Metadata Unmanaged API Reference:
typedef enum CorTokenType {
mdtModule = 0x00000000,
mdtTypeRef = 0x01000000,
mdtTypeDef = 0x02000000,
mdtFieldDef = 0x04000000,
mdtMethodDef = 0x06000000,
mdtParamDef = 0x08000000,
...
Hold the phone: parameter tokens are always 0x08xxxxxx
. The second item in each table entry above also
started with 0x08
! This perfectly fits the layout of Il2CppParameterDefinition
where
token
is the second item, so we’ve identified this table as the parameter definition table!
Its file offset should normally be located in Il2CppGlobalMetadataHeader.parametersOffset
– which is
0x58
bytes into Il2CppGlobalMetadataHeader
– and the actual offset to this table
(0x15C0D8C
) is found at offset 0xE0
from the start of global-metadata.dat
.
Since every item in Il2CppGlobalMetadataHeader
is a 4-byte integer, we can deduce that for Honkai Impact,
parametersOffset
should be the 0xE0 / 4 + 1
th = 57th item in the header, and fill it in in our new struct:
(the header is 0x158
bytes long which is 86 entries)
I’ve also added in parametersCount
here, which is quickly verified by taking the length specified in
the header after the table offset – 0x3B4AA0
in this case – adding it to the offset and verifying that
the table does in fact end at that location.
Excellent! Two down, 84 to go! Now just repeat this for all of the other tables and you’ve reconstructed the entire header
Cheer up, it could be worse: you could be playing RAID: Shadow Legends.
Tip: If you find yourself in a situation where you have to perform this kind of analysis,
note that each table you resolve helps you glean information that can be used to simplify solving the remaining tables.
In the example above, we now have a ton of valid indexes for the string table (nameIndex) and the type definition table (typeIndex),
plus we know the maximum valid index for a parameter: 0x3B4AA0 / 0x10 = 0x3B4AA – and all of the valid parameter token values.
When we come across other tables that might reference these values, we can cross-check to confirm our theses.
Note: This technique has some important caveats.
There may be tables with similar looking data and structs which cannot be easily differentiated.
Handle these by coming back to them later when you have resolved more tables, then examine some of the
values individually to see if you can cross-reference them with other tables in a way that makes sense.
Another very important caveat is that this technique also assumes the tables have not been obfuscated.
We already know that the header struct fields in this application have been reordered, and such reordering –
known as data structure layout randomization – is a common form of data obfuscation. We even saw it in our
exploration of League of Legends Wild Rift! If you are the victim of this, you won’t be able to reconstruct
the tables by correlating the metadata file contents and the IL2CPP source code.