Hex editing UPK files

From Nexus Mods Wiki
Revision as of 02:30, 16 April 2013 by AnUser (talk | contribs) (EndFunctionParms)
Jump to: navigation, search

Introduction

This tutorial will attempt to cover every aspect regarding hex editing upk files for XCom:Enemy Unknown. It is assumed you are already familiar with the DefaultGameCore.ini file options and that you are aware what changes can be achieved by just editting the ini file and what changes need of hex editing. If you know that, even if you don't know what hex editing is, you're in the right place.

Content

  • Section 2 Programs & Tools offers a brief description of the tools used for hex editing upk files, along with a download link and installation tips.
  • Section 3 Hex editing I: changing single values is a miscellaneous of several procedures to make a simple edit in a hex file organized in the form of a tutorial. It is aimed at beginners but it may contain useful tips for advanced modders.
  • Section 4 Hex editing II: re-writing functions and following are dedicated to advanced hex editing, and they no more take form of tutorial but are rather focused on getting as detailed and precise as possible.



Hex Editing
Hex editing is what we need to do when we want to change something in XCom:Enemy Unknown that we can't achieve by merely editing the DefaultGameConfig.ini file (DGC.ini from now on). It means accessing the .upk files (Unreal Package files) where the game stores compiled classes and functions (stored as UnrealScript bytecode - similar to machine language - not a table of values like in DGC.ini, although to look at it's not much different), and edit them by means of at least two programs. Unlike with DGC.ini, when editing the upk files we do it via its hex representation, instead of editing readable text.

Hexadecimal format
Hexadecimal (hex for short) is a numbering system where a single digit can take up to 16 values, counting from 0 to 9 and then from A to F, consecutively, so with a single digit we can express a value ranging from 0 (0) to 15 (F). Like decimal counting, when we want to express a value greater than what we can express in a digit, we use more digits!, so to express the number '16' in hex we would write '10' instead, and so on. See Hexadecimal on wikipedia for more info.

Why hex editing
Due to technical limitations on modding XCom it is currently not possible to write in it's native language because we can't actually change the size of the files, so all that we can do is changing bytes of information.

Programs & Tools

UE Explorer

Description: An Unreal Engine decompiler. For more information on decompilers in general, see this Decompiler Wikipedia article. This program lets you see the code almost as their creators wrote it, providing key information you will need to change upk files, such as getting its hex representation, etc.
Installation: Install it anywhere you want.

UPK Decompressor

Description: We need it to de-compress upk files so we can edit them.
Installation: Install it anywhere you want. You'll have to move the upk files to uncompress to it's installation folder, but just once.

XSHAPE

Description: We need it update the XcomGame.exe to run with modified upk files.
Installation: Extract the files to your game root folder. XSHAPE will update the executable's checksum value for any modified upk files. You'll need to run XSHAPE.bat after making any change to upk files. Failure to do so will cause the game to crash immediately upon launch.

HxD HEX Editor

Description: A general purpose hex editor. This is the program used to actually change the upk files.
Installation: Install it anywhere you want.

Notepad++

Description: Light-weight text editor with good search functionality and other cool stuff that is pretty useful when hex editing.
Installation: Install it anywhere you want.

UPK Extractor

Description: This program lets you extract the files from uncompressed upk files.
Installation: Install it anywhere you want. You'll have to move the uncompressed upk files to it's installation folder in order to extract the files, but just once.

WinMerge

Description: WinMerge lets you compare text for differences and such.
Installation: Install it anywhere you want.

Hex editing I: changing single values

When first attempting to hex editing a upk file, it is strongly recommended that to start by replacing a single value for another value of the same type (i.e. replacing one number with another). This chapter will try cover every step necessary to achieve this; more advanced changes to the upk files will be discussed in later chapters.

A word of advice

Hex editing your upk files is an easy way to screw up your game, so a few things you should know before jumping into it:

  • Making an incorrect change can cause the game to crash on start-up, and it won't show any error message that could tell what is wrong.
  • Making an incorrect change can also cause the game to crash when the changed code is executed. So...
  • Make back-up files before any change.
  • Document every change made so if one change is proven wrong it can be easily reverted.

Failing to do this, and continuing to blindly change hexes without direction will eventually lead the game to a point of no return - where the crash-causing error can no longer be located and the only way to fix it will be re-installing the game and thus loosing the changes made so far. There's no need to reach so far; just keep organized and save yourself a few headaches.


Preparing the ground

Uncompress upk files
Most mods, if not all, edit these two upk files: XComGame.upk and XComStrategyGame.upk. So the first recommendation is to decompress both of them using UPK Decompressor, then make a back-up of both uncompressed files and store them in a safe place (a backups folder is always useful).

Export upk files
This will allow you to search through all the upk's content using Notepad++. It is and incredibly powerful tool so it is worth using. To export the upk files: first open them in UE Explorer; then, for each file, go to the Tools menu >> Exporting >> Export Classes. (Note: there is a second "Tools" menu below the standard Windows Interface)

Browsing upk files with UE Explorer

Browsing files
When attempting to make a change to the game we need to know which function we want to edit. More specifically, we need to know the package or file (XComGame.upk or XComStrategyGame.upk most likely), then we need to know the class which has the function we are after. Just knowing the file won't help us much unless we are given a unique hex string (see below). Once we find the data we want to change, open the file with UE Explorer, select the "Objects" tab on the left (see picture below, step 1), and we'll see an unsorted list of classes within that file. Write anything on the search field below to filter results (2), and erase after writing anything to make it sorts the list alphabetically. Once you locate the desired class, click on the "+" icon to expand the class and then to expand functions (3). Clicking on any of them will display it's code on the right screen (4).

However, code can't be changed in UE Explorer - it is a decompiler, not a compiler. We need to get its hex representation and edit the upk file with the hex editor. Not only do we need to know the hex representation for the numbers we want to change (that could be deduced without the need to look at the code) but we need to get them "in context". This way, when we are actually editing the upk file we do it with absolute certainty that what we are changing is what we want to change and nothing else.

Getting function hex code
To get the function in hex just right-click on the function name in UE Explorer (4), and select "View Buffer" (5), not Table Buffer. This will open a new window with the hex for the whole function. We'll cover later how to make use of it.

[[File: |250x200px|link=4537292-1364662648.jpg]]

Manipulating hex code

Once we've located the function we want to change and we have its hex code, we'll want that hex code in a place where we can edit it. So while in "View Buffer" in UE Explorer, we click Edit, then Dump Bytes. This copies the hex buffer to the clipboard. Then, open Notepad++ and paste it into a text document. It will show each byte separated by "-", so to make it more readable we can select the whole text (Ctrl+Home, then Shift+Ctrl+End), and make a search and replace (Ctrl+H opens the search-and-replace window). There in "find what" type a hyphen, and in "replace for" enter a blank space. Finally, select replace all and it's done. It may be useful as well to break the code in lines of 16 bytes, to match UE Explorer's buffer view, but beware it may not perform searches as desired because of the line breaks.

Locating a hex value within a function

Now comes the fun part. There are three possible approaches to find a specific value inside a function hex code:

  1. Cracking down the entire function so we know exactly what is what. This isn't recommended if our aim is just to change a single value, and it will require a deeper knowledge of hex editing, but it's definitely possible.
  2. Direct search: we'll need to know the hex representation for the value we're looking for. If it's a number we can use Windows' calculator in programmer mode to get it in hex, and try searching for that number preceded by "2C" or "24" (so if we are looking for the number 17 we'll search in the hex for "2C 11" or "24 11" (If these numbers don't make sense to you check again how hexadecimal works). If we are lucky enough we'll find only one incidence or so few that judging from the order in which they appear and how far each is from the beginning we'll be able to tell which one to change. But beware when searching for such short hex strings, there may be other coincidences that in other contexts may mean a different thing, maybe a "2C 11" if preceeded by "07" or "06" may mean a completely different thing. In these cases where the search outputs many results and it's not easy to tell which one is the good one, we'll have to use next method:
  3. Looking at view buffer table:
    UE Explorer Hex Viewer
    That's right! Sometimes it's the only way we've got to get a proper hex string. Remember we find it by right-clicking on the function name and selecting "View Buffer". There we have two means to get extra info from bytes:
    1. Hovering the mouse over the hex elements: The first three lines can be ignored for now, as they represent the function header (this will be covered in further chapters). Only hex value that are underlined will show display a "pop-up hint" when the mouse is hovered over the hex value. Every few bytes it shows a label giving the decompiled description of the element examined. That is telling you where each element starts, and judging from that you can effectively know how many bytes the element takes. You'll find there are values that take more space than others (this will be covered later), but by now the only thing that matters is locating the value "in it's context".
    2. Selecting a byte: Still in UE Explorer, View Buffer, if clicking on a hex element the columns on the left will show what it could mean depending on the data type it represents, which depends on context, as we will learn.

At this point there aren't many useful tips to give. In the beginning it may convenient to trace the entire function as a way to get used to how things appear in hex (you'll see it's not the same text translated character by character into a numeric coding - it follows its own rules - but it shouldn't be too difficult to follow the function in hex in the view buffer mode (in UE Explorer) while also having the decompiled function in Notepad++ or the main view in UE Explorer). Once we are certain we've found the value we want to change, copy a longer hex string to perform a search within Notepad++ or break the code in Notepad++ into lines of 16 bytes so we can just count the line number.

Whatever the chosen method we use, our final aim is having a hex string long enough to be unique within the whole upk file. A good way to achieve this is to get a single string covering from the beginning of the function (header included) to the value we want to change, so when we search for it in the hex editor, the last highlighted character is the one to replace.

Editing upk files

At this point it is assumed we already have a unique hex string that will help us locate the value we want to change inside the upk file with no possibility of finding the wrong hex codes. To edit upk files we'll use a hex editor such as HxD, open the file we want to change, and perform a search (Ctrl+F), searching for that unique hex string. Just remember to select "Hex string" instead of "Text". If everything has been done as described here we should now have found the value we were after. Just for precaution hit the right arrow, so cursor positions itself right after the searched code, and perform the same search again. A message telling that it couldn't be found is the confirmation that the hex string was indeed unique (this check isn't necessary if the hex string included function header, since those are unique, but it doesn't hurt to take some precaution). Getting again to the value to change is just a key combo (Ctrl+Home, then Ctrl+F, then Enter).

Remember to make a backup before changing the file!

To actually edit a byte just highlight it selecting it with the mouse or searching for the text, and simply write over it, or Ctrl+B to paste-over whatever was copied into the clipboard (useful for replacing long strings), or right click on the selected text and select "Fill selection". This will prompt a window where you can input the replacement code. You'll have to close UE Explorer so the hex editor can save the file.

Check the changes

Time now to check the changes we've made back in UE Explorer. It should be easy to tell if the change has resulted in desired effect or not. If everything is ok the last step is running XSHAPE.bat, which recomputes the checksum for the upk, and writes the new value into the exe file - this lets the game run the newly modified upks. If the game doesn't crash on start up you may be 50% sure your edit will work :p Now getting serious, when we're sure of the changes we've made and checking UE Explorer shows the code modified as expected it's pretty much 100% the edit will work, unless we're changing more than just one value and got into re-writing pieces of functions - that's coming next.

Hex editing II: re-writting functions

If you've succeeded at changing several hex values, are already confident doing so and want to take it one step beyond and change bits of functions or rewritte them entirely here you'll find the information you need to achieve it.

Please bear in mind this section is still in an early stage of development, and in time it may grow in content and accuracy.

UnrealScript

UnrealScript is the Unreal Engine's scripting language. It is a hihg-level, object-oriented, strongly-typed, event-driven programming language very similar to Java and C++. It uses class single inheritance, it does not have object wrappers for primitive types, and it supports operator overloading, but not method overloading, except for optional parameters.

Programing basic concepts

Please refer to this wikipedia articles to get started:

Data Types

  1. Primitive types
    1. Int
    2. Byte
    3. Bool
    4. Float
    5. String
    6. Name
    7. Enum
  2. Reference types
    1. Object
    2. Actor
    3. Interface
    4. Class
    5. Delegate
    6. Pointer
  3. Composite types
    1. Struct
    2. Static array
    3. Dynamic array
    4. Multi-dimensional arrays
    5. Map

Refer to this document for a description of each of this elements.

Hex values

Each "element" in a function code (variables, operators, literals and other semantical or syntactical elements, such as sentence-ending tokens, etc) has a hex representation. Some of them, as we'll learn, are represented using one single byte, while some other require additional following bytes to complement them. Those elements, though, always require the same amount of bytes (each element type always has the same lenght) and the additional bytes are always placed after the relevant byte, so the first byte of an element is what allows us to identify it. The remaining bytes, if any, either represent a sort of index (functions and variables most notably) or they represent an offset or an absolute position in the code, which is based on code's Virtual Size (this will be covered later).

Here's a list of hex values.

According to this data there are elements that show different possible hex values. So far it hasn't been found any issue in using any of those.
Most, if not all of the tokens that take several bytes will be explained in detail in this document, anyway here's a reference to most common elements' hex size. Remeber that you can check token's size for yourself with UE Explorer, hoovering the mouse over the hex code in the View Buffer screen, as we'd seen before.

Tokens that take more than one byte:
Rest of the bytes represent an index:

LocalVariable = 0x00 (+4 bytes)
InstanceVariable = 0x01 (+?? bytes)
DefaultVariable = 0x02 (+?? bytes)
etc

Rest of the bytes represent an offset or position in function's virtual size:

Switch = 0x05 (+?? bytes)
Jump = 0x06 (+2 bytes)
JumpIfNot = 0x07 (+2 bytes)
Case = 0x0A (+2 bytes)
etc

Sentences & Operators

Not every element present in a function necessarily appears in the hex code exactly as it is nor in the same order or position we find it in the function. Sentences though they come one after another and in order.

Prefix'd operators

Let's talk now about how the different elements are arranged in a sentence, and we'll start taking as example a simple sum operator. To sum 4 and 5 we'd normaly write it like this: 4 + 5 usig what in mathematical terms is called "infix", since the operator (+) is in the middle, but once compiled the game treats operators differently, and places them at the beginning of the expression, using what is called "prefix", so the same operation as before would be represented as: + 4 5. It is important to understand this because elements in hex code will appear in order according to this. If you want you may think of it as the game sort of considers operators like functions that require exactly 2 parameters. So in hex there wouldn't be a "sum operation" but there would be a "sum function" which would sum next two elements.

Multiple operators

The fact that (most) operators "operate" with two elements or values doesn't mean we cannot sum three numbers, we'd just need to use two sum operators to do so. In the case of a sum, order doesn't matter, as you well know, but there are other cases where order is of vital importance. If you're already familiar with programming procedures, if using that trick of considering operators like two-parameter functions, you'll find it easy to tell the order you must write elements and operators according to this "prefix" rule. Otherwise it may help you as well to recall that every mathematical operator needs of two values, and one or both of those values could in turn be the result of other operations. If that is the case where a the value of an element used in an operation depends in turn of the result of another operation, you may consider them like they stack LIFO (Last In First Out) in the order they appear on top of pending operations. Once the one on top of the stack has it's two elements it executes the operation, so that operation can be removed from the stack and the returned value is passed to the next pending operation that is now on top of our imaginary stack, and thus until all operations are resolved. We'll expand this a little bit later but now we need to talk about another very common element before we can see some real examples.

EndFunctionParms

The EndFunctionParms token (0x16) is used to indicate that all the required arguments have been passed to a function or operator so it can execute now and retrieve whatever result it may return. This is valid either for class functions and for operators like mathematical or logical operators.

A rule of thumb to know wether some element requires an EndFunctionParams token is using that trick of considering every element a function; if it requires 2 "parameters" it will need an EndFunctionParams token after those, if on the other hand it only takes 1 "parameter" it doesn't use EndFunctionParams token. For this exercise it may help to think about the 0x2C IntConstByte item as a function that takes 1 parameter and returns an Integer value; as it only takes 1 param it doesn't use 0x16 EndFunctionParams token unlike a sum. The same is true for 0x07 JumpIfNot token (we'll see this later), etc. There is though an exception to this rule, Rand (0xA7) native function token, which takes 1 parameter and requires EndFunctionParams token 0x16. Need confirmation of the reliability of this method.

Hex code for an expression with multiple operators

Time now to see some hex code. Let's consider our last example: 4 + 5, which using this "prefix" method becomes + 4 5. Now checking the hex values list we see we can express numbers using hex tokens ByteConst(0x24) or IntConstByte (0x2C) (it isn't truly that easy to guess just looking at the list, but experience have proven those are the items to use). We can also see that we can express the sum operator as 0x92. So the operation in hex would look like: 92 2C 04 2C 05 16.
92 ------> Sum operator. Next two elements (not bytes!) ought to be the numbers to sum
2C ------> Indicates the next byte is an Integer Number
04 ------> Number 4
2C 05 -> Number 5
16 ------> End sum operation, so result is calculated.

Now let's see how we'd write in hex the following expression: 3 * (4 + 5). First we should identify the operations that take place here and the elements involved in each one. In this case there is a sum that involves two elements, which are numbers 4 and 5; and there is another operation, a multiplication, that involves two elements, number 3 and another element which is the result of previous operation. Being aware of this, knowing that operations stack LIFO and that hex code for multiplication is 0x90 we have all we need to write it out in hex, but first let's analyze how'd write it in decimal numbers complying to this "prefix" syntax. It could be written this way:
* 3 + 4 5
Here we first tell the game we want to perform a multiplication, so the game understands the next two elements will be the values to multiply. First value we comply and we give a 3 as the first element in the multiplication, and where the game expected the second number to multiply we give it instead a sum operation, so the game understands the second value that is left to complete the multiplication will have to wait until the new operation (sum) is finished, so it patiently waits. Next, as usual when an sum operation is started, the game expects next two elements to be numbers to sum. In this case we give it two numbers (no more chained operations) so the game performs the sum, and when it gets the result (9) it passes it to the pending operation, so the multiplication finaly gets the second value it was waiting for and can now complete the operation and multiply 3 by 9.

We could also write it as: * + 4 5 3 which would be expressed as (4 + 5) * 3 but it is exactly the same although quite less clear in the hex. If you look at it closely it is the same operation, we've just swapped the order of the elements in the multiplication, so first element is the sum of 4 and 5, and second element is number 3.

Ok so now let's take back previous, clearer version and let's write it in hex. It would be:

Infix: 3 * (4 + 5)
Prefix: * 3 + 4 5
Hex: 90 2C 03 92 2C 04 2C 05 16 16
90 -----> Multiplication token
2C 03 -> Number 3
92 -----> Sum token
2C 04 -> Number 4
2C 05 -> Number 5
16 -----> End sum (4 + 5)
16 -----> End multiplication 3 * 9

Let's see again the other example:
Infix: (4 + 5) * 3
Prefix: * + 4 5 3
Hex: 90 92 2C 04 2C 05 16 2C 03 16
90 -----> Multiplication token
92 -----> Sum token
2C 04 -> Number 4
2C 05 -> Number 5
16 -----> End sum (4 + 5)
2C 03 -> Number 3
16 -----> End multiplication 9 * 3

Control Structures

Jump / Goto

Conditional statements

If statement / JumpIfNot
Else statement
Switch case statement

For each loop

Data Structures

Arrays

Index accessing
Dynamic arrays

Struct

Objects

Object variables (member token)