Before I start this article, I need to define a couple of terms:

  1. Byte – This is a data type which can store 256 discrete variations. Typically its said to have a minimum value of 0 and a maximum value of 255 (thus 256 variations including the 0). All files can be considered to just be a series of bytes.
  2. Hex or Hexadecimal – Unlike a decimal or base 10 number which only allows 10 different variations per digit and must include a another digit to include numbers which exceed that range (e.g. 8, 9, 10, 11 / 98, 99, 100, 101) a hexadecimal number stores 16 variations. After 9, the first six letters of the alphabet are used (e.g. 8, 9, A, B, C, D, E, F, 10 , 11 / FE, FF, 100, 101) .

All files are divided into two categories – plain text and binary.

Plain text files as the name suggests just contains text. They cannot contain images, sound, video or any form of text styling unless they mark it up. Examples of these files include .txt, .ini, .csv, .html, .php. These files can be opened in your system’s default plain text editor such as Notepad on Windows and will load and display fine.

Plain text files load and display fine in plain text editors

Binary files however can store a much wider range of data. Your camera photos, mp3s and videos are all binary files. Rather than limit the data storage to just text, binary files can make use of a larger range of encoding which means that we can’t view these files properly in a plain text editor. This is shown in the image below, in which I’ve tried to load a bitmap image into Notepad.

By contrast, loading a binary file in a plain text editor is not a good idea

So to view and edit binary files, we clearly need a different tool. If we know the file format then we could load the file into the relevant editor, loading images into Photoshop or Paint for example. But this is no good to us if we don’t know the file format, if there isn’t a relevant editor yet or if we want to examine the internal data structure.

Hex Editors can display and edit binary data in a very helpful and effective way. They are not limited to text characters and can be used to display and edit the full range of variations in each byte. Unlike text editors which display binary data badly and don’t support changing the value of non text data, hex editors are not hindered by these problems.

XVI32 is a freeware hex editor for Windows

My personal favourite free hex editor is XVI32 which can be downloaded for free on Windows. It’s quite a lightweight and functional hex editor and while there are many hex editors out there offering a greater range of features, I like the simplicity and straightforwardness of XVI32.

In the above screen shot we can see three columns. The left and thin column is the line number displayed in hex. The number shown represents the index or offset of the first entry on that line, for example B means 11 as a decimal and if you count across the boxes in either of the other two columns you’ll see that they are also 11 boxes across. These line groupings do not exist in the actual file, this is merely just how its displayed in the editor, a bit like word wrapping text.

The middle column displays the hexadecimal view of the file, while the right column shows a ASCII or text view of the file. Each box in the hex and text views represents a byte in the file.

So why is hex useful? Why not represent the values of each byte as a decimal?

Hexadecimal numbers have the useful property that with two digits they can represent 256 discrete variations, just like a byte. So rather than use 3 digits to represent the value of each byte, we can use two digits to their full range. The minimum hex value for a byte is 0 (or 00) and the maximum hex value for a byte is FF.

To convert between decimal and hexadecimal you can use the built in calculator on Windows (or use a site like this). Depending on your version of Windows you may need to change to either scientific or programmer mode before the hex and decimal options are available. To convert a number, type it into one mode then select the other mode. For example:

In the next post, I’ll explore how to use a hex editor and look at some common data types. Before reading that post however, it would be useful to try opening a few different file types in a hex editor just to get a feel for it. It would also be very helpful to try converting a few different numbers between hex and dec.

This is the first in a series of mini articles on reverse engineering binary files, particularly those in computer games but the majority of what I will discuss has applications in all areas of computer software. While I may touch on reverse engineering executable code, memory hacking and plain text formats, the majority of the focus is on actual binary files.

Since 2009 I’ve been reverse engineering and modifying the classic Westwood RTS Dune 2000. Despite being released in 1998, the modding community was held back by a limited number of tools. For example while it was possible to edit and make new multiplayer maps for the game, it was not possible to make campaign maps and missions. A couple of guys had been exploring the mission files and had made small, but significant, discoveries. This was when I joined the modding scene and since then I’ve reverse engineered a whole range of different files in the game and released a bunch of tools based on my research.

When I started, I had no reverse engineering experience. I’d never used a hex editor and had no idea about the binary representation of data. If you’re in the same boat, then don’t worry, in this series of articles I’m going to talk about what I’ve learned in terms of both conceptual theory and also practical techniques that I employ. This series of articles is particularly aimed at computer programmers with no reverse engineering experience and who would like to get started.

As with programming, the initiative is all on you. You will apply the theory and techniques to a different set of problems and formats than I did. As such you will need to extend and modify them to fit your needs and make a few leaps on your own to successfully reverse engineer a complex format. But if you start off small and work your way up, once you understand the basic theory you’ll find reverse engineering is actually very straight forward.