Writing "hello world" in x86 machine code for Linux with Elfhex

Wednesday 10 June 2020

In a previous post, I described my Elfhex project, a simple machine code “assembler” which takes source files containing machine code, represented using hex bytes, and produces executable ELF binaries. While it may seem that it would be quite tedious to produce a working program using this tool, simple executables are not too hard to achieve. In this post, we will use Elfhex to write and “assemble” a simple “hello world” program in x86 machine code, targeting the Linux operating system.

To start, we can attempt to create the simplest binary that we can run properly. This program only needs to exit, however, if we just had a blank program, then the bytes “after” the end of the file (when it is loaded into memory) will be executed, probably leading to a segfault. Therefore, we need to at least have the instructions necessary to terminate the process.

First, in our Elfhex source file, we need the program declaration. This indicates what machine type (architecture) and endianness should be placed in the ELF header, and what the default alignment should be for our segments. In this case, we want to target x86 (machine type 3), with little endianness (represented with <, big endianness would be >) and an alignment of 4096 (to avoid pages overlapping). Thus, our file (which we can save with a “.eh” extension, e.g. “program.eh”) will start with:

program 3 < 4096

Now, we can write our desired x86 machine code into the source file. In Linux, the most basic way to interact with the operating system from our program is by using system calls. This is much quicker than trying to use the higher-level libraries available on Linux, as we would need to link them first, requiring many more bytes. To invoke a system call, we can issue an interrupt with 0x80 as the argument. The opcode to perform an interrupt is 0xcd, so our machine code for this will be cd 80.

When performing a Linux system call, the arguments to the call are placed in registers. The syscall number always goes in EAX, while the remaining registers hold the arguments to the specific syscall, if any. The exit syscall, which we want to invoke, has the number 1, and its only argument is the return value for the program. Lets say we want to return 0. Therefore, we need to place 1 in EAX, and 0 in EBX. An easy way to do this is with a self-xor (e.g., xor eax, eax), followed by an inc for EAX (using mov would be simple, but takes more bytes, and so the xor pattern is very common). For this, we can use opcode 0x33 for xor, and 0x40 for inc.

For its operands, the 0x33 opcode uses the so-called modR/M byte (or bytes), which represents two operands: a register, and either a register or memory address (various operations can also be performed on the address). The opcode used determines the operand order (e.g., for 0x33, the first and destination operand is the register, and the second operand is the memory—if we use 0x31, then this would be swapped). This means that unlike many other architectures, instructions in x86 can be variable length, and can also operate directly on memory locations (without the need to load everything into registers first). The modR/M byte is split up into three parts: the top (left-most) two bits (the mod) represent the type of the second (M) operand, the middle three bits (R) represent the register operand, and the remaining three represent the other operand (M), its meaning determined by the top two bits. If more information is needed (e.g., if a memory location is specified for M), then it follows in subsequent bytes.

In our case, we just want registers for both operands. The mod will therefore be 0b11, which indicates that the “M” operand will be a register. Since we want EAX for both operands, which is represented as 0b000, we thus set the middle and bottom three bits both to 0b000. Therefore, our overall modR/M byte is 0b11000000, and no additional bytes are needed. This would be 0xc0 in hex, but Elfhex allows the expression of binary literals, and so we can write this byte as =11000000b (the final character representing the base). This needs to be repeated for EBX, which is 0b011 (the register order is EAX, ECX, EDX, EBX, etc.) Therefore, for our second xor invocation, the argument will be =11011011.

0x40, our chosen inc opcode, uses a different argument scheme—the bottom three bits of the opcode itself indicate the register it is supposed to operate on. Since we want to increment EAX, the final opcode is 0x40, but if we wanted to increment EBX, for example, then it would be 0x43 (considering the order above).

Thus, we have our complete program: first, we will set the values of EAX and EBX, then we will invoke the system call interrupt.

33 =11000000b 40 33 =11011011b cd 80

In order for us to place this code in our program, however, we need to create a segment. In most architectures, programs are loaded into memory in these “segments”, and executable ELF files must contain a “segment table” which indicates the location and size of each segment in the file, and where they should be loaded into memory. Elfhex can decide on the placement of segments in memory by itself (though there are some flags to control aspects of it), and so we just need to group our bytes into segments. Therefore, our complete program will look like:

program 3 < 4096

segment text(flags: rx) {
    [_start]
    33 =11000000b
    40
    33 =11011011b
    cd 80
}

This tells the compiler to put all our bytes in a segment with both “read” and “execute” permissions. We have called it “text” here, but the name is not transferred to the output binary, and is only used when referencing labels in the segment (sections, which are a subdivision of segments and are defined in their own table in the ELF metadata, can have names, but sections are not necessary in an executable ELF file, and so Elfhex omits them altogether). We have also used the [_start] label to tell the assembler what to use for the “entry point” in the ELF header. This memory location is what is placed into the EIP (instruction pointer) register when the program is loaded, and thus should precede the first instruction.

Our program should now be complete! After writing it to a file, conventionally, as mentioned, with the .eh extension, we can compile it with Elfhex. The process should look something like (don’t forget to make the output file executable):

$ elfhex program.eh program
Assembled. Total size: 91 bytes.
$ ./program | echo $?
0

Success! We have created a working program using Elfhex ($? contains the return code of the previous command). Inspecting the hex dump of this file, we can see the bytes from the source file in the output, along with the ELF header. As can be seen, most of the file is taken up by the header, with the actual program only right at the end.

$ xxd program
00000000: 7f45 4c46 0101 0100 0000 0000 0000 0000 .ELF............
00000010: 0200 0300 0100 0000 5480 0408 3400 0000 ........T...4...
00000020: 0000 0000 0000 0000 3400 2000 0100 0000 ........4. .....
00000030: 0000 0000 0100 0000 0000 0000 0080 0408 ................
00000040: 0080 0408 5b00 0000 5b00 0000 0500 0000 ....[...[.......
00000050: 0010 0000 33c0 4033 dbcd 80 ....3.@3...

As an exercise, you can try making it return different values.

We have almost made it to the creation of our “hello world” program! Now we just need to print out the actual string. To do this we can leverage the write system call (0x04). This takes the file descriptor, string pointer, and length as arguments. In this case, we want to write to standard output (file descriptor 1). First, we need to place the string in memory. Since the string is just bytes as well, we can use the string literal syntax:

[hello] "hello, world" 0a

We have to write the newline as a literal byte, since the string syntax cannot span multiple lines. We also need to place a label that points to the start of the string. References to this will resolve to the string’s location in memory.

Now we just need to invoke the system call. To do this, we need to move 4 into EAX, 1 into EBX, <<hello>> (an absolute reference to the [hello] label) into ECX, and 13 (the length of “hello, world\n”) into EDX. To change it up, lets just use mov for this, even though it is more costly in terms of space. In this case, we want to move 32-bit “immediates”, or literal values, into the registers, so we can use 0xb8 and its successors—like 0x40, the last three bits indicate the destination register. Therefore, to move all our desired values, we can use:

b8 =4d4 bb =1d4 b9 <<hello>> ba =13d4

In this case, the =1d4 and the other literals will be substituted for the byte representation of the numbers, padded to four bytes. Since we stated our program was little-endian in the program declaration, =1d4 would be substituted with 01 00 00 00. Absolute references are always four bytes wide, so we don’t need to worry about padding there. After this, we can invoke the system call interrupt with another cd 80.

Putting this all together, we can create our complete program:

program 3 < 4096

segment text(flags: rx) {
    [_start]
    
    # print "hello, world\n"
    b8 =4d4
    bb =1d4
    b9 <<hello>>
    ba =13d4
    cd 80
    
    # exit
    33 =11000000b
    40
    33 =11011011b
    cd 80
    
    # data
    [hello] "hello, world" 0a
}

In this case, we have just placed the string in the same memory segment as our executable code. Compilers often place strings in their own segment, but memory is just memory, so we don’t need to worry about it too much at the moment.

Now let’s assemble it using Elfhex and run it:

$ elfhex program.eh program
Assembled. Total size: 126 bytes.
$ ./program
hello, world

Success! We have written a “hello world” program for our target architecture using Elfhex. Examine the contents of the output, and you should be able to identify the bytes from our source, including our “hello, world” string, at the end of the binary.

More examples of Elfhex programs for x86 Linux can be found in the samples directory of the Elfhex Github.