We all know that computers are pretty cool, but we often underestimate just how extraordinary they truly are. Only a few decades ago, many of the tasks we perform on computers today would have been deemed impossible feats of magic.
Today, we have the ability to access the entirety of human knowledge within seconds, communicate instantaneously with people across the world, post Rust memes on Twitter (and bluesky), control spacecraft traversing the solar system, and accomplish countless other incredible things thanks to the computers we've created.
But have you ever paused to consider the intricate inner workings of these machines? Have you ever wondered, "How do computers really work π€"? Perhaps you're already familiar with computers, perhaps you're even a software developer or programmer. Do you actually understand the underlying technology that underpins the software you write?
While it's not necessary to comprehend every facet of computer architecture to exploit their potential and tackle complex problems, understanding the underlying principles of computing is an enjoyable and enlightening experience that ultimately enhances your ability to use computers effectively, and write better software.
Endless possibilities await you as you deepen your understanding of the fundamental aspects of computing! With this knowledge, you'll have the power to bring even the most complex programs to life, and who knows, you might be inspired to create your very own operating system, programming language, or Twitter clone. The only limit is your imagination, and excitement is guaranteed!
While I strive to impart as much knowledge as possible, I understand that I cannot teach you everything you need to know. You must find your own way in this vast world of computers, discovering every bit and every byte along the way. My goal is to inspire you and show you that it is not as hard as it seems. Here's a poem to inspire you:
βinvent yourself and then reinvent yourself, don't swim in the same slough. invent yourself and then reinvent yourself and stay out of the clutches of mediocrity.
invent yourself and then reinvent yourself, change your tone and shape so often that they can never categorize you.
reinvigorate yourself and accept what is but only on the terms that you have invented and reinvented.
be self-taught.
and reinvent your life because you must; it is your life and its history and the present belong only to you.β
β Charles Bukowski, The Pleasures of the Damned.
This tutorial is targeted at programmers with zero prior knowledge of machine code and low-level programming concepts. There may be over-simplifications, and certain explanations may not be the best way to present the concepts discussed. I try my best to be as accurate as possible, while maintaining simplicity. I am not an expert, just a software hobbyist.
Let us begin by asking the question "How do computers work"?
Yes!! programmers write software, but they do so by using other software called operating systems, programming language compilers, interpreters, code editors, IDEs, and web browsers, all of which are software made by other programmers.
I know right? they pop up everywhere. Say you have a computer with no software at all. No operating system, web browser, programming language, absolutely nothing but hardware. How would you get the computer to do something like show a text on the screen or play a tone from the speaker? In other words, what is the most hello-worldy software that a computer can execute?
Yes!! machine code!
Sorry to interrupt, but I'm quite enthusiastic about this stuff. Modern computers are electronic devices, and as such they operate only by channeling electricity around their electronic components. Let us do a quick history lesson.
In the early days, machines were built primarily to simplify and streamline daily tasks. Mundane and repetitive tasks are a part of everyday life, and people sought a way to automate these tasks so they could focus on more significant endeavors. However, these machines were inflexible and rigid, making them unsuitable for complex tasks that require decision-making abilities.
The notion of a "programmable" machine emerged as a solution to this problem. One of the most significant contributions to this idea was made by Alan Turing, a British mathematician who lived in the early 20th century. Turing introduced the idea of a "Turing machine," a theoretical device capable of performing any calculation that a human mathematician could accomplish.
Good question. Imagine a machine with a long strip of paper that has a series of symbols written on it. The machine has a "read-write head" that can move back and forth along the strip of paper, reading the symbols and writing new ones.
The machine also has a set of rules that tell it what to do based on the symbol it's currently reading.
For example, let's say that the machine is reading a symbol that says "add." The machine would then look at the next few symbols on the strip of paper and perform an addition operation based on those symbols. Then it would write the result back onto the strip of paper and move the read-write head to the next symbol.
It is. This idea paved the way for the development of early computers, which were essentially just large, mechanical Turing machines. These machines could be programmed to perform a wide variety of tasks, from basic arithmetic to more complex operations like sorting and searching.
The idea of a Turing machine was revolutionary at its inception, but as such machines were scaled up, they got more difficult to build, maintain and operate. At the time, machines were mostly constructed as mechanical devices. There were no screens or keyboards used, just hard metallic parts and vacuum tubes with switches and knobs. Such machines were only used for scientific and military purposes as they were very expensive to build and needed a lot of expertise to operate.
Soon enough, scientists began working on electronic Turing machines that could harness the power of electricity to perform the same tasks done by their mechanical counterparts.
At this point, the word "computer" was already being used to refer to these machines.
Because they were mostly used for mathematical "computations".
With the advent of electronic computers, a new technology emerged that would revolutionize the field of computing: transistors. They replaced the bulky vacuum tubes and made electronic computers smaller, more reliable, and cheaper to build.
Transistors are tiny electronic switches that can be turned on or off to represent the binary digits of 0 and 1. Now Imagine you have a small stream of water that you can control with a series of gates. When the gate is closed, the water can't flow through, and when it's open, the water can flow freely. This is similar to how transistors are used in computers.
Instead of a stream of water, we have a flow of electricity that can be controlled by the gates. When the transistor gate is closed, no electricity can pass through, and when it's open, electricity can flow freely. These gates are very tiny and can be turned on or off millions of times per second, allowing the computer to perform complex operations quickly.
Just like how you can direct the flow of water in different directions by opening and closing different gates, the computer can direct the flow of electricity through different transistors to perform different operations. By combining these simple operations, the computer can perform more complex tasks.
In the same way that gates in a water channel can be combined to create different water flow patterns, electronic circuits made up of transistors can be combined to create logic gates that perform simple operations like addition and subtraction.
There are a few different types of logic gates, but two of the most common are the AND gate and the OR gate. The AND gate only produces a "1" output when both of its inputs are "1", and produces a "0" output in all other cases. The OR gate produces a "1" output if either of its inputs are "1", and produces a "0" output only when both inputs are "0".
By combining these simple logic gates, we can create more complex operations. For example, we can use an XOR (exclusive OR) gate to perform addition. An XOR gate produces a "1" output only when one of its inputs is "1" and the other input is "0". By connecting multiple XOR gates together, we can perform addition.
For example, let's say we want to add the binary numbers 101 and 010 together. We can represent these numbers using transistors and use XOR gates to add them together, like this:
1 0 1
XOR XOR XOR
0 1 0
-----------------
1 1 1
Of course, this is a simplified example and real computers use much more complex circuits to perform addition and other operations, but the basic idea is the same.
That is absolutely right. Once you have addition, you can add multiple times to get multiplication. Now I would like to go into more detail about how an electronic computer works by building one with you, but implementing a real computer may be too complex.
Brilliant idea Dino! Brilliant! Let us call it COM1 as in "Computer 1" since we may build even more complex imaginary computers in the future.
COM1 is a simple imaginary computer. It is a Turing machine that has a tape with an infinite number of slots for instructions, and a read/write HEAD.
[i0, i1, i2, i3, i4, i5, i6, i7, i8, i9, i10, ... ]
^
HEAD
Each slot can contain a single instruction. The HEAD will then read each instruction starting from i0, and execute them one after the other. The three dots "..." indicate that there are more slots, but they have been omitted for brevity. The instructions inside these slots can be basic tasks like addition and subtraction.
That's a good observation. This means we need a place to store numbers. A place where we can access values at any time during execution. Let's call it a memory unit
.
So how would this memory unit (MU) be structured? A good way to have it is to place another tape elsewhere, containing a series of numbers that can be accessed by the first tape.
Yeah right. Let us call it the Central Unit
or CU.
So we have a CU that looks like this:
[i0, i1, i2, i3, i4, i5, i6, i7, i8, i9, i10, ...]
^
HEAD
And an MU that looks like this:
[44, 23, 34, 27, ...]
The MU does not need a head as it is simply a bucket for storing raw numbers. Each number has an address that we can use to locate it. We can simply use the index as the address, so the above number 44 has address 0, the number 23 has address 1, the number 34 has address 2 and the number 27 has address 3, and so on.
Every possible instruction will be represented by a number because the computer can only work with numbers. Therefore, each instruction is already implemented into the hardware of the computer, and we can tell the computer what instruction to execute by giving it the corresponding number to that instruction. Here are a few example instructions:
1 -> Add the number at MU address 0 to the number in MU address 1 and save the result in MU address 2.
2 -> Add the number at MU address 3 to the number in MU address 2 and save the result in MU address 1.
Saying to the computer "Add the number at MU address 0 to the number in..." is impossible because it does not understand language. But it does understand numbers so we can use a unique number to represent every possible instruction.
If we combine everything we have described above, we can now write our first machine code for COM1.
I will give you the code in the Central Unit (CU), and the numbers in the memory unit (MU), then see if you can work out the final values that will be in the MU when all the instructions have been executed.
CU -> [1, 2, 2]
MU -> [2, 4, 1, 1]
The HEAD starts at the first instruction which is 1. This corresponds to the instruction "Add the number at MU address 0 to the number in MU address 1 and save the result in MU address 2". When this instruction is executed, the resulting state of the CU would be:
[1, 2, 2]
^
HEAD
And the state of the MU would be:
[2, 4, 6, 1]
Notice that the number in MU address 2 has changed from 1 to 6. This is because the instruction dictates that we overwrite its value and save only the result of the computation. Also note that when addressing the memory, we start counting from 0 instead of 1.
Now we step forward to the next instruction which has the value 2. The instruction 2 corresponds to "Add the number at MU address 3 to the number in MU address 2 and save the result in MU address 1". So the state of the CU after execution would be:
[1, 2, 2]
^
HEAD
And the MU would be:
[2, 7, 6, 1]
Finally, we step forward to the last instruction which has the value 2. The instruction 2 corresponds to "Add the number at MU address 3 to the number in MU address 2 and save the result in MU address 1". So the state of the CU after execution would be:
[1, 2, 2]
^
HEAD
And the MU would be:
[2, 7, 6, 1]
Notice that no value was changed in the MU? This is because we only modify the destination address, and leave the values being added together unchanged. In this case, the result being stored in the destination address of 1 was 7 and the value it contained previously was also 7 so no change is noticed.
Yeah! But did you notice anything strange in our implementation of COM1?
Not that. We previously agreed that computers only know the binary states of 1 and 0, but our instructions were being represented as normal decimal numbers, and the numbers stored in the MU were also represented as decimal numbers.
I only used decimal numbers for the convenience of explaining them. In an ideal imaginary computer, the instructions we encoded as [1, 2, 2] would look something like [0b1, 0b10, 0b10] when written in binary, and the MU we wrote as [2, 4, 1, 1] would be [0b10, 0b100, 0b1, 0b1]. The brackets are not important except for showing that the numbers are in a sequence, and the 0b
prefix is just to differentiate binary numbers from decimal numbers.
This system of separating the instructions being executed from the data being manipulated (like we did with the CU and MU) is known as the Harvard architecture. However, modern computers use a different architecture:
Von Neumann architecture is a theoretical model for a digital computer that describes how a computer should be organized and how it should function. The architecture is named after John von Neumann, a prominent mathematician and computer scientist.
The Von Neumann architecture has five main components:
Memory Unit: This component stores both data and instructions used by the central processing unit (CPU) to execute programs. The memory is divided into addressable units, each with a unique identifier or address.
Central Processing Unit (CPU): This component performs arithmetic and logical operations on data stored in memory. The CPU reads instructions from memory, decodes them, and executes them.
Input/Output (I/O) Devices: These devices allow data to be input into the computer, or output from it, such as keyboards, mice, and displays.
Control Unit: This component manages the operation of the CPU by retrieving instructions from memory and directing the flow of data between the CPU, memory, and I/O devices.
Arithmetic and Logic Unit (ALU): This component performs arithmetic operations (addition, subtraction, multiplication, and division) and logical operations (AND, OR, NOT) on data stored in memory.
Our first implementation of COM1 uses a Harvard architecture because the instructions are stored in the Central Unit (CU) and the data is stored separately in the Memory Unit (MU).
Let us now modify our imaginary computer to use a Von Neumann architecture instead.
COM1 will no longer store instructions separately from the numbers being manipulated. Instead, we will store both instructions and numbers on the same tape. The Central Unit (CU) will be responsible for finding the next instruction and executing it.
We will no longer call it the Central Unit but the Processor
from now on. Also, the tape containing both the data and instructions will be called the Memory
. Since we have combined these two tapes, we will no longer use a HEAD to specify the current instruction but use something called the Program Counter
or PC
.
Therefore, the memory
would look like this:
[op1, op2, op3, op4, op5, op6, op7, ...]
^
PC
op means "opcode". We know that instructions are represented as numbers, and data is also represented with numbers, so we give them a general name opcode
.
The flow of execution guides the computer on what should be executed as instructions and what should be used as data. As we run some examples, it will be clear how this works in practice.
The following is an example of values in the memory
of COM1:
[1, 4, 5, 2, 0, 12, 3]
^
PC
Any one of these opcodes could potentially be executed by the processor as an instruction, or used as input for an instruction. For example, the first opcode above is 1. The opcode 1 could mean "add the next two numbers together and store the result in the next position, then advance the PC by 3". If we write this in pseudocode, it would look like this:
PC + 1 = (PC + 1) + (PC + 2)
PC = PC + 3
Breaking this down, it would mean:
PC + 1 represents the next memory address after the program counter (i.e. the address of the number 4 from our above example). We will be assigning the result of the computation to this address.
(PC + 1) represents the value of the next memory address after the program counter (i.e. the number 4).
(PC + 2) represents the value of the next 2 memory addresses after the program counter (i.e. the number 5).
PC = PC + 3 means after we add the two values and save the result, we will then move the PC forward by three addresses. When this instruction has been executed, the content of the memory will then look like this:
[1, 9, 5, 2, 0, 12, 3]
^
PC
As you have seen in this new implementation, the program counter (PC) starts in memory address 0 and increments when an instruction is executed. The next instruction to be executed will depend on the location of the PC after the last instruction is done. In the above example, the number 1 was the first instruction, the numbers 4 and 5 were the input to that instruction, and the result (9) was stored in the position of the number 4.
Because the program counter was moved forward by the last instruction, the processor does not accidentally use the second and third opcodes as instructions and instead executes the fourth opcode 2
as the next instruction.
Let us define one more instruction:
Opcode 2 -> GOTO (PC + 1)
This translates to: the opcode 2 means the program counter should go to the memory address represented by the number in the next address. Writing it in pseudocode would be:
PC = (PC + 1)
Incidentally, the program counter in our previous example is currently pointing at the number 2. What do you think the memory would look like after this instruction is executed?
That is correct. The next number after the instruction 2 is 0. Therefore, the program counter goes to the address 0 which is the first instruction.
Yes!! You can see how simple instructions can be combined to make even more complex computations. If we allow the above program to continue running, the value of the address 1 will keep increasing forever.
Now COM1 looks a lot more like a real computer than an imaginary computer.
This is how modern computers work. They have a central processing unit that executes code from the memory, and use the surrounding data as input. They are more complex than this and they can have thousands of different kinds of instructions, but the principle is the same.
It is that simple, but there is one part we're getting wrong. It is misleading to use decimal numbers when writing machine code, but the machine code we've been writing so far does exactly that. Ideally, the above example machine code should be written in binary numbers to look more like what the computer sees:
[1, 100, 101, 10, 0, 1100, 11]
Understandable. It is inconvenient to write code in binary numbers, especially when we're simply representing the code on paper. Using decimal numbers to represent machine code is not so bad, but there is another number system that sits just right between the binary numbers that the machine understands, and the decimal numbers we find convenient. They are called hexadecimal numbers.
We hate binary. It's too clunky. Thankfully, hexadecimal numbers are much easier to work with than binary. Hexadecimal numbers are base 16, which means there are 16 symbols used to represent a number instead of the 2 symbols used in binary (0 and 1) or the 10 symbols used in decimal (0-9).
The symbols used in hexadecimal are the numbers 0-9 and the letters A-F. So instead of counting from 0 to 9 before moving to the next digit like in decimal, we count from 0 to F before moving to the next digit.
For example, the number 15 in decimal is represented as F in hexadecimal. The number 255 in decimal is represented as FF in hexadecimal.
Well, there's a lot to it. It is easier to convert hexadecimal numbers into binary than decimal numbers, and hex numbers are well suited for representing bytes of data which we handle very often in low-level programming. If you want to know more about the advantages of hexadecimal numbers, I have written an article that goes into more detail about it.
Now when we convert the example machine code we wrote earlier from decimal numbers to hexadecimal numbers:
Before:
[1, 4, 5, 2, 0, 12, 3]
^
PC
After:
[0x1, 0x4, 0x5, 0x2, 0x0, 0xc, 0x3]
^
PC
The 0x
before each opcode is just a prefix we use when writing hexadecimal numbers to avoid confusing them for decimal numbers.
We have theoretically designed a simple imaginary computer, but having it only on paper is not so fun. How about we make a COM1 virtual machine?
A virtual machine or VM is basically just software that tries to mimic the operation of hardware. By making a virtual machine, we essentially build a computer from the ground up using software, without having to work directly with physical hardware. It also gives us the flexibility of inspecting the operation of the computer and probing every instruction it executes.
We have developed COM1 enough to simulate it in software, so hopefully by implementing it in the Rust programming language, it will help us understand how COM1 works in practice.
If you know anything about me, then you should know that I love Rust. So why not Rust? Frankly, you don't need to know Rust to follow along, heck you don't even need to know any programming language. The goal is to illustrate how COM1 works in practice. I will keep it as simple as possible, and explain every part of the code. It'll be fun I promise.
To follow along, you need to install Rust. In this tutorial, I assume you are using a Unix-like operating system such as Linux or MacOS. If you are on Windows, you can translate the Unix commands to their Windows equivalent, or just use WSL.
Create a new Rust project:
cargo new com-1
Then move into the created directory:
cd com-1
Now if you run cargo run
in the terminal, you should see a "Hello World" printed on your screen.
The plan for this virtual machine is to read a text file containing a sequence of hexadecimal numbers, then store these numbers in an array that will represent the memory
of the virtual machine. Then it reads each number from the memory and performs some actions depending on what opcode the number represents.
To get started, we need to read the contents of the text file. Edit the main.rs
file in the src
directory and add the following:
fn main() {
let source = std::fs::read_to_string("./code.txt").unwrap();
}
We used the read_to_string()
function from the fs
module in the standard library to read a file code.txt
into a string. Whatever the file contains will simply be copied into a string variable called source
.
You can now create a code.txt
file in the root directory of the rust project:
touch code.txt
Edit the text file and add the following:
[0x1, 0x4, 0x5, 0x2, 0x0, 0xc, 0x3]
Now when the program runs, the content of the text file will be stored in the source
variable. This will be the starting value of the virtual machine's memory.
It will be difficult to work with the string as it is, so we need to deserialize it into an array of numbers. To do this, we will use the Ron library.
serde_json does not allow you to have hexadecimal numbers, so Ron is more suitable for this project.
To add Ron as a dependency, modify the Cargo.toml
file and add the following:
[package]
name = "com-1"
version = "0.1.0"
edition = "2021"
[dependencies]
ron = "0.8.0" # new
Now we can use Ron to deserialize the source:
fn main() {
let source = std::fs::read_to_string("./code.txt").unwrap();
let mut memory: Vec<u8> = ron::from_str(&source).unwrap();
}
We declared a mutable variable called memory
with the type Vec<u8>
. This type is basically an array of numbers. Then we assign the value returned from the function ron::from_str()
to it. This function takes the source
variable as an argument and returns a Vec<u8>
.
Now we have the virtual machine's memory. We declared it as mutable because we will be changing its value during execution. For sanity, we can print the memory to see its content:
fn main() {
let source = std::fs::read_to_string("./code.txt").unwrap();
let mut memory: Vec<u8> = ron::from_str(&source).unwrap();
println!("STARTING MEMORY => {:?}", memory);
}
By running cargo run
, you should see the following text printed in the terminal:
STARTING MEMORY => [1, 4, 5, 2, 0, 12, 3]
As you can see, the starting value of the memory is exactly what we wrote in the code.txt
file, but look! The hexadecimal numbers have been converted to decimal numbers. This happened while Ron was deserializing the string in source
into an array.
Now we have to define a few variables to be used later:
// after println
let clock_speed = 1000;
let memory_length = memory.len();
let mut pc = 0;
Every computer processor has a clock speed, and so does our virtual machine. The clock speed determines the number of instructions that a processor can execute per second. In other words, it indicates how quickly the processor can perform calculations and process data.
The clock speed is typically measured in gigahertz (GHz), with modern processors operating at speeds ranging from a few hundred megahertz to several gigahertz. The higher the clock speed, the faster the processor can perform computations and process data.
In this case, the clock_speed
variable we defined is in milliseconds. To simulate a clock speed, we sleep for this duration of milliseconds before executing the next instruction.
We defined memory_length
so that we know how many opcodes are in the memory for later use.
Finally, we define the program counter (PC). The pc
variable holds.... the program counter of our virtual machine. We defined it as mutable because its value will be modified during execution.
Now, the fun part:
// after pc
while pc < memory_length {
std::thread::sleep(std::time::Duration::from_millis(clock_speed));
execute(&mut pc, &mut memory);
}
We write a while loop
that repeatedly calls the execute
function which we will define later, but before every iteration, it sleeps for a few milliseconds depending on the value of clock_speed
. We previously defined the value of clock_speed
as 1000 milliseconds which is equal to 1 second.
Putting it all together, our main
function should look like this:
fn main() {
let source = std::fs::read_to_string("./code.txt").unwrap();
let mut memory: Vec<u8> = ron::from_str(&source).unwrap();
println!("STARTING MEMORY => {:?}", memory);
let clock_speed = 1000;
let memory_length = memory.len();
let mut pc = 0;
while pc < memory_length {
std::thread::sleep(std::time::Duration::from_millis(clock_speed));
execute(&mut pc, &mut memory);
}
}
Now we write a new function called execute
:
// after main function
fn execute(pc: &mut usize, memory: &mut Vec<u8>) {
let opcode = memory[*pc as usize];
match opcode {
// NOP
0 => {
*pc += 1;
}
// ADD
1 => {
memory[*pc + 1] = memory[*pc + 1] + memory[*pc + 2];
*pc += 3;
}
// GOTO
2 => {
*pc = memory[*pc + 1] as usize;
}
_ => {
panic!("Invalid opcode {}", opcode);
}
}
println!("Processor executed opcode {} - MEMORY => {:?} - PC => {}", opcode, memory, pc);
}
There you have it. A virtual machine that works just like our imaginary COM1 computer.
Before the execute
function was called, we defined some values: the memory
and the pc
. When we call the execute
function, we pass these two values as parameters.
Inside the execute
function, we first index into the memory
and store the value that the pc
is pointing at in the opcode
variable. We then use a match statement to check if the opcode matches any of the defined instructions. When it finds a match, it executes the instruction and increments the pc
by a number depending on the instruction.
I have defined three instructions: 0, 1, and 2. If an opcode does not match any of these instructions, it will match the underscore _
case which will cause the program to crash and print an error showing the opcode.
The 0 instruction is a NOP instruction which means "No operation". It does nothing except increment the pc
by 1.
The 1 instruction is our "add the next two numbers together and store the result in the next position, then advance the PC by 3" instruction.
The 2 instruction is our GOTO instruction.
When an instruction is done executing, we print the opcode that was just executed, the value of the memory
, and the value of the pc
. By printing these values, we see what changes the instruction has made. For example, if the instruction was a GOTO instruction, we should see the value of the pc
change.
Putting it all together, our COM1 virtual machine's code should look like this:
fn main() {
let source = std::fs::read_to_string("./code.txt").unwrap();
let mut memory: Vec<u8> = ron::from_str(&source).unwrap();
println!("STARTING MEMORY => {:?}", memory);
let clock_speed = 1000;
let memory_length = memory.len();
let mut pc = 0;
while pc < memory_length {
std::thread::sleep(std::time::Duration::from_millis(clock_speed));
execute(&mut pc, &mut memory);
}
}
fn execute(pc: &mut usize, memory: &mut Vec<u8>) {
let opcode = memory[*pc as usize];
match opcode {
// NOP
0 => {
*pc += 1;
}
// ADD
1 => {
memory[*pc + 1] = memory[*pc + 1] + memory[*pc + 2];
*pc += 3;
}
// GOTO
2 => {
*pc = memory[*pc + 1] as usize;
}
_ => {
panic!("Invalid opcode {}", opcode);
}
}
println!("Processor executed opcode {} - MEMORY => {:?} - PC => {}", opcode, memory, pc);
}
Now you can run it with:
cargo run
If everything works fine, we should see values printed on the terminal every 1 second. The virtual machine executes an instruction, waits for 1 second, then executes the next instruction, printing the values of the executing opcode, memory
, and pc
. You should see something like this:
STARTING MEMORY => [1, 4, 5, 2, 0, 12, 3]
Processor executed opcode 1 - MEMORY => [1, 9, 5, 2, 0, 12, 3] - PC => 3
Processor executed opcode 2 - MEMORY => [1, 9, 5, 2, 0, 12, 3] - PC => 0
Processor executed opcode 1 - MEMORY => [1, 14, 5, 2, 0, 12, 3] - PC => 3
Processor executed opcode 2 - MEMORY => [1, 14, 5, 2, 0, 12, 3] - PC => 0
Processor executed opcode 1 - MEMORY => [1, 19, 5, 2, 0, 12, 3] - PC => 3
Processor executed opcode 2 - MEMORY => [1, 19, 5, 2, 0, 12, 3] - PC => 0
Processor executed opcode 1 - MEMORY => [1, 24, 5, 2, 0, 12, 3] - PC => 3
Processor executed opcode 2 - MEMORY => [1, 24, 5, 2, 0, 12, 3] - PC => 0
Processor executed opcode 1 - MEMORY => [1, 29, 5, 2, 0, 12, 3] - PC => 3
Processor executed opcode 2 - MEMORY => [1, 29, 5, 2, 0, 12, 3] - PC => 0
Processor executed opcode 1 - MEMORY => [1, 34, 5, 2, 0, 12, 3] - PC => 3
Processor executed opcode 2 - MEMORY => [1, 34, 5, 2, 0, 12, 3] - PC => 0
Processor executed opcode 1 - MEMORY => [1, 39, 5, 2, 0, 12, 3] - PC => 3
Processor executed opcode 2 - MEMORY => [1, 39, 5, 2, 0, 12, 3] - PC => 0
Processor executed opcode 1 - MEMORY => [1, 44, 5, 2, 0, 12, 3] - PC => 3
Processor executed opcode 2 - MEMORY => [1, 44, 5, 2, 0, 12, 3] - PC => 0
Yes, it is. Recall that the machine code currently being executed by the virtual machine is written in the code.txt
file:
[0x1, 0x4, 0x5, 0x2, 0x0, 0xc, 0x3]
If we modify these values, the behavior of the virtual machine should change. Can you program it to do something different?
STARTING MEMORY => [1, 4, 5, 1, 0, 12, 2, 3]
Processor executed opcode 1 - MEMORY => [1, 9, 5, 1, 0, 12, 2, 3] - PC => 3
Processor executed opcode 1 - MEMORY => [1, 9, 5, 1, 12, 12, 2, 3] - PC => 6
Processor executed opcode 2 - MEMORY => [1, 9, 5, 1, 12, 12, 2, 3] - PC => 3
Processor executed opcode 1 - MEMORY => [1, 9, 5, 1, 24, 12, 2, 3] - PC => 6
Processor executed opcode 2 - MEMORY => [1, 9, 5, 1, 24, 12, 2, 3] - PC => 3
Processor executed opcode 1 - MEMORY => [1, 9, 5, 1, 36, 12, 2, 3] - PC => 6
Processor executed opcode 2 - MEMORY => [1, 9, 5, 1, 36, 12, 2, 3] - PC => 3
Processor executed opcode 1 - MEMORY => [1, 9, 5, 1, 48, 12, 2, 3] - PC => 6
Processor executed opcode 2 - MEMORY => [1, 9, 5, 1, 48, 12, 2, 3] - PC => 3
That's a good one! Can you explain what the computer is doing?
That is correct. The effect of that machine code is: the value at memory address 0x4 keeps increasing forever. Well, not forever because if we leave it to run for some time, the program crashes when its value is 255. This is a limitation of our virtual machine because we have defined the size of opcodes to be 8-bit large.
Yes, we can. Let us add a subtract instruction and a write instruction:
fn execute(pc: &mut usize, memory: &mut Vec<u8>) {
let opcode = memory[*pc as usize];
match opcode {
// NOP
0 => {
*pc += 1;
}
// ADD
1 => {
memory[*pc + 1] = memory[*pc + 1] + memory[*pc + 2];
*pc += 3;
}
// GOTO
2 => {
*pc = memory[*pc + 1] as usize;
}
// SUBTRACT
3 => {
memory[*pc + 1] = memory[*pc + 1] - memory[*pc + 2];
*pc += 3;
}
// WRITE
4 => {
let target = memory[*pc + 1] as usize;
let value = memory[*pc + 2];
memory[target] = value;
*pc += 3;
}
_ => {
panic!("Invalid opcode {}", opcode);
}
}
println!(
"Processor executed opcode {} - MEMORY => {:?} - PC => {}",
opcode, memory, pc
);
}
The SUBTRACT instruction is quite similar to the ADD instruction. It uses a 3 opcode and we simply replace the +
with a -
.
The WRITE instruction is for directly writing a value to an address in memory. It uses a 4 opcode, and it writes the value in the next 2 addresses after the pc
to the next address after the pc
. In pseudocode, it looks like this:
(PC + 1) = (PC + 2)
PC = PC + 3
To test this, we can write a machine code that uses both the subtract and the write instructions:
[0x3, 0x9, 0x1, 0x4, 0x2, 0xF]
When you write the above machine code in code.txt
and run it, we should see the following output:
STARTING MEMORY => [3, 9, 1, 4, 2, 15]
Processor executed opcode 3 - MEMORY => [3, 8, 1, 4, 2, 15] - PC => 3
Processor executed opcode 4 - MEMORY => [3, 8, 15, 4, 2, 15] - PC => 6
First, we see that the 3 opcode is executed, which is a subtract instruction. It subtracts 1 from 9 and writes the result in memory address 1. Therefore memory address 1 which had a value of 9 at the start of the program now has a value of 8. Then the PC is incremented to a value of 3.
The opcode under the PC is now 4. This corresponds to the write instruction which writes the value 15 to the memory address 2. We can see that the value in that address is now 15, and the PC has been incremented to a value of 6. The program now exits as there are no more instructions to execute.
I used whatever number I could think of since it is just an imaginary computer. However, if you are programming a real computer, the instructions have been pre-determined by the manufacturer of the processor. The set of instructions a processor uses and understands is known as the Instruction set
of the processor.
An Instruction Set Architecture (ISA) is a term used to describe the set of instructions and features that a particular processor supports. An ISA is essentially the interface between the hardware and the software that runs on the processor. It defines how a processor executes a set of instructions, how data is stored and manipulated, and how input and output operations are performed.
There are many different ISAs in use today, ranging from simple embedded processors to complex server-class CPUs.
Each ISA has its own set of advantages and disadvantages. Some ISAs are designed to be simple and efficient, with a small set of instructions that can be executed quickly. These ISAs are often used in embedded systems and other applications where low power consumption and small size are important factors. Other ISAs are designed to be more complex and feature-rich, with a larger set of instructions that can be used to implement more advanced algorithms and data structures. These ISAs are often used in high-performance computing and other applications where speed and functionality are more important than power consumption.
One of the advantages of an ISA is that it provides a standardized interface between hardware and software. By providing a set of instructions that abstract away the details of how the hardware actually works, an ISA allows software developers to focus on the higher-level algorithms and data structures that are central to their application.
Think of the instruction set of a processor as a table with all of the processor's opcodes, and what each one of those opcodes means to the processor.
Some of the most common ISAs are:
x86: The x86 ISA is perhaps the most well-known and widely used ISA in the consumer computing industry. If you're reading this tutorial on a laptop or desktop PC, it probably has a processor with an x86 ISA.
ARM: The ARM ISA is another widely used ISA in the consumer computing industry, especially in mobile devices such as smartphones and tablets. It is known for its low power consumption and efficiency, which makes it well-suited for portable devices. If you are reading this tutorial on a mobile phone or tablet, it very likely has a processor with an ARM instruction set.
PowerPC: The PowerPC ISA is used primarily in IBM servers and high-performance computing applications. It is known for its high performance and reliability, and is used in many applications that require high computational power. Examples of popular consumer computers that use the PowerPC ISA include older Apple Macintosh computers, such as the Power Mac G5 and the iMac G5.
MIPS: The MIPS ISA is used in a variety of embedded systems and other specialized applications, such as routers and modems. It is known for its simplicity and low cost, which makes it well-suited for applications that require a small, low-power processor. Examples of popular devices that use the MIPS ISA include the PlayStation 2 gaming console and various Cisco networking devices.
RISC-V: The RISC-V ISA is an open-source ISA that is gaining popularity in the computing industry. It is designed to be simple and modular, which makes it well-suited for a wide range of applications, from small embedded devices to high-performance computing. The RISC-V ISA is still relatively new, but its open-source nature and potential for customization make it an exciting development in the world of computing.
Notice that Instruction Set Architecture
, Architecture
, and Instruction Set
are sometimes used interchangeably.
For now, we will suspend our work on the COM1 imaginary computer and focus on the x86 architecture that real computers use, then subsequently we will explore more architectures such as RISC-V and ARM.
The x86 architecture is one of the most widely used for computer processors, with a history that dates back to the early days of personal computing. It has a complex and diverse instruction set, with hundreds (or thousands) of opcodes that enable a wide range of operations, from basic arithmetic to advanced system-level tasks.
The x86 instruction set is infamously known to be overly complicated with a f*** ton of instructions. The complexity of x86 is further compounded by the requirement to maintain compatibility with earlier versions of the architecture.
While preserving legacy code is certainly advantageous, it can also create a considerable burden for programmers who work with the architecture. Achieving a delicate balance to ensure compatibility with older software and hardware systems can be challenging.
For this reason (and many others), directly using the opcodes of a processor to write machine code is time-consuming, and impractical. This is not a problem specific to x86 but it is more profound with it. If only there was a way for us to write machine code without actually writing opcodes in hexadecimal numbers π€.
What if we could simplify the process of writing machine code such that we could easily understand what a code is doing just by looking at it?
We can achieve this by creating a special program that takes in a series of words like ADD, SUBTRACT, WRITE as input, and generates machine code that corresponds to these words. We refer to these short words as mnemonics
.
If you recall the COM1 virtual machine we built earlier, the opcode for the ADD instruction was 1. Therefore, we can write a program that takes in a file containing mnemonics instead of opcodes, and generates a new file containing the corresponding opcodes of those mnemonics. If it encounters an ADD mnemonic, it outputs a 1 opcode.
We can then define a set of standard mnemonics that correspond to all the opcodes of an instruction set. This set of mnemonics and the rules for how they are converted to machine code is known as an "Assembly language".
That is correct. There is an x86 assembly language, there is an ARM assembly language, and so on. The program that converts the mnemonics of an assembly language into machine code is known as an assembler
.
Yes! The first assembler may be written manually with the opcodes of a processor, and then using that first assembler, more advanced assemblers can be built.
In software development, the process of using a primitive and old tool to make a better tool is known as bootstrapping
. Just like you use a not-so-sharp tool to make a sharper tool, and then use that sharper tool to make an even sharper tool, you can use raw machine code to make a limited assembler, and then use that assembler to make a more advanced assembler.
Luckily for us, other programmers have done the dirty work of defining assembly languages and making assemblers for them, so we only need to have an assembler to write machine code for a particular processor architecture.
Yes, you can, but only if they both have the same architecture. If you write x86 assembly code, you cannot run the resulting machine code on a computer with an ARM architecture.
As much as I would love to continue this tutorial, it is getting quite lengthy. I tell you what, how about we make this a series? If you would like me to make this a low-level programming series, tweet at me or message me on my discord server. I would love to hear your feedback and answer your questions. Thanks for reading, and see you in the next one.
Also, Consider donating, or becoming a sponsor to help me continue making open-source software, and writing cool programming content.