development-programming-assembly.html
* created: 2025-05-20T16:08
* modified: 2025-07-31T22:29
title
Assembly
description
Assembly represents machine-level instructions in a human readable form. They commonly used as a foundation for higher-level languages. Writing assembly requires knowledge about the processors instruction set.
Assembly
An assembly language is used as a middle man between higher level-languages like Rust or C and machine code. This is done by first compiling these higher level languages into assembly, which then gets translated into binary by an assembler.
Assembler and their respective assembly languages are hardware specific; you need to write different assembly for different CPU architectures because they use different instruction sets.
Practical example
The following code represents a program that adds two numbers together:
LDA 100
ADA 101
STA 102
LDA 100
: LoaD Accumulator 100 => loads the value stored at memory address 100 into the accumulator.
ADA 101
: ADd Accumulator 101 => add the value from the accumulator and the memory address 101 together.
STA 102
: STore Accumulator 102 => writes the resulting value from the accumulator into the memory address 102.
The control unit is responsible for the fetch, decode, fetch operands, execute and write back cycle during the execution. This cycle happens for each individual instruction.
Decoding Machine Code
To represent machine code we need to implement some sort of encoding. A simple way of encoding could look like this:
Instruction Table
Reserve |
Opcode |
Num |
Operand |
Assembler |
000000 |
001 |
1 |
000010 |
LOAD #2 |
000000 |
010 |
0 |
001101 |
STORE 13 |
000000 |
001 |
1 |
000101 |
LOAD #5 |
000000 |
010 |
0 |
001110 |
STORE 14 |
000000 |
001 |
0 |
001101 |
LOAD 13 |
000000 |
011 |
0 |
001110 |
ADD 14 |
000000 |
010 |
0 |
001111 |
STORE 15 |
000000 |
111 |
0 |
000000 |
HALT |
Opcode Encoding
Code |
Instruction |
000 |
NOOP |
001 |
LOAD |
010 |
STORE |
011 |
ADD |
100 |
SUB |
101 |
EQUAL |
110 |
JUMP |
111 |
HALT |
Number Bit
Num |
Meaning |
0 |
Operand is memory address |
1 |
Operand is an immediate value (#) |
Addressing commands
If we address byte wise; which means that every byte gets its own address, we would need two bytes per address (16 bit). Since our command is 12 bit long and we can only access every second bit, adding 4 bytes of padding after every command.
In practice this would look like:
- 0000
- 0010
- 0100
- 0110
- ...
Assembling for RISC and CISC
To illustrate how converting assembly works, we first need to understand the differences between both architectures.
-
RISC: Uses simpler, fixed-length instructions; faster execution per instruction, but may need more instructions.
-
CISC: Uses complex, variable-length instructions; can do more in a single instruction, but may be slower per instruction.
Out of these differences, the following properties emerge. The assembler can encode RISC instructions into fixed lengths (often 32 bit) binary instructions, which are in a very predictable format. When it comes to CISC the assembler has to encode more complex instructions, which results in corresponding binary that has instructions that are 3-6 bytes long depending on the specifies.