development-programming-assembly.html

* created: 2025-05-20T16:08
* modified: 2025-07-31T22:29

title

Assembly

description

Assembly represents machine-level instructions in a human readable form. They commonly used as a foundation for higher-level languages. Writing assembly requires knowledge about the processors instruction set.

Assembly

An assembly language is used as a middle man between higher level-languages like Rust or C and machine code. This is done by first compiling these higher level languages into assembly, which then gets translated into binary by an assembler.

Assembler and their respective assembly languages are hardware specific; you need to write different assembly for different CPU architectures because they use different instruction sets.

Practical example

The following code represents a program that adds two numbers together:

LDA 100
ADA 101
STA 102

LDA 100: LoaD Accumulator 100 => loads the value stored at memory address 100 into the accumulator.

ADA 101: ADd Accumulator 101 => add the value from the accumulator and the memory address 101 together.

STA 102: STore Accumulator 102 => writes the resulting value from the accumulator into the memory address 102.

The control unit is responsible for the fetch, decode, fetch operands, execute and write back cycle during the execution. This cycle happens for each individual instruction.

Decoding Machine Code

To represent machine code we need to implement some sort of encoding. A simple way of encoding could look like this:

Instruction Table

Opcode	Num	Operand	Assembler
001	1	000010	LOAD #2
010	0	001101	STORE 13
001	1	000101	LOAD #5
010	0	001110	STORE 14
001	0	001101	LOAD 13
011	0	001110	ADD 14
010	0	001111	STORE 15
111	0	000000	HALT

Opcode Encoding

Code	Instruction
000	NOOP
001	LOAD
010	STORE
011	ADD
100	SUB
101	EQUAL
110	JUMP
111	HALT

Number Bit

Num	Meaning
0	Operand is memory address
1	Operand is an immediate value (#)

Addressing commands

If we address byte wise; which means that every byte gets its own address, we would need two bytes per address (16 bit). Since our command is 12 bit long and we can only access every second bit, adding 4 bytes of padding after every command.

In practice this would look like:

0000
0010
0100
0110
...

Assembling for RISC and CISC

To illustrate how converting assembly works, we first need to understand the differences between both architectures.

RISC: Uses simpler, fixed-length instructions; faster execution per instruction, but may need more instructions.
CISC: Uses complex, variable-length instructions; can do more in a single instruction, but may be slower per instruction.

Out of these differences, the following properties emerge. The assembler can encode RISC instructions into fixed lengths (often 32 bit) binary instructions, which are in a very predictable format. When it comes to CISC the assembler has to encode more complex instructions, which results in corresponding binary that has instructions that are 3-6 bytes long depending on the specifies.