development-programming-assembly.html


* created: 2025-05-20T16:08
* modified: 2025-07-31T22:29

title

Assembly

description

Assembly represents machine-level instructions in a human readable form. They commonly used as a foundation for higher-level languages. Writing assembly requires knowledge about the processors instruction set.

Assembly

An assembly language is used as a middle man between higher level-languages like Rust or C and machine code. This is done by first compiling these higher level languages into assembly, which then gets translated into binary by an assembler.

Assembler and their respective assembly languages are hardware specific; you need to write different assembly for different CPU architectures because they use different instruction sets.

Practical example

The following code represents a program that adds two numbers together:

LDA 100
ADA 101
STA 102

LDA 100: LoaD Accumulator 100 => loads the value stored at memory address 100 into the accumulator.

ADA 101: ADd Accumulator 101 => add the value from the accumulator and the memory address 101 together.

STA 102: STore Accumulator 102 => writes the resulting value from the accumulator into the memory address 102.

The control unit is responsible for the fetch, decode, fetch operands, execute and write back cycle during the execution. This cycle happens for each individual instruction.

Decoding Machine Code

To represent machine code we need to implement some sort of encoding. A simple way of encoding could look like this:

Instruction Table

Reserve Opcode Num Operand Assembler
000000 001 1 000010 LOAD #2
000000 010 0 001101 STORE 13
000000 001 1 000101 LOAD #5
000000 010 0 001110 STORE 14
000000 001 0 001101 LOAD 13
000000 011 0 001110 ADD 14
000000 010 0 001111 STORE 15
000000 111 0 000000 HALT

Opcode Encoding

Code Instruction
000 NOOP
001 LOAD
010 STORE
011 ADD
100 SUB
101 EQUAL
110 JUMP
111 HALT

Number Bit

Num Meaning
0 Operand is memory address
1 Operand is an immediate value (#)

Addressing commands

If we address byte wise; which means that every byte gets its own address, we would need two bytes per address (16 bit). Since our command is 12 bit long and we can only access every second bit, adding 4 bytes of padding after every command.

In practice this would look like:

  1. 0000
  2. 0010
  3. 0100
  4. 0110
  5. ...

Assembling for RISC and CISC

To illustrate how converting assembly works, we first need to understand the differences between both architectures.

Out of these differences, the following properties emerge. The assembler can encode RISC instructions into fixed lengths (often 32 bit) binary instructions, which are in a very predictable format. When it comes to CISC the assembler has to encode more complex instructions, which results in corresponding binary that has instructions that are 3-6 bytes long depending on the specifies.