very rough notes for my own reference.

Branch Prediction

Branch prediction is a way to improve the performance of branch instructions.
Always guess right (static prediction)
Dynamic predicion
Saturating counter (flip-flop)
Neural (latest!)
Do half (speculative processing)

Semaphore

A semaphore is a variable that can be accessed only through two standard atomic operations: wait and signal.
wait: If the semaphore is positive, decrement it. Otherwise, block.
signal: Increment the semaphore. If there are blocked processes, wake one of them up.
one of the simplest ways to solve the critical section problem.

Memory

DRAM: Slow, cheap, high capacity.
SRAM: Fast, expensive, low capacity.
How to make slow memory appear fast? Caching!
How to make small memory appear large? Virtual memory!

Cache

Idea: Use SRAM as a cache for DRAM.

Temporal locality: If a memory location is accessed, it is likely to be accessed again soon.
Spatial locality: If a memory location is accessed, nearby memory locations are likely to be accessed soon.

Some terminology:

Cache hit: The data is in the cache.
Cache miss: The data is not in the cache.
Hit rate: The fraction of memory accesses that are hits.
Hit time: Time to access the cache.
Miss rate: 1 - hit rate.
Miss penalty: Time to fetch data from DRAM + hit time.

Cache Organization

Cache line / block: The smallest unit of data that can be stored in the cache.
A block can store multiple words.
Address, let’s say 0x19ac80c2, can be split into
- Address of a word: [tag][index][offset]
- Address of a block: [tag][offset]
- Tag: 0x19ac8
- Index: 0x0c
- Offset: 0x2
A cache will store: [tag][valid bit][dirty bit][data (maybe 4 words)]
- Valid bit: Whether the cache line is valid.
- Dirty bit: Whether the cache line has been modified. (this is a method for write-back cache)

Cache Write Policy

If no read-miss:

Write-through: Write to both cache and memory.
Write-back: Write to cache only, and write to memory only when the cache line is evicted.

If read-miss:

Write-allocate: Fetch the block from memory and write to cache.
Write-around: Write to memory only, and do not fetch the block.

Write-through problems

Slow, because we have to write to memory.
Solution: Write buffer. Write to buffer, and write to memory later.

Write-back problems

Wasteful to write to memory if the cache line is evicted without being modified.
Solution: Dirty bit. Only write to memory if the dirty bit is set.

Memory Abstraction

Let’s say if programs A and B both use “address 0x100” to store their variable i. Suppose if it uses physical address, then both A and B will use the same address. Cannot!

We can use Base + Limit registers.
Base: The starting address of the program.
Limit: The size of the program.
Real Address = Base + Virtual Address, and assert(Real Address < Base + Limit)

Partitioning (L9)

Fixed partitioning: Divide memory into fixed-size partitions.
- Con: Partition need to be large enough to hold the largest process.
- Con: Internal fragmentation (smaller process waste memory space).
Dynamic partitioning: Divide memory into variable-size partitions.
- Free space is known as holes.
- External fragmentation.

Allocation Algorithms (Assume Memory is Contiguous)

First-fit: Allocate the first hole that is big enough.
Best-fit: Allocate the smallest hole that is big enough.
Worst-fit: Allocate the largest hole.

Buddy System

Kind of like segment tree, split binary.

Disjoint Memory Schemes

Paging Scheme

Divide memory into fixed-size pages (logical memory, keep size as power of 2).
Divide process into fixed-size frames (physical memory, same size as page).
Map pages to frames (called page table).
Physical Address = Frame Number * Page Size + Offset

Observations

External fragmentation is eliminated.
Internal fragmentation is minimal (at most one page for each process).

Implementation

Full software, issue: requires two memory access for every memory reference.
Hardware Support! Translation Look-aside Buffer (TLB): A cache for page table entries.

Protection

Paging can include:
- Access rights: Read, write, execute.

Scheduling Algorithms

Round-robin

Each process gets a fixed amount of time to run.
If the process finishes before the time is up, it is preempted.
If the process does not finish before the time is up, it is preempted and put at the back of the queue.

Priority

Preemptive: Higher priority processes are run first.
Non-preemptive: Once a process is running, it will run until it finishes.
Shortcomings: Starvation.
- Solution 1: Aging. Increase the priority of a process as it waits.
- Solution 2: Minimum time quantum. Ensure low-priority processes get to run for a minimum amount of time.

Multi-Level Feedback Queue (MLFQ)

Multiple queues with different priorities.
New job, highest.
If a process uses up its time quantum, demote.
If a process blocks, retain.

Lottery

Each process gets a number of tickets.
A random ticket is drawn, and the process with the winning ticket gets to run.
The more tickets a process has, the more likely it is to win.

Threads

A thread is a sequence of instructions that can be executed independently.
A process can have multiple threads.
Threads share the same address space.

Random Notes on Computer Organization stuff

Table of Contents