# MMU - Memory Management Unit

## Overview

The MMU (Memory Management Unit) is a module responsible for virtual memory management, address translation, and L1 cache control.
It supports RISC-V SV32 (Supervisor Virtual Memory 32-bit) and implements a page-based virtual memory system.
```
mmu (mmu/mmu.v)
├── dtlb (mmu/dtlb.v)                      # Data TLB
├── itlb (mmu/itlb.v)                      # Instruction TLB
├── l1_dcache (cache/l1_dcache.v)          # L1 data cache
├── l1_icache (cache/l1_icache.v)          # L1 instruction cache
└── ptw (mmu/ptw.v)                        # Page table walker
```

## Key Features

- **Address Translation**
- **Address Protection**
- **L1 Cache Control**
- **Lower Memory Access**
- **Store-Conditional (SC) Instruction Support**


## MMU (mmu/mmu.v)
Processes instruction and data read/write requests in parallel.
The MMU is managed through the following 3 state machines:
- Instruction memory access
- Data memory access
- Lower memory access

The general flow is as follows:
Instruction (I_CTRL):
1. Receive instruction fetch request from CPU
2. Translate virtual address to physical address in ITLB, fetch PTE from lower memory under PTW control on miss, check permission information
3. If page fault exception or access fault exception is detected, return exception information to CPU
4. Check cache hit in L1 I-Cache, fetch from lower memory on miss
5. If access fault exception is detected, return exception information to CPU
6. Return instruction to CPU

Data (D_CTRL):
1. Receive load/store request from CPU
2. Translate virtual address to physical address in DTLB, fetch PTE from lower memory under PTW control on miss, check permission information
3. If page fault exception or access fault exception is detected, return exception information to CPU
4. For Store-Conditional instructions, compare with reservation address and execute store only if condition is met, return to CPU if failed
5. For peripheral access, send access request to lower memory and wait for response
6. Check cache hit in L1 D-Cache, load from lower memory on miss and execute load/store
7. Return data to CPU

Lower Memory (M_CTRL):
1. Receive PTW's PTE fetch requests and L1 I/D-Cache miss access requests, process from highest priority if multiple
2. Before accessing upper cache, check if the address to access exists in L1 D-Cache and check Dirty bit, write back to lower memory first if Dirty
3. Access lower memory and obtain data
4. Notify completion, notify exception information if exception occurs

## TLB (mmu/itlb.v, mmu/dtlb.v)
TLB (Translation Lookaside Buffer) is a cache to accelerate translation from virtual addresses to physical addresses.
There are two TLBs: ITLB (Instruction TLB) and DTLB (Data TLB).

Direct-mapped organization is adopted to achieve high operating frequency and low latency.
TLBs generally adopt set-associative or fully-associative methods because address space locality is low.
We adopted direct-mapped organization, prioritizing latency and operating frequency.
Instead, to increase hit rate, we recommend using more entries than usual.
Implementing TLB data and tags in BRAM secures many entries while suppressing hardware resource increase.

On miss, PTW (Page Table Walker) references the page table, obtains PTE (Page Table Entry), and writes to TLB.
TLB entries are as follows:
```
TLB ENTRY:
┌─────────┬──────────────────────┬──────────────┐
│ Lvl (1) │ Tag (32-INDEX_WIDTH) │ PTE (32)     │
└─────────┴──────────────────────┴──────────────┘
```

## PTW (mmu/ptw.v)
PTW (Page Table Walker) is a hardware module that references the page table on TLB miss and obtains PTE (Page Table Entry).
Since SV32 allows 2-level page table walk, PTW performs up to 2 memory accesses per PTE.
Lower memory access is performed through the MMU's state machine.

## L1 Cache (cache/l1_icache.v, cache/l1_dcache.v)
The MMU includes an L1 instruction cache and L1 data cache.
The common parts of the L1 cache are explained first, followed by the differences between each cache.

The L1 cache adopts the PIPT (Physically Indexed, Physically Tagged) method, which uses physical addresses for cache index and tag.
Using physical addresses prevents synonym and alias problems that occur when using virtual addresses.
This allows simple scalability to large capacity.
It uses direct-mapped organization with a block size of 16 Bytes.
These methods increase hit rate while maintaining high operating frequency and reducing latency.

In the current implementation, to shorten paths, the L1 cache uses the lower 12 bits of the virtual address for the index and the 22 bits of the physical address obtained from TLB or PTW for the tag.
Therefore, if making it smaller than 8 KiB, the Verilog HDL code needs to be modified.

### L1 Instruction Cache
The L1 instruction cache is updated only when fetching from lower memory on cache miss.
To maintain consistency, when a store instruction is executed on an entry in the L1 instruction cache, that entry is invalidated.

```
Meta RAM:
┌──────────┬─────────────────┐
│ valid(1) │ tag(TAG_WIDTH)  │
└──────────┴─────────────────┘

Data RAM:
┌─────────────────────────────┐
│ data (128)                  │
└─────────────────────────────┘
```

### L1 Data Cache
The L1 data cache adopts Write-back and Write-allocate policies.
When a store instruction hits the cache, it updates the data in the cache and sets the Dirty bit.
On cache miss, it loads data from lower memory and writes to the cache line.

```
Meta RAM:
┌──────────┬───────────┬─────────────────┐
│ valid(1) │ dirty(1)  │ tag(TAG_WIDTH)  │
└──────────┴───────────┴─────────────────┘

Data RAM:
┌─────────────────────────────┐
│ data (128)                  │
└─────────────────────────────┘
```

## MMU State Machines

The operation of each state is as follows:

### Instruction Fetch Control (istate_q)
| State    | Operation                       |
|----------|---------------------------------|
| `I_IDLE` | Wait for instruction fetch request |
| `I_TLB`  | ITLB reference                  |
| `I_CACHE`| L1 I-Cache access               |
| `I_PTW`  | Page table walk                 |
| `I_FETCH`| Fetch from lower memory         |
| `I_RET`  | Return response to CPU          |

### Data Load/Store Control (dstate_q)

| State      | Operation                       |
|------------|---------------------------------|
| `D_IDLE`   | Wait for load/store request     |
| `D_TLB`    | DTLB reference                  |
| `D_CACHE`  | L1 D-Cache access               |
| `D_PTW`    | Page table walk                 |
| `D_RET`    | Return response to CPU          |
| `D_CHECK`  | LR/SC condition check           |
| `D_STORE`  | Execute store                   |
| `D_ALLOCATE`| Load from lower memory         |
| `D_PERIPH` | Peripheral access               |

### Lower Memory Control (mstate_q)

| State    | Operation                   |
|----------|-----------------------------|
| `M_IDLE` | Wait for read/write request |
| `M_PTW`  | PTW's PTE fetch             |
| `M_DCHECK`| Dirty line check           |
| `M_WRITE`| Execute write-back          |
| `M_READ` | Read from lower memory      |