L2 Cache Architecture
Overview
The L2 cache is a unified cache placed between the MMU and AXI interconnect. It adopts a 4-way set-associative configuration and uses the PLRU (Pseudo Least Recently Used) replacement algorithm.
l2_cache (cache/l2_cache.v) # L2 unified cache (no submodules)
Key Features
Configuration: 4-way set associative
Data Width: 128-bit
Replacement Algorithm: Pseudo LRU (PLRU)
Block Size: 16 Byte
Write Policy: Write-back, Write-allocate
The general flow is as follows:
Receive read/write request from MMU
For peripheral access, transition to corresponding state and process
Read cache (2 cycles)
Compare tags of each Way and determine hit/miss
If hit, select data and execute read/write respectively, return response to MMU
If miss, determine Way to replace with PLRU and check Dirty bit
If Dirty bit is set, write back data of selected Way to main memory
Load new data from main memory to that Way and execute read/write appropriately
Return response to MMU
L2 Cache (cache/l2_cache.v)
The L2 cache is a large-capacity unified cache. The L2 cache is the last-level cache, requiring a high hit rate to reduce accesses to DRAM with large access latency. The L2 cache has a 4-way set-associative configuration and adopts the PLRU (Pseudo Least Recently Used) replacement algorithm. The block size is 16 Bytes. The write policy is Write-back and Write-allocate.
To maintain high operating frequency, we read BRAM in register mode. This results in a total latency of 4 cycles: 2 cycles for read, 1 cycle for tag comparison, and 1 cycle for data selection.
Meta RAM:
┌──────────┬───────────┬─────────────────┐
│ valid(1) │ dirty(1) │ tag(TAG_WIDTH) │
└──────────┴───────────┴─────────────────┘
Data RAM:
┌─────────────────────────────┐
│ data (128) │
└─────────────────────────────┘
State Machine
The operation of each state is as follows:
State |
Operation |
|---|---|
|
Wait for read/write request |
|
L2 cache read (1st cycle) |
|
L2 cache read (2nd cycle) |
|
Tag comparison |
|
Hit/miss determination and selection |
|
Write to cache line (store instruction) |
|
Write back Dirty line to main memory |
|
Read from main memory |
|
Wait for peripheral write response |
|
Wait for peripheral read response |
|
Return response to MMU |