L2 Cache Architecture

Overview

The L2 cache is a unified cache placed between the MMU and AXI interconnect. It adopts a 4-way set-associative configuration and uses the PLRU (Pseudo Least Recently Used) replacement algorithm.

l2_cache (cache/l2_cache.v)                # L2 unified cache (no submodules)

Key Features

  • Configuration: 4-way set associative

  • Data Width: 128-bit

  • Replacement Algorithm: Pseudo LRU (PLRU)

  • Block Size: 16 Byte

  • Write Policy: Write-back, Write-allocate

The general flow is as follows:

  1. Receive read/write request from MMU

  2. For peripheral access, transition to corresponding state and process

  3. Read cache (2 cycles)

  4. Compare tags of each Way and determine hit/miss

  5. If hit, select data and execute read/write respectively, return response to MMU

  6. If miss, determine Way to replace with PLRU and check Dirty bit

  7. If Dirty bit is set, write back data of selected Way to main memory

  8. Load new data from main memory to that Way and execute read/write appropriately

  9. Return response to MMU

L2 Cache (cache/l2_cache.v)

The L2 cache is a large-capacity unified cache. The L2 cache is the last-level cache, requiring a high hit rate to reduce accesses to DRAM with large access latency. The L2 cache has a 4-way set-associative configuration and adopts the PLRU (Pseudo Least Recently Used) replacement algorithm. The block size is 16 Bytes. The write policy is Write-back and Write-allocate.

To maintain high operating frequency, we read BRAM in register mode. This results in a total latency of 4 cycles: 2 cycles for read, 1 cycle for tag comparison, and 1 cycle for data selection.

Meta RAM:
┌──────────┬───────────┬─────────────────┐
│ valid(1)  dirty(1)   tag(TAG_WIDTH)  │
└──────────┴───────────┴─────────────────┘

Data RAM:
┌─────────────────────────────┐
│ data (128)                  │
└─────────────────────────────┘

State Machine

The operation of each state is as follows:

State

Operation

IDLE

Wait for read/write request

LATENCY

L2 cache read (1st cycle)

CHECK_LOCK

L2 cache read (2nd cycle)

COMPARE_TAG

Tag comparison

CHECK_VALID

Hit/miss determination and selection

WRITE_CACHE

Write to cache line (store instruction)

WRITE_BACK

Write back Dirty line to main memory

ALLOCATE

Read from main memory

PERIPH_WRITE

Wait for peripheral write response

PERIPH_READ

Wait for peripheral read response

RET

Return response to MMU