### DRAM REFRESH MANAGEMENT

Mahdi Nazm Bojnordi

**Assistant Professor** 

School of Computing

University of Utah



### Overview

- Upcoming deadline
  - Mar. 15<sup>th</sup>: homework assignment is due (11:59PM).

- □ This lecture
  - DRAM address mapping
  - DRAM refresh basics
  - Smart refresh
  - Elastic refresh
  - Avoiding or pausing refreshes

# DRAM Address Mapping

Where to store cache lines in main memory?



# DRAM Address Mapping

Where to store cache lines in main memory?



# **DRAM Address Mapping**

☐ How to compute bank ID?



# Cache Line Interleaving





Spatial locality is not well preserved!

# Page Interleaving



#### Address format



# Cache Line Mapping

□ Bank index is a subset of set index



### Row Buffer Conflict

 Problem: interleaving load and writeback streams with the same access pattern to the banks may result in row buffer misses



# Key Issues

- To exploit spatial locality, use maximal interleaving granularity (or row-buffer size)
- To reduce row buffer conflicts, use only those bits in cache set index for "bank bits"



## Permutation-based Interleaving



## Permutation-based Interleaving

□ New bank index



### Permutation-based Interleaving



### DRAM Refresh

- DRAM cells lose charge over time
- Periodic refresh operations are required to avoid data loss
- □ Two main strategies for refreshing DRAM cells
  - Burst refresh: refresh all of the cells each time
    - Simple control mechanism (e.g., LPDDRx)
  - Distributed refresh: a group of cells are refreshed
    - Avoid blocking memory for a long time







### Refresh Basics

- □ tRET: the retention time of DRAM leaky cells (64ms)
  - All cells must be refreshed within tRET to avoid data loss
- tREFI: refresh interval, which is the gap between two refresh commands issues by the memory controller
  - MC sends 8192 auto-refresh commands to refresh one bin at a time
    - $\blacksquare$  tREFI = tRET/8192 = 7.8us
- tRFC: the time to finish refreshing a bin (refresh completion)
- What is the bin size?

### Refresh Basics

### □ tRFC increases with chip capacity

#### Impact of chip density on refresh completion time



# Controlling Refresh Operations

- □ CAS before RAS (CBR)
  - DRAM memory keeps track of the addresses using an internal counter
- RAS only refresh (ROR)
  - Row address is specified by the controller; similar to a pair of activate and precharge
- Auto-refresh vs. self refresh
  - Every 7.8us a REF command is sent to DRAM (tRAS+tRP)
  - LPDDR turns off IO for saving power while refreshing multiple rows

# Refresh Granularity

### □ All bank vs. per bank refresh



(a) All-bank refresh ( $REF_{ab}$ ) frequency and granularity.



(b) Per-bank refresh ( $REF_{pb}$ ) frequency and granularity.

# Optimizing DRAM Refresh

 Observation: each row may be accessed as soon as it is to be refreshed



### **Smart Refresh**

□ Idea: avoid refreshing recently accessed rows



**Figure 5: Smart Refresh Control Schematic** 

# Diverse Impacts of Refresh



Worst Case Refresh Hit DRAM Read



| DRAM<br>capacity | tRFC  | bandwidth<br>overhead<br>(95°C per Rank) | latency<br>overhead<br>(95°C) |
|------------------|-------|------------------------------------------|-------------------------------|
| 512Mb            | 90ns  | 2.7%                                     | 1.4ns                         |
| 1Gb              | 110ns | 3.3%                                     | 2.1ns                         |
| 2Gb              | 160ns | 5.0%                                     | 4.9ns                         |
| 4Gb              | 300ns | 7.7%                                     | 11.5ns                        |
| 8Gb              | 350ns | 9.0%                                     | 15.7ns                        |

### Elastic Refresh

- Send refreshes during periods of inactivity
- Non-uniform request distribution
- Refresh overhead just has to fit in free cycles
- Initially not aggressive, converges with delay until empty (DUE) as refresh backlog grows
- Latency sensitive workloads are often lower bandwidth
- Decrease the probability of reads conflicting with refreshes

### Elastic Refresh

- Introduce refresh backlog dependent idle threshold
- With a log backlog, there is no reason to send refresh command
- With a bursty request stream, the probability of a future request decreases with time
- As backlog grows, decrease this delay threshold



**Key: to reduce REF and READ conflicts** 

### DRAM Refresh vs. ERROR Rate



If software is able to tolerate errors, we can lower DRAM refresh rates to achieve considerable power savings

### Flikker

- Divide memory bank into high refresh part and low refresh parts
- Size of high-refresh portion can be configured at runtime
- Small modification of the Partial Array Self-Refresh (PASR) mode



# Refresh Pausing



Pausing at arbitrary point can cause data loss

Pausing Refresh reduces wait time for Reads

### Performance Results

