AMD PRODUCTS
DIVERSE POWER DELIVERY CHALLENGES

Laptop
- 10W to 45W TDP
- 3-4 high current (up to 40A) rails
- 5-10 low current rails
- Diverse IP

Server
- 65W to 165W TDP
- 3 high current rails (up to 120A)
- 2-3 low current rails

Graphics
- 10W to 300W TDP
- 2 high current rails (up to 200A)
- 1-2 low current rails
Minimize AC transients

Accurate DC voltage

Large size / weight of voltage regulator components
  - Exacerbated by high currents

Regulator costs with the profusion of rails
  - Exacerbated by increasingly diverse integrated IP and high core counts
CPUs, GPUs and APUs all operate at low voltages with high current – this creates a challenge for power supplies and packages to deliver a quality voltage.

- In this example the first droop has rung out prior to the trough of the second droop.
- However, workloads can exist which overlap the frequencies leading to an even larger drop in voltage.

It is costly to provide perfect power supply decoupling: have to make trade-offs.

The situation gets worse each generation:
- Finer power supply partitioning
- More aggressive stuttering behavior
- Reduced on-die decoupling capacitance with scaling
- Improved fine grain clock gating

FOR ALL PRODUCTS: AC POWER SUPPLY NOISE IS AN ISSUE

INDUCED THROUGH HIGH CURRENT TRANSIENTS
AMD’S CLOCK STRETCHER
ADAPT CLOCK FREQUENCY TO IN RESPONSE TO VOLTAGE DROOPS

- Detect the voltage is dropping
- Immediately slow down the clock so speed paths don’t fail due to lower voltage
- Prevents rare events from increasing voltage margin
  - Typically see largest droops < 1% of the time
- Doesn’t compensate for 100% of droop or eliminate need for good power delivery, but greatly mitigates the impact

Droop Detector → DFS → PLL → VDDCORE → Vdd

Vdd above threshold
Vdd drops below threshold
Response Time < 1 nano-second
Operating frequency temporarily decreases
Traditional margined frequency

CZ Clock Stretcher Power Savings

Percent Power-Savings

0.0% 5.0% 10.0% 15.0% 20.0% 25.0%

1.15 1.2 1.3 1.33 1.4 1.5 1.65 1.7

Normalized Clock Frequency

CPU Power Savings
GPU Power Savings
DEALING WITH DC VARIATIONS:
BOOT TIME POWER SUPPLY CALIBRATION

- Run voltage analysis code on the tester when the part is tested and binned
  - Log the voltage as seen by the integrated power supply monitors

- On boot-up in whatever system and board the GPU is deployed, run that same code and observe the voltage
  - Dial the regulator to deliver the same AC voltage as observed on the tester for speed binning
AMD PRODUCTS

DIVERSE POWER DELIVERY CHALLENGES

Laptop
- 10W to 45W TDP
- 3-4 high current (up to 40A) rails
- 5-10 low current rails
- Diverse IP

Server
- 65W to 165W TDP
- 3 high current rails (up to 120A)
- 2-3 low current rails

Graphics
- 10W to 300W TDP
- 2 high current rails (up to 200A)
- 1-2 low current rails
Laptop purchaser’s #1 concern: Battery life

Market consistently moving to thinner, lighter form factors

3 – 25W unit growth of 400% forecast ...
... and 35W+ unit reduction of over ~70% forecast

Source: Mercury, PDGB, AMD LRP
THE RACE TO BE THIN

- Standard core – 800um thick
- Thin core – 400um thick
- Ultra-thin core – Equal or less than 200um thick
- Coreless – No core (build-up only)
FOR INTEGRATED REGULATION, WE NEED DISCRETE INDUCTORS

- But they are big – at odds with the Z-height reduction
  - Silicon integration a possibility

- Package area is much more expensive than board area → moving power conversion on package is expensive

- Silicon area is even more expensive than package – difficult to manage the cost of moving power FETs on die
THE MANY RAILS REQUIRED IN LAPTOPS PRESENT CHALLENGES AND OPTIONS

For TDP 35 W Carrizo part

<table>
<thead>
<tr>
<th>Die Supply Name</th>
<th>Nominal voltage</th>
<th>TDC(A)</th>
</tr>
</thead>
<tbody>
<tr>
<td>VDDCR_CPU</td>
<td>Variable .7V-1.5V</td>
<td>40</td>
</tr>
<tr>
<td>VDDCR_GFX</td>
<td>Variable .7V – 1.2V</td>
<td>30</td>
</tr>
<tr>
<td>VDDCR_NB</td>
<td>Variable .7V-1.2V</td>
<td>12</td>
</tr>
<tr>
<td>VDDCR_FCH</td>
<td>0.8</td>
<td>0.2</td>
</tr>
<tr>
<td></td>
<td>1.5</td>
<td>3</td>
</tr>
<tr>
<td>VDDIO_MEM_S3</td>
<td>1.35</td>
<td>2.9</td>
</tr>
<tr>
<td></td>
<td>1.25</td>
<td>2.8</td>
</tr>
<tr>
<td></td>
<td>1.05</td>
<td>7</td>
</tr>
<tr>
<td>VDDP</td>
<td>0.95</td>
<td>7</td>
</tr>
<tr>
<td>VDDP_GFX</td>
<td>1.05</td>
<td>1.5</td>
</tr>
<tr>
<td></td>
<td>0.95</td>
<td>1.5</td>
</tr>
<tr>
<td>VDDP_S5</td>
<td>1.05</td>
<td>0.8</td>
</tr>
<tr>
<td>VDD_18</td>
<td>0.95</td>
<td>0.8</td>
</tr>
<tr>
<td>VDD_18_S5</td>
<td>1.8</td>
<td>1.5</td>
</tr>
<tr>
<td>VDD_33</td>
<td>1.8</td>
<td>0.5</td>
</tr>
<tr>
<td>VDD_33_S5</td>
<td>3.3</td>
<td>0.2</td>
</tr>
<tr>
<td>VDD_AUDIO</td>
<td>3.3</td>
<td>0.2</td>
</tr>
<tr>
<td>VDDBT_RTC_G</td>
<td>1.8</td>
<td>0.2</td>
</tr>
<tr>
<td></td>
<td>1.5</td>
<td>0.2</td>
</tr>
<tr>
<td></td>
<td>1.8</td>
<td>4uA</td>
</tr>
</tbody>
</table>

Could combine high current rails

Can be discrete or aggregated into a Power Management IC (PMIC) on-board

GFX LDO power Losses vs. Discrete rail

% System Power Loss vs. CPU Power Allocation % (GFX = 1 - CPU%)
The trend is to low power so massive currents are not a motivator for integration.

Constant pressure on costs
- Can’t increase silicon area or package area unless there are other substantial competitive advantages.

Z-height reductions to support thin form factors is a very high priority
- Discrete inductors not viable.
- Silicon-integrated a possibility within cost constraints.

LDO’s can be used judiciously to reduce rail counts and costs.
AMD PRODUCTS
DIVERSE POWER DELIVERY CHALLENGES

Laptop
- 10W to 45W TDP
- 3-4 high current (up to 40A) rails
- 5-10 low current rails
- Diverse IP

Server
- 65W to 165W TDP
- 3 high current rails (up to 120A)
- 2-3 low current rails

Graphics
- 10W to 300W TDP
- 2 high current rails (up to 200A)
- 1-2 low current rails
The critical things are:

- High efficiency – customers base purchase decisions on TCO which has a strong power component
- Manage thermals – cooling is expensive
- Support high currents – both process and performance trends are pushing currents much higher
- Cost is still important – pricing pressure from the cloud and competition
PROCESSOR ELECTRICAL SPECS
A COUPLE OF DEFINITIONS

**TDC: Thermal Design Current**
*a.k.a. Continuous Current*
*a.k.a. Max Continuous Current*
*a.k.a. Thermal Current*

Amount of electrical current voltage regulator must supply and be thermally viable.

**EDC: Electrical Design Current**
*a.k.a. Peak Current*
*a.k.a. Max Current*
*a.k.a. Max Instantaneous Current*
*a.k.a. Imax*

Amount of electrical current voltage regulator must supply without caring for thermals.

The EDC/TDC ratio is a function of maximum activity versus “typical” activity ➔ This is steadily increasing as IPC, core counts and boost frequencies increase.
EDC: Electrical Design Current

As we become more effective at reducing average power while adding wider vector instructions and increasing IPC and running higher frequencies for short term boost

The ratio of EDC to TDC is increasing from what used to be 1.3X to 1.5X and now well beyond that

- EDC is on track to become a first order limiter.
- Mitigations:
  - Reduce operating frequency (bad)
  - Increase phase and inductor counts for VR (expensive)
  - Power management approaches
  - Use IVR to reduce VR currents
Fully Integrated Voltage Regulators

- FIVR = Fully Integrated Voltage Regulator
- Power delivered to the CPU from a single MBVR at elevated voltage (1.8V) reducing platform power loss
- Power FETs, control circuits and decoupling on die, inductors are on the package

High currents make package inductors viable
- Lower cost and area solution

Source: Intel
IVR CAN BE A GOOD FIT FOR SERVER

- Thermal impact *may* be offset by power savings, but is workload dependent.
- Impact of power density not accounted for here.

**FIVR Value Proposition: Power**

- Platform Power Savings
- Per Domain Voltage
- Per core P-states
- Uncore Frequency Scaling
- Power Savings With FIVR

Source: Intel
BUT THERMALS ARE VERY CHALLENGING

- Server and HPC system are typically thermally limited
- Fans are a large fraction of power and cost
- Increasing power density is expensive

Example system power breakdown:

- CPU: 28%
- Memory: 27%
- PCIe: 3%
- HDD: 13%
- SAS card: 1%
- AOM: 1%
- VRM: 4%
- ALOM: 4%
- Fans: 18%
- Misc: 4%
- CPU VR: 2%
- DDR VR: 2%

200 mm

Fan at inlet

High-fin density heatsink

VRM region

Exhaust

processor package
THERMAL SCENARIOS FOR VOLTAGE REGULATION

Scenario 1: traditional VRM design
- Per package TDP: 165W
- VR Power loss: 24W (~87% efficiency)
- Fan Speed: 30 CFM

Scenario 2: VRM+IVR
- Per package TDP: 165W
- IVR Power loss: 24W (~87% efficiency)
- Fan Speed: 30 CFM
- All cores running at full speed (max P-state)
- Extra heat uniformly distributed

All else equal (i.e. reliability limits, heatsink, fans), the additional 5C for the processor must result in lower power and hence lower performance
PER-CORE VOLTAGE A KEY IVR SERVER BENEFIT

- Variation mitigation
  - 28mV on average ≈ 3% voltage
  - an LDO is higher efficiency

- Per-core P-states
  - For IVR to be a win over LDO, average drop-out must be >10%
  - Workload dependent and not necessarily the common case
    - System level

---

- HVM test flow determines 8 unique curves per die.
- Dynamic fusing assigns each core to a V/F curve which is closest to the ideal curve for that core.

**P-Core P-States**

Source: Intel
SERVER PRODUCT POWER INTEGRATION

CONCLUSIONS

- Thermal limitations are at the forefront and not easily dealt with
- Currents are increasing rapidly with each generation with max current or EDC, becoming a first order limiter
- Integrated voltage regulation solves some problems but creates new ones
  - Per-core voltage, EDC mitigation, platform BOM reduction
  - LDO’s versus IVR is a tradeoff that needs to be re-examined each generation
  - IVR conversion efficiency is the most important parameter
AMD PRODUCTS

DIVERSE POWER DELIVERY CHALLENGES

**Laptop**
- 10W to 45W TDP
- 3-4 high current (up to 40A) rails
- 5-10 low current rails
- Diverse IP

**Server**
- 65W to 165W TDP
- 3 high current rails (up to 120A)
- 2-3 low current rails

**Graphics**
- 10W to 300W TDP
- 2 high current rails (up to 200A)
- 1-2 low current rails
The final product is the add-in board (AiB), not the GPU itself
- Power conversion and delivery are the responsibility of the component vendor (i.e. AMD), not the OEM/ODM customer

TDC and $di/dt$ are very high, and show no signs of diminishing

Prices are generally fixed by market segment so not much cost flexibility
POWER DELIVERY CASE STUDY WITH FURY X

VR'S CAN MOVE CLOSER TO THE LOAD WITH HBM

PCB area occupied by ASIC + Memory (Radeon™ R9 290X)

PCB area occupied by ASIC with HBM
Clock stretching cuts droop about in half for a traditional memory board.

The HBM board (called Fiji here), has almost the same voltage without clock stretching.
**POWER CONVERSION INTEGRATION**

**TARGET:** IMPROVE AC AND DC VOLTAGE IMPACTS, REDUCE COSTS

- High frequency switchers can definitely improve AC transients
  - Question is how big the gain is after clock stretching is accounted for

- For cost reduction, we’ve grown the expensive package and silicon while reducing commodity VRD components
  - Benefit depends on a tight, miniaturized IVR design
  - Increased power density likely to impact cooling solution cost

---

**Employs package discrete passive components (L, C)**

- Lower platform cost
- Reduced package complexity
- Improved end-to-end efficiency
- Smallest form factor

---

For cost reduction, we’ve grown the expensive package and silicon while reducing commodity VRD components.

- Benefit depends on a tight, miniaturized IVR design
- Increased power density likely to impact cooling solution cost
Adaptive techniques like clock stretchers and boot time calibration are mitigating AC and DC power delivery losses that would otherwise motivate integration.

Some AC droop impact remains and can be targeted by integration.

BOM cost reduction is another motivator, but the cost equation is challenging, requiring minimal impact to package size and silicon area.

- Reductions in capacitors from better AC response and simplified board design can potentially offset.
<table>
<thead>
<tr>
<th>IVR benefit / challenge</th>
<th>Laptop</th>
<th>Server</th>
<th>Graphics</th>
</tr>
</thead>
<tbody>
<tr>
<td>Thermals</td>
<td>Mid</td>
<td>High</td>
<td>Mid</td>
</tr>
<tr>
<td>Cost reduction</td>
<td>High</td>
<td>Mid</td>
<td>High</td>
</tr>
<tr>
<td>EDC mitigation</td>
<td>Mid</td>
<td>High</td>
<td>Mid</td>
</tr>
<tr>
<td>AC+DC loss mitigation</td>
<td>High</td>
<td>High</td>
<td>High</td>
</tr>
<tr>
<td>Per-IP/per-core voltage</td>
<td>High</td>
<td>High</td>
<td>Low</td>
</tr>
<tr>
<td>Form factor reduction</td>
<td>High</td>
<td>Low</td>
<td>Low</td>
</tr>
</tbody>
</table>

### Product Line Sensitivity

<table>
<thead>
<tr>
<th>Integration Approach Impact</th>
</tr>
</thead>
<tbody>
<tr>
<td>IVR</td>
</tr>
<tr>
<td>-----</td>
</tr>
<tr>
<td>High</td>
</tr>
<tr>
<td>High</td>
</tr>
<tr>
<td>High</td>
</tr>
<tr>
<td>High</td>
</tr>
<tr>
<td>High</td>
</tr>
<tr>
<td>Low</td>
</tr>
</tbody>
</table>

### Summary

**Laptop**
- Thin form factors reduce Z-height putting severe limitations on inductor sizes
- PMICs are well suited to sweeping up the small rails
- Silicon-integrated LDOs can be used to good effect

**Server**
- Increasing power density impacts thermals
- Per-core voltage is a key benefit
- LDOs can accomplish much of this
- EDC mitigation another benefit
- Decision between LDO and IVR is very product specific

**Graphics**
- BOM cost reduction a key opportunity
- Requires a very area efficient implementation to succeed
- AC transient reduction valuable as well