

#### Beyond the Bridge: Contention-Based Covert and Side Channel Attacks on Multi-GPU Interconnect

<u>Yicheng Zhang</u><sup>1</sup>, Ravan Nazaraliyev<sup>1</sup>, Sankha Baran Dutta<sup>2</sup>, Nael Abu-Ghazaleh<sup>1</sup>, Andres Marquez<sup>2</sup>, Kevin Barker<sup>2</sup>

yzhan846@ucr.edu





<sup>1</sup>University of California, Riverside

<sup>2</sup>Pacific Northwest National Laboratory

#### Multi-GPU Systems

• Multi-GPU Systems: Widely used across various fields.

#### Large Language Model

ChatGPT



Google's Gemini



#### **Compute Graphics**

Computer-generated imagery (CGI)



Computer Graphic Arts



#### Autonomous Vehicle

Tesla

Cybertruck Fails Are a Daily Delight to the Haters

Wild crashes, malfunctions and ill-advised off-roading have made Elon Musk's steel-paneled Tesla a symbol of vehicular folly BY MILES KLEE



#### Data Centers

Google Cloud Platform

#### Google Cloud

**GPU-Accelerated Google Cloud** 

The Fast, Powerful Cloud for Accelerated Computing and Visualization

#### NVIDIA

MARCH 7, 2024



## Outline

- <u>Background: Multi-GPU interconnect.</u>
- Threat model and leakage vectors.
- Cross-GPU covert channel attacks.
- Cross-GPU side channel attacks.
- Mitigation.







### Background: Multi-GPU interconnect

- NVLink: High-speed, high-bandwidth interconnect by NVIDIA.
- **Direct Links**: Supports CPU-to-GPU and GPU-to-GPU connections.
- **Bidirectional**: Each link has two sublinks, one for each direction.
- PCIe: A serial expansion bus standard for connecting a computer to one or more peripheral devices.



https://www.nvidia.com/en-us/design-visualization/nvlink-bridges/

## Outline

- Background: Multi-GPU interconnect.
- Threat model and leakage vectors.
- Cross-GPU covert channel attacks.
- Cross-GPU side channel attacks.
- Mitigation.







#### Known Side-channel Attacks on GPU

- Previous GPU attacks focused on a single GPU.
  - This required the <u>co-location</u> of the victim and the spy on the same GPU.
- "Spy in the GPU-box" [ISCA'23] demonstrated a prime and probe attack on remote GPU's I2 cache.
  - But they did not explore the interconnects between GPUs.



6

[ISCA'23] Dutta, Sankha Baran, et al. "Spy in the GPU-box: Covert and side channel attacks on multi-GPU systems." Proceedings of the 50th Annual International Symposium on Computer Architecture. 2023.

## Threat model

• No need for co-location.



#### Leakage Vectors: Contention-based

• Contention on a shared NVLink can lead to an increase in data transfer.



## Leakage Vectors: Contention-based

- Contention measurement on NVLink.
  - Contention direction influence.



## Leakage Vectors: Contention-based

#### • Contention measurement on NVLink.

- Contention direction influence.
- Contention size influence.



#### Leakage Vectors: Leaky Counter-based

 Prior work exploits GPU performance counters as side channel leakages.

#### NVLink-related Performance Counters.

#### DNN Model Architecture Fingerprinting Attack on CPU-GPU Edge De

Kartik Patwari, Syed Mahbub Hafiz, Han Wang, Houman Homayoun, Zubair Shafiq, and Chen-University of California, Davis, CA, USA

#### Demvstifvin

sequence with an accurac

models, which thus calls f

rity in UM system.

Session 10D: VulnDet 2 + Side Channels 2

**Rendered Inse** 

Graphics Processing Units (GPUs) a

computing devices to enhance the

of graphical workloads. In addition

integrated in data centers and cloud

ios the GPU can be shared between a

d tout have with high and

ABSTRACT

Hoda Naghi

hnagh001(

Zhiyun

zhiyung@c

University of Cali

University of Cali

more powerful, and some are equipped with a GPU to enable on-device deep neural network (DNN) learning tasks such as image classification and object detection. Such DNN-based applications frequently deal with sensitive user data, and their architectures are considered intellectual property to be protected. We investigate a potential avenue of fingerprinting attack to identify the (running) DNN model architecture family (out of state-of-the-art DNN categories) on CPU-GPU Zł edge devices. We exploit a stealthy analysis of aggregate Xulong Tan system-level side-channel information such as memory, CPU, and GPU usage available at the user-space level. To the best University of Pitts Pittsburgh, Pennsylva of our knowledge, this is the first attack of its kind that tax6@pitt.ec does not require physical access and/or sudo access to the victim device and only collects the system traces passively, as opposed to most of the existing reverse-engineering-based DNN model architecture extraction attacks. We perform feature selection analysis and supervised machine learningbased classification to detect the model architecture. With a to accelerate data intensive workload ABSTRACT combination of RAM, CPU, and GPU features and a Random Forest-based classifier, our proposed attack classifies a known granularity allowing a spy application DNN model into its model architecture family with 99% attempt to infer the behavior of the privacy-sensitive training accuracy. Also, the introduced attack is so transferable that and WebGL send workloads to the acteristics. Consequently, it can detect an unknown DNN model into the right DNN frame, allowing an attacker to inter adversary to steal the mod measure the side-effects of the victin tative model extraction at architecture category with 87.2% accuracy. Our rigorous feature analysis illustrates that memory usage (RAM) is timing-sensitive architect mance counters or other resource tr disclosed in hardware pla a critical feature for such fingerprinting. Furthermore, we the vulnerability using two applica information accurately. In successfully replicate this attack on two different CPU-OpenGL based spy can fingerprint w uncover the root cause of GPU platforms and observe similar experimental results that activities within the website, and eve exhibit the capability of platform portability of the attack. Also, we investigate the robustness of the proposed attack identify three new Archto varying background noises and a modified DNN pipeline. movement patterns. We th Besides, we exhibit that the leakage of model architecture UMProbe. We also create UM and utilize the bencl family information from this stealthy attack can strengthen Our evaluation shows th an adversarial attack against a victim DNN model by 2×.

Index Terms-DNN Model Architecture Fingerprinting,

Side-Channel Attack, GPU-enabled Embedded System

Abstract—Embedded systems for edge computing are getting

{kpatwari, shafiz, hjlwang, hhomayoun, zshafiq, chuah}@ucdavis.edu

with GPU Context-switch Session 5A: Frameworks for

Leaky DNN: Stealing Deep-l

Juny deep learning - Layering the ML cake.

#### DeepSniffer: A DNN Model Extraction Framework **Based on Learning Architectural Hints**

Xing Hu<sup>1</sup>, Ling Liang<sup>1</sup>, Shuangchen Li<sup>1</sup>, Lei Deng<sup>1,2</sup>, Pengfei Zuo<sup>1,3</sup>, Yu Ji<sup>1,2</sup>, Xinfeng Xie<sup>1</sup> Yufei Ding<sup>1</sup>, Chang Liu<sup>4</sup>, Timothy Sherwood<sup>1</sup>, Yuan Xie<sup>1</sup> University of California, Santa Barbara1 Tsinghua University2 Huazhong University of Science and Technology<sup>3</sup> Citadel Securities<sup>4</sup> {xinghu, lingliang, shuangchenli, leideng, xinfeng, yuanxie}@ucsb.edu, pfzuo@hust.edu.cn jiy15@mails.tsinghua.edu.cn,{yufeiding,sherwood}@cs.ucsb.edu,liuchang2005acm@gmail.com

#### Abstract

As deep neural networks (DNNs) continue their reach into a wide range of application domains, the neural network architecture of DNN models becomes an increasingly sensitive subject, due to either intellectual property protection or risks of adversarial attacks. Previous studies explore to leverage architecture-level events disposed in hardware platforms to extract the model architecture information. They pose the following limitations: requiring a priori knowledge of victim models, lacking in robustness and generality, or obtaining incomplete information of the victim model architecture. Our paper proposes DeepSniffer, a learning-based model extraction framework to obtain the complete model architecture information without any prior knowledge of the victim model. It is robust to architectural and system noises introapplication doma duced by the complex memory hierarchy and diverse runtime system optimizations. The basic idea of DeepSniffer is to learn the relation between extracted architectural hints (e.g., volumes of memory reads/writes obtained by side-channel or bus snooping attacks) and model internal architectures.

(without network architecture knowledge) to 75.9% (with extracted network architecture). The DeepSniffer project has been released in Github1.

 Computer systems organization → Architectures; Computing methodologies → Machine learning; curity and privacy → Domain-specific security and privacy architectures.

Keywords domain-specific architecture; deep learning security; machine learning

#### ACM Reference Format:

Hritvik Taneia

Xing Hu, Ling Liang, Shuangchen Li, Lei Deng, Pengfei Zuo, Yu Ji, Xinfeng Xie, Yufei Ding, Chang Liu, Timothy Sherwood, Yuan Yie 2020 DeenSniffer: A DNN Model Extraction Framework Based on Learning Architectural Hints. In Proceedings of the Twenty-Fifth International Conference on Architectural Support for Programming Languages and Operating Systems (ASPLOS '20), March 16-20, 2020 Lausanne, Switzerland. ACM, New York, NY, USA, 15 pages. https:// //doi.org/10.1145/3373376.3378460

1 Introduction

Jie Jeff Xu Stephan van Schaik Georgia Tech University of Michigan jxu680@gatech.edu stephys@umich.edu

Yuval Yarom\* Ruhr University Bochum vuval.varom@rub.de

Hot Pixels: Frequency, Power, and Temperature Attacks on GPUs and Arm SoCs

Jason Kim

ASPLOS'20, March 16-20, 2020, Lausanne, Switzerland

Scaling (DVFS) to break constant-time code [50, 67] and even mounting electromagnetic attacks via audio interfaces [32]. These software-based analog attacks pose a paradigm shift in side channel research, as they allow attackers to bypass microarchitectural-attack countermeasures previously considered sufficient to mitigate software-based side channels.

Another change brought about in the recent evolution of computing hardware is the departure from x86-based architectures as the sole source of high performance computing. Indeed, the past few years have seen the introduction of highlyperformant Arm-based hardware, as well as a steady growth in the capabilities and integration of GPUs. Aiming to create thinner, lighter, and more energy efficient devices, modern CPUs and GPUs are forced to balance a delicate three-way tradeoff between power consumption, heat dissipation and execution speed (frequency). While exceptions do exist [22]. the side channel implications of the DVFS mechanism were primarily studied on (properly cooled and powered) Intel platforms [49, 50, 67], despite the increased reliance on DVFS in GPUs and high-performance Arm SoCs.

Thus, in this paper we study the following main questions:

Are software-based physical side channels present on 1 GPUs and high-end Arm SoCs? What would it take to create such attacks and what information can be extracted using it?

and CPU usages available at the user-space ests in recent year statistics and performance counters provide efforts and resourc mation for users to monitor application pe which are their k behavior problems [11]. Our black-box att investigate to what training a supervised classifier using systen be inferred by atta for a diverse set of popular DNN model In particular, w used in deep learning (DL) applications. and an adversary reverse-engineering the victim DNN model Neural Network (I parameters (i.e., more fine-grained propertie: based on context-s ing fine-grained side-channel leakage, our a to extract the fine on classifying a victim DNN into a catego including its layer architectures (i.e., less fine-grained property) Leveraging this coarse-grained side-channel knowledge. Prior named MoSConS. 12] has shown that an attacker's knowled identify the structu model architecture-even though the acquire the structural info is less fine-grained-allows it to improve the fore, we believe ne of adversarial attacks. protect training as While prior literature has investigated r Index Terms-D

Abstract-Mach

tion through memory access pattern-based venues [9], [13], [14], they are limited in ing ways. First, some require physical a In recent years victim device (e.g., by probing electroma emissions) [9], [13]. EM emanations enable especially deep le grained memory statistics and can recon from the research network architecture without prior knowle have shown pron some utilize popular cache-based side-channe Flush+Reload or Prime+Probe [14], [15]. based methods solve the issue of requiring pt., recognition [64].

This paper investigates a DNN model

fingerprinting attack on GPU-enabled edge

side-channel leakage. Specifically, we prot

DNN model fingerprinting attack on CPU

edge devices through passive analysis of

side-channel information such as global me

but require active cache probing. This is undesirable as it involves directly probing the system cache, and due to the significant emergence of cache-based side-channel attacks, researchers are developing detection techniques membrahaxithan This [17] Our

## Leakage Vectors: Leaky Counter-based

#### • NVLink counters.

| Category         | Counter Name                                                                                                     |  |  |  |
|------------------|------------------------------------------------------------------------------------------------------------------|--|--|--|
| Throughput       | nvlink_receive/transmit_throughput                                                                               |  |  |  |
| User             | nvlink_user_data_received/transmitted_nvlink_user_write_data_transmitted, nvlink_user_response_data_received     |  |  |  |
| Total            | nvlink_total_data_received/transmitted, nvlink_total_response_data_received, nvlink_total_write_data_transmitted |  |  |  |
| Atomic operation | nvlink_total/user_nratom_data_transmitted, nvlink_total/user_ratom_data_transmitted                              |  |  |  |

• Observation 1: The NVLink receive/transmit attributes reveal NVLink data transaction direction.



## Leakage Vectors: Leaky Counter-based

#### • User vs Total.

| Category         | Counter Name                                                                                                     |  |  |  |  |  |
|------------------|------------------------------------------------------------------------------------------------------------------|--|--|--|--|--|
| Throughput       | nvlink_receive/transmit_throughput                                                                               |  |  |  |  |  |
| User             | nvlink_user_data_received/transmitted, nvlink_user_write_data_transmitted, nvlink_user_response_data_received    |  |  |  |  |  |
| Total            | nvlink_total_data_received/transmitted, nvlink_total_response_data_received, nvlink_total_write_data_transmitted |  |  |  |  |  |
| Atomic operation | nvlink_total/user_nratom_data_transmitted, nvlink_total/user_ratom_data_transmitted                              |  |  |  |  |  |

• Observation 2: When NVLink is shared, NVLink total counters reveal all NVLink data transaction patterns.



## Outline

- Background: Multi-GPU interconnect.
- Threat model and leakage vectors.
- Cross-GPU covert channel attacks.
- Cross-GPU side channel attacks.
- Mitigation.







## Cross-GPU Covert Channel Attacks

- Sender and receiver share NVLink interconnect.
  - To signal bit '1':
    - Sender transfers data via NVLink to force congestion.
  - To signal bit '0':
    - Sender idles for a pre-defined duration.



#### Cross-GPU Covert Channel Attacks

• Covert message ("Hello, NVLink!").



#### Cross-GPU Covert Channel Attacks

• Bandwidth and error rate.



## Outline

- Background: Multi-GPU interconnect.
- Threat model and leakage vectors.
- Cross-GPU covert channel attacks.
- Cross-GPU side channel attacks.
- Mitigation.







## Cross-GPU Side Channel Attacks

- Attack 1: Application Fingerprinting.
- Victim: conducts her application across multi-GPU systems.
  - 8 HPC applications + 10 DNN models.
- **Spy:** operates in the background, persistently tracking NVLink leakage vectors.

#### NVLink Leakage Trace

- "nvlink\_total\_data\_received".
  - Total data bytes received through NVLinks.



"rf" from openMM benchmarks

"ResNet-50"

## Cross-GPU Side Channel Attacks

- Attack 1: Application Fingerprinting.
- Evaluation among 18 applications.
  - Features engineering.
  - Classification.

|          | DGX   |       | GCP   |           |       |       |
|----------|-------|-------|-------|-----------|-------|-------|
|          | F1    | Prec  | Rec   | <b>F1</b> | Prec  | Rec   |
| KNN      | 25.96 | 31.45 | 26.11 | 55.77     | 55.97 | 58.89 |
| XGBoost  | 90.87 | 91.45 | 91.11 | 97.78     | 98.06 | 97.78 |
| LightGBM | 92.22 | 93.12 | 92.22 | 96.10     | 96.93 | 96.11 |

## Cross-GPU Side Channel Attacks

- Attack 2: Fingerprinting 3D graphics character rendering.
- Victim: renders her 3D graphics character across multi-GPU systems.
  - 50 fully rigged 3D characters from the Blender Studio open movies.
- **Spy:** operates in the background, persistently tracking NVLink leakage vectors.

## NVLink Leakage Trace

- "nvlink\_total\_data\_received".
  - Total data bytes received through NVLinks.



"NVLink leakage traces of 5 consecutive frames"

#### NVLink Leakage Trace

- "nvlink\_total\_data\_received".
  - Total data bytes received through NVLinks.



**Character 2:** 

"Oti"







24

## Cross-GPU Side Channel Attacks

- Attack 2: Fingerprinting 3D graphics character rendering.
- Evaluation among 50 characters.
  - Features engineering.
  - Classification.

|          | F1    | Prec  | Rec   |
|----------|-------|-------|-------|
| KNN      | 59.74 | 62.71 | 62.50 |
| XGBoost  | 90.11 | 93.10 | 90.50 |
| LightGBM | 91.56 | 94.11 | 92.00 |

## Outline

- Background: Multi-GPU interconnect.
- Threat model and leakage vectors.
- Cross-GPU covert channel attacks.
- Cross-GPU side channel attacks.
- Mitigation.







## Mitigation

- Restricting access to high-resolution clock instructions.
- Detecting abnormal NVLink monitoring and/or contention.
- Managing access to leaky counters.



#### Conclusion

- Covert and Side-channels on multi-GPU interconnect.
  - Through contention and leaky counters (First).
- Cross-GPU covert channel attack.
- Two end-to-end cross-GPU side channel attacks.
- Mitigation based on limiting the precision or rate is not effective.
- Future work:
  - Finer-grained side channel attack; better profiling systems for interconnect.

# Thank you! Any questions?

Yicheng Zhang

yzhan846@ucr.edu

https://yichez.site