Paper ID: 2409.03992

Confidential Computing on nVIDIA H100 GPU: A Performance Benchmark Study

Jianwei Zhu, Hang Yin, Shunfan Zhou

This report evaluates the performance impact of enabling Trusted Execution Environments (TEE) on NVIDIA H100 GPUs for large language model (LLM) inference tasks. We benchmark the overhead introduced by TEE mode across various models and token lengths, focusing on the bottleneck caused by CPU-GPU data transfers via PCIe. Our results show that while there is minimal computational overhead within the GPU, the overall performance penalty is primarily due to data transfer. For most typical LLM queries, the overhead remains below 5%, with larger models and longer sequences experiencing near-zero overhead.

Submitted: Sep 6, 2024

Topics

Single GPU
Benchmark Study
Inference Task
Multi GPU
Test Environment

Links

arXiv PDF