Question 1

Why do AI chips need TB/s of memory bandwidth?

Accepted Answer

Large language models have billions of parameters that must be read from memory for every inference pass. An LLM with 70 billion parameters at 16-bit precision needs 140 GB of data read per forward pass. At 3 TB/s, the H100 can perform roughly 20 inference passes per second — bandwidth directly determines tokens-per-second output.

Question 2

Why is memory bandwidth the main bottleneck for large language model inference?

Accepted Answer

During LLM inference each token requires reading all model weights from memory. A 70-billion-parameter model at 16-bit precision means 140 GB read per forward pass. At 30 tokens per second, that is 4.2 TB/s of memory reads — right at the limit of an H100's HBM3. This is why AI inference is "memory-bound": the GPU's compute cores sit idle waiting for data. Quantising weights to 8-bit or 4-bit halves or quarters the bandwidth demand, directly increasing tokens per second.

Question 3

What is the fastest memory bandwidth ever achieved in a commercial chip?

Accepted Answer

The NVIDIA B200 GPU with HBM3e achieves approximately 8 TB/s of memory bandwidth as of 2025. Each generation roughly doubles bandwidth — from 2 TB/s (A100) to 3.35 TB/s (H100) to 4.8 TB/s (H200) to 8 TB/s (B200). The trajectory suggests 16+ TB/s within a few years.

Question 4

How long would it take to transfer a petabyte at 1 TB/s?

Accepted Answer

About 16.7 minutes. A petabyte is 1,000 terabytes, so at 1 TB/s, the math is simple division. For context, the Library of Congress contains roughly 10–20 petabytes of data. Transferring it all at 1 TB/s would take about 3–6 hours.

Question 5

Is there anything beyond TB/s?

Accepted Answer

Yes — petabytes per second (PB/s). Experimental optical interconnects and photonic computing architectures are pushing toward PB/s-class bandwidth. Some supercomputer storage systems already aggregate into the PB/s range when all nodes operate simultaneously. It is the next frontier for AI training clusters.

Terabyte per second (TBps)	Gibibyte per second (GiB/s)
0.001	0.93132257461547851563
0.01	9.31322574615478515625
0.1	93.1322574615478515625
1	931.322574615478515625
3.35	3,119.93062496185302734375
10	9,313.22574615478515625

Terabyte per second to Gibibyte per second

Conversion History

Quick Reference Table (Terabyte per second to Gibibyte per second)

About Terabyte per second (TBps)

About Gibibyte per second (GiB/s)

Terabyte per second – Frequently Asked Questions

Why do AI chips need TB/s of memory bandwidth?

Why is memory bandwidth the main bottleneck for large language model inference?

What is the fastest memory bandwidth ever achieved in a commercial chip?

How long would it take to transfer a petabyte at 1 TB/s?

Is there anything beyond TB/s?

Gibibyte per second – Frequently Asked Questions

Why do GPU specs sometimes use GiB/s instead of GB/s?

How much GiB/s bandwidth does DDR5 RAM provide?

What is the difference between memory bandwidth and storage bandwidth?

Can I measure GiB/s bandwidth on my own system?

At what GiB/s does data transfer become limited by physics?