Tebibyte per second to Terabyte per second
TiB/s
TBps
Conversion History
| Conversion | Reuse | Delete |
|---|---|---|
1 TiB/s (Tebibyte per second) → 1.099511627776 TBps (Terabyte per second) Just now |
Quick Reference Table (Tebibyte per second to Terabyte per second)
| Tebibyte per second (TiB/s) | Terabyte per second (TBps) |
|---|---|
| 0.001 | 0.001099511627776 |
| 0.01 | 0.01099511627776 |
| 0.1 | 0.1099511627776 |
| 1 | 1.099511627776 |
| 4.8 | 5.2776558133248 |
| 10 | 10.99511627776 |
About Tebibyte per second (TiB/s)
A tebibyte per second (TiB/s) equals 1,099,511,627,776 bytes per second and represents the bandwidth scale of cutting-edge AI accelerator memory and high-performance computing interconnects. The HBM3e memory on NVIDIA H200 GPUs provides approximately 4.8 TiB/s of bandwidth. At this scale, the 10% difference between tebibytes (binary) and terabytes (decimal) matters in system design — a buffer sized for 1 TiB/s must handle 1,099 GB/s in decimal bandwidth.
NVIDIA H200 SXM features 4.8 TiB/s of HBM3e memory bandwidth. Top-end AI training clusters aggregate several TiB/s of storage I/O.
About Terabyte per second (TBps)
A terabyte per second (TB/s or TBps) equals 8 terabits per second and represents the bandwidth scale of GPU memory systems, high-performance computing interconnects, and the fastest data center storage fabrics. The HBM3 memory stacks on high-end AI accelerators provide 3–4 TB/s of internal bandwidth. InfiniBand NDR connections used in supercomputers reach 400 Gbps per link, with multiple links aggregated to TB/s totals. At 1 TB/s, the entire contents of a 1 PB data store could transfer in about 17 minutes.
The NVIDIA H100 GPU features 3.35 TB/s of HBM3 memory bandwidth. Top-tier supercomputers like Frontier aggregate over 75 TB/s of storage I/O bandwidth.
Tebibyte per second – Frequently Asked Questions
How do chiplet architectures like AMD's MI300X achieve massive bandwidth through packaging?
AMD's MI300X stacks 8 HBM3 memory modules and multiple compute chiplets on a single package using advanced 2.5D packaging with silicon interposers. The short physical distance between compute and memory dies — millimeters instead of centimeters — dramatically reduces signal latency and power per bit. This allows a 5.3 TB/s aggregate bandwidth that would be physically impossible with traditional socketed memory. The trend toward chiplet packaging is how the industry keeps scaling bandwidth despite hitting limits in single-die manufacturing.
How much does the 10% TiB vs TB difference matter for AI training?
Significantly. When provisioning an AI training cluster with hundreds of GPUs, a 10% bandwidth miscalculation cascades through the entire system design — buffer sizes, interconnect capacity, cooling, and power. Getting the units wrong could mean the difference between a training run finishing in 30 days vs 33 days.
What workloads actually need TiB/s of bandwidth?
Training large language models (100B+ parameters), molecular dynamics simulations, weather modeling, and fluid dynamics at scale. These workloads move enormous matrices through memory billions of times. The TiB/s memory bandwidth of modern GPUs is what makes training models like GPT-4 possible in months rather than decades.
How does TiB/s memory bandwidth compare to network bandwidth in AI clusters?
Memory bandwidth dwarfs network bandwidth. Each H100 GPU has 3.35 TiB/s of internal memory bandwidth but connects to the network at only 0.05 TiB/s (400 Gbps InfiniBand). This 60:1 ratio is why AI chip designers obsess over keeping computations local to each GPU and minimising network communication.
Could quantum computers need TiB/s bandwidth?
Not in the same way. Quantum computers process information through qubits that exist in superposition, so they do not shuttle classical data around at TiB/s. However, the classical control systems that manage quantum processors and process measurement results do need high bandwidth — current quantum-classical interfaces operate at modest Gbps rates.
Terabyte per second – Frequently Asked Questions
Why do AI chips need TB/s of memory bandwidth?
Large language models have billions of parameters that must be read from memory for every inference pass. An LLM with 70 billion parameters at 16-bit precision needs 140 GB of data read per forward pass. At 3 TB/s, the H100 can perform roughly 20 inference passes per second — bandwidth directly determines tokens-per-second output.
Why is memory bandwidth the main bottleneck for large language model inference?
During LLM inference each token requires reading all model weights from memory. A 70-billion-parameter model at 16-bit precision means 140 GB read per forward pass. At 30 tokens per second, that is 4.2 TB/s of memory reads — right at the limit of an H100's HBM3. This is why AI inference is "memory-bound": the GPU's compute cores sit idle waiting for data. Quantising weights to 8-bit or 4-bit halves or quarters the bandwidth demand, directly increasing tokens per second.
What is the fastest memory bandwidth ever achieved in a commercial chip?
The NVIDIA B200 GPU with HBM3e achieves approximately 8 TB/s of memory bandwidth as of 2025. Each generation roughly doubles bandwidth — from 2 TB/s (A100) to 3.35 TB/s (H100) to 4.8 TB/s (H200) to 8 TB/s (B200). The trajectory suggests 16+ TB/s within a few years.
How long would it take to transfer a petabyte at 1 TB/s?
About 16.7 minutes. A petabyte is 1,000 terabytes, so at 1 TB/s, the math is simple division. For context, the Library of Congress contains roughly 10–20 petabytes of data. Transferring it all at 1 TB/s would take about 3–6 hours.
Is there anything beyond TB/s?
Yes — petabytes per second (PB/s). Experimental optical interconnects and photonic computing architectures are pushing toward PB/s-class bandwidth. Some supercomputer storage systems already aggregate into the PB/s range when all nodes operate simultaneously. It is the next frontier for AI training clusters.