AWS Beats Cloud Rivals to NVIDIA Blackwell with EC2 G7 — 4.6x AI Inference Gain Over G6

#ai #programming #tech #product

AWS launched EC2 G7 instances on June 19, 2026, becoming the first major cloud to offer NVIDIA RTX PRO 4500 Blackwell Server Edition GPUs. The instances claim 4.6x AI inference performance over G6, backed by 700 Gbps EFA networking and 32 GB GDDR7 per GPU. The move arrives the same week AWS confirme

Amazon Web Services on June 19, 2026 became the first major cloud provider to offer NVIDIA's latest-generation server GPUs, launching EC2 G7 instances powered by the RTX PRO 4500 Blackwell Server Edition. The instances claim up to 4.6x AI inference performance over the incumbent G6 family — a gap wide enough to make workload migration economically compelling for customers running inference at scale.

What the Hardware Actually Is

The RTX PRO 4500 Blackwell Server Edition is a single-slot, passively cooled card with 32 GB of GDDR7 ECC memory, 800 GB/s memory bandwidth, and 51 TFLOPS of FP32 compute. At 165 W, it slots into dense rack configurations that a dual-slot active-cooled card cannot reach. NVIDIA positions it as the successor to the L4, the workhorse of the G6 generation, with approximately 41% more CUDA cores (10,496 vs. 7,424) and 5th-generation Tensor Cores for the AI inference gains AWS is claiming.

By embedding the chip in its own instances before Google Cloud or Microsoft Azure, AWS captures the early-adopter window for customers who cannot wait for rival offerings.

Key Facts

4.6x AI inference throughput over G6 instances (AWS claim; workload-specific)
2.1x graphics performance over G6 for rendering and VDI
32 GB GDDR7 per GPU, 1.33x G6 capacity; 2.45x memory bandwidth
700 Gbps EFA networking, 7x more than G6 — critical for multi-node inference serving
Up to 8 GPUs per instance: 256 GB total GPU memory, 192 vCPUs, 768 GiB system RAM
7.6 TB local NVMe SSD, 7 instance sizes from single-GPU to 8-GPU
Available now in US East (Ohio) and US West (Oregon); On-Demand, Savings Plans, and Spot purchasing
AWS did not disclose per-hour pricing at launch

Why the 700 Gbps Networking Number Matters

The raw GPU specs are expected given the Blackwell architecture. The more consequential figure may be the 700 Gbps Elastic Fabric Adapter throughput — a 7x jump over G6. Modern LLM inference serving distributes context across many GPUs; the bottleneck is frequently inter-GPU memory transfer, not raw compute. Sevenfold more bandwidth at the instance level directly raises the ceiling on model sizes G7 can serve without sharding across multiple instances, reducing both latency and cost per token.

The instances also support NVIDIA GPUDirect RDMA with EFA for Amazon FSx for Lustre, enabling GPU memory to communicate with distributed storage without routing through the CPU — a meaningful architecture for retrieval-augmented inference pipelines.

Industry Context: Blackwell Momentum Is Real

The G7 launch lands four days after MLCommons published MLPerf Training 6.0 results on June 16, in which NVIDIA Blackwell systems swept every benchmark, including a record 8,192-GPU scale-out run. The Blackwell GB300 NVL72 posted up to 60% faster training than the GB200 in the same rack configuration, and NVIDIA was the sole entrant on two new mixture-of-experts tests using DeepSeek-V3 (671 billion parameters) and GPT-OSS-20B. That benchmark validation gives enterprise buyers confidence the Blackwell generation is not merely a paper spec.

AWS's Two-Track Chip Strategy

The G7 launch cannot be read in isolation. One day before, reporting emerged that AWS is in active discussions to sell its own Trainium chips to external data centers — a significant strategic pivot confirmed by Amazon AI chief Peter DeSantis. Andy Jassy's April 2026 shareholder letter valued Amazon's semiconductor business at $50 billion in annualised revenue potential if sold externally, and noted commitments from OpenAI (approximately 2 gigawatts of Trainium capacity) and Anthropic (up to 5 gigawatts).

The juxtaposition is deliberate. AWS wants to be indispensable whether customers choose NVIDIA silicon or commodity alternatives. Offering Blackwell first cements the NVIDIA relationship; developing and potentially externalising Trainium creates a credible second-source that pressures NVIDIA pricing. Amazon separately confirmed it will deploy more than one million NVIDIA GPUs starting in 2026 — a figure that underscores the AI infrastructure market is large enough for both strategies to coexist.

Who Is Affected

The clearest beneficiaries are workloads that today saturate G6 memory or bandwidth: large multimodal inference, real-time video transcoding at 4K/8K, GPU-accelerated analytics on Amazon EMR and EKS, and virtual desktop infrastructure at enterprise scale. The 9th-generation NVENC engine with 4:2:2 H.264 and HEVC support makes G7 particularly relevant for media companies with broadcast-grade encoding requirements.

For enterprises currently on Reserved G6 Instances, the migration calculus depends on undisclosed G7 pricing. A 4.6x performance ratio only pays off at the workload level if the per-hour cost ratio is below that threshold — a number AWS has not yet provided.

What to Watch

Pricing disclosure and G7 Reserved Instance availability are the near-term catalysts; without a public on-demand rate, the 4.6x performance claim cannot be converted into a cost-per-inference comparison against G6 or against Azure and Google Cloud once they respond with their own Blackwell offerings.

Source: aws_infra, dcd_news, hpcwire, gn_gpu_cluster

Originally published on gentic.news