It’s the second day of AWS re:Invent, the day of the conference where the bulk of the announcements are made. By our count, AWS put out 21 press releases today, bringing the total to 49 total for the week (you can browse them on the AWS News Blog).
AWS has long set the bar for hyperscale infrastructure, and that continues to be the case. The company, however, is often a fast follower when it comes to AI. Microsoft, with its OpenAI relationship, was quick to bring GenAI services to Azure. Google, while the number three CSP, is building some of the best in-house silicon for AI training and inference.
While there’s a lot of goodness in those 49 announcements, the ones I’m most interested in are related to infrastructure and AI, along with its new Amazon Nova foundation models.
AI Infrastructure
AI services require AI infrastructure. AWS announced Trainium2, its new in-house designed training accelerator, new EC2 instances that use the new part, and updates to its storage stack to better support the data needs of the AI lifecycle.
Trainium2
Tranium is AWS’s machine learning accelerator, and this week it announced the second generation: the cleverly named Trainium2, purpose-build to enhance the training of large-scale AI models, including foundation models and large language models.
The original Tranium was designed before the shockwave that is LLMs and thusly wasn’t competitive against NVIDIA’s training accelerators. AWS closes that gap with Trainium2, saying that the new part delivers up to four times faster training performance and up to twice the energy efficiency of Trainium1.
Trainium2 supports up to 96GB of HMB memory, more than 3x the capacity of Trainium1. When it comes to LLM and other GenAI training, memory is paramount, and the engineers over at Amazon’s Annapurna Labs nailed it.
AWS said that, beyond traditional EC2 instances, it plans to deploy Trainium2 in EC2 UltraClusters, allowing configurations of up to 100,000 chips. This setup will deliver up to 65 exaFLOPS of aggregate compute power.
If you want a deeper-than-you-want-read on Trainium2, check out Dylan Patel’s piece on his SemiAnalysis site. His team does a great job of filling in the blanks.
Amazon EC2 Enhancements
- Trn2 Instances and Trn2 UltraServers: Powered by the new AWS Trainium2 accelerators, the new Trn2 instances offer what AWS claims is up to 4x faster performance, 4x more memory bandwidth, and 3x more memory capacity than the prior generation. Beyond that, the new instances provide 30-40% better price performance compared to current GPU-based EC2 instances.
- P5en Instances: Built around NVIDIA H200 Tensor Core GPUs and custom 4th Gen Intel Xeon processors, the enw P5en instances feature up to 3,200 Gbps of Elastic Fabric Adapter v3 (EFAv3) networking. The new type offers up to a 35% reduction in latency compared to previous generations, enhancing performance for distributed training and HPC workloads.
Amazon S3 Enhancements
Managing data goes hand-in-hand with managing the AI lifecycle, especially where object storage is concerned. AWS invented S3, the de facto industry standard for objects, and it continues to evolve the technology. This week AWS added capabilities that will be very welcome :
- Queryable Object Metadata (Preview): Automatically generates and stores metadata for objects in Amazon S3 buckets using fully managed Apache Iceberg tables. This allows users to efficiently query object metadata with tools like Amazon Athena, Amazon Redshift, and Apache Spark, facilitating rapid data discovery and management at scale.
- Amazon S3 Tables: Optimizes storage for tabular data, Amazon S3 Tables support the Apache Iceberg format, enabling seamless queries with engines such as Amazon Athena and Amazon EMR. AWS says that it brings up to 3x faster query performance and up to 10x more transactions per second compared to self-managed table storage.
AI Services & Foundation Models
AWS announced updates across its AI and machine learning services, enhancing capabilities in model efficiency, data governance, and user productivity. AWS also unveiled Amazon Nova, its suite of advanced foundation models designed to enhance GenAI across various modalities.
Amazon Nova
Amazon Nova encompasses two primary categories: understanding models and creative content generation models. The Nova foundation models are integrated into Amazon Bedrock.
- Understanding Models: These models process text, image, or video inputs to generate text outputs.
- Amazon Nova Micro: A text-only model optimized for low latency and cost, supporting up to 128K tokens. It’s suitable for tasks like text summarization, translation, content classification, interactive chat, and basic mathematical reasoning and coding. Customization is possible through fine-tuning and model distillation.
- Amazon Nova Lite: A multimodal model handling text, image, and video inputs, capable of processing up to 300K tokens or 30 minutes of video per request. This is for real-time customer interactions, document analysis, and visual question-answering tasks. Fine-tuning and model distillation are supported for optimization.
- Amazon Nova Pro: A high-performance multimodal model offering a balance of accuracy, speed, and cost. With a 300K token context length, it supports complex workflows requiring API calls and tool integrations.
- Amazon Nova Premier: The most advanced multimodal model, it tackles complex reasoning tasks and serves as a superior teacher for distilling custom models. It is currently in training, with availability targeted for early 2025.
- Creative Content Generation Models: These models generate images or videos based on text and image inputs.
- Amazon Nova Canvas: An image generation model capable of producing studio-quality images with precise control over style and content. Features include inpainting, outpainting, and background removal. It performs well on benchmarks such as TIFA and ImageReward.
- Amazon Nova Reel: A video generation model that creates short videos from text prompts and images, allowing control over visual style and pacing. This one is targeted at marketing, advertising, and entertainment content tasks.
Amazon Bedrock Enhancements
- Model Distillation: AWS introduced a preview of Amazon Bedrock Model Distillation, which automates the creation of smaller, fine-tuned models (student models) based on responses from larger foundation models (teacher models).
- RAG Evaluation and LLM-as-a-Judge: New features in Amazon Bedrock include RAG evaluation and a “LLM-as-a-judge” capability. These tools enable automated assessment of generative AI applications, allowing developers to refine models efficiently by evaluating aspects like correctness and helpfulness.
- APIs for RAG Applications: AWS released APIs designed to enhance RAG applications for Amazon Bedrock. The new APIs support custom connectors, streaming data ingestion, and reranking models, allowing for more accurate and customized responses in generative AI applications.
Amazon Q Updates
- Workflow Automation and Integrations: Amazon Q Business now offers workflow automation capabilities alongside 50 new action integrations.
- Extensions and Integrations: There are new browser extensions and integrations with popular messaging and collaboration tools, allowing users to access Amazon Q Business directly within their preferred applications.
- Visual Element Insights: A new feature enables Amazon Q Business to extract insights from visual elements embedded within documents, such as diagrams and charts, providing users with comprehensive information analysis.
PartyRock Enhancements
- New Capabilities and Free Usage: PartyRock, an Amazon Bedrock playground, has introduced new features and now offers free daily usage without requiring a credit card. Users can explore a vast app catalog, find relevant applications, and utilize them to enhance daily productivity.
Quick Take
There’s a lot here to digest, and maybe more announcements to come. While we’re just skimming the announcements in this blog, once the event wraps, we’ll take a more detailed look at Amazon’s most impactful announcements.
Stay tuned.