Amazon launches AI model evaluation on Bedrock: How it works

Technology 2 min read

By Dwaipayan Roy 01:12 pm Nov 30, 202301:12 pm

Model evaluation comprises 2 parts

Amazon has unveiled Model Evaluation on Bedrock, a new service that lets businesses test AI models before implementing them. Revealed at the AWS re:Invent conference, this service is currently in its preview stage. AWS VP of Database, Analytics, and Machine Learning, Swami Sivasubramanian, explained that "model selection and evaluation is not just done at the beginning, but is something that's repeated periodically." The goal is to involve more people in assessing AI models.

Components of model evaluation on Bedrock

Model Evaluation on Bedrock consists of two parts: automated evaluation and human evaluation. Developers can gauge a model's performance in areas like robustness, accuracy, or toxicity for tasks such as text classification, question and answering, summarization, and text generation. The system then creates a report based on these evaluations. For human evaluation, users can collaborate with an AWS human evaluation team or their own team, with AWS offering customized timelines and pricing for those working with its assessment team.

Evaluating models with custom datasets

AWS allows customers to use their own data for model evaluation, in addition to providing test datasets. This helps businesses better understand how the models perform in their specific use cases. AWS VP for Generative AI, Vasi Philomin, stated that gaining a deeper understanding of model performance can guide development more effectively, and help companies determine if models meet responsible AI standards before using them.

Human evaluation detects additional metrics

Human evaluators can identify metrics that automated systems might miss, such as empathy or friendliness. While AWS doesn't need all customers to benchmark models, those exploring which models to use might benefit from this process. During the preview period, AWS will only charge for model inference used in evaluations. The aim of benchmarking on Bedrock is not to assess models broadly but to offer businesses a way to measure a model's impact on their projects.

Titan Image Generator has also been unveiled

Amazon has also rolled out an image generation tool, called Titan Image Generator. It is now up for grabs in preview for AWS customers on Bedrock, the company's AI development platform. Belonging to Amazon's Titan line-up of generative AI models, Titan Image Generator can create new images using text descriptions, and even customize existing pictures. This puts it in competition with rivals like Microsoft, Google, and OpenAI.