Page Loader
Summarize
Mistral unveils AI-powered tool to tackle online toxicity
The API can be customized for specific applications

Mistral unveils AI-powered tool to tackle online toxicity

Nov 08, 2024
12:24 pm

What's the story

Mistral, an artificial intelligence (AI) start-up, has launched a new application programming interface (API) for content moderation. This is the same API that powers moderation on Le Chat, Mistral's chatbot platform. The company says it can be customized for specific applications and safety standards. The new API from Mistral is powered by a refined model called Ministral 8B, which has been trained to process text in several languages including English, French, and German.

Information

It can classify text into 9 categories

The Mistral API can classify text into nine different categories. These include sexual content, hate and discrimination, violence and threats, dangerous and criminal content, self-harm, health-related issues, financial matters, legal topics, and personally identifiable information (PII).

Application scope

Versatility and industry reception

The best part is that Mistral's moderation API can be applied to both raw and conversational text, showcasing its versatility. In its blog post, the company notes that "over the past few months, we've seen growing enthusiasm across the industry and research community for new AI-based moderation systems, which can help make moderation more scalable and robust across applications." "Our content moderation classifier leverages the most relevant policy categories for effective guardrails," Mistral adds.

Bias challenges

Addressing biases and technical flaws in AI systems

Despite their potential benefits, AI-powered moderation systems can also be susceptible to the same biases and technical flaws that plague other AI systems. For example, some models trained to detect toxicity may read phrases in African-American Vernacular English (AAVE) as disproportionately "toxic." Social media posts about people with disabilities are frequently flagged as more negative or toxic by commonly used public sentiment and toxicity detection models.

Ongoing development

Commitment to improving its moderation model

Mistral notes that its moderation model is extremely accurate but still a work in progress. The company has not compared the performance of its API with other popular moderation APIs such as Jigsaw's Perspective API and OpenAI's moderation API. "We're working with our customers to build and share scalable, lightweight, and customizable moderation tooling," the company said, promising to continue working with the research community for safety advancements in the wider field.