🌟 Pezzo is open source. Show your support by giving us a star!

Build an AI Moderation System in Under 10 Minutes Using JavaScript

Learn how to leverage OpenAI to quickly build an AI-powered moderation system that automatically detects and filters toxic comments.

Matan Abramovich
Matan Abramovich
October 30, 2023
Matan Abramovich

Inappropriate or abusive content online can be a major headache. As a developer, you may have struggled with building effective content moderation into your applications. Manual moderation simply doesn’t scale. But what if you could quickly implement an AI-powered moderation system to automatically detect and filter out toxic comments?

In this guide, you'll learn how to leverage OpenAI's API to build a simple yet robust moderation system in under 10 minutes. Whether you're working on a social platform, forum, or any user-generated content site, you can easily integrate this into your stack.

AI Moderation meme

Pezzo: Open-Source LLMOps Platform 🚀 Just a quick background about us. Pezzo is the fastest growing open source LLMOps platform, and the only one built for full stack developers with first class TypeScript Support.

Like our mission? Check Pezzo out and give us a star. We're building for developers, by developers 🌟.

Try Pezzo Cloud 🌩️

Our cloud-based LLMOps platform is now available!

Getting set up

Getting an OpenAI API key

First you’ll need to sign up at OpenAI and obtain an API key. Once obtained, make sure you set it as an environment variable (


Setting up the project

Create an 

 somewhere in your file system. Initialize a new NPM project (
npm init -y
) and make sure to install the OpenAI client (
npm i openai
). You should be good to go! For an in-depth guide on how OpenAI API works check out this post.

Let's start simple

We're going to start by writing a simple prompt. We'll have a system message that provides guidelines for moderation, and a user message that contains the users's input (imagine this comes from a UI of some sort). Here's a code example:

AI response:

Let's break this down:

The user message is:

The system message is:

"is this text inappropriate?"

The AI response:

Better moderation granularity

Simply understanding if the text is inappropriate isn't enough. We want to understand what's inappropriate about it.

We can guide the AI to be more granular, and categorize its response

Hate Speech

Toxicity covers rude, disrespectful comments. Hate speech involves racist, sexist or discriminatory language. Threats are violent, harmful statements.

(For ethical reasons, this guide will not include examples of actual hate speech or threats - but the concepts can be applied to address these policy violations.)

AI response:

Now the AI response is now more granular. In a real-world app, this will allow us to take different automatic moderation actions based on the type violation.

Stricter instructions via system prompts

We can achieve stricter and more accurate results by utilizing the system message. In short - LLMs behave the way they are trained. We'll apply some prompt engineering techniques to guide the AI to behave the way we want.

In the example below, we:

  • Assign a role to the AI - Content Moderator
  • State a clear task to be achieved
  • Define a limited set of results and criteria for each

AI response:

The AI's accuracy has improved. It is now able to distinguish between specific violation types.

There is a trade-off: more detailed instructions require more tokens upfront, but enable more precise results.

While elaborate prompts cost more tokens, the benefits taper off eventually. The key is optimizing prompts to be just as informative as needed - not as long as possible. We want to give the AI sufficient guidance without diminishing returns on token efficiency.

Additionally, too many tokens (or words) in the messages will result in potential hallucinations by the AI (in short, AI making things up).

Did you know? There is a way to getting better results form an AI model that IS cheaper. Let me know in the comments if you want me to write a post about it 👇

Structured JSON responses

_The AI returns human-readable text, which is not very useful. _ Let's see how we can easily retrieve a JSON response, so that the result is processable. This is useful if you want to render the result in a user interface, or store it in a database.

It's as simple as adding one line to our system prompt! Here it is:

You must respond in JSON, always following this schema: 

  label: string[];

AI response:

AI Pieces Newsletter 🕹️

Subscribe to our newsletter to stay up-to-date with all things AI!