Welcome to PolicyAI! This guide will help you understand how to effectively use Large Language Models (LLMs) within PolicyAI to power your content moderation policies. Using LLMs offers significant flexibility and scalability, but crafting effective policies requires understanding best practices tailored to how these models interpret instructions.

<aside> đź’ˇ

Important Disclaimer: These tips and strategies are based on current observations and performance with specific models. They may not generalize directly to new models or contexts and should always be validated through testing before being applied in production. LLM behavior can be nuanced and sometimes unpredictable.

</aside>

https://www.loom.com/share/c1600fab777a4aae853d14f4842efe02

Setting Up Your First Policy in PolicyAI

This section covers the basic steps to get your policy text into PolicyAI.

Writing Clear and Concise Policy Definitions

The definitions within your policy categories need to be unambiguous for the LLM.

<aside> đź’ˇ

Example: When reviewing results for a policy that said, “No hate speech”, we saw reasoning from the LLM for both:

Instead, phrase it positively or as a rule the content must not break (e.g., "Content violates this policy if it contains hate speech as defined below...").

</aside>

In areas where your policy is unclear, vague, or doesn't explicitly cover a scenario, you will often observe the LLM giving inconsistent results for similar content. This is a helpful signal that the policy itself needs refinement to be more definitive in that scenario.

Important: Don’t instruct LLMs to make decisions that require information they cannot possibly have from reviewing a single piece of content in isolation, or from undefined values or subjective words.

<aside> đź’ˇ

Examples of things LLMs struggle to assess or reference without explicit input:

For more information on policy engineering and structuring policies, see our article here.