Creating Policies and Prompts

Welcome to PolicyAI! This guide will help you understand how to effectively use Large Language Models (LLMs) within PolicyAI to power your content moderation policies. Using LLMs offers significant flexibility and scalability, but crafting effective policies requires understanding best practices tailored to how these models interpret instructions.

<aside> 💡

Important Disclaimer: These tips and strategies are based on current observations and performance with specific models. They may not generalize directly to new models or contexts and should always be validated through testing before being applied in production. LLM behavior can be nuanced and sometimes unpredictable.

</aside>

https://www.loom.com/share/c1600fab777a4aae853d14f4842efe02

Setting Up Your First Policy in PolicyAI

This section covers the basic steps to get your policy text into PolicyAI.

Navigate to the Manage Policies section within the PolicyAI user interface.
Click the "Create New Policy" button. You will be prompted to name your policy and add a description. This is a good place to put the version number, date, etc. The “Preset” tab includes a simple default policy that you can use to get started, if you want.
You will see a large text area where you can paste or type your policy definition.
Your policy should be formatted using headings (e.g., ## Category Name) to clearly delineate each policy category as instructed in the prompt introduction template. Ensure category names follow the naming conventions (no special characters). Anything that has a header format will populate a Moderation Category in the sidebar.
Periodically save your policy draft as you work.

Writing Clear and Concise Policy Definitions

The definitions within your policy categories need to be unambiguous for the LLM.

Policies should be as simple and straightforward as possible. Define terms clearly within or immediately after the category heading.
Look for areas of repetition and try to consolidate definitions. However, avoid excessive consolidation that results in run-on sentences or single sentences packed with too many different examples or concepts, as this can become confusing for the LLM.
Do not phrase policy definitions negatively (e.g., simply saying "No hate speech")

<aside> 💡

Example: When reviewing results for a policy that said, “No hate speech”, we saw reasoning from the LLM for both:

"The content does not contain hate speech, therefore it does violate the policy" and
“The content does not contain hate speech, therefore it does not violate the policy.”

Instead, phrase it positively or as a rule the content must not break (e.g., "Content violates this policy if it contains hate speech as defined below...").

</aside>

In areas where your policy is unclear, vague, or doesn't explicitly cover a scenario, you will often observe the LLM giving inconsistent results for similar content. This is a helpful signal that the policy itself needs refinement to be more definitive in that scenario.

Important: Don’t instruct LLMs to make decisions that require information they cannot possibly have from reviewing a single piece of content in isolation, or from undefined values or subjective words.

<aside> 💡

Examples of things LLMs struggle to assess or reference without explicit input:

Repeated posts or "mass posting" (when reviewing content one piece at a time).
Whether content was "unwanted," "unwelcome," or similar concepts without explicit user feedback data included in the input alongside the content.
References to "illegal drugs" or other illegal acts without contextual information on the relevant geography and local laws (age of consent, solicitation laws, etc., also fall into this category). Similarly, things like “known hate group” can vary across jurisdictions and are constantly changing, so may need additional explicit input.
Subjective definitions, such as “egregious”, "gratuitous," "non-severe," "dangerous," "extreme," "clearly," "suggests”, etc.
New slang, memes, symbols, or methods of bypassing detection that are not yet encoded in the LLM's training data or the policy's explicit examples.
Lack of measurable metrics, e.g. how much of something constitutes "minimal" vs. "excessive"? </aside>

For more information on policy engineering and structuring policies, see our article here.