Developing an effective moderation policy with LLMs is rarely a one-time task. It's an ongoing cycle of creation, testing, analysis, and refinement. Content trends on your platform evolve, platform rules may change, and you'll gain deeper insights into how your LLM-powered policy performs on real data over time.
Recommended Cycle:
Data from testing (false positives, false negatives, confusion patterns) and reviewing specific misclassified examples are your primary signals for where policy refinement is needed.
Because LLMs can sometimes produce different outputs for the exact same input if the policy definition is not perfectly clear or the content is borderline (learn more here), it can be helpful to test the same examples multiple times to see how consistent results are.
Musubi has a unique workflow which allows you to use these variable results to your advantage to better strengthen your policies.
After you run your policy against a dataset, scroll up to the first “dataset preview & selection” section and choose which examples you want to test further.
From there, under the “preview selected rows” section, you can choose how many times you want to run those examples through the policy. We recommend running 5-10 times.

After you click “Run policy”, you will see how the LLM decided each time. In this example, the response was the same each time, but the severity was different.

In other cases, the decision may be completely different, which is a strong signal that the policy needs to be more clearly defined or more examples added.
The “reason” for the decision (found by scrolling to the right of the table above, or downloading the CSV) can be really helpful here, as slight differences in reasoning can sometimes highlight gaps in the policy.

Above, you can see that although each decision was correct, the reasoning isn’t consistent, so this may indicate a need to further define what a credible threat is and is not in the policy.