It’s up to you to decide how to utilize the output within your broader moderation system.
Understanding the Output: For each piece of content submitted, PolicyAI provides a classification label (the determined policy category), reasoning, and a severity score indicating the model's classification of how severe the violation is.
Severity scores can be additional helpful signals for triage, human review/ oversight, and double-checking.
Potential Workflow Integrations:
- Automated Actions: Integrate PolicyAI's API to trigger actions based on classifications and severity scores.
- Human Review Queueing: Use PolicyAI outputs to populate specific queues for human moderators, prioritizing content based on predicted category and severity.
- Data Analysis and Reporting: Store PolicyAI outputs to analyze trends in violations, understand the volume of different content types, and report on moderation effectiveness.
- Feedback Loops: Integrate human review decisions back into your system to update golden sets and inform policy refinements.