Based on your test results, you'll likely need to make adjustments. Here's how to approach common issues.
Symptoms you may see: Poor accuracy, inconsistent labeling (variability), systematic bias (too many False Positives or False Negatives for a category), discrepancy between label and reason.
Look for patterns: Which categories are most frequently confused for each other? This suggests focusing your attention on the definitions of those specific categories and potentially their relative ordering.
Review Misclassified Examples: Look at the examples that were classified incorrectly.
Check Policy Text & Prompt: Reread the relevant policy definitions, the prompt introduction, and contextual information (Protected Characteristics, etc.) critically. Is any wording potentially confusing or misleading to an LLM?
Evaluate Prompting Nuances: Did you include examples in the prompt that might be too easy or not representative? Are you asking the model to infer information it doesn't have?
Potential Fixes: Refine policy wording, add specific examples (violating and non-violating) to policy definitions, adjust prompt instructions, re-order categories, expand your golden set to include more diverse edge cases, or identify cases that may require human review.