Evaluation Results

DATE MODEL JUDGEMENT PASS FAIL REQUIRES REVIEW TOTAL ACTIONS
2025-11-13 gpt-4o AI Judge Only 37 (79%) 10 (21%) 0 (0%) 47 View Details