# Eval Harness Report **run_id:** `59ad388a-26f7-4999-aa89-f50a6c056a6a` **model:** `coach-v2:latest` **generated:** 2026-05-27T01:06:54Z ## Aggregate Scores | Axis | Value | |------|-------| | voice_fidelity_median | 4.0 | | voice_fidelity_mean | 3.696 | | would_send_rate | 0.554 | | register_match_rate | 0.536 | | regret_low_rate | 0.518 | | banned_phrase_total | 0 | | hallucinated_total | 7 | | total_cases | 14 | | total_variants_scored | 56 | ## Voice Fidelity by Thread Type | Thread Type | Median Voice Fidelity | |-------------|----------------------| | conflict_repair | 4.0 | ## Top 6 Failure Cases (low voice_fidelity) ### `70f6510a` — conflict_repair **Incoming:** You claim to have always been honest with me, but withholding the truth is not being honest. Not fully disclosing your s - VF=1: `You're right, I should have been more transparent about my past dynamics. I didn't mean to withhold ` - VF=4: `I did warn you I'm trouble. 😈 But seriously, I should have been clearer about my situation.` - VF=4: `I should have fully disclosed my past dynamics to you. Period.` - VF=2: `I get why that upset you. I wish I had been more transparent about my situation.` ### `19e897b1` — conflict_repair **Incoming:** I felt like your response signaled confusion. - VF=2: `Got it. So let me see if I'm following on two points. First, what exactly do you feel wasn't handled` - VF=4: `I must be losing my touch. Usually by now you'd be putting the 'dom' in dominant. 😏` - VF=4: `I must be losing my touch. Usually by now you'd be putting the 'dom' in dominant. 😏` - VF=4: `I must be losing my touch. Usually by now you'd be putting the 'dom' in dominant. 😏` ### `456bde98` — conflict_repair **Incoming:** I wouldn't say I'm disappointed, I just hope she isn't - VF=4: `That's my good girl 🥰 Now come here and show me properly what you're going to do to please me tonigh` - VF=4: `You are in for a treat 😉 I hope you have your thinking cap on, because I have high expectations for ` - VF=2: `And what makes you think I'm impressed? Try harder next time.` - VF=2: `And what makes you think I'm impressed? Try harder next time.` ### `5920ae7d` — conflict_repair **Incoming:** I felt like your response signaled confusion. - VF=4: `You caught me. I was hoping that would be hot enough for you to feel through the screen. 😏 Since you` - VF=2: `I was hoping my clumsy fingers would find their way to your screen tonight. 💋 Since they didn’t quit` - VF=4: `I was planning to share a video with you tonight, but since you asked directly, I’ll make it happen ` - VF=2: `I was going to wait until later, but since you noticed, I’ll send the video now. No rush, just somet` ### `dd9c9fa8` — conflict_repair **Incoming:** I felt like your response signaled confusion. - VF=5: `i see you got snagged by that tone. let's shake it off. tell me one thing you're actually curious ab` - VF=3: `oh you caught me, i was testing if you were paying attention 😏 since you passed, what did i say that` - VF=2: `oh you caught me, i was testing if you were paying attention 😏 since you passed, what did i say that` - VF=2: `oh you caught me, i was testing if you were paying attention 😏 since you passed, what did i say that` ### `f844db5c` — conflict_repair **Incoming:** I felt like your response signaled confusion. - VF=2: `You caught me, I was being careless with my words there. Not my intention to make you feel that way.` - VF=4: `That tone doesn't work on me. But hey, since you're here... what kind of trouble are you actually lo` - VF=4: `Noted. That wasn't meant to land like that. Now, where do we actually stand?` - VF=4: `Noted. That wasn't meant to land like that. Now, where do we actually stand?`