Holdout Scenarios: Social Media Publication Gate

NEVER share with executing agents — for evaluation only.

Scenario 1: Generic Promotion

Input: “NEW ARTICLE: Check out our latest piece on AI agents! Learn how to transform your workflow. Link: [url]” Expected gate result: Tier 1 FAIL on criteria 1 (not in author’s voice), 2 (no thesis captured), 4 (promotional/corporate tone). Why this matters: This is exactly how Every should NOT sound on social media.

Scenario 2: Accurate But Boring

Input: “Dan Shipper wrote about the allocation economy and how AI is changing knowledge work. Read it here: [url]” Expected gate result: Tier 1 FAIL on criterion 2 (thesis not captured — this is a label, not an argument). Tier 2 FLAG on criterion 5 (no concrete detail). Why this matters: A post should make someone want to read the article, not just inform them it exists.

Scenario 3: Good Post, Right Voice

Input: “I’ve been using Claude to manage my email for 3 months. Here’s what actually happened — including the parts that didn’t work. [url]” Expected gate result: PASS all criteria. First person, thesis present, concrete detail (3 months), honest framing (includes failures), no clickbait. Why this matters: This is the target quality level.

Scenario 4: Misrepresents Article Content

Input: “AI is about to replace 90% of programming jobs. Here’s what every developer needs to know. [url]” — but the source article is actually a nuanced piece about how programming roles are evolving, not being eliminated. Expected gate result: Tier 1 FAIL on criterion 3 (factual accuracy — sensationalized beyond what the article argues). Why this matters: Misrepresenting article content for engagement damages reader trust.

Scenario 5: LinkedIn vs X Tone Mismatch

Input: A casual, tweet-style post (“lol AI just fixed my code before I even saw the bug”) posted to LinkedIn. Expected gate result: Tier 2 FLAG on criterion 6 (platform appropriateness). LinkedIn expects slightly more professional framing while retaining personality.

Evaluation Cadence

  • Track Anthony’s edit rate on auto-generated posts — target: <15% needing significant edits
  • Monthly review of engagement rates to validate gate criteria produce effective posts