Governance Learning Loop

Every Inc – Governance v1.0 – 2026-04-01

Purpose

Governance that does not evolve becomes either irrelevant (agents outgrow it) or oppressive (it blocks legitimate work). This document defines how Every’s agent governance evolves from operational evidence – not from theory, not from fear, and not from a single incident overreaction.

The learning loop is the mechanism that keeps governance tight enough to prevent harm and loose enough to enable Every’s “ship and iterate” culture. It is operationalized by the evolution-auditor skill and reviewed by Dan + Brandon monthly.

The Monthly Governance Review Cycle

Timing

First Monday of each month. 60-90 minutes. Non-negotiable – this is the governance equivalent of the weekly all-hands.

Participants

Required: Dan Shipper (CEO), Brandon Gell (operations/governance)
Invited as needed: Kate Lee (editorial governance), Natalia Quintero (consulting governance), product GMs (when their domain is affected)

Pre-Meeting: Evolution-Auditor Report

The evolution-auditor skill generates a structured report 48 hours before the review. The report covers:

Input 1: Decision Ledger Analysis

Source: Decision ledger entries from the past 30 days (see DECISION-LEDGER-SPEC.md).

Metrics surfaced:

Metric	What It Tells Us	Action Threshold
Total decisions by tier	Agent activity distribution	If Tier 1 drops below 60% of total: agents may be too cautious. If Tier 1 exceeds 95%: agents may be too autonomous.
Escalation volume	How often agents need help	>20 escalations/month for a single domain: governance gap or agent miscalibration
Escalation category distribution	What types of novel situations arise	If “novel situation” is >50% of escalations: governance is lagging behind operations
Human override rate	How often humans change agent recommendations	>25% override rate for a decision type: authority tier likely wrong
Average human response time by tier	Whether humans are bottlenecked	Tier 3 response >12h average: too many Tier 3 decisions, or the human is overloaded
Boundary proximity events	Where agents are operating near limits	>3 near-misses per boundary per month: boundary needs clarification or workflow redesign
Tier 2 success rate	Whether autonomous decisions are working	<80% success rate: tighten to Tier 3, or improve agent capability

Format: Table with trend arrows (improving/stable/degrading vs. prior month).

Input 2: Gate Effectiveness Data

Source: Quality gate pass/fail rates across all gates.

Metrics surfaced:

Gate	What We Measure	Action Signal
Editorial review (Kate’s three rigor tests)	Pass rate on first submission, revision count, types of failures	If pass rate >90%: agents/writers are learning, consider lightening the gate for proven patterns. If <60%: quality problem upstream.
AI tells detection (Katie’s system)	Detection rate, false positive rate, types of AI patterns caught	High false positive rate (>20%): recalibrate detection. Specific AI tells appearing repeatedly: feed back to Spiral/writing agents as avoidance rules.
14-agent compound review	Finding severity distribution, auto-fix success rate, contradictory findings rate	If >30% of findings are trivial: review agents are too sensitive. If auto-fix success <85%: auto-fix scope may be too broad.
Consulting deliverable review (Natalia)	Revision count, types of issues caught, client feedback scores	Low revision count + high NPS: Claudie is improving. High revision count: check if engagement scope is clear enough.
Code merge gate (GM approval)	Time from review-pass to merge, GM override rate on review findings	If GMs consistently override specific review agents: that agent may need recalibration.

Format: Gate-by-gate summary with effectiveness rating (high/medium/low) and specific recommendations.

Input 3: Holdout Scenario Results

What are holdout scenarios?

Periodically, the evolution-auditor creates hypothetical test cases where it asks: “If an agent encountered this situation, what would current governance produce?” These are paper exercises, not live tests. They stress-test governance against realistic but unusual situations that have not yet occurred.

Example holdout scenarios for Every:

A Plus One agent in a client Slack receives a message from a journalist asking about Every’s AI practices. Does governance correctly route this to Dan (crisis/external stakeholder)? Does the agent know not to respond substantively?
Claudie is generating a status report and realizes that the best way to explain a client’s progress is to reference a framework Every developed for a different client. Does governance correctly identify this as a boundary proximity event (HB-4)? Does the agent know to anonymize or escalate?
A compound engineering work agent’s auto-fix changes code in a shared library used by all four products. Does governance correctly escalate this beyond the single product GM to all affected GMs?
Kate Lee is on vacation and an article is ready for publication. The AI tells detection passes. Eleanor is available. Does the escalation chain correctly route to Eleanor as secondary? Is the “never auto-publish” boundary maintained?
Danny (Spiral GM) asks R2-C2 (Dan’s agent) to help debug a Spiral issue. Does governance correctly prevent R2-C2 from modifying Spiral’s CLAUDE.md while allowing it to read and suggest?

Format: Scenario description, expected governance response, actual governance response (based on current docs), gap analysis.

Input 4: Escalation Patterns

Source: Escalation entries from the decision ledger, cross-referenced with resolution data.

Analysis dimensions:

Dimension	What It Reveals
Escalation volume by agent	Which agents are least confident in their authority
Escalation volume by domain	Which domains have the most governance gaps
Escalation resolution consistency	Whether humans are making consistent decisions (enabling policy encoding)
Escalation-to-policy conversion rate	Whether the POLICY-GENERATION process is working (candidate policies being created from patterns)
Escalation response time trends	Whether specific humans are becoming bottlenecks
Escalation quality (per format spec)	Whether agents are providing good escalation context

Format: Pattern summary with specific callouts for the 3 most important trends.

The Review Process

Phase 1: Pattern Identification (20 minutes)

Dan and Brandon review the evolution-auditor report. They identify:

Signals that governance is too tight: High escalation volume, low Tier 1 ratio, humans making the same decision repeatedly (should be encoded), response time bottlenecks.
Signals that governance is too loose: Tier 2 success rate declining, boundary near-misses increasing, human override rate increasing, gate effectiveness dropping.
Signals of governance gaps: Novel-situation escalations clustering in a specific domain, holdout scenarios revealing unhandled cases.

Phase 2: Threshold Adjustment (20 minutes)

Based on patterns identified, Dan and Brandon decide on adjustments:

Adjustment Type	Example	Process
Tier upgrade (loosen)	Move “podcast show notes publication” from Tier 3 to Tier 2 because Kate has approved every one for 3 months	Update AUTHORITY-MATRIX.md. Notify Kate for confirmation.
Tier downgrade (tighten)	Move “auto-fix on shared libraries” from Tier 2 to Tier 3 after a shared-library auto-fix caused a regression	Update AUTHORITY-MATRIX.md. Notify all product GMs.
Boundary clarification	Clarify HB-4 to explicitly address anonymized cross-client pattern references	Update HARD-BOUNDARIES.md. Notify Natalia.
Escalation path update	Add podcast production to the escalation target table now that Rachel’s agents are more active	Update ESCALATION-PROTOCOLS.md.
New policy adoption	Approve 2 of 4 pending candidate policies from POLICY-GENERATION pipeline	Integrate into relevant docs. Announce at all-hands.
Gate recalibration	Adjust AI tells detection sensitivity after high false positive rate	Coordinate with Katie Parrott for implementation.

Phase 3: Boundary Evolution (10 minutes)

Hard boundaries are reviewed with extreme caution. The default is “boundaries do not change.” Boundary changes require:

Strong evidence from ledger data (not anecdote).
Explicit confirmation from the stakeholder the boundary protects.
A clear articulation of what risk is being accepted.

Boundaries can be:

Clarified (common): adding specificity to an existing boundary without changing its scope.
Narrowed (rare): making a boundary more restrictive. Requires evidence of near-misses or actual violations.
Broadened (very rare): relaxing a boundary. Requires 90+ days of evidence that the boundary is unnecessarily restrictive AND unanimous stakeholder approval.
Removed (essentially never): only if the boundary addresses a risk that no longer exists.

Phase 4: Authority Tier Evolution (10 minutes)

Authority tiers evolve based on agent maturity and operational evidence.

Graduation criteria (moving a decision type to a lower tier):

90+ day track record at current tier with >90% success rate.
No boundary proximity events related to the decision type.
Human approver confirms comfort with the graduation.
The agent has demonstrated the judgment, not just the execution.

Regression criteria (moving a decision type to a higher tier):

Success rate at current tier drops below 80%.
A single high-severity failure (even if success rate is still high).
Human override rate exceeds 25% for the decision type.
A new risk is identified that was not considered in the original tier assignment.

Evolution Audit Trail

Every change to governance documents is recorded in the decision ledger with a special governance-update decision type:

entry_id: "governance-update-2026-04-01-001"
timestamp: "2026-04-01T18:00:00Z"
agent: "evolution-auditor"
agent_type: "governance"
domain: "cross-domain"
decision_type: "governance-update"
tier: 4
action_taken: "Updated AUTHORITY-MATRIX.md: moved podcast-show-notes-publication from Tier 3 to Tier 2"
rationale: "90-day track record: 12 consecutive approvals by Kate with no modifications. Kate confirmed comfort with Tier 2."
outcome: "success"
escalated: false
human_decision: "Approved by Dan + Brandon in monthly review. Kate confirmed."
notes: "Eleanor Warnock added as notification target for Tier 2 podcast show notes."

This creates a complete history of why governance changed, when, and who approved it.

Quarterly Deep Review

Every third monthly review is expanded into a quarterly deep review (90 minutes instead of 60).

Additional quarterly agenda items:

Full governance document re-read. Dan and Brandon re-read all 6 governance documents end-to-end. Are they still coherent? Are there contradictions introduced by incremental updates?
Agent ecosystem census. How many agents are operating? Has the ecosystem grown? Are there new agent types not covered by governance? (e.g., if a new product launches with new agents.)
Cultural alignment check. Is governance consistent with Every’s values as they are practiced (not just as documented)? Has the culture shifted in ways that make governance rules feel misaligned? Brandon’s “be sincere, not serious” lens is particularly useful here.
External landscape check. Have industry norms, regulations, or notable incidents at other companies changed the risk landscape? (e.g., new AI regulations, a competitor’s AI agent causing harm.)
Governance complexity audit. Count the total number of rules, policies, and boundaries. If the count has grown more than 20% quarter-over-quarter, trigger a simplification exercise. Governance must stay lean enough for agents to hold in context and humans to remember.

Anti-Patterns in Governance Evolution

Knee-Jerk Tightening

After a single incident, the temptation is to add restrictions. Resist. One incident is an anecdote; three is a pattern. Log the incident, watch for recurrence, then act.

Exception: If the incident involves a hard boundary violation or a customer-facing trust breach, immediate tightening is warranted.

Governance Drift

Small incremental updates can cause governance documents to become internally inconsistent. The quarterly deep review exists to catch this. The evolution-auditor should also run consistency checks: does the authority matrix reference boundaries that still exist? Do escalation targets match current team members?

Optimization Theater

Adjusting thresholds by 5% each month without meaningful operational impact. If a change would not alter agent behavior in practice, do not make it. Governance changes should be noticeable.

Under-Evolution

The opposite of knee-jerk tightening. If governance has not been updated in 2+ months despite active agent operations, something is wrong – either the agents are not logging, the evolution-auditor is not running, or Dan and Brandon are skipping reviews.

Forcing function: If no governance updates occur in 60 days, the evolution-auditor automatically escalates a “governance staleness” alert to Dan and Brandon.

Relationship to Other Governance Documents

Document	Learning Loop Relationship
AUTHORITY-MATRIX.md	Tier assignments evolve through this loop based on success rates and human override patterns
HARD-BOUNDARIES.md	Boundaries are clarified or (rarely) evolved through this loop based on near-miss data
ESCALATION-PROTOCOLS.md	Escalation targets and formats evolve through this loop based on response time and quality data
POLICY-GENERATION.md	New policies feed into this loop as governance growth; this loop validates that the policy pipeline is working
DECISION-LEDGER-SPEC.md	The ledger is the primary data source for this loop; this loop validates ledger completeness and integrity

Reviewed by: Dan Shipper, Brandon Gell Next review: 2026-05-01 (first monthly governance review cycle) Governance version: 1.0