Society-of-Thought Review¶

Society-of-thought is a review mode that uses specialist personas — each with a unique cognitive strategy and behavioral rules — to review code from genuinely different perspectives. Instead of running the same review prompt through multiple models, a panel of specialists independently analyzes the code, then a synthesis step merges their findings into a single prioritized report.

The society-of-thought engine is implemented as the paw-sot utility skill, which paw-planning-docs-review (planning workflow), paw-final-review (implementation workflow), and paw-review-workflow (review workflow) can delegate to. This shared engine handles specialist discovery, selection, execution, and synthesis — while each calling workflow handles its own configuration source and post-synthesis flow.

Three Integration Points¶

Workflow	Calling Skill	Config Source	Post-Synthesis Flow
Planning	`paw-planning-docs-review`	WorkflowContext.md	Apply-to-spec/apply-to-plan routing
Implementation	`paw-final-review`	WorkflowContext.md	Apply/skip/discuss resolution
Review	`paw-review-workflow`	ReviewContext.md	Feedback → critic → GitHub comment pipeline

In the planning workflow, SoT is one of three Planning Review modes (alongside single-model and multi-model). Specialists review design documents (Spec.md, ImplementationPlan.md, CodeResearch.md) using the artifacts review type, which frames analysis around design decisions and feasibility rather than code. Findings route to paw-spec or paw-planning based on affected artifact.

In the implementation workflow, SoT is one of three Final Review modes (alongside single-model and multi-model). After synthesis, findings go through an interactive resolution phase where changes are applied directly.

In the review workflow, SoT replaces the Evaluation Stage's impact analysis and gap identification with unified specialist evaluation. Findings from REVIEW-SYNTHESIS.md flow into the existing output pipeline for comment generation and GitHub posting.

Why Perspective Diversity Matters¶

Traditional code review (whether human or AI) tends to look at code through one lens. A single reviewer might catch logic bugs but miss security implications. Multi-model review helps by adding model-architecture diversity, but every model still answers the same questions.

Society-of-thought adds perspective diversity: a security specialist traces data flows through trust boundaries, a performance specialist estimates computational costs, and an assumptions specialist questions whether the code should exist at all. Research shows that inspectors using different perspectives find non-overlapping defects — each perspective surfaces issues the others miss.

Perspective Overlays¶

Perspective diversity goes even further with perspective overlays — evaluative lenses that shift when and under what conditions a specialist reviews without changing who they are. A security specialist running under a premortem perspective ("it's 6 months after launch and the system was breached") will surface different risks than the same specialist reviewing under standard present-tense framing.

PAW ships with three built-in perspectives:

Perspective	Lens Type	Focus
Premortem	Temporal	6-month post-launch failure analysis — "what went wrong?"
Retrospective	Temporal	6-month operational review — "what's painful to maintain?"
Red Team	Adversarial	Exploitation analysis — "how would an attacker abuse this?"

Perspectives are configured at three tiers:

Auto (default for final review): The engine analyzes your diff and selects relevant perspectives — temporal for operational changes, adversarial for security-sensitive code
Guided: Specify perspectives by name (e.g., premortem, red-team)
None (default for PR review): No perspective overlays — standard present-tense review only

The perspective_cap setting (default: 2) limits how many perspectives each specialist runs, controlling cost. With 3 specialists and 2 perspectives each, you get 6 specialist-perspective runs instead of 3.

Each finding in the review output includes a **Perspective** attribution showing which lens surfaced it, and the REVIEW-SYNTHESIS.md includes a Perspective Diversity section summarizing which perspectives were applied and why.

Configuration¶

Planning Workflow (Planning Docs Review)¶

Society-of-thought for planning review is configured during workflow initialization (paw-init). The key fields in WorkflowContext.md:

Field	Values	Default
Planning Review Mode	`society-of-thought`	`multi-model`
Planning Review Specialists	`all`, comma-separated names, or `adaptive:<N>`	`all`
Planning Review Interaction Mode	`parallel` or `debate`	`parallel`
Planning Review Interactive	`true`, `false`, or `smart`	`smart`
Planning Review Specialist Models	`none`, model pool, pinned pairs, or mixed	`none`
Planning Review Perspectives	`none`, `auto`, or comma-separated names	`auto`
Planning Review Perspective Cap	positive integer	`2`

When Planning Review Mode is society-of-thought, paw-planning-docs-review invokes paw-sot with type: artifacts (not diff), framing specialists for design and planning document analysis. Findings route to paw-spec or paw-planning based on affected artifact. When planning_review_models is set, it is ignored in SoT mode — use planning_review_specialist_models for model diversity.

Implementation Workflow (Final Review)¶

Society-of-thought is configured during workflow initialization (paw-init). The key fields in WorkflowContext.md:

Field	Values	Default
Final Review Mode	`society-of-thought`	`multi-model`
Final Review Specialists	`all`, comma-separated names, or `adaptive:<N>`	`all`
Final Review Interaction Mode	`parallel` or `debate`	`parallel`
Final Review Interactive	`true`, `false`, or `smart`	`smart`
Final Review Specialist Models	`none`, model pool, pinned pairs, or mixed	`none`
Final Review Perspectives	`none`, `auto`, or comma-separated names	`auto`
Final Review Perspective Cap	positive integer	`2`

Review Workflow (PR Review)¶

Society-of-thought for PR review is configured in ReviewContext.md (populated from invocation parameters):

Field	Values	Default
Review Mode	`society-of-thought`	`single-model`
Review Specialists	`all`, comma-separated names, or `adaptive:<N>`	`all`
Review Interaction Mode	`parallel` or `debate`	`parallel`
Review Interactive	`true`, `false`, or `smart`	`false`
Review Specialist Models	`none`, model pool, pinned pairs, or mixed	`none`
Review Perspectives	`none`, `auto`, or comma-separated names	`none`
Review Perspective Cap	positive integer	`2`

When Review Mode is society-of-thought, the Evaluation Stage invokes paw-sot instead of running paw-review-impact and paw-review-gap separately. Findings from REVIEW-SYNTHESIS.md are then mapped into the output pipeline with severity mapping: must-fix → Must, should-fix → Should, consider → Could.

CLI Only

Society-of-thought is CLI-only for v1. In VS Code, configuring society-of-thought falls back to multi-model mode with a notification.

Built-in Specialists¶

PAW ships with 9 built-in specialists, each using a distinct cognitive strategy:

Specialist	Cognitive Strategy	Focus
Security	Threat modeling via attack-tree decomposition	Trust boundaries, data flows, blast radius
Performance	Quantitative estimation and bottleneck analysis	Computational costs, scaling behavior
Assumptions	Socratic questioning	Hidden assumptions, unstated requirements
Edge Cases	Boundary enumeration	Off-by-one, empty inputs, race conditions
Maintainability	Narrative walkthrough	Future engineer experience, cognitive load
Architecture	Pattern recognition	System-level fit, coupling, extensibility
Testing	Coverage gap analysis	Untested paths, assertion quality
Correctness	Specification-implementation correspondence	Logic errors, wrong operators, default paths
Release Manager	Release-impact analysis / deployment-path tracing	CI/CD changes, packaging, migration safety, rollback

Every specialist includes anti-sycophancy rules — structural constraints that require each specialist to either identify a substantive concern or explain what they analyzed and why they found no issues. This prevents the "looks good to me" problem common in AI reviews.

Interaction Modes¶

Parallel Mode (Default)¶

All specialists review the diff independently and in parallel, then a synthesis agent merges their findings into REVIEW-SYNTHESIS.md. This is the fastest option and works well for most reviews.

Specialists (parallel) → Synthesis → REVIEW-SYNTHESIS.md

Debate Mode¶

Specialists run in sequential rounds. After each round, a synthesis agent summarizes findings and poses targeted questions back to specific specialists. The debate terminates when no new substantive findings emerge (or after 3 rounds).

Round 1 (parallel) → Synthesis → Round 2 (targeted) → Synthesis → ...

Debate mode uses hub-and-spoke mediation: specialists see only the synthesis summary between rounds, never each other's raw findings. This preserves perspective independence while allowing productive disagreement.

When to Use Debate

Debate mode costs more tokens but produces more thorough reviews. Use it for critical changes, security-sensitive code, or architectural decisions where you want specialists to challenge each other's findings.

Token Cost Estimates

Debate mode token usage scales with specialist count and rounds. With 9 specialists and up to 3 rounds plus per-thread continuation (30-call budget), a full debate can consume 50+ subagent calls. For large diffs, consider using adaptive:<N> with a smaller N to control cost, or use parallel mode for routine reviews.

Specialist Selection Modes¶

All (Default)¶

All discovered specialists participate. With only the built-in roster, this means 9 specialists.

Fixed List¶

Specify exactly which specialists to use:

Final Review Specialists: security, performance, testing

Adaptive Selection¶

Let the agent analyze your diff and select the most relevant specialists:

Final Review Specialists: adaptive:3

The agent examines the change content, matches it against specialist domains, and selects up to N specialists. The selection rationale is documented in REVIEW-SYNTHESIS.md.

Custom Specialists¶

You can create custom specialists at three levels, with most-specific-wins precedence:

Level	Location	Scope
Project	`.paw/personas/<name>.md`	Shared with team via Git
User	`~/.paw/personas/<name>.md`	Personal, all projects
Built-in	Bundled with PAW	Default roster

A project-level specialist with the same filename as a built-in specialist overrides the built-in version.

Trust Model

Custom specialist files are loaded as agent instructions with full tool access. Only use persona files from trusted sources. Project-level specialists (.paw/personas/) are committed to the repository and should be reviewed like any other code contribution.

Creating a Custom Specialist¶

A specialist file is a markdown document that defines a persona. The filename (without .md) becomes the specialist's name. Here's a scaffold:

---
context: implementation
shared_rules_included: false
---

# Compliance Specialist

## Identity & Narrative Backstory

You are an expert in regulatory compliance with deep experience
in [your domain]. Describe formative experiences that shape how
you approach code review — incidents, lessons learned, and the
instincts they developed.

The backstory should be written in second person ("You are...")
and establish genuine expertise through specific, realistic
scenarios rather than generic credentials.

## Cognitive Strategy

**[Name your strategy and describe it.]**

Describe the structured analytical process this specialist
follows when examining a diff. This should be genuinely
different from other specialists — not just a different topic,
but a different WAY of analyzing code.

Example strategies:
- Threat modeling (security)
- Quantitative estimation (performance)
- Socratic questioning (assumptions)
- Boundary enumeration (edge cases)

## Behavioral Rules

- Specific rules that guide this specialist's review behavior
- Each rule should be actionable and distinct
- Include what the specialist looks for and what it refuses
  to accept

## Shared Rules

See `_shared-rules.md` for Anti-Sycophancy Rules and
Confidence Scoring.

## Demand Rationale

Before evaluating code, describe what context this specialist
needs to see in order to do its job. What should it flag if
that context is missing?

## Shared Output Format

See `_shared-rules.md` for Required Output Format (Toulmin
structure). Use `**Category**: <name>` where `<name>` is this
specialist's category.

## Example Review Comments

Include 2-3 example findings that demonstrate this specialist's
cognitive strategy in action. Use the Toulmin format from the
shared rules (Finding → Grounds → Warrant → Rebuttal Conditions
→ Suggested Verification).

Tips for Effective Specialists¶

Distinct cognitive strategies matter more than distinct topics. A "database specialist" that reviews code the same way as a general reviewer adds little. A specialist that estimates query plans and calculates I/O costs adds real value.
Narrative backstories improve consistency. Research shows that detailed persona narratives help models maintain character better than bullet-point role descriptions.
Anti-sycophancy is structural, not stylistic. The shared rules enforce that every specialist must produce substantive analysis. Don't override these in custom specialists.
Use shared_rules_included frontmatter. Set to true only if your custom specialist includes its own anti-sycophancy rules, confidence scoring, and Toulmin output format. When false (default), shared rules are automatically injected.
Include example findings. 2-3 examples in the Toulmin format (Grounds → Warrant → Rebuttal → Verification) anchor the specialist's behavior more effectively than additional instructions.

Context Filtering¶

Specialists can declare a domain context in their frontmatter, allowing callers to filter to only relevant specialists for their review type. This enables domain-specific reviews where implementation specialists (security, performance, etc.) don't participate when reviewing non-code content like business plans or documentation.

Declaring context in a specialist:

---
context: business
---

Each specialist declares a single context value — comma-separated or array values are not supported.

Filtering by context: Callers specify context: <domain> in the review context. Only specialists with matching context (case-insensitive) participate.

Default behaviors:

Specialists without a context field default to implementation
For diff and artifacts review types without explicit context, filtering defaults to implementation
For freeform review type without explicit context, no filtering occurs (all specialists participate)

Zero-match handling: If no specialists match the requested context:

In interactive or smart mode, SoT warns you and asks how to proceed
In non-interactive mode, SoT warns and falls back to all specialists

This means typos in context values are recoverable — you'll always get a clear warning rather than a silent failure.

All 9 built-in specialists have context: implementation. To create specialists for other domains (compliance, business analysis, etc.), add the appropriate context field to your custom specialist's frontmatter.

Tip

For freeform reviews of non-code content, always specify context to ensure only relevant specialists participate. Without an explicit context, all specialists are included regardless of domain.

Custom Perspectives¶

You can create custom perspective overlays at three levels, with the same most-specific-wins precedence as specialists:

Level	Location	Scope
Project	`.paw/perspectives/<name>.md`	Shared with team via Git
User	`~/.paw/perspectives/<name>.md`	Personal, all projects
Built-in	Bundled with PAW	Default roster (premortem, retrospective, red-team)

A project-level perspective with the same filename as a built-in perspective overrides the built-in version.

Creating a Custom Perspective¶

A perspective file defines an evaluative lens. The filename (without .md) becomes the perspective's name. Required sections:

# Compliance Perspective

## Lens Type
custom

## Parameters
- **Temporal Frame**: N/A
- **Scenario**: Evaluating code against regulatory requirements

## Overlay Template
You are reviewing this code for regulatory compliance violations. As the
{specialist}, identify requirements gaps, audit trail deficiencies, and
data handling violations specific to your domain. Focus on what a
compliance auditor would flag during certification review.

## Novelty Constraint
Each compliance concern must reference a specific code pattern, data flow,
or missing control visible in the artifact. Do not raise generic
compliance concerns that apply to any system.

Key requirements:

The {specialist} placeholder in the Overlay Template is resolved at runtime with the specialist's name
Overlay Templates should be 50–100 words — long enough to shift the evaluative frame, short enough not to compete with the specialist's cognitive strategy
The Novelty Constraint should require evidence-anchoring to prevent unconstrained speculation

The Synthesis Agent¶

The synthesis agent operates as a PR triage lead — a functional role with structural constraints. It can merge, deduplicate, classify conflicts, and flag trade-offs, but it cannot generate new findings. This prevents the synthesis step from introducing hallucinated issues that no specialist actually raised.

Key synthesis behaviors:

Confidence-weighted aggregation — Findings supported by multiple specialists with high confidence rank higher
Grounding validation — Findings referencing code not in the diff are flagged as ungrounded and demoted
Evidence-based adjudication — When specialists disagree, the synthesis examines reasoning traces rather than counting votes

The output is REVIEW-SYNTHESIS.md in .paw/work/<work-id>/reviews/.

Interactive Moderator Mode¶

When Final Review Interactive is true or smart, you can interact with specialists after the review completes:

Summon a specialist by name for follow-up on a specific area
Challenge a finding — the specialist must respond with independent evidence
Request deeper analysis on a particular file or function

With smart mode, interactive sessions activate only when significant findings (must-fix or should-fix) are present. If the review produces only consider items that don't qualify as quick wins, it completes without interruption — though quick-win consider items are still auto-applied.

Model Assignment¶

By default, all specialists use the session's default model. You can add model-architecture diversity in two ways: workflow-level configuration and per-specialist frontmatter.

Workflow-Level Configuration¶

Set Final Review Specialist Models in WorkflowContext.md (configured during paw-init):

Model pool — distribute models round-robin across specialists:

Final Review Specialist Models: gpt-5.3-codex, claude-opus-4.6, gemini-3-pro-preview

Explicit pinning — assign specific models to specific specialists:

Final Review Specialist Models: security:claude-opus-4.6, architecture:gpt-5.3-codex

Mixed — pin some specialists, pool the rest:

Final Review Specialist Models: security:claude-opus-4.6, gpt-5.3-codex, gemini-3-pro-preview

In mixed mode, pinned specialists get their assigned model. Unpinned specialists are sorted alphabetically and assigned models from the pool list round-robin.

Per-Specialist Frontmatter¶

Custom specialists (in .paw/personas/ or ~/.paw/personas/) can specify a model in their YAML frontmatter:

---
model: claude-opus-4.6
---

Resolution Precedence¶

When multiple sources specify a model, the most specific wins:

Specialist frontmatter model: field (highest priority)
WorkflowContext pinning (specialist:model pair)
WorkflowContext pool (round-robin distribution)
Session default (fallback)

Next Steps¶

Stage Transitions — How review policies affect the workflow
Workflow Modes — Configure Full, Minimal, or Custom modes
Artifacts Reference — All PAW artifacts including REVIEW-SYNTHESIS.md