--- summary: "Reference: provider-specific transcript sanitization and repair rules" read_when: - You are debugging provider request rejections tied to transcript shape - You are changing transcript sanitization or tool-call repair logic - You are investigating tool-call id mismatches across providers title: "Transcript Hygiene" --- # Transcript Hygiene (Provider Fixups) This document describes **provider-specific fixes** applied to transcripts before a run (building model context). These are **in-memory** adjustments used to satisfy strict provider requirements. These hygiene steps do **not** rewrite the stored JSONL transcript on disk; however, a separate session-file repair pass may rewrite malformed JSONL files by dropping invalid lines before the session is loaded. When a repair occurs, the original file is backed up alongside the session file. Scope includes: - Tool call id sanitization - Tool call input validation - Tool result pairing repair - Turn validation / ordering - Thought signature cleanup - Image payload sanitization If you need transcript storage details, see: - [/reference/session-management-compaction](/reference/session-management-compaction) --- ## Where this runs All transcript hygiene is centralized in the embedded runner: - Policy selection: `src/agents/transcript-policy.ts` - Sanitization/repair application: `sanitizeSessionHistory` in `src/agents/pi-embedded-runner/google.ts` The policy uses `provider`, `modelApi`, and `modelId` to decide what to apply. Separate from transcript hygiene, session files are repaired (if needed) before load: - `repairSessionFileIfNeeded` in `src/agents/session-file-repair.ts` - Called from `run/attempt.ts` and `compact.ts` (embedded runner) --- ## Global rule: image sanitization Image payloads are always sanitized to prevent provider-side rejection due to size limits (downscale/recompress oversized base64 images). Implementation: - `sanitizeSessionMessagesImages` in `src/agents/pi-embedded-helpers/images.ts` - `sanitizeContentBlocksImages` in `src/agents/tool-images.ts` --- ## Global rule: malformed tool calls Assistant tool-call blocks that are missing both `input` and `arguments` are dropped before model context is built. This prevents provider rejections from partially persisted tool calls (for example, after a rate limit failure). Implementation: - `sanitizeToolCallInputs` in `src/agents/session-transcript-repair.ts` - Applied in `sanitizeSessionHistory` in `src/agents/pi-embedded-runner/google.ts` --- ## Provider matrix (current behavior) **OpenAI / OpenAI Codex** - Image sanitization only. - On model switch into OpenAI Responses/Codex, drop orphaned reasoning signatures (standalone reasoning items without a following content block). - No tool call id sanitization. - No tool result pairing repair. - No turn validation or reordering. - No synthetic tool results. - No thought signature stripping. **Google (Generative AI / Gemini CLI / Antigravity)** - Tool call id sanitization: strict alphanumeric. - Tool result pairing repair and synthetic tool results. - Turn validation (Gemini-style turn alternation). - Google turn ordering fixup (prepend a tiny user bootstrap if history starts with assistant). - Antigravity Claude: normalize thinking signatures; drop unsigned thinking blocks. **Anthropic / Minimax (Anthropic-compatible)** - Tool result pairing repair and synthetic tool results. - Turn validation (merge consecutive user turns to satisfy strict alternation). **Mistral (including model-id based detection)** - Tool call id sanitization: strict9 (alphanumeric length 9). **OpenRouter Gemini** - Thought signature cleanup: strip non-base64 `thought_signature` values (keep base64). **Everything else** - Image sanitization only. --- ## Historical behavior (pre-2026.1.22) Before the 2026.1.22 release, OpenClaw applied multiple layers of transcript hygiene: - A **transcript-sanitize extension** ran on every context build and could: - Repair tool use/result pairing. - Sanitize tool call ids (including a non-strict mode that preserved `_`/`-`). - The runner also performed provider-specific sanitization, which duplicated work. - Additional mutations occurred outside the provider policy, including: - Stripping `` tags from assistant text before persistence. - Dropping empty assistant error turns. - Trimming assistant content after tool calls. This complexity caused cross-provider regressions (notably `openai-responses` `call_id|fc_id` pairing). The 2026.1.22 cleanup removed the extension, centralized logic in the runner, and made OpenAI **no-touch** beyond image sanitization.