The three-stage JSON repair and validation pipeline — jsonrepair, Zod schema enforcement, and self-correcting retry loop.

JSON Validation

Why LLMs produce broken JSON

LLMs are probabilistic text generators. Even when instructed to output JSON, they occasionally:

  • Wrap the JSON in markdown code fences (```json ... ```)
  • Add explanatory text before or after the JSON
  • Include trailing commas in objects or arrays
  • Omit closing braces or brackets
  • Use single quotes instead of double quotes
  • Truncate the response mid-object

Without a repair layer, any of these failures crashes your pipeline.

The three-stage pipeline

Stage 1 — Structural repair

Before validation, the raw LLM output passes through a repair pipeline:

  1. Strip markdown fences — removes ```json, ```, and similar wrappers
  2. Strip leading text — removes any text before the first { or [
  3. Strip trailing text — removes any text after the last } or ]
  4. jsonrepair — fixes trailing commas, missing brackets, single quotes, unescaped characters, and 100+ other common patterns

Stage 2 — Zod validation

The repaired string is parsed and validated against the JSON Resume v1 Zod schema. If it passes, the result is returned immediately.

Stage 3 — Self-correcting retry

If validation fails, the specific Zod error is formatted into a correction prompt and sent back to the LLM:

CORRECTION REQUIRED (Attempt 1):
Validation failed:
- "work.0.startDate" : Expected string, received number
- "basics.email" : Invalid email

Return ONLY the corrected JSON. Stop immediately after the closing brace.

The LLM is asked to fix exactly the fields that failed. This loop runs up to maxRetries times (default: 3).

Using the Zod schema directly

import { JsonResumeSchema } from '@edwinfom/resume-intel'
 
// Validate your own data
const result = JsonResumeSchema.safeParse(myData)
 
if (!result.success) {
  console.error(result.error.issues)
}
 
// Parse with throwing
const data = JsonResumeSchema.parse(myData)

Deduplication

After extraction, array fields are deduplicated by composite key to remove entries that appear multiple times (common in multi-page scanned CVs where headers repeat):

Field Deduplication key
work name + position + startDate
education institution + studyType + startDate
skills name
languages language (case-insensitive)
projects name
certificates name + issuer
awards title + awarder

Reliability

In production testing across text-native and scanned CVs:

  • Stage 1 alone resolves ~70% of malformed outputs
  • Stage 1 + 2 resolves ~85%
  • Stage 1 + 2 + 3 (1 retry) resolves ~97%
  • Stage 1 + 2 + 3 (3 retries) resolves ~99%+