Parseur
Intelligent document parsing platform that extracts, validates, and auto-corrects structured data from PDFs and images using an LLM-only approach.
Next.jsTypeScriptOpenAIClaudePrismaPostgreSQL
Role
Creator & Developer
Company
Personal
Year
2025
The Challenge
Extracting structured data from unstructured documents (PDFs, images) is notoriously hard. Traditional OCR pipelines are brittle and require manual rules per document type. I wanted a system that could understand any document layout using LLMs, with auto-correction and validation.
The Solution
- Multi-LLM pipeline combining OpenAI GPT-4o and Anthropic Claude 3.5 Sonnet
- Intelligent extraction with auto-correction and Zod schema validation
- S3-compatible storage with MinIO for local development
- Background job processing with Inngest for async document pipelines
- Full-stack Next.js 16 app with Prisma ORM and PostgreSQL
Results
2
LLM providers
Auto
Correction
Live
parseur.vercel.app
Learnings
Multi-LLM orchestration taught me that no single model excels at everything. GPT-4o handles visual layout understanding better, Claude excels at structured reasoning. The key is composing them — not choosing between them.
View on GitHub →