Platform — DocuLexis AI

Core Capabilities

A multi-modal AI engine that sees, reads, understands, and reasons

Not rules. Not templates. An AI agent that processes any document like a human expert — at machine scale.

Visual Layout Detection

Identifies headers, footers, tables, stamps, signatures, and handwriting zones — regardless of format or layout complexity.

Multi-Lingual Intelligence

Detects and processes 100+ languages in a single document, including mixed-script content like Arabic headers with English body text.

Reading Order Engine

Determines correct reading sequence across multi-column, nested, and non-linear layouts — no templates needed.

Semantic Extraction

Understands context — knows a "Date" next to a signature differs from a "Date of Birth" in a form. Extracts meaning, not just text.

Agentic Reasoning

Self-corrects, cross-references fields, validates data integrity, and flags anomalies autonomously across pages and documents.

Format Agnostic

PDFs, scans, faxes, photos, handwritten notes, screenshots — any input becomes structured, validated output.

Under the Hood

Why we built a pipeline, not a prompt

Recent research shows that throwing more reasoning tokens at document parsing doesn't improve accuracy — it actively degrades it. Models that "think harder" hallucinate table cells, split continuous tables into fragments, and fill in blanks with guesses. The problem isn't reasoning. It's architecture.

The single-model trap

What happens when one model does everything

Table hallucination. The model fills in blank cells with inferred values. A deliberate shorthand becomes fabricated data.

Structural splitting. One continuous table becomes three. The model sees a header row and decides it must be a boundary.

Vision encoder loss. Small, dense, or vertical text vanishes during image encoding — before reasoning even starts. No amount of thinking recovers lost pixels.

Character mutation. More reasoning tokens cause the model to second-guess correct characters — turning "CNY" into "CYN".

The DocuLexis approach

Specialized components, orchestrated intelligently

Layout first, content second. Table boundaries are established by dedicated layout analysis before any text extraction begins. No guessing.

Native-resolution OCR. Text is captured at full pixel fidelity by a dedicated OCR pass — not by a vision encoder that compresses everything into token embeddings.

Reasoning with intention. Agentic AI validates structure against raw OCR data. Reasoning is reserved for complex decisions — not pixel-level speculation.

Deterministic verification. Pre-extracted text anchors every output. If the OCR read "4-CNY", the output says "4-CNY" — no second-guessing.

Four-pass architecture: each component plays to its strengths

Pass 1

Layout Detection

Vision models map document zones — tables, charts, text blocks, signatures, handwriting — establishing structural boundaries before reading a single character.

Pass 2

Native OCR

Dedicated OCR engines read text at full pixel resolution within each zone. Small text, vertical orientation, and dense tables are captured without compression loss.

Pass 3

LLM Structuring

Language models organize pre-extracted text into structured fields, tables, and hierarchies. The LLM structures what's already been read — it doesn't transcribe.

Pass 4

Agentic Validation

Self-correcting agents cross-reference structured output against raw OCR data. Anomalies are flagged, tables are verified, and confidence scores are assigned.

Why pipeline beats prompt — the numbers

Single model, no reasoning

~79%

~48s

$0.029

Single model, max reasoning

~79%

~242s

$0.246

Pipeline + agentic validation
97%+
<10s
$0.013

Benchmark methodology informed by OmniDocBench evaluation framework. Quality measured across field accuracy, table structure, and reading order fidelity.

The Complete Pipeline

Eight stages from raw document to decision

Recent breakthroughs in specialized OCR models have pushed recognition accuracy past 94% on standard benchmarks. But recognition is just one stage. DocuLexis orchestrates the full pipeline — from ingestion and layout analysis through validation, normalization, and delivery — ensuring enterprise-grade reliability that no single model can provide alone.

Receive

Ingest from API, email, SFTP, cloud storage, or drag-and-drop.

Split & Classify

Auto-split bundles, classify by type, language, and urgency.

Detect Zones

Map layout regions — tables, charts, handwriting, stamps, signatures.

Extract

Pull structured fields, line items, entities, and relationships.

Enrich

Tag metadata, categorize transactions, normalize currencies and dates.

Validate

Cross-check fields across pages and documents. Auto-flag anomalies.

Review

Human-in-the-loop for edge cases. Confidence-based routing and audit trails.

Deliver

Push clean JSON/XML to your ERP, CRM, LOS via API, webhook, or connector.

Ingestion Compatibility

Bring any file. We'll read it.

DocuLexis ingests 30+ file types natively — from scanned faxes and camera photos to complex spreadsheets and slide decks. No pre-conversion required.

Documents & PDFs

Core formats

.PDF Portable Document

.DOCX Word

.DOC Word Legacy

.ODT OpenDocument

.RTF Rich Text

.TXT Plain Text

Images & Scans

20+ formats — including camera captures, faxes, and legacy scans

.JPEG

.JPG

.PNG

.TIFF

.TIF

.BMP

.WEBP

.GIF

.PSD

.JP2

.APNG

.DCX

.DDS

.DIB

.PCX

.PPM

.TGA

.ICNS

.HEIC

.SVG

Spreadsheets

Tabular data — financial statements, ledgers, reports

.XLSX Excel

.XLS Excel Legacy

.CSV Comma-Separated

.TSV Tab-Separated

.ODS OpenDocument Sheet

Presentations

Slide decks — investor materials, pitch decks, reports

.PPTX PowerPoint

.PPT PowerPoint Legacy

.ODP OpenDocument Slides

.KEY Keynote

Email & Archives

Ingest directly from email sources and compressed bundles

.EML Email Message

.MSG Outlook Message

.ZIP Compressed Archive

.HTML Web Page

Automatic format conversion: Non-PDF documents (Word, PowerPoint, spreadsheets) are intelligently converted before extraction. Layout fidelity is preserved — DocuLexis adapts to font substitutions and page reflows automatically.

Password-protected files: Encrypted PDFs require the password to be supplied via the API. DocuLexis will return a clear error if a locked file is detected.

No file size limits on Enterprise: Starter and Growth plans support files up to 50MB. Enterprise plans handle files of any size via our async job pipeline.

Measurable Impact

What changes when you replace legacy tools

Field-Level Accuracy

60–75% with constant tuning

97%+ out of the box

Time per Document

8–15 min manual review

<10 seconds, fully automated

New Document Onboarding

Weeks of template engineering

Zero templates — works on first pass

Multi-Language Support

1–2 languages per pipeline

100+ languages, mixed-script in one doc

Handwriting & Signatures

Unsupported or unreliable

Physician notes, adjuster marks, seals

Cross-Document Validation

Manual spot-checks

Autonomous cross-referencing & flagging

Touchless Processing Rate

10–25% straight-through

90%+ documents need zero human touch

Deployment Timeline

Months of integration work

API-first — live in days

Output Reliability

Stochastic — no formatting guarantees

Deterministic — validated structured output