Extract structured data from PDFs
deepread-ocrskillsetup L2★0
Sheshiyer/skill-clusters ↗What it does
Extract text and structured JSON from PDFs with confidence
Best for
Extracting structured data from invoices, forms, receipts where 90% auto-extraction + 10% human review beats 100% manual.
Inputs
- · PDF file
- · JSON schema (optional, for structured extraction)
Outputs
- · Clean markdown text
- · Structured fields with confidence scores
- · hil_flag per field (human-in-loop)
Requires
- · DeepRead REST API
- · API key
Preconditions
DeepRead API key; PDF accessible; free tier allows 2000 pages/month
Failure modes
- · Handwritten/obscured text marked hil_flag=true
- · Monthly quota exhausted
- · Complex nested array schemas fail
Trust signals
- · Per-field confidence scores
- · hil_flag indicates uncertainty
- · Multi-pass validation
- · Free tier available