cyberneticlibrary

Optimize database schema and queries

dbasubagentsetup L10
foutoucour/guitar-exercises
What it does

Ingest multi-format documents into pgvector

Best for

Building semantic search over regulatory documents (Verordnungen) when 700-word chunks matter.

Inputs
  • · PDF/XLSX/DOCX/HTML
  • · chunking config
Outputs
  • · pgvector embeddings
  • · chunk store
Requires
  • · pgvector
  • · paraphrase-multilingual-mpnet-base-v2
  • · PyMuPDF
  • · pdfplumber
Preconditions
  • · Postgres with pgvector ext
  • · Embedding model loaded
  • · 3-stage PDF fallback
Failure modes
  • · OCR fails on scans
  • · IVFFlat index corrupts
  • · wrong chunk size
Trust signals
  • · 3-stage PDF fallback explicit
  • · OCR + pdfplumber + PyMuPDF
  • · IVFFlat index preserved