Optimize database schema and queries
dbasubagentsetup L1★0
foutoucour/guitar-exercises ↗What it does
Ingest multi-format documents into pgvector
Best for
Building semantic search over regulatory documents (Verordnungen) when 700-word chunks matter.
Inputs
- · PDF/XLSX/DOCX/HTML
- · chunking config
Outputs
- · pgvector embeddings
- · chunk store
Requires
- · pgvector
- · paraphrase-multilingual-mpnet-base-v2
- · PyMuPDF
- · pdfplumber
Preconditions
- · Postgres with pgvector ext
- · Embedding model loaded
- · 3-stage PDF fallback
Failure modes
- · OCR fails on scans
- · IVFFlat index corrupts
- · wrong chunk size
Trust signals
- · 3-stage PDF fallback explicit
- · OCR + pdfplumber + PyMuPDF
- · IVFFlat index preserved