Dynamic RAG Data Loader: An efficient data pipeline and document loader for RAG application
Architecture
Key features
- Convert any local directory to RAG friendly vectorDB on demand
- Multi-format document ingestion (PPT, HTML, Word, TXT, Excel, PDF)
- Doc parsers, Pytesseract, OCR, GOTC OCR 2.0, and multi-modal LLM for data extraction
- Incremental DB updates for source files changes,
- Cost estimation and tracking
Few code snippet from the project