Full retrieval-augmented generation in one file. Ingest documents, search them, get answers with chunk-level citations. When it doesn't know, it says so.
Standard RAG pipelines pass whatever they retrieve to the LLM, even when it's irrelevant. The model generates a plausible-sounding answer with no connection to your actual documents. Nobody flags it because the response reads fine.
This pipeline checks retrieval confidence before generating. If the best-matching chunk scores below the threshold, it returns "I don't have enough information" instead of guessing. Every answer that does come back includes the specific chunk IDs it drew from.
TF-IDF embedding and cosine similarity search run entirely on your machine. No embedding API calls for the retrieval step. Fast, private, free.
Every generated answer references the specific chunks it drew from. Trace any claim back to its source document and paragraph.
When the top retrieval score falls below 0.05, the pipeline refuses to answer instead of hallucinating. Configurable threshold.
LLM-as-judge step that scores whether the generated answer actually sticks to the provided context. Catch drift before your users do.
Default: 200 words per chunk, 20-word overlap. Adjust both parameters to match your document structure and query patterns.
Call rag.addDocument(text, id) with raw text. PDFs, reports, knowledge base articles, support docs, whatever you have. Each gets a UUID and metadata tracking.
rag.buildIndex() chunks your documents, computes TF-IDF vectors, and builds the search index. Runs locally. Takes seconds for typical document sets.
rag.query("your question") retrieves relevant chunks, generates an answer through Claude, and returns both the answer and the source references. Low-confidence queries get refused, not faked.
One-time purchase. Full source code.
Tell us about your document set and query patterns. We'll scope the build and get back to you within 24 hours.
Get in Touch