Dev Tools

RAG That Cites Its Sources

Full retrieval-augmented generation in one file. Ingest documents, search them, get answers with chunk-level citations. When it doesn't know, it says so.

Get the Pipeline See the Architecture
The Problem
Most RAG systems hallucinate when retrieval fails.
You ask a question, the retrieval step finds nothing relevant, and the LLM makes up an answer anyway. Your users don't know the difference.

Confident wrong answers

Standard RAG pipelines pass whatever they retrieve to the LLM, even when it's irrelevant. The model generates a plausible-sounding answer with no connection to your actual documents. Nobody flags it because the response reads fine.

Refusal over fabrication

This pipeline checks retrieval confidence before generating. If the best-matching chunk scores below the threshold, it returns "I don't have enough information" instead of guessing. Every answer that does come back includes the specific chunk IDs it drew from.

Architecture
Six stages, one file.
Ingest
Raw text in
Chunk
Overlapping windows
Embed
TF-IDF vectors
Retrieve
Cosine similarity
Generate
Claude + citations
Evaluate
Faithfulness score
What You Get
Built for accuracy, not demos.

Local Retrieval

TF-IDF embedding and cosine similarity search run entirely on your machine. No embedding API calls for the retrieval step. Fast, private, free.

Chunk-Level Citations

Every generated answer references the specific chunks it drew from. Trace any claim back to its source document and paragraph.

Low-Confidence Refusal

When the top retrieval score falls below 0.05, the pipeline refuses to answer instead of hallucinating. Configurable threshold.

Faithfulness Evaluation

LLM-as-judge step that scores whether the generated answer actually sticks to the provided context. Catch drift before your users do.

Configurable Chunking

Default: 200 words per chunk, 20-word overlap. Adjust both parameters to match your document structure and query patterns.

How It Works
Add documents, build once, query forever.
1

Feed in your documents

Call rag.addDocument(text, id) with raw text. PDFs, reports, knowledge base articles, support docs, whatever you have. Each gets a UUID and metadata tracking.

2

Build the index

rag.buildIndex() chunks your documents, computes TF-IDF vectors, and builds the search index. Runs locally. Takes seconds for typical document sets.

3

Query with citations

rag.query("your question") retrieves relevant chunks, generates an answer through Claude, and returns both the answer and the source references. Low-confidence queries get refused, not faked.

Pricing
Knowledge base Q&A that doesn't make things up.

One-time purchase. Full source code.

Solo

$4,000
one-time
  • Full pipeline source code
  • Ingest, chunk, embed, retrieve, generate
  • Citation system
  • Faithfulness evaluator
  • Commercial license (single user)
Get Started

Stop shipping hallucinated answers.

Tell us about your document set and query patterns. We'll scope the build and get back to you within 24 hours.

Get in Touch