Dev Tools

RAG That Cites Its Sources

Full retrieval-augmented generation in one file. Ingest documents, search them, get answers with chunk-level citations. When it doesn't know, it says so.

Get the Pipeline See the Architecture

The Problem

Most RAG systems hallucinate when retrieval fails.

You ask a question, the retrieval step finds nothing relevant, and the LLM makes up an answer anyway. Your users don't know the difference.

Confident wrong answers

Standard RAG pipelines pass whatever they retrieve to the LLM, even when it's irrelevant. The model generates a plausible-sounding answer with no connection to your actual documents. Nobody flags it because the response reads fine.

Refusal over fabrication

This pipeline checks retrieval confidence before generating. If the best-matching chunk scores below the threshold, it returns "I don't have enough information" instead of guessing. Every answer that does come back includes the specific chunk IDs it drew from.

Architecture

Six stages, one file.

Ingest

Raw text in

→

Chunk

Overlapping windows

→

Embed

TF-IDF vectors

→

Retrieve

Cosine similarity

→

Generate

Claude + citations

→

Evaluate

Faithfulness score

What You Get

Built for accuracy, not demos.

Local Retrieval

TF-IDF embedding and cosine similarity search run entirely on your machine. No embedding API calls for the retrieval step. Fast, private, free.

Chunk-Level Citations

Every generated answer references the specific chunks it drew from. Trace any claim back to its source document and paragraph.

Low-Confidence Refusal

When the top retrieval score falls below 0.05, the pipeline refuses to answer instead of hallucinating. Configurable threshold.

Faithfulness Evaluation

LLM-as-judge step that scores whether the generated answer actually sticks to the provided context. Catch drift before your users do.

Configurable Chunking

Default: 200 words per chunk, 20-word overlap. Adjust both parameters to match your document structure and query patterns.

How It Works

Add documents, build once, query forever.

Feed in your documents

Call rag.addDocument(text, id) with raw text. PDFs, reports, knowledge base articles, support docs, whatever you have. Each gets a UUID and metadata tracking.

Build the index

rag.buildIndex() chunks your documents, computes TF-IDF vectors, and builds the search index. Runs locally. Takes seconds for typical document sets.

Query with citations

rag.query("your question") retrieves relevant chunks, generates an answer through Claude, and returns both the answer and the source references. Low-confidence queries get refused, not faked.

Pricing

Knowledge base Q&A that doesn't make things up.

One-time purchase. Full source code.

Solo

$4,000

one-time

Full pipeline source code
Ingest, chunk, embed, retrieve, generate
Citation system
Faithfulness evaluator
Commercial license (single user)

Get Started

Custom Build

$4K - $15K

based on scope

Everything in Solo
Your documents ingested and tuned
Chunking optimized for your content
Dense vector embeddings (optional)
Eval harness with your test queries
2-5 week delivery

Contact for Quote