v0.2.1 · Now on PyPI

RagBucket

Portable executable .rag artifacts for Python. Package your entire RAG pipeline into a single file. Build once. Load anywhere.

$ uv add ragbucket
scroll
The Problem

RAG systems are still fragmented.

ML models travel as single files — .pt .onnx .gguf .h5 — saved, shared, deployed anywhere. RAG pipelines don't. They're a web of infrastructure, scattered config, and provider lock-in.

Before RagBucket Fragmented Pipeline
Vector DB tied to your infrastructure
Embedding pipeline re-runs every deploy
Chunking config scattered across repos
Provider lock-in at every layer
Non-reproducible retrieval results
Impossible to share or distribute
With RagBucket Portable .rag Artifact
Everything in one self-contained file
Build once, deploy anywhere
Config baked into the artifact
Swap LLM providers at runtime
100% reproducible retrieval
Share via pip, S3, or git
demo.rag
📦 demo.rag
🗄️ vectors.faiss FAISS index
📄 chunks.json document memory
⚙️ manifest.json config + metadata
The Format

Introducing .rag

A compressed, self-contained archive that packages your entire retrieval system. Vectors, chunks, and config — everything in one file. The only external dependency at inference time is an LLM API key.

3
files inside
4
LLM providers
embedding models
Architecture

How it works

Two completely decoupled phases: build-time artifact generation and runtime retrieval + generation. The artifact is the bridge.

RagBucket architecture diagram — build pipeline and runtime flow
Quickstart

Two steps.
That's the whole API.

Build your artifact from documents. Load and query it anywhere — with any LLM provider.

build_example.py
from ragbucket import RagBuilder, RagConfig

# Configure the pipeline
config = RagConfig(
    embedding_model="BAAI/bge-small-en-v1.5",
    chunk_size=512,
    chunk_overlap=50,
    top_k=3,
)

# Build the .rag artifact
builder = RagBuilder(config=config)

builder.build(
    doc_path="docs/",       # folder of .txt files
    op_path="demo.rag",    # output artifact
)

# ✓ demo.rag created
# Contains: vectors.faiss + chunks.json + manifest.json
runtime_example.py
from ragbucket import RagRuntime
import os
from dotenv import load_dotenv

load_dotenv()

# Load the artifact — works from any environment
rag = RagRuntime(
    rag_path="demo.rag",
    provider="groq",              # groq | openai | gemini | anthropic
    api_key=os.getenv("GROQ_API_KEY"),
    model="llama-3.1-8b-instant",
    system_prompt="You are a helpful assistant.",
)

# Ask anything
response = rag.ask("What are Anik's AI/ML skills?")
print(response)
config.py
from ragbucket import RagConfig

# All fields are optional — defaults are sensible
config = RagConfig(

    # Any Sentence Transformers compatible model
    embedding_model="BAAI/bge-small-en-v1.5",

    # Chunking
    chunk_size=512,
    chunk_overlap=50,

    # How many chunks to retrieve per query
    top_k=3,
)

# Supported embedding models:
# "BAAI/bge-small-en-v1.5"          fast, great for English
# "BAAI/bge-base-en-v1.5"           balanced quality/speed
# "sentence-transformers/all-MiniLM-L6-v2"
# "sentence-transformers/all-mpnet-base-v2"
Compatibility

Works with the tools
you already use.

Plug in any LLM provider, any embedding model, any vector store. RagBucket handles the wiring.

LLM Providers response generation
Groq
llama-3.1-8b-instant
OpenAI
gpt-4o-mini
Gemini
gemini-1.5-flash
Anthropic
claude-3-haiku
Embedding Models vector generation
OpenAI
text-embedding-3-small
Cohere
embed-english-v3.0
Gemini
text-embedding-004
Voyage AI
voyage-3
Local
sentence-transformers
Vector Store semantic indexing
FAISS
Facebook AI Similarity Search · built-in, zero config
Default
Features

Everything you need.
Nothing you don't.

📦
Portable Artifacts
Serialize your entire RAG pipeline into a single reusable .rag file. Build once, run everywhere.
🔍
Built-in Semantic Search
FAISS-powered vector similarity search. No external vector database required.
🔌
Multi-Provider Runtime
Groq, OpenAI, Gemini, Anthropic — all behind one unified interface. Swap with one line.
⚙️
Configurable Pipeline
Customize chunk size, overlap, embedding model, and top-k via RagConfig. Defaults just work.
🏃
Lightweight Runtime
Load and execute .rag artifacts anywhere Python runs. No server, no infrastructure.
🧠
Self-Contained Memory
The artifact IS the retrieval system. Semantic memory travels with your code, not your infra.
Tech Stack

Built on proven tools.

Component
Technology
Role
Embeddings
Sentence Transformers
Converts text chunks to semantic vectors
Vector Search
FAISS
L2 similarity search at inference time
Chunking
LangChain Splitters
Recursive character-aware text splitting
Packaging
zipfile
Compresses artifact into portable archive
Config
Pydantic
Validated, typed configuration schema
Build System
Hatchling
PyPI packaging and distribution