RagBucket

The Problem

RAG systems are still fragmented.

ML models travel as single files — .pt .onnx .gguf .h5 — saved, shared, deployed anywhere. RAG pipelines don't. They're a web of infrastructure, scattered config, and provider lock-in.

Before RagBucket Fragmented Pipeline

✕Vector DB tied to your infrastructure

✕Embedding pipeline re-runs every deploy

✕Chunking config scattered across repos

✕Provider lock-in at every layer

✕Non-reproducible retrieval results

✕Impossible to share or distribute

With RagBucket Portable .rag Artifact

✓Everything in one self-contained file

✓Build once, deploy anywhere

✓Config baked into the artifact

✓Swap LLM providers at runtime

✓100% reproducible retrieval

✓Share via pip, S3, or git

demo.rag

📦 demo.rag

🗄️ vectors.faiss FAISS index

📄 chunks.json document memory

⚙️ manifest.json config + metadata

The Format

Introducing .rag

A compressed, self-contained archive that packages your entire retrieval system. Vectors, chunks, and config — everything in one file. The only external dependency at inference time is an LLM API key.

files inside

LLM providers

∞

embedding models

Quickstart

Two steps.
That's the whole API.

Build your artifact from documents. Load and query it anywhere — with any LLM provider.

build_example.py

from ragbucket import RagBuilder, RagConfig

# Configure the pipeline
config = RagConfig(
    embedding_model="BAAI/bge-small-en-v1.5",
    chunk_size=512,
    chunk_overlap=50,
    top_k=3,
)

# Build the .rag artifact
builder = RagBuilder(config=config)

builder.build(
    doc_path="docs/",       # folder of .txt files
    op_path="demo.rag",    # output artifact
)

# ✓ demo.rag created
# Contains: vectors.faiss + chunks.json + manifest.json

runtime_example.py

from ragbucket import RagRuntime
import os
from dotenv import load_dotenv

load_dotenv()

# Load the artifact — works from any environment
rag = RagRuntime(
    rag_path="demo.rag",
    provider="groq",              # groq | openai | gemini | anthropic
    api_key=os.getenv("GROQ_API_KEY"),
    model="llama-3.1-8b-instant",
    system_prompt="You are a helpful assistant.",
)

# Ask anything
response = rag.ask("What are Anik's AI/ML skills?")
print(response)

config.py

from ragbucket import RagConfig

# All fields are optional — defaults are sensible
config = RagConfig(

    # Any Sentence Transformers compatible model
    embedding_model="BAAI/bge-small-en-v1.5",

    # Chunking
    chunk_size=512,
    chunk_overlap=50,

    # How many chunks to retrieve per query
    top_k=3,
)

# Supported embedding models:
# "BAAI/bge-small-en-v1.5"          fast, great for English
# "BAAI/bge-base-en-v1.5"           balanced quality/speed
# "sentence-transformers/all-MiniLM-L6-v2"
# "sentence-transformers/all-mpnet-base-v2"

Compatibility

Works with the tools
you already use.

Plug in any LLM provider, any embedding model, any vector store. RagBucket handles the wiring.

LLM Providers response generation

Groq

llama-3.1-8b-instant

OpenAI

gpt-4o-mini

Gemini

gemini-1.5-flash

Anthropic

claude-3-haiku

Embedding Models vector generation

OpenAI

text-embedding-3-small

Cohere

embed-english-v3.0

Gemini

text-embedding-004

Voyage AI

voyage-3

Local

sentence-transformers

Vector Store semantic indexing

FAISS

Facebook AI Similarity Search · built-in, zero config

Default

Features

Everything you need.
Nothing you don't.

📦

Portable Artifacts

Serialize your entire RAG pipeline into a single reusable .rag file. Build once, run everywhere.

🔍

Built-in Semantic Search

FAISS-powered vector similarity search. No external vector database required.

🔌

Multi-Provider Runtime

Groq, OpenAI, Gemini, Anthropic — all behind one unified interface. Swap with one line.

⚙️

Configurable Pipeline

Customize chunk size, overlap, embedding model, and top-k via RagConfig. Defaults just work.

🏃

Lightweight Runtime

Load and execute .rag artifacts anywhere Python runs. No server, no infrastructure.

🧠

Self-Contained Memory

The artifact IS the retrieval system. Semantic memory travels with your code, not your infra.

Tech Stack

Built on proven tools.

Component

Technology

Role

Embeddings

Sentence Transformers

Converts text chunks to semantic vectors

Vector Search

FAISS

L2 similarity search at inference time

Chunking

LangChain Splitters

Recursive character-aware text splitting

Packaging

zipfile

Compresses artifact into portable archive

Config

Pydantic

Validated, typed configuration schema

Build System

Hatchling

PyPI packaging and distribution

RAG systems are still fragmented.

Introducing .rag

How it works

Two steps.That's the whole API.

Works with the toolsyou already use.

Everything you need.Nothing you don't.

Built on proven tools.

Two steps.
That's the whole API.

Works with the tools
you already use.

Everything you need.
Nothing you don't.