MindSpring

GitHub: Stackbilt-dev/mindspring · MIT

Part of the Stackbilt ecosystem. Cloudflare-native source intelligence backend for Knowledge Notebooks. Integrates with AEGIS Core via NDJSON thread ingestion — AEGIS conversation-facts write pipelines are compatible with MindSpring’s simple upload path.

MindSpring operates in a hybrid mode:

v1 (/api/*) — production-ready conversation archive ingestion, semantic search, and RAG chat over ChatGPT/Claude exports
v2 (/api/v2/workspaces/:workspaceId/notebooks/*) — workspace-scoped Knowledge Notebooks with sources, ingestion jobs, scoped retrieval, chat, and persisted artifacts

Architecture

Browser (SPA) → Hono API (Cloudflare Worker)
                    ├── Vectorize (vector storage + semantic search)
                    ├── R2 (raw file storage)
                    ├── Workers AI (embeddings + DeepSeek R1 RAG)
                    ├── Queue (async ingestion pipeline)
                    └── KV (state, auth, conversation text, telemetry)

Single Worker deployment — API and static frontend served together. Fully Cloudflare-native (Vectorize, not Qdrant). Streaming JSON parser handles files up to 1GB+ without memory bloat. Zero external runtime dependencies beyond Hono.

Quick Start

1. Clone and create resources

git clone https://github.com/Stackbilt-dev/mindspring.git
cd mindspring && npm install

# KV namespace
wrangler kv namespace create MINDSPRING_KV
wrangler kv namespace create MINDSPRING_KV --preview

# R2 bucket
wrangler r2 bucket create mindspring-uploads

# Vectorize index (1024-dim, cosine)
wrangler vectorize create mindspring-conversations --dimensions=1024 --metric=cosine

# Queue + DLQ
wrangler queues create mindspring-ingestion
wrangler queues create mindspring-ingestion-dlq

Paste the KV namespace IDs into wrangler.toml, then deploy:

wrangler deploy

2. Bootstrap an API key

wrangler kv key put --binding KV "apikey:your-initial-admin-key" \
  '{"name":"bootstrap","scope":"admin","createdAt":"2025-01-01T00:00:00Z","lastUsedAt":null,"revoked":false}' \
  --preview false --remote

Use this admin key to create scoped keys via POST /api/auth/keys.

Auth Model

API keys have hierarchical scope:

Scope	Access
`read`	Search, browse, stats, health
`ingest`	`read` + upload and trigger ingestion
`admin`	`ingest` + key management and telemetry

Pass as Authorization: Bearer <key> or X-API-Key: <key>.

API Overview

v2 Knowledge Notebooks

Method	Path	Description
`POST`	`/api/v2/workspaces/:wId/notebooks`	Create notebook
`GET`	`/api/v2/workspaces/:wId/notebooks`	List notebooks
`PATCH`	`/api/v2/workspaces/:wId/notebooks/:nId`	Update metadata/instructions
`DELETE`	`/api/v2/workspaces/:wId/notebooks/:nId`	Soft-delete
`POST`	`/api/v2/workspaces/:wId/notebooks/:nId/sources`	Register source from uploaded file
`POST`	`/api/v2/workspaces/:wId/notebooks/:nId/search`	Notebook-scoped semantic search
`POST`	`/api/v2/workspaces/:wId/notebooks/:nId/chat`	Notebook-scoped chat with citations
`POST`	`/api/v2/workspaces/:wId/notebooks/:nId/artifacts`	Create artifact (`briefing_doc`, `faq_glossary`, `implementation_plan`, `world_bible`)
`GET`	`/api/v2/workspaces/:wId/notebooks/:nId/artifacts/:aId`	Get artifact detail (includes `stale` flag)

Search & Chat (v1)

Method	Path	Description
`GET`	`/api/search?q=<query>`	Semantic search. Params: `q`, `limit` (max 100), `threshold` (0–1)
`POST`	`/api/chat`	RAG chat. Body: `{"question": "...", "history": [...]}`
`GET`	`/api/conversations`	Browse all conversations
`GET`	`/api/conversations/:id/similar`	Find similar conversations

Upload & Ingestion

Method	Path	Description
`POST`	`/api/uploads/simple`	Direct upload for files under 5MB
`POST`	`/api/uploads`	Initiate multipart upload for large files
`POST`	`/api/uploads/:id/complete`	Finalize and start ingestion
`GET`	`/api/uploads/:id/status`	Poll ingestion progress

Full OpenAPI 3.1 spec in openapi.yaml.

Supported Formats

Source	Format
ChatGPT	`conversations.json` from Settings > Data Controls > Export
Claude	JSON exports with `chat_messages` arrays
NDJSON threads	AEGIS `conversation-facts` write pipeline output

Both array ([{...}]) and object ({"key": {...}}) JSON root formats are supported.

Large File Handling

Files are handled without buffering in Worker memory:

Files under 5MB: simple upload path. Larger files: R2 multipart (50MB chunks, uploaded sequentially with progress tracking)
Streaming JSON parser reads from R2 chunk-by-chunk — peak memory is ~2× the largest single conversation object
Progress checkpointed to KV after every 100 conversations — Queue redelivers on CPU limits, ingestion resumes from checkpoint
Embeddings via Workers AI (@cf/baai/bge-large-en-v1.5, 1024 dims), sub-batched at 96

RAG Chat

DeepSeek R1 (@cf/deepseek-ai/deepseek-r1-distill-qwen-32b) running on Workers AI:

Question is embedded and used to retrieve top 8 conversations from Vectorize
Retrieved conversations are packed into ~4K token context window
DeepSeek R1 reasons across conversations and synthesizes an answer with citations
Supports up to 4 turns of multi-turn history
Collapsible reasoning blocks — see the model’s chain of thought or hide it

Rate Limits

Endpoint group	Limit
Search / browse / conversations	60 req/min per key
Chat (RAG)	20 req/min per key
Upload mutations	20 req/min per key

Rate limit headers on every response: X-RateLimit-Limit, X-RateLimit-Remaining, X-RateLimit-Reset.

Frontend

Vanilla HTML/CSS/JS SPA served as static assets from the same Worker. No build step.

Search — semantic search with score bars and staggered card animations
Chat — RAG chat with collapsible reasoning blocks and source citations
Upload — drag-and-drop with multipart chunking and real-time progress
Detail — full conversation view with similar conversation discovery
Settings — API key configuration and system health dashboard

Design system: “Infrastructure Noir” — Midnight Console, Architectural Tan, Visionary Purple, System Green, Cloudflare Cyan. Typography: Syne / DM Sans / JetBrains Mono.