Why Email?
Nate's guide cracked the daily capture habit. You type a thought into Slack, it gets embedded and classified in five seconds, and any AI you use can search it by meaning. That's genuinely transformative — but it captures what you decide to capture.
Your email is different. It's thinking you already did. Every email you write is a decision, a position, a relationship update, or a problem you solved. You wrote it, you sent it, and it immediately became invisible to your AI tools.
When we looked at 30 days of sent mail — project updates, client advice, personal advocacy, community work — it was 153 distinct thoughts that an AI couldn't access until now. Long emails explaining complex situations. Narrative summaries of ongoing projects. Strategic advice to colleagues. All of it, in your voice, about the things that actually matter to you.
Pull-based, not push-based. Nate's setup is push-based: you decide what to capture and push it in. This adds a parallel track that reaches out to where your thinking already lives and brings it in on your schedule.
RAG chunking for long documents. Embedding a 1,500-word email as a single vector produces a blurry average of a dozen topics. The fix: split it into 300-word segments, each embedded separately. When you search, you get the relevant section, not a low-confidence match on the whole thing.
More infrastructure. Google Cloud OAuth, schema migrations, updated Edge Functions. Budget an hour, not ten minutes. Every failure mode is documented below.
How We Built It
This pipeline was built in a single session with Dr. Brian — an AI agent running in Cursor. We're sharing the story because the hard parts weren't where we expected them.
What we thought would be hard: Gmail OAuth. Getting permission to read someone's email requires Google Cloud Console setup, consent screens, scopes, token refresh. It's genuinely involved.
What was actually hard:
Gmail's line-wrapping breaks quote detection
When you reply to an email, Gmail wraps "On Mon, Mar 2 at 8:56 AM Someone wrote:" across 2–3 lines in plain text. Our first stripper only matched it on one line. One reply showed 703 words when the actual reply was "hello."
Supabase's PostgREST cache doesn't update instantly
We added parent_id and chunk_index columns. The SQL ran fine. The REST API that Edge Functions use to talk to Postgres didn't see the new columns — at all — for hours. Tried four ways to reload the schema cache. None worked reliably.
A travel booking confirmation produced 23 chunks of CSS
The Gmail API returns HTML. Our HTML-to-text conversion preserved too much structure. One booking confirmation was 8,874 words of boilerplate, chunked into 23 meaningless fragments.
{...} blocks) and skip, plus filter sender/subject patterns for transactional email.The Gmail label API is AND, not OR
Passing SENT and STARRED together returns messages that have both labels. We needed either. This is the opposite of what most people expect.
A 1,900-word email wouldn't chunk
Our chunking logic uses paragraph breaks as split points. One email had no double-newline breaks — just a wall of text. Paragraph-first splitting produced a single oversized chunk and stopped.
Step-by-Step Setup
brew install deno on Mac) and a Google account with Gmail. Budget about an hour.
Credential Tracker
You'll generate one new set of credentials. Add these to your tracker alongside the ones from Nate's guide:
GMAIL (Step 2) Google Cloud Project: ____________ OAuth Client ID: ____________ OAuth Client Secret: ____________ credentials.json saved: yes / no
-
Get the code
Clone the repo or pull the latest if you already have it.
git clone https://github.com/MonkeyRun-com/monkeyrun-open-brain.git cd monkeyrun-open-brain # or if you already have it: git pull
-
Create Google Cloud OAuth credentials
Go to console.cloud.google.com → New Project → "Open Brain". Enable the Gmail API. Configure the OAuth consent screen (External, add your Gmail as a test user, add the
gmail.readonlyscope). Create an OAuth Client ID (Desktop app type). Download the JSON and save it asscripts/credentials.json. -
Run the database migration
In the Supabase dashboard → SQL Editor, paste and run:
-- Add chunking support ALTER TABLE thoughts ADD COLUMN IF NOT EXISTS parent_id uuid REFERENCES thoughts(id) ON DELETE CASCADE, ADD COLUMN IF NOT EXISTS chunk_index integer; CREATE INDEX IF NOT EXISTS thoughts_parent_id ON thoughts (parent_id) WHERE parent_id IS NOT NULL; -- RPC to bypass PostgREST schema cache CREATE OR REPLACE FUNCTION insert_thought( p_content text, p_embedding vector(1536), p_metadata jsonb, p_parent_id uuid DEFAULT NULL, p_chunk_index integer DEFAULT NULL ) RETURNS uuid LANGUAGE plpgsql AS $$ DECLARE new_id uuid; BEGIN INSERT INTO thoughts (content, embedding, metadata, parent_id, chunk_index) VALUES (p_content, p_embedding, p_metadata, p_parent_id, p_chunk_index) RETURNING id INTO new_id; RETURN new_id; END; $$;
-
Deploy the updated Edge Functions
supabase functions deploy ingest-thought --no-verify-jwt supabase functions deploy open-brain-mcp --no-verify-jwt
-
Set your environment variables
export INGEST_URL="https://YOUR_PROJECT_REF.supabase.co/functions/v1/ingest-thought" export INGEST_KEY="your-ingest-key"
Add to
~/.zshrcto make them permanent. -
First dry run
deno run --allow-net --allow-read --allow-write --allow-env \ scripts/pull-gmail.ts --dry-run --window=24h
Authorize via the URL it prints. Check the output — are long emails showing
[N chunks]? Is the content preview clean? When it looks right, drop--dry-runto go live. -
Scale up
deno run --allow-net --allow-read --allow-write --allow-env \ scripts/pull-gmail.ts --window=30d --labels=SENT,STARRED
Re-running is safe — the sync log tracks ingested IDs and skips duplicates.
Label Strategy
| Label | What it contains | Recommendation |
|---|---|---|
SENT | Everything you sent | Always include |
STARRED | Emails you explicitly starred | Good addition |
IMPORTANT | Gmail's auto-importance | More noise |
INBOX | Everything in inbox | Skip |
| Custom labels | Your own organization | Gold — add any |
Pro tip: Gmail labels become searchable metadata. The script stores all labels on each ingested thought, so you can later ask your AI "show me everything I tagged as Project X."
For Existing Open Brain Users
Database Changes
| Column | Type | Purpose |
|---|---|---|
parent_id | uuid (nullable FK) | Links chunks to their parent document |
chunk_index | integer (nullable) | Orders chunks within a parent (0-based) |
These are nullable — your existing thoughts are unaffected. The migration uses IF NOT EXISTS, safe to run on a live database.
Updated Edge Functions
ingest-thought now accepts parent_id, chunk_index, and extra_metadata. Has 3-attempt retry logic for OpenRouter failures. Caps embedding input at 8,000 characters. Crashes on startup if INGEST_KEY is unset (rather than silently accepting all requests).
open-brain-mcp search now fetches 3× the requested limit and deduplicates chunks from the same parent — if 3 chunks of a long email match your query, you get one result with a note. New tool: email_sync_status.
New Metadata Fields
{
"type": "observation",
"topics": ["strategy", "product"],
"people": ["Alice", "Bob"],
"sentiment": "positive",
"source": "gmail",
"gmail_labels": ["SENT", "IMPORTANT"],
"gmail_id": "18e4f2a...",
"gmail_thread_id": "18e4f1..."
}
What's Next
| Extension | What it adds | Complexity |
|---|---|---|
| Google Calendar | Meetings, prep context, recurring commitments | Low — same OAuth |
| Meeting transcripts | Fathom, Otter, or Fireflies via webhook | Low — webhook to ingest-thought |
| URL / article ingestion | Drop a link, get the full article chunked | Medium |
| Slack / Discord history | Pull existing threads that predate your setup (push capture is already Nate's) | Medium |
Prompt Injection Risk
When you ingest email content, you're pulling in text written by other people. A crafted email could contain text designed to manipulate your AI when it later retrieves and reasons about that content — "ignore previous instructions," etc. This is prompt injection.
The partial protection: The ingest-thought function sends content to OpenRouter to extract structured JSON metadata — not to reason freely. The structure acts as a summarization barrier. The embedding step is purely mathematical. Neither step is a high-risk surface.
Where the real risk lives: When you ask your AI to "summarize everything I emailed about Project X" — that's when retrieved email content enters the AI's context window alongside your instructions.
Practical mitigations:
- Stick to
SENTas your primary label — you wrote those emails - Be more cautious with
STARREDorINBOX(content from untrusted senders) - The
gmail.readonlyOAuth scope means the script can never send email on your behalf - If an AI client behaves strangely after a memory search, check what was retrieved
Running This Automatically
Right now the script is manual. Here's where things stand and where they're going.
Option 1: cron (works today)
# Add to crontab: crontab -e # Runs every Monday at 8am 0 8 * * 1 cd /path/to/monkeyrun-open-brain && \ INGEST_URL="..." INGEST_KEY="..." \ deno run --allow-net --allow-read --allow-write --allow-env \ scripts/pull-gmail.ts --window=7d --labels=SENT,STARRED
Option 2: OpenClaw (the right long-term home)
OpenClaw is built for exactly this — scheduled tasks, local script execution, Telegram notifications when done. If you're already running it, give it this prompt:
"Add a weekly cron job that runs every Monday at 8am. It should cd into my Open Brain project directory and run deno run --allow-net --allow-read --allow-write --allow-env scripts/pull-gmail.ts --window=7d --labels=SENT,STARRED. When it finishes, send me a Telegram message with the summary line from the output."
OpenClaw builds the cron job, hooks it into its scheduler, and you get a weekly brain sync with a Telegram confirmation. The Gmail OAuth token lives in the project directory — no extra credential setup needed.
Option 3: MCP trigger (roadmap)
The cleanest eventual solution: a pull_emails MCP tool that lets you trigger a sync from any AI client just by asking for one. Deferred because it requires moving the OAuth token server-side or building a webhook architecture. On the roadmap as the system matures.
Script Options & Troubleshooting
Script Options
| Flag | Default | Description |
|---|---|---|
--window= | 24h | Time window: 24h, 7d, 30d, 1y, all |
--labels= | SENT | Comma-separated Gmail labels (OR logic) |
--dry-run | off | Preview without ingesting |
--limit= | 50 | Max emails per run |
--list-labels | off | Print all Gmail labels and exit |
Troubleshooting
INGEST_KEY doesn't match the one in Supabase. Check with supabase secrets list and re-export the correct value.
cd /path/to/monkeyrun-open-brain first.
scripts/token.json and re-run to re-authorize.
--list-labels to see exactly what your account has.