BMO Wiki

Start from an empty desk and end with a live ESP32 companion: public landing, build wiki, private operator controls, Supabase memory, Vercel deploy, firmware pairing, voice, and smoke tests.

Start the launch quest Open the repo

8launch phases25ports mapped6env vars

Zero to live

The complete BMO launch quest.

Follow these phases in order. Each one ends with a proof check, so you always know whether the build is ready for the next door.

NEXT_PUBLIC_SUPABASE_URLSupabase project URLpublic

NEXT_PUBLIC_SUPABASE_PUBLISHABLE_KEYSupabase publishable keypublic

SUPABASE_SECRET_KEYSupabase secret keyserver only

OPENROUTER_API_KEYOpenRouter API keyserver only

AUTH_SESSION_SECRETopenssl rand -hex 32server only

CRON_SECRETopenssl rand -hex 32 (guards the dream-cycle cron)server only

Local tools

Node.js 20.18+, npm, Git, PlatformIO, Python 3.10+, and ffmpeg if you regenerate voice assets.

Cloud accounts

GitHub for the repo, Vercel for the Next.js deploy, Supabase for persistence, and OpenRouter for STT, LLM, and TTS.

Network

A 2.4 GHz Wi-Fi network. ESP32-C3 cannot join 5 GHz Wi-Fi, and weak RSSI will make voice feel slow.

Safety habit

Keep one scratch buffer for temporary secrets. Never commit firmware .env, include/secrets.h, API keys, or plaintext fingerprints.

Cloud ingredients

OpenRouter, Supabase, Vercel, firmware.

OpenRouter

Account, API key, and a small credit balance.

OPENROUTER_API_KEYThe credits endpoint returns remaining balance.

Supabase

Project URL, publishable key, secret key, schema, and seed.

NEXT_PUBLIC_SUPABASE_URL, NEXT_PUBLIC_SUPABASE_PUBLISHABLE_KEY, SUPABASE_SECRET_KEYTables exist, RLS is enabled, and the config seed row exists.

Vercel

Project rooted at dashboard/ with all env vars applied.

Dashboard env varsProduction deployment builds and /wiki returns 200.

Firmware

Wi-Fi, deployed origin, and device fingerprint.

firmware/bmo_face_anim/.envThe pre-build hook renders include/secrets.h, then upload succeeds.

Firmware .env

The six values that become secrets.h.

The pre-build hook reads this gitignored file and renders include/secrets.h before PlatformIO compiles.

WIFI_SSIDPrimary 2.4 GHz Wi-Fi name

WIFI_PASSPrimary Wi-Fi password

WIFI_SSID2Optional fallback Wi-Fi name

WIFI_PASS2Optional fallback Wi-Fi password

DASHBOARD_URLProduction origin, no trailing slash

FINGERPRINTPlaintext fingerprint from onboarding or rotation

Ports, GPIO, volts

Wire BMO like this.

This is the current ESP32-C3 firmware pin map. Keep every ground common, keep logic at 3.3V, and avoid GP8/GP9 for microphone data.

AreaPartModule pinESP32-C3 pinVoltNote

PowerESP32-C3 Super MiniUSB-C / 5VUSB power or 5V/VIN5V input

Use a real data cable for flashing; keep grounds common.

PowerShared logic rail3V3ESP32 3V33.3V

Display, mic, and touch logic live on 3.3V. Never feed 5V into GPIO.

DisplayST7735 TFTVCC3V33.3V

Use the same ground as the ESP32.

DisplayST7735 TFTGNDGND0V

Common ground.

DisplayST7735 TFTLED / BLK3V33.3V

Backlight on. Add control later only if you need dimming.

DisplayST7735 TFTCSGP73.3V logic

Chip select for the TFT.

DisplayST7735 TFTRST / RESETGP103.3V logic

Panel reset.

DisplayST7735 TFTDC / A0GP33.3V logic

Command/data select.

DisplayST7735 TFTSDA / MOSIGP63.3V logic

SPI data to screen.

DisplayST7735 TFTSCK / SCLGP43.3V logic

SPI clock.

Audio outMAX98357A ampVIN / VCC3V3 preferred3.3V or supported 5V

3.3V is safest for a single-rail build. If using 5V on amp VIN, keep I2S logic 3.3V and grounds common.

Audio outMAX98357A ampGNDGND0V

Common ground with ESP32 and speaker amp.

Audio outMAX98357A ampBCLKGP03.3V logic

Shared I2S bit clock with the mic.

Audio outMAX98357A ampLRC / WSGP13.3V logic

Shared I2S word-select clock with the mic.

Audio outMAX98357A ampDINGP23.3V logic

I2S audio data into the amp.

Audio out8 ohm speakerSPK+ / SPK-Amp output padsspeaker output

Do not connect the speaker directly to ESP32 GPIO.

Audio inINMP441 micVDD3V33.3V

Do not power the mic from 5V.

Audio inINMP441 micGNDGND0V

Common ground.

Audio inINMP441 micSCK / BCLKGP03.3V logic

Shared I2S clock from the ESP32.

Audio inINMP441 micWS / LRCLGP13.3V logic

Shared I2S word-select from the ESP32.

Audio inINMP441 micSD / DOUTGP53.3V logic

Critical: do not use GP8 or GP9. They are boot strapping pins and caused silent captures.

Audio inINMP441 micL/RGND0V

Sets the mic slot. Use the tickle mic self-test if the other slot is louder.

TouchTTP223 touch sensorVCC3V33.3V

Keeps touch output ESP32-safe.

TouchTTP223 touch sensorGNDGND0V

Common ground.

TouchTTP223 touch sensorOUT / SIGGP203.3V logic

Firmware uses INPUT_PULLDOWN to avoid phantom touches.

Open the workbench

Start with the repo, tools, and one scratch buffer for values you will paste later.

Clone the BMO-ESP32 repo.
Install Node.js 20.18 or newer for the web app.
Install PlatformIO for the ESP32-C3 firmware.
Install Python 3.10+ and ffmpeg if you plan to regenerate voice clips.
Create or prepare Supabase, Vercel, OpenRouter, and GitHub accounts.

git clone https://github.com/AlleyBo55/BMO-ESP32.git
cd BMO-ESP32/dashboard
npm install

Save point

The dashboard dependencies install, and PlatformIO can see the esp32c3_supermini environment.

Gather BMO parts

Build the physical cast before the cloud brain enters the story.

ESP32-C3 Super Mini board.
ST7735 1.8 inch 160x128 TFT display.
INMP441 I2S microphone.
MAX98357A I2S amp plus 8 ohm 1 W speaker.
TTP223 capacitive touch sensor, USB-C data cable, jumpers, and stable power.

Save point

You can flash a tiny firmware sketch and open the serial monitor without driver problems.

Create the memory room

Supabase stores admin state, config, activity logs, auth attempts, and BMO memory.

Create a Supabase project close to your deploy region.
Run dashboard/supabase/schema.sql in the SQL editor.
Run dashboard/supabase/seed.sql after the schema succeeds.
Copy the project URL, publishable key, and secret key.

# Run these files in Supabase SQL Editor
dashboard/supabase/schema.sql
dashboard/supabase/seed.sql

Save point

The admin, config, activity_log, and auth_attempts tables exist with row-level security enabled.

Light the web portal

Deploy the Next.js app as the public landing, wiki, and private operator console.

Import the GitHub repo into Vercel.
Set the Vercel root directory to dashboard.
Add the five environment variables below for Production, Preview, and Development.
Mark every server-only value as sensitive.
Make sure OpenRouter has a credit balance before testing voice.

openssl rand -hex 32
# paste that value into AUTH_SESSION_SECRET

Save point

The first Vercel deploy builds green and the production URL loads.

Onboard the operator

Create the first admin and generate the one-time device fingerprint.

Visit the production URL.
Complete the onboarding form with a username and strong password.
Leave fingerprint blank unless you already have a high-entropy value.
Copy the plaintext fingerprint immediately; it is shown once.

Save point

Pair the tiny brain

Give the ESP32-C3 Wi-Fi, the deployed origin, and its rotatable fingerprint.

Copy firmware/bmo_face_anim/.env.example to firmware/bmo_face_anim/.env.
Fill Wi-Fi SSID, Wi-Fi password, optional fallback Wi-Fi, deployed origin, and BMO fingerprint.
Flash the firmware to the ESP32-C3.
Open the serial monitor and watch for Wi-Fi plus brain readiness.

cd firmware/bmo_face_anim
cp .env.example .env
$EDITOR .env
pio run -e esp32c3_supermini -t upload
pio device monitor -e esp32c3_supermini

Save point

Serial output shows Wi-Fi connected and the brain client ready.

Prove the bridge

Confirm the paired device can reach the cloud and random callers cannot.

Call the credits endpoint with the fingerprint header.
Call the same endpoint without the header.
Keep the 200 and 401 results as your first launch proof.

curl -i \
  -H "X-BMO-Fingerprint: <paste-the-fingerprint>" \
  https://your-bmo-site.vercel.app/api/openrouter/credits

curl -i https://your-bmo-site.vercel.app/api/openrouter/credits

Save point

The authenticated request returns 200; the request without the fingerprint returns 401.

Run the voice smoke test

Exercise the whole loop: touch, mic, cloud brain, memory, voice, speaker, activity log.

Hold the touch button and say: tell me a story.
Release the button and watch the face move listening to thinking to talking.
Confirm audio plays from the speaker.
Check the activity log for input_text, reply_text, status ok, and total timing.

Save point

BMO answers out loud, the mouth moves with the audio, and the activity row records the exchange.

Before you call it live.

Public homepage loads at the production URL.Wiki is reachable at /wiki and is listed in sitemap.xml.Private operator controls are not linked from public nav and are not in sitemap.xml.Supabase anon role cannot read private tables directly.Server-only env vars are marked sensitive in Vercel.firmware .env and include/secrets.h are gitignored and never appear in git status.Old fingerprint fails after rotation; new fingerprint works after re-flash.

Features

What the project supports.

Each feature has a visible behavior, a hardware or software owner, and one implementation rule that keeps the tiny device believable.

Firmware renderer

Expression engine

The firmware owns the face library: idle, touch, listen, thinking, talking, laughing, bashful, and the wider mood set.

User sees: BMO blinks, glances, smiles, talks, reacts to touch, and never feels like a static image.
Implementation: A 160x128 RGB565 back-buffer is composed around 30 fps and flushed over hardware SPI.

TTP223 + touch classifier

Touch language

One capacitive pad becomes several gestures instead of a single boring button press.

User sees: Tap gives a small surprise, hold opens listening, long-hold gets shy, and rapid taps can become a laugh moment.
Implementation: The firmware debounces the touch line, classifies press timing, and maps it to mood plus audio behavior.

INMP441, brain API, MAX98357A

Voice loop

BMO moves from listening to thinking to talking, so voice input has a visible status at every step — and the talking mouth lip-syncs to the reply.

User sees: The face shows a glitchy listening state while recording, a pulsing thinking orb during the round-trip, and a lip-synced mouth while the reply plays.
Implementation: Held-touch captures auto-gained low-rate mono audio (decimated so a fixed buffer holds ~6s) sent to the brain route; streamed PCM16 feeds the I2S speaker while a live envelope drives the mouth.

Audio clips + voice service

Tiny voice pack

Short local clips keep instant reactions fast, while streamed replies cover open conversation.

User sees: BMO greets you instantly (and lip-syncs) on a normal touch, and still speaks longer generated replies when the brain route answers.
Implementation: Greetings are generated once in BMO’s real voice, downsampled, and baked as 4-bit ADPCM into firmware; generated replies stream to avoid storing large audio.

Brain service + Supabase memory

Memory core

A gbrain-inspired layer gives BMO short-term conversation continuity plus a durable, self-updating profile of the child.

User sees: Follow-ups make sense (apple → “red” stays on topic), and BMO remembers the child’s name — learned, never hardcoded, newest value wins.
Implementation: Recent turns are replayed as chat history; stable facts are upserted by key and recalled by vector similarity before each reply. Fully degradable.

Brain route + OpenRouter web plugin

Live web search

For questions that depend on current facts, BMO can look things up on the web instead of guessing from stale training data.

User sees: Ask "who is the president now" or "what is the weather today" and BMO answers from fresh results, not an out-of-date memory.
Implementation: When the web_search skill is on, the brain call enables the OpenRouter web plugin and nudges the model to trust live results; citations and URLs are stripped before the reply is spoken.

Soul editor + brain route

Editable personality

BMO’s persona, tone, and reply language all come from one editable soul prompt — the single source of truth.

User sees: Change the soul on the dashboard to make BMO talk differently or reply in another language; it takes effect on the next request.
Implementation: The brain route sends the soul verbatim as the system prompt (plus only the live clock). No hardcoded language or style clamp overrides it.

Fingerprint guard

Secure cloud bridge

The device can talk to the cloud brain without making the private operator controls public.

User sees: A paired BMO can call the brain route; an unpaired request gets rejected.
Implementation: Firmware sends an X-BMO-Fingerprint header while the server stores only the hashed value.

Supabase login + Next.js route

Private operator console

The public homepage explains the build, while the admin controls stay intentionally unlisted.

User sees: Visitors see the landing and wiki; the operator still has a private route for tuning the device.
Implementation: The operator route remains behind Supabase login and is excluded from public navigation and sitemap.

BMO-ESP32 repo

Open build kit

The project is meant to be hackable: firmware, voice tools, wiring notes, and the web brain live in the repo.

User sees: A builder can inspect the code, flash the device, replace clips, and iterate on the shell.
Implementation: PlatformIO handles firmware flashing; Next.js, Supabase, and small scripts handle the cloud side.

Components needed

Hardware and software roles.

Pocket Brain

ESP32-C3

Runs timing, Wi-Fi, mood state, button handling, and the route to the brain service.

Need: ESP32-C3 Super Mini board
Wire / route: Owns TFT SPI, I2S audio, touch input, Wi-Fi, and firmware secrets.
Note: Target PlatformIO env: esp32c3_supermini.

Feeling Window

ST7735 TFT display

Shows eyes, mouth shapes, glances, listening, thinking, talking, bashful, and the wider mood set.

Need: 1.8 inch 160x128 RGB565 ST7735 display
Wire / route: CS GP7, RESET GP10, DC GP3, MOSI GP6, SCK GP4; VCC and LED to 3V3.
Note: The face is rendered in a small double-buffered framebuffer.

Listening Sprout

I2S microphone

Makes the listening state real and gives the firmware a voice-input path.

Need: INMP441 I2S microphone
Wire / route: Shares the ESP32-C3 I2S clock (GP0/GP1); data (SD) on GP5.
Note: SD must avoid strapping pins GP8/GP9, or capture reads as silence.

Voice Star

I2S amp + speaker

Plays compact voice output and lets talk animation follow the reply rhythm.

Need: MAX98357A I2S amp plus 8 ohm 1 W speaker
Wire / route: BCLK GP0, LRC GP1, DOUT GP2; speaker connects to amp output pads.
Note: Short clips are instant; streamed PCM16 replies come from the brain route.

Kind Button

Touch sensor

Turns tap, hold, and long-hold into distinct personality inputs.

Need: TTP223 capacitive touch sensor
Wire / route: Touch output to GP20, plus VCC and GND.
Note: The top touch pad is the main physical interaction.

Wonder Heart

Memory service

Stores preferences, recent moments, and context for more coherent behavior.

Need: Next.js brain routes with Supabase-backed memory
Wire / route: ESP32 posts to the deployed origin with its fingerprint header.
Note: Inspired by Garry Tan GBrain-style recall and enrichment.

Stage

Power and wiring

Keeps the tiny cast stable enough that animation and audio do not brown out.

Need: USB-C data cable, 3V3/GND bundles, jumpers, and a reliable 1 A supply
Wire / route: Display, mic, touch, amp, and board share planned power and ground rails.
Note: Build one subsystem at a time so wiring bugs are easy to isolate.

Voice pipeline

How a spoken question becomes a spoken answer.

Hold the touch pad, speak, and let go. Six stages turn your voice into BMO's voice — each one has a visible behavior and one implementation detail that keeps it reliable on tiny hardware.

01 · Capture

Hold to talk

A long press (about half a second) starts a walkie-talkie recording that runs until you let go, with a short grace window so a finger flicker or a pause between words does not cut you off early. Length is bounded near six seconds by the chip memory the secure connection also needs.

Under the hood: INMP441 mic, decimated to ~5.3 kHz mono PCM so a fixed 64 KB buffer holds ~6s, recorded straight into the request buffer with no extra copies.

02 · Auto-gain

Make it loud enough

The mic records very quietly, so the firmware measures the loudest sample and scales the whole clip up before sending. Quiet trailing words survive instead of getting dropped.

Under the hood: Peak-normalize toward ~67% full-scale, gain capped at 48x. Without it, speech-to-text loses the end of sentences.

03 · Send

One authenticated POST

The clip is wrapped as a WAV and posted to the brain route with the device fingerprint header. No account login lives on the device — only the rotatable fingerprint.

Under the hood: multipart/form-data to /api/brain, X-BMO-Fingerprint header; the server stores only the hash.

04 · Understand

Speech to text to thought

The cloud transcribes the audio, recalls relevant memory and recent turns, and asks the language model for a short in-character reply. If the question needs current facts, BMO can search the web first. If no speech was heard, it answers with a gentle "say that again" instead of guessing.

Under the hood: STT then LLM with the soul prompt, child profile, recent conversation, and semantic recall folded in; OpenRouter web plugin grounds current-fact questions when the web_search skill is on.

05 · Speak

Read the reply verbatim

The reply text is sent to a dedicated text-to-speech model that reads it exactly as written, so the spoken audio can never improvise a different answer than the one shown in the activity log.

Under the hood: Dedicated /audio/speech TTS reads the text verbatim (the chat-audio model is reserved for singing). Reply streamed back as PCM16; web-search citations and URLs are stripped before it is spoken.

06 · Play & lip-sync

Mouth follows the voice

Audio streams to the speaker chunk by chunk while a loudness meter drives the mouth, so BMO looks like it is really speaking instead of playing a sound over a frozen face.

Under the hood: The chunked-transfer reply is de-chunked and the WAV header skipped on-device (the fix for the old "sssk" static), downsampled 24 to 16 kHz to I2S; a fast-attack envelope feeds the talking-mouth animation.

Why BMO reads the reply exactly

The voice model is a chat-audio model, not a plain read-aloud engine. Given a persona and a message it tends to answer in character instead of reading the text — so BMO could say something other than the logged reply. The fix sends the reply wrapped as a strict "read this verbatim" script, so the spoken audio always matches the reply text.

Why the mic data pin matters

The microphone data line must avoid the ESP32-C3 boot strapping pins (GP8 / GP9). On a strapping pin the input reads stuck-high, so capture is pure silence and speech-to-text returns nothing. The mic SD pin lives on GP5, a free pin, so audio is captured cleanly.

Core brain (gbrain-inspired)

What BMO adopted from gbrain.

BMO reproduces the load-bearing ideas of Garry Tan's GBrain as real code on its own stack (Supabase pgvector + OpenRouter), not as inert skill files. Each capability below is a working module, with the file that implements it.

capture + brain-first recall

Persistent memory

Every exchange is embedded and written down, then recalled by meaning before BMO answers — so follow-ups stay on topic and the brain grows the more BMO is used.

Module: lib/brain.ts · match_brain_memory

think (synthesis + gap analysis)

Reasoned recall

Beyond fetching memories, BMO can compose a single cited answer and honestly flag what it does not know yet.

Module: lib/brain/synthesize.ts

self-wiring knowledge graph + enrich

Connected memories

People, places, and topics become entities with typed links, so BMO can reach facts that plain similarity search misses.

Module: lib/brain/graph.ts · entities.ts

enrich the entity over time

Child profile

Durable facts about the child (name, favorites, fears) are distilled from conversations and updated in place — learned, never hardcoded, newest value wins.

Module: lib/brain/profile.ts

salience + dedup

Importance & tidy-up

Memories are scored for importance and near-duplicates are found, so what matters is kept and clutter is pruned.

Module: lib/brain/salience.ts

24/7 dream cycle (maintain)

Dream cycle

A scheduled offline pass consolidates, de-duplicates, and re-scores memory so recall quality improves over time with no human in the loop.

Module: lib/brain/consolidate.ts · /api/brain/dream

find_trajectory / timeline

Timeline

A temporal view of memory — what happened, in what order, and how a topic evolved across time.

Module: lib/brain/timeline.ts

hybrid search

Hybrid search

Vector similarity and keyword search are fused with reciprocal-rank fusion for results that beat either signal alone.

Module: lib/brain/search.ts

gbrain doctor / skillpack-check

Brain health

Built-in checks (table reachable, memories present, embeddings present, recall working) roll up into a single health score.

Module: lib/brain/doctor.ts

dream-cycle idea, applied to a toy

Random thoughts

When played with, BMO thinks out loud on its own: it recalls what it knows, muses one short line in its own voice, speaks it, and remembers the thought — a self-feeding inner life. gbrain has no named skill for this; it is their dream-cycle idea made spontaneous.

Module: lib/thoughts.ts · /api/brain/idle-thought

Signal flow

From input to personality.

An event arrives

A touch on the TTP223 pad (tap, hold, long-hold) or a microphone capture creates one small input event for the firmware to react to.

Firmware turns it into a mood

The ESP32-C3 debounces and classifies the event, then maps it to a readable mood state so the screen changes immediately — no waiting on the network.

The face explains the wait

Listening, thinking, and talking are separate animated states, so during cloud work the device looks busy and alive instead of frozen.

Local or cloud decides the response

Quick gestures play instant baked clips on-device. A held-touch question goes to the cloud brain — the full voice round-trip is detailed in the Voice section above.

Expression states

The face is the status light.

Touch

A tap becomes a mood.

The touch pad wakes the face first, then lets BMO answer with a small, readable reaction.

tiny eye dart, gasp mouth, then a soft return to idleTTP223 -> mood trigger

Listen

Listening has its own face.

While the mic records you, the face shows a live, glitchy "tuned-in" state so the device clearly looks like it is hearing you.

X-eyes, an open alert mouth, listening marks, and a subtle screen glitchI2S mic -> brain route

Think

A pause can feel alive.

During the network round-trip the mouth becomes a pulsing processing orb so the wait reads as computing, not freezing.

focused squint eyes, a breathing orb mouth, and an occasional glitch flickerESP32-C3 -> brain API

Talk

The mouth follows the voice.

Real lip-sync drives the mouth from the live audio loudness, with open, lively eyes — BMO looks like it is actually speaking.

mouth opening tracks the streamed voice; soft slow blinkreply audio -> live envelope

Hold

Long hold goes bashful.

The long press becomes a shy expression instead of a loud particle explosion.

lowered eyes, small smile tilt, restrained cheekslong hold -> shy loop

Build notes

The rules that keep BMO small.

Firmware target

PlatformIO builds the ESP32-C3 Super Mini firmware. The live target is the esp32c3_supermini environment.

Display budget

The face renders into a 160x128 RGB565 buffer, roughly 40 KB, then flushes over hardware SPI.

Voice budget

Use tiny baked clips for instant reactions and streamed PCM16 for generated replies, so the ESP32 does not store big audio.

Secrets

Wi-Fi, deployed origin, and the plaintext device fingerprint stay in the gitignored firmware secrets file.

Public/private split

The landing and wiki are public. Operator controls stay unlisted, login-protected, and outside the sitemap.

References

Code and memory inspiration.

The project repo is the source of truth for this build. GBrain is referenced as inspiration for the memory-shaped brain idea.

BMO-ESP32 repo garrytan/gbrain