NEXT_PUBLIC_SUPABASE_URLSupabase project URLpublicBMO Wiki
Start from an empty desk and end with a live ESP32 companion: public landing, build wiki, private operator controls, Supabase memory, Vercel deploy, firmware pairing, voice, and smoke tests.
Zero to live
The complete BMO launch quest.
Follow these phases in order. Each one ends with a proof check, so you always know whether the build is ready for the next door.
NEXT_PUBLIC_SUPABASE_PUBLISHABLE_KEYSupabase publishable keypublicSUPABASE_SECRET_KEYSupabase secret keyserver onlyOPENROUTER_API_KEYOpenRouter API keyserver onlyAUTH_SESSION_SECRETopenssl rand -hex 32server onlyCRON_SECRETopenssl rand -hex 32 (guards the dream-cycle cron)server onlyLocal tools
Node.js 20.18+, npm, Git, PlatformIO, Python 3.10+, and ffmpeg if you regenerate voice assets.
Cloud accounts
GitHub for the repo, Vercel for the Next.js deploy, Supabase for persistence, and OpenRouter for STT, LLM, and TTS.
Network
A 2.4 GHz Wi-Fi network. ESP32-C3 cannot join 5 GHz Wi-Fi, and weak RSSI will make voice feel slow.
Safety habit
Keep one scratch buffer for temporary secrets. Never commit firmware .env, include/secrets.h, API keys, or plaintext fingerprints.
Cloud ingredients
OpenRouter, Supabase, Vercel, firmware.
Account, API key, and a small credit balance.
OPENROUTER_API_KEYThe credits endpoint returns remaining balance.Project URL, publishable key, secret key, schema, and seed.
NEXT_PUBLIC_SUPABASE_URL, NEXT_PUBLIC_SUPABASE_PUBLISHABLE_KEY, SUPABASE_SECRET_KEYTables exist, RLS is enabled, and the config seed row exists.Project rooted at dashboard/ with all env vars applied.
Dashboard env varsProduction deployment builds and /wiki returns 200.Wi-Fi, deployed origin, and device fingerprint.
firmware/bmo_face_anim/.envThe pre-build hook renders include/secrets.h, then upload succeeds.Firmware .env
The six values that become secrets.h.
The pre-build hook reads this gitignored file and renders include/secrets.h before PlatformIO compiles.
WIFI_SSIDPrimary 2.4 GHz Wi-Fi name
WIFI_PASSPrimary Wi-Fi password
WIFI_SSID2Optional fallback Wi-Fi name
WIFI_PASS2Optional fallback Wi-Fi password
DASHBOARD_URLProduction origin, no trailing slash
FINGERPRINTPlaintext fingerprint from onboarding or rotation
Ports, GPIO, volts
Wire BMO like this.
This is the current ESP32-C3 firmware pin map. Keep every ground common, keep logic at 3.3V, and avoid GP8/GP9 for microphone data.
USB-C / 5VUSB power or 5V/VIN5V inputUse a real data cable for flashing; keep grounds common.
3V3ESP32 3V33.3VDisplay, mic, and touch logic live on 3.3V. Never feed 5V into GPIO.
VCC3V33.3VUse the same ground as the ESP32.
GNDGND0VCommon ground.
LED / BLK3V33.3VBacklight on. Add control later only if you need dimming.
CSGP73.3V logicChip select for the TFT.
RST / RESETGP103.3V logicPanel reset.
DC / A0GP33.3V logicCommand/data select.
SDA / MOSIGP63.3V logicSPI data to screen.
SCK / SCLGP43.3V logicSPI clock.
VIN / VCC3V3 preferred3.3V or supported 5V3.3V is safest for a single-rail build. If using 5V on amp VIN, keep I2S logic 3.3V and grounds common.
GNDGND0VCommon ground with ESP32 and speaker amp.
BCLKGP03.3V logicShared I2S bit clock with the mic.
LRC / WSGP13.3V logicShared I2S word-select clock with the mic.
DINGP23.3V logicI2S audio data into the amp.
SPK+ / SPK-Amp output padsspeaker outputDo not connect the speaker directly to ESP32 GPIO.
VDD3V33.3VDo not power the mic from 5V.
GNDGND0VCommon ground.
SCK / BCLKGP03.3V logicShared I2S clock from the ESP32.
WS / LRCLGP13.3V logicShared I2S word-select from the ESP32.
SD / DOUTGP53.3V logicCritical: do not use GP8 or GP9. They are boot strapping pins and caused silent captures.
L/RGND0VSets the mic slot. Use the tickle mic self-test if the other slot is louder.
VCC3V33.3VKeeps touch output ESP32-safe.
GNDGND0VCommon ground.
OUT / SIGGP203.3V logicFirmware uses INPUT_PULLDOWN to avoid phantom touches.
Open the workbench
Start with the repo, tools, and one scratch buffer for values you will paste later.
- Clone the BMO-ESP32 repo.
- Install Node.js 20.18 or newer for the web app.
- Install PlatformIO for the ESP32-C3 firmware.
- Install Python 3.10+ and ffmpeg if you plan to regenerate voice clips.
- Create or prepare Supabase, Vercel, OpenRouter, and GitHub accounts.
git clone https://github.com/AlleyBo55/BMO-ESP32.git
cd BMO-ESP32/dashboard
npm installThe dashboard dependencies install, and PlatformIO can see the esp32c3_supermini environment.
Gather BMO parts
Build the physical cast before the cloud brain enters the story.
- ESP32-C3 Super Mini board.
- ST7735 1.8 inch 160x128 TFT display.
- INMP441 I2S microphone.
- MAX98357A I2S amp plus 8 ohm 1 W speaker.
- TTP223 capacitive touch sensor, USB-C data cable, jumpers, and stable power.
You can flash a tiny firmware sketch and open the serial monitor without driver problems.
Create the memory room
Supabase stores admin state, config, activity logs, auth attempts, and BMO memory.
- Create a Supabase project close to your deploy region.
- Run dashboard/supabase/schema.sql in the SQL editor.
- Run dashboard/supabase/seed.sql after the schema succeeds.
- Copy the project URL, publishable key, and secret key.
# Run these files in Supabase SQL Editor
dashboard/supabase/schema.sql
dashboard/supabase/seed.sqlThe admin, config, activity_log, and auth_attempts tables exist with row-level security enabled.
Light the web portal
Deploy the Next.js app as the public landing, wiki, and private operator console.
- Import the GitHub repo into Vercel.
- Set the Vercel root directory to dashboard.
- Add the five environment variables below for Production, Preview, and Development.
- Mark every server-only value as sensitive.
- Make sure OpenRouter has a credit balance before testing voice.
openssl rand -hex 32
# paste that value into AUTH_SESSION_SECRETThe first Vercel deploy builds green and the production URL loads.
Onboard the operator
Create the first admin and generate the one-time device fingerprint.
- Visit the production URL.
- Complete the onboarding form with a username and strong password.
- Leave fingerprint blank unless you already have a high-entropy value.
- Copy the plaintext fingerprint immediately; it is shown once.
Login works, and your scratch buffer has the production URL plus plaintext fingerprint.
Pair the tiny brain
Give the ESP32-C3 Wi-Fi, the deployed origin, and its rotatable fingerprint.
- Copy firmware/bmo_face_anim/.env.example to firmware/bmo_face_anim/.env.
- Fill Wi-Fi SSID, Wi-Fi password, optional fallback Wi-Fi, deployed origin, and BMO fingerprint.
- Flash the firmware to the ESP32-C3.
- Open the serial monitor and watch for Wi-Fi plus brain readiness.
cd firmware/bmo_face_anim
cp .env.example .env
$EDITOR .env
pio run -e esp32c3_supermini -t upload
pio device monitor -e esp32c3_superminiSerial output shows Wi-Fi connected and the brain client ready.
Prove the bridge
Confirm the paired device can reach the cloud and random callers cannot.
- Call the credits endpoint with the fingerprint header.
- Call the same endpoint without the header.
- Keep the 200 and 401 results as your first launch proof.
curl -i \
-H "X-BMO-Fingerprint: <paste-the-fingerprint>" \
https://your-bmo-site.vercel.app/api/openrouter/credits
curl -i https://your-bmo-site.vercel.app/api/openrouter/creditsThe authenticated request returns 200; the request without the fingerprint returns 401.
Run the voice smoke test
Exercise the whole loop: touch, mic, cloud brain, memory, voice, speaker, activity log.
- Hold the touch button and say: tell me a story.
- Release the button and watch the face move listening to thinking to talking.
- Confirm audio plays from the speaker.
- Check the activity log for input_text, reply_text, status ok, and total timing.
BMO answers out loud, the mouth moves with the audio, and the activity row records the exchange.
Before you call it live.
Features
What the project supports.
Each feature has a visible behavior, a hardware or software owner, and one implementation rule that keeps the tiny device believable.
Expression engine
The firmware owns the face library: idle, touch, listen, thinking, talking, laughing, bashful, and the wider mood set.
- User sees
- BMO blinks, glances, smiles, talks, reacts to touch, and never feels like a static image.
- Implementation
- A 160x128 RGB565 back-buffer is composed around 30 fps and flushed over hardware SPI.
Touch language
One capacitive pad becomes several gestures instead of a single boring button press.
- User sees
- Tap gives a small surprise, hold opens listening, long-hold gets shy, and rapid taps can become a laugh moment.
- Implementation
- The firmware debounces the touch line, classifies press timing, and maps it to mood plus audio behavior.
Voice loop
BMO moves from listening to thinking to talking, so voice input has a visible status at every step — and the talking mouth lip-syncs to the reply.
- User sees
- The face shows a glitchy listening state while recording, a pulsing thinking orb during the round-trip, and a lip-synced mouth while the reply plays.
- Implementation
- Held-touch captures auto-gained low-rate mono audio (decimated so a fixed buffer holds ~6s) sent to the brain route; streamed PCM16 feeds the I2S speaker while a live envelope drives the mouth.
Tiny voice pack
Short local clips keep instant reactions fast, while streamed replies cover open conversation.
- User sees
- BMO greets you instantly (and lip-syncs) on a normal touch, and still speaks longer generated replies when the brain route answers.
- Implementation
- Greetings are generated once in BMO’s real voice, downsampled, and baked as 4-bit ADPCM into firmware; generated replies stream to avoid storing large audio.
Memory core
A gbrain-inspired layer gives BMO short-term conversation continuity plus a durable, self-updating profile of the child.
- User sees
- Follow-ups make sense (apple → “red” stays on topic), and BMO remembers the child’s name — learned, never hardcoded, newest value wins.
- Implementation
- Recent turns are replayed as chat history; stable facts are upserted by key and recalled by vector similarity before each reply. Fully degradable.
Live web search
For questions that depend on current facts, BMO can look things up on the web instead of guessing from stale training data.
- User sees
- Ask "who is the president now" or "what is the weather today" and BMO answers from fresh results, not an out-of-date memory.
- Implementation
- When the web_search skill is on, the brain call enables the OpenRouter web plugin and nudges the model to trust live results; citations and URLs are stripped before the reply is spoken.
Editable personality
BMO’s persona, tone, and reply language all come from one editable soul prompt — the single source of truth.
- User sees
- Change the soul on the dashboard to make BMO talk differently or reply in another language; it takes effect on the next request.
- Implementation
- The brain route sends the soul verbatim as the system prompt (plus only the live clock). No hardcoded language or style clamp overrides it.
Secure cloud bridge
The device can talk to the cloud brain without making the private operator controls public.
- User sees
- A paired BMO can call the brain route; an unpaired request gets rejected.
- Implementation
- Firmware sends an X-BMO-Fingerprint header while the server stores only the hashed value.
Private operator console
The public homepage explains the build, while the admin controls stay intentionally unlisted.
- User sees
- Visitors see the landing and wiki; the operator still has a private route for tuning the device.
- Implementation
- The operator route remains behind Supabase login and is excluded from public navigation and sitemap.
Open build kit
The project is meant to be hackable: firmware, voice tools, wiring notes, and the web brain live in the repo.
- User sees
- A builder can inspect the code, flash the device, replace clips, and iterate on the shell.
- Implementation
- PlatformIO handles firmware flashing; Next.js, Supabase, and small scripts handle the cloud side.
Components needed
Hardware and software roles.
ESP32-C3
Runs timing, Wi-Fi, mood state, button handling, and the route to the brain service.
- Need
- ESP32-C3 Super Mini board
- Wire / route
- Owns TFT SPI, I2S audio, touch input, Wi-Fi, and firmware secrets.
- Note
- Target PlatformIO env: esp32c3_supermini.
ST7735 TFT display
Shows eyes, mouth shapes, glances, listening, thinking, talking, bashful, and the wider mood set.
- Need
- 1.8 inch 160x128 RGB565 ST7735 display
- Wire / route
- CS GP7, RESET GP10, DC GP3, MOSI GP6, SCK GP4; VCC and LED to 3V3.
- Note
- The face is rendered in a small double-buffered framebuffer.
I2S microphone
Makes the listening state real and gives the firmware a voice-input path.
- Need
- INMP441 I2S microphone
- Wire / route
- Shares the ESP32-C3 I2S clock (GP0/GP1); data (SD) on GP5.
- Note
- SD must avoid strapping pins GP8/GP9, or capture reads as silence.
I2S amp + speaker
Plays compact voice output and lets talk animation follow the reply rhythm.
- Need
- MAX98357A I2S amp plus 8 ohm 1 W speaker
- Wire / route
- BCLK GP0, LRC GP1, DOUT GP2; speaker connects to amp output pads.
- Note
- Short clips are instant; streamed PCM16 replies come from the brain route.
Touch sensor
Turns tap, hold, and long-hold into distinct personality inputs.
- Need
- TTP223 capacitive touch sensor
- Wire / route
- Touch output to GP20, plus VCC and GND.
- Note
- The top touch pad is the main physical interaction.
Memory service
Stores preferences, recent moments, and context for more coherent behavior.
- Need
- Next.js brain routes with Supabase-backed memory
- Wire / route
- ESP32 posts to the deployed origin with its fingerprint header.
- Note
- Inspired by Garry Tan GBrain-style recall and enrichment.
Power and wiring
Keeps the tiny cast stable enough that animation and audio do not brown out.
- Need
- USB-C data cable, 3V3/GND bundles, jumpers, and a reliable 1 A supply
- Wire / route
- Display, mic, touch, amp, and board share planned power and ground rails.
- Note
- Build one subsystem at a time so wiring bugs are easy to isolate.
Voice pipeline
How a spoken question becomes a spoken answer.
Hold the touch pad, speak, and let go. Six stages turn your voice into BMO's voice — each one has a visible behavior and one implementation detail that keeps it reliable on tiny hardware.
Hold to talk
A long press (about half a second) starts a walkie-talkie recording that runs until you let go, with a short grace window so a finger flicker or a pause between words does not cut you off early. Length is bounded near six seconds by the chip memory the secure connection also needs.
- Under the hood
- INMP441 mic, decimated to ~5.3 kHz mono PCM so a fixed 64 KB buffer holds ~6s, recorded straight into the request buffer with no extra copies.
Make it loud enough
The mic records very quietly, so the firmware measures the loudest sample and scales the whole clip up before sending. Quiet trailing words survive instead of getting dropped.
- Under the hood
- Peak-normalize toward ~67% full-scale, gain capped at 48x. Without it, speech-to-text loses the end of sentences.
One authenticated POST
The clip is wrapped as a WAV and posted to the brain route with the device fingerprint header. No account login lives on the device — only the rotatable fingerprint.
- Under the hood
- multipart/form-data to /api/brain, X-BMO-Fingerprint header; the server stores only the hash.
Speech to text to thought
The cloud transcribes the audio, recalls relevant memory and recent turns, and asks the language model for a short in-character reply. If the question needs current facts, BMO can search the web first. If no speech was heard, it answers with a gentle "say that again" instead of guessing.
- Under the hood
- STT then LLM with the soul prompt, child profile, recent conversation, and semantic recall folded in; OpenRouter web plugin grounds current-fact questions when the web_search skill is on.
Read the reply verbatim
The reply text is sent to a dedicated text-to-speech model that reads it exactly as written, so the spoken audio can never improvise a different answer than the one shown in the activity log.
- Under the hood
- Dedicated /audio/speech TTS reads the text verbatim (the chat-audio model is reserved for singing). Reply streamed back as PCM16; web-search citations and URLs are stripped before it is spoken.
Mouth follows the voice
Audio streams to the speaker chunk by chunk while a loudness meter drives the mouth, so BMO looks like it is really speaking instead of playing a sound over a frozen face.
- Under the hood
- The chunked-transfer reply is de-chunked and the WAV header skipped on-device (the fix for the old "sssk" static), downsampled 24 to 16 kHz to I2S; a fast-attack envelope feeds the talking-mouth animation.
Why BMO reads the reply exactly
The voice model is a chat-audio model, not a plain read-aloud engine. Given a persona and a message it tends to answer in character instead of reading the text — so BMO could say something other than the logged reply. The fix sends the reply wrapped as a strict "read this verbatim" script, so the spoken audio always matches the reply text.
Why the mic data pin matters
The microphone data line must avoid the ESP32-C3 boot strapping pins (GP8 / GP9). On a strapping pin the input reads stuck-high, so capture is pure silence and speech-to-text returns nothing. The mic SD pin lives on GP5, a free pin, so audio is captured cleanly.
Core brain (gbrain-inspired)
What BMO adopted from gbrain.
BMO reproduces the load-bearing ideas of Garry Tan's GBrain as real code on its own stack (Supabase pgvector + OpenRouter), not as inert skill files. Each capability below is a working module, with the file that implements it.
Persistent memory
Every exchange is embedded and written down, then recalled by meaning before BMO answers — so follow-ups stay on topic and the brain grows the more BMO is used.
- Module
- lib/brain.ts · match_brain_memory
Reasoned recall
Beyond fetching memories, BMO can compose a single cited answer and honestly flag what it does not know yet.
- Module
- lib/brain/synthesize.ts
Connected memories
People, places, and topics become entities with typed links, so BMO can reach facts that plain similarity search misses.
- Module
- lib/brain/graph.ts · entities.ts
Child profile
Durable facts about the child (name, favorites, fears) are distilled from conversations and updated in place — learned, never hardcoded, newest value wins.
- Module
- lib/brain/profile.ts
Importance & tidy-up
Memories are scored for importance and near-duplicates are found, so what matters is kept and clutter is pruned.
- Module
- lib/brain/salience.ts
Dream cycle
A scheduled offline pass consolidates, de-duplicates, and re-scores memory so recall quality improves over time with no human in the loop.
- Module
- lib/brain/consolidate.ts · /api/brain/dream
Timeline
A temporal view of memory — what happened, in what order, and how a topic evolved across time.
- Module
- lib/brain/timeline.ts
Hybrid search
Vector similarity and keyword search are fused with reciprocal-rank fusion for results that beat either signal alone.
- Module
- lib/brain/search.ts
Brain health
Built-in checks (table reachable, memories present, embeddings present, recall working) roll up into a single health score.
- Module
- lib/brain/doctor.ts
Random thoughts
When played with, BMO thinks out loud on its own: it recalls what it knows, muses one short line in its own voice, speaks it, and remembers the thought — a self-feeding inner life. gbrain has no named skill for this; it is their dream-cycle idea made spontaneous.
- Module
- lib/thoughts.ts · /api/brain/idle-thought
Signal flow
From input to personality.
An event arrives
A touch on the TTP223 pad (tap, hold, long-hold) or a microphone capture creates one small input event for the firmware to react to.
Firmware turns it into a mood
The ESP32-C3 debounces and classifies the event, then maps it to a readable mood state so the screen changes immediately — no waiting on the network.
The face explains the wait
Listening, thinking, and talking are separate animated states, so during cloud work the device looks busy and alive instead of frozen.
Local or cloud decides the response
Quick gestures play instant baked clips on-device. A held-touch question goes to the cloud brain — the full voice round-trip is detailed in the Voice section above.
Expression states
The face is the status light.
A tap becomes a mood.
The touch pad wakes the face first, then lets BMO answer with a small, readable reaction.
tiny eye dart, gasp mouth, then a soft return to idleTTP223 -> mood triggerListening has its own face.
While the mic records you, the face shows a live, glitchy "tuned-in" state so the device clearly looks like it is hearing you.
X-eyes, an open alert mouth, listening marks, and a subtle screen glitchI2S mic -> brain routeA pause can feel alive.
During the network round-trip the mouth becomes a pulsing processing orb so the wait reads as computing, not freezing.
focused squint eyes, a breathing orb mouth, and an occasional glitch flickerESP32-C3 -> brain APIThe mouth follows the voice.
Real lip-sync drives the mouth from the live audio loudness, with open, lively eyes — BMO looks like it is actually speaking.
mouth opening tracks the streamed voice; soft slow blinkreply audio -> live envelopeLong hold goes bashful.
The long press becomes a shy expression instead of a loud particle explosion.
lowered eyes, small smile tilt, restrained cheekslong hold -> shy loopBuild notes
The rules that keep BMO small.
Firmware target
PlatformIO builds the ESP32-C3 Super Mini firmware. The live target is the esp32c3_supermini environment.
Display budget
The face renders into a 160x128 RGB565 buffer, roughly 40 KB, then flushes over hardware SPI.
Voice budget
Use tiny baked clips for instant reactions and streamed PCM16 for generated replies, so the ESP32 does not store big audio.
Secrets
Wi-Fi, deployed origin, and the plaintext device fingerprint stay in the gitignored firmware secrets file.
Public/private split
The landing and wiki are public. Operator controls stay unlisted, login-protected, and outside the sitemap.
References
Code and memory inspiration.
The project repo is the source of truth for this build. GBrain is referenced as inspiration for the memory-shaped brain idea.