April 2026 • allstar whisper machine learning python amateur radio

iNode: Transcribing and Identifying Every Voice on My AllStar Node

My AllStar node 604010 is connected to the N9IAA repeater on 146.685 and carries traffic all day. I have always been curious what I missed when I was not listening. Earlier this year I decided to do something about that and built iNode -- a pipeline that records every transmission, transcribes it with Whisper, identifies who is talking using speaker diarization, decodes MDC1200 PTT IDs, and displays everything in a web UI that looks like a chat window.

This is the writeup of how that came together over about six weeks, what worked, what did not, and where it ended up.

The Idea

The starting point was simple: I wanted a searchable log of what was said on the node. Not just who keyed up and when -- I wanted the actual words, attributed to actual people.

Whisper is OpenAI's open source speech-to-text model. It runs locally, it is good, and it handles the kind of audio quality you get off a repeater reasonably well. PyAnnote is a speaker diarization library -- it takes an audio file and tells you which parts of it belong to which speaker, using voice embeddings. Put those two things together and you get transcribed, speaker-attributed radio traffic.

The architecture I landed on:

AllStar Pi (604010)
  records .gsm audio files via archivedir in rpt.conf
  NFS-exports the audio directory to the OptiPlex

OptiPlex
  inotifywait detects new .gsm files
  ships them to Oracle Cloud ARM via SCP
  Oracle runs Whisper + PyAnnote
  results sync back to SQLite on the OptiPlex

Flask web app on the OptiPlex
  SMS-style chat UI
  speaker labeling
  audio playback
  search

The reason for Oracle: the OptiPlex has an i7 and an ATI GPU that is too old to be useful for ML inference. Whisper and PyAnnote are slow on CPU. Oracle Cloud has a free ARM instance with enough grunt to handle the transcription workload without tying up the OptiPlex.

Getting the Audio Off the Node

The first step was configuring AllStar to archive audio. This is controlled by the archivedir parameter in rpt.conf on the AllStar Pi. Set it to a path, restart Asterisk, and it starts writing .gsm files -- one per transmission, named with a 14-digit timestamp.

The Pi exports that directory over NFS and the OptiPlex mounts it. Simple and reliable. The OptiPlex runs inotifywait on the mount point and fires the pipeline whenever a new file appears.

One thing that burned time early on: I was looking at mixmonitor as the setting that controlled recording, because that name made sense to me. It is not. mixmonitor is an Asterisk dialplan application, completely unrelated to rpt.conf. The right setting is archivedir. Once that was in the right place everything worked.

The Inference Pipeline

The inference script -- inode-infer.py -- runs on Oracle and does a few things in sequence for each audio file:

Convert the .gsm file to PCM using sox
Run the MDC1200 decoder on the raw PCM to check for a PTT ID
Run PyAnnote to get a speaker embedding
Run Whisper to transcribe the audio
Write everything to SQLite

Speaker matching works by comparing the PyAnnote embedding for a new transmission against stored embeddings for known speakers using cosine similarity. When the score is high enough it assigns the transmission to that speaker. When it is not, it creates a new SPEAKER_?? entry.

MDC1200 is a signaling system that Motorola radios use to send a unit ID when you key up. A lot of the traffic on the N9IAA repeater comes from people running MDC-equipped Motorola portables. When the decoder picks up an ID and that ID has been seen before from a known speaker, it takes priority over the voice match. That improves attribution accuracy considerably for those radios.

The Hardware Problem

Getting Oracle set up was its own adventure. The free tier ARM instance has enough CPU for inference but only 6GB of RAM, which rules out running an LLM there for summarization. Upgrading to pay-as-you-go was the path to get access to the instance type I needed, and the Oracle account setup process is its own obstacle course.

Once it was running the pipeline was fast enough -- one transmission takes about 20-30 seconds to process end to end on the Oracle ARM. The backlog from the first few days of archiving took a while to chew through but it got there.

Speaker Identification Problems

Getting Whisper working was straightforward. Speaker identification was not.

The recurring problem was that PyAnnote would assign new transmissions to SPEAKER_?? even when they were clearly from someone already in the database. The embeddings were being computed correctly but the cosine similarity threshold was too aggressive, and there were also cases where the PyAnnote pipeline was failing silently and not generating an embedding at all.

When that happens you end up with a database full of orphaned SPEAKER_?? entries that all need to be manually reviewed and merged. I built a few tools for that -- inode-dedup.py to merge duplicates, inode-reembed.py to re-run PyAnnote on utterances that missed their embedding the first time, inode-backlog.sh to push stuck transmissions back through the Oracle pipeline.

The web UI has a labeling interface. When you identify a SPEAKER_?? as a known callsign, it stores that as a confirmed embedding and uses it preferentially for future matches. Building up a set of confirmed embeddings per speaker makes the matching progressively more accurate over time.

It is not perfect. Repeater audio has noise, the levels vary between radios, and some people's voices are just hard to distinguish from each other. But it is good enough to be useful and it keeps improving as more confirmed labels accumulate.

The MDC ML Detour

At one point I went down a side path trying to build a machine learning based MDC1200 decoder. The C-based decoder I was using worked but was not super reliable on noisy audio. The idea was to train a classifier on spectrograms of MDC bursts to get better accuracy.

I generated a labeled dataset from the AllStar audio archive, got a training pipeline running on Oracle, and it did not work well enough to be worth deploying. The C decoder, tuned properly, ended up being more reliable than anything the ML approach produced in a reasonable amount of time. That work is on hold.

The Web UI

The frontend is a Flask app serving a single-page chat interface. Transmissions appear as chat bubbles, newest at the bottom. Each bubble shows the speaker, the transcript, and a timestamp. There is a play button that retrieves the original audio from the archive.

One thing the UI can do that I did not expect to use as much as I do: it can retransmit a recorded audio file back over the air through the AllStar node. That turned out to be useful for testing and for the occasional situation where you want to replay something for someone who missed it.

Hourly Summaries

The thing I use most is the hourly summary. A cron job runs at the top of every hour, queries the last hour of traffic from SQLite, passes the transcripts to Claude via the Claude Code CLI, and sends the result to Telegram. It reads like a radio net summary -- who was on, what was talked about, anything notable.

This had a failure mode worth documenting. Claude Code auto-updated one day and left the symlink at ~/.local/bin/claude pointing at a deleted version directory. The cron job hit command not found, stderr was going to /dev/null, and the script fell back to just dumping raw transcripts to Telegram instead of a summary. It took a while to figure out what had happened because there was no visible error anywhere. Fixed it by repointing the symlink and adding a separate cron job that refreshes it after every auto-update.

Where It Is Now

iNode has been running in production since late April 2026. The database has thousands of utterances. Speaker identification works well for the regulars on the N9IAA net -- the people who are on every day have enough confirmed embeddings that their transmissions are almost always attributed correctly. Visitors and occasional check-ins still end up as SPEAKER_?? until someone labels them.

The code is at github.com/ki9ng/inode-public. It is specific to my setup but the architecture is general enough that someone else could adapt it.

Like everything else on this list, I did not write the code. I described the pipeline, worked through problems as they came up, and Claude wrote the implementation. Six weeks of sessions to get from nothing to a working production system.

73
Bill KI9NG | EN61 | DMR 3200395

← back to field notes