Does my audio get uploaded to a server?

No. The AI model runs in your browser. Your audio is decoded, transcribed, and processed entirely on your device. The only network request is the initial model download, and that is cached after the first use.

How accurate is the transcription?

The tool uses Whisper tiny, which is the smallest and fastest version of the model. It handles clear English speech well. Heavy accents, background noise, or overlapping speakers can reduce accuracy. For censoring purposes, you are reviewing the transcript before applying bleeps, so you can catch and correct any misses.

What audio formats are supported?

MP3, WAV, OGG, FLAC, and M4A. Any format your browser can decode will work.

How long can my recording be?

There is no hard limit. The tool processes audio in 30-second chunks, so longer files take proportionally longer. A 5-minute recording processes in about a minute on most devices. A 60-minute recording will take longer but will still work.

Can I use this on my phone?

Yes. The tool runs in any modern browser, including Safari on iOS and Chrome on Android. The model download is the same 40MB, so a Wi-Fi connection helps for the first use.

Is the profanity list customizable?

The built-in list covers common English profanity. If you need to bleep words that are not on the list, switch to Manual Redact mode and click the words yourself. You can also use Auto Censor mode as a starting point and then click additional words to add them.

Back to blog

EngineeringJune 26, 2026·6 min read

Bleep and redact audio without uploading it anywhere

An AI model runs in your browser, transcribes your audio, and bleeps the words you pick. No server, no upload, no account. Your recording stays on your device.

You have a recording with someone's name in it. Or a swear word. Or a home address. You need to bleep it out before you can share the file. So you search for an audio censoring tool and find one that asks you to upload the recording.

The whole reason you want to censor the audio is because it contains something sensitive. And the first thing the tool does is copy that audio to a server you have no control over.

This is the default state of audio redaction tools in 2026. You hand over the thing you're trying to protect, and you hope the server deletes it afterward.

We built Orec's audio censor tool to work differently. The AI model runs in your browser. Your audio never leaves your device. There is no server in the loop.

How it works

The tool uses Whisper, an open-source speech recognition model from OpenAI, compiled to run directly in the browser through Transformers.js and ONNX Runtime. When you first use the tool, it downloads the model (about 40MB). After that, it caches on your device and loads instantly on every future visit.

Once the model is ready, you drop in an audio file. The tool transcribes it locally and shows you every word. Each word has a start time and an end time, down to the hundredth of a second. You pick the words you want to bleep, hit apply, and the tool replaces those exact time ranges with a 1kHz tone. Sample-precise.

The entire pipeline runs in your browser tab. The audio file stays in memory on your device. The transcription happens on your CPU. The bleep tone is generated mathematically. Nothing touches a network connection.

Two modes for different jobs

The tool has two modes. Auto Censor checks your transcript against a built-in profanity list and highlights matches automatically. Manual Redact starts with a clean transcript and lets you click any word to mark it.

Auto Censor is for content creators who recorded a podcast or video and need a clean version for platforms that flag profanity. YouTube, TikTok, Instagram. You upload your recording, the tool finds the swear words, and you bleep them in one click.

Manual Redact is for everything else. A teacher who recorded a classroom session and needs to remove student names before sharing it (FERPA requires this). An HR manager who recorded an interview and needs to strip identifying details. A journalist who needs to protect a source's identity in an audio clip. A lawyer preparing deposition audio for review.

Same engine underneath. The difference is just which words start highlighted.

Word-level timestamps are the key

Most people think of transcription as turning audio into text. A paragraph of words. But Whisper can do something more specific: it can tell you where each word starts and ends in the audio stream.

When Whisper transcribes "and then she said damn it" with word-level timestamps, it returns something like this: "damn" starts at 14.32 seconds and ends at 14.58 seconds. The tool takes those two numbers, calculates the exact samples in the audio buffer, and replaces them with a generated 1kHz sine wave. The bleep has a 5-millisecond fade-in and fade-out so it blends with the surrounding audio instead of clicking.

This is why the results sound clean. The bleep covers exactly the word and nothing more. No guessing, no manual scrubbing through a waveform editor, no cutting too early or too late.

The one-time setup

Downloading an AI model sounds like a big deal. 40MB is a real download. But it only happens once.

The first time you use the tool, you see a progress bar that says "Setting things up for you. This is a one-time thing." The model downloads, gets cached by your browser, and stays there. The next time you visit, the model loads from cache in a few seconds.

This is important because it means the tool gets faster over time. Your first use takes a minute. Your tenth use takes a few seconds. And the model never needs to phone home. Once it is on your device, it works offline.

Browser AI is the right call for privacy tools

There is a pattern forming. Any tool that processes sensitive content should run locally. The server is the vulnerability.

When you upload audio to a server for processing, you are trusting that the server handles your data correctly. You are trusting that it gets deleted after processing. You are trusting that the company behind it will not change its privacy policy six months from now. You are trusting that no breach will expose your files.

Running the model in the browser removes all of those trust requirements. The audio goes from your microphone (or your file) to your browser's memory, gets processed, and the result goes back to you. The server serves the webpage and the model file. After that, it has no role.

This approach has tradeoffs. Browser-based AI is slower than server-side inference. The model is smaller (Whisper tiny, not Whisper large) so the transcription accuracy is lower. Long recordings take more time to process. But for a censoring tool, these tradeoffs are worth it. You do not need a perfect transcript. You need to find the words you want to bleep, and Whisper tiny is accurate enough for that.

When to use this

You recorded a podcast episode and dropped an F-bomb at the 38-minute mark. You do not want to open a DAW, scrub through the waveform, find the word, and manually edit it out. You want to upload the file, see the word highlighted, and click bleep.

You are a teacher and you recorded a parent-teacher conference. Before you can share the recording with your school administration, you need to remove the student's name from the audio. FERPA compliance is a legal requirement, and every audio tool you have found so far wants you to upload the file to a server.

You are a journalist and a source agreed to an on-record interview but asked you to remove their name from the final cut. You need a tool that will not create a copy of the unredacted audio on someone else's infrastructure.

For all of these, the audio censor tool works. Drop the file, pick the words, bleep them, download the result. Your audio stays on your device the entire time.

If you need to record new audio first, the Orec recorder is one tap away. Same local-first approach. No account, no upload, no limits.

Frequently
asked questions.

All posts