Audio Tools
Explore the best new Audio tools and products curated by the community.
Curlo is a privacy-first macOS app for searching, previewing, and organizing large sound libraries. Find SFX or music by describing what you want to hear, search for similar sounds, edit metadata & UCS, manage tags, and keep everything fully local on your Mac.
A macOS menu-bar app that turns any conversation into a clean markdown transcript, with a local speech model running entirely on-device. One global shortcut brings up a small bar at the bottom of your screen. It captures your mic and the system audio as separate tracks, labels who said what, and lets you flag key moments mid-call that sit inline at the right timestamp. No bot joins the call, nothing leaves your Mac, no account, no subscription.
Introducing Parrot: Ringg’s speech-to-text model for production-grade voice agents. Capture Hindi-heavy and noisy real-world conversations with low-latency inference, stronger transcript quality, and Hindi validation built for downstream workflows.
Tweaking knobs is a time-honored tradition in sound design. Chatting with AI is revolutionizing industries. JAMtime.ai embraces both, while keeping the human firmly in the driver's seat. Build and tweak your guitar pedal with phrases as simple or technical as you like, from "brighter" to "comb filter into a plate reverb." The AI writes a real DSP graph, not generated audio. Come fall in love with the JAMtime.ai workflow. Then take it to your DAW with free VST/AU plugins for Mac, Windows, Linux.
Insta360 Mic Pro is a pro wireless mic with a customizable E-Ink display, 3-mic array, AI noise canceling, directional pickup modes, 32-bit float internal recording, timecode sync, 400m range, and multi-cam creator workflows.
Voiser helps creators, teams, and businesses turn text into the most human like AI voiceovers. With 140+ languages, 1000+ voices, emotional voice styles, custom instructions, and fast generation, you can create realistic voiceovers for videos, ads, training content, podcasts, and global projects in minutes.
Download 👉 https://github.com/sunapp-ai/sun-to-spotify SUN-to-Spotify is a skill that lets you generate AI podcasts, audiobooks, and then publish them directly to your Spotify library for streaming or offline listening. Just describe what you want to hear: startup advice, history deep dives, philosophy, news, or custom learning content, and SUN creates a personalized audio experience in minutes. Built for creators, developers, and curious minds exploring the future of AI native audio.
A TTS model should give you two things: an oscar-worthy performance and a verifiable signature to prove it's yours. DramaBox is the first to do both. Describe a scene the way you would to an actor, like 'a talk show host gasps in mock shock, bursts into laughter,' and the model interprets it as performance. Every output is watermarked with Resemble Watermarker. Open source, English-only for now, find it in your Resemble account or on Hugging Face.
Ready-made creative workflows. Upload your input, pick a template, get a finished asset - product shots, mockups, style transfers, character sheets, and more.
For busy professionals who can't remember important details, Chronicle is a personal AI memory system that lets you voice-record facts, ideas, and information and instantly retrieve them with natural language questions. Unlike journaling apps focused on introspection and mood tracking, Chronicle is designed for total recall with minimum friction—capturing what you need to remember, not how you feel.
Pop makes voice notes first class in everyday messaging. Amazing transcripts, a magic editor to summarise or clean up, edit the audio of your notes by editing the transcript & more.
Every time I jumped between Spotify, Zoom, and YouTube I had to manually switch audio outputs. It drove me crazy. So I built Sound Warden. It lives in your menu bar and automatically routes each app to the audio device you want. Set it once, forget it forever. ✅ Per-app audio routing ✅ Menu bar — always accessible ✅ Lightweight, no background bloat Built for anyone who uses multiple audio devices daily.
Got multiple phones or tablets lying around? MUSIXQUARE turns them into a synchronized surround sound system. Right in the browser, no install needed. Assign each device a role (Left, Right, Center, Subwoofer) and play the same track perfectly in sync across all of them. Share local files, YouTube videos, or even your system audio with everyone in the room. More Devices, Richer Sound. Free to start! Works on any device with a modern browser. Try it at musixquare.com
The fastest path to a working Voice Agent, built on the most accurate Voice AI in the market. Stream audio in, get audio back. We handle the rest. ~1s latency. Best-in-class accuracy on the stuff that matters (numbers, emails, names). Tool calling that doesn't go silent. Mid-call prompt + voice + tool updates. $4.50/hr flat. No per-token. No concurrency caps. Most devs ship a working agent the same day.
A state-of-the-art voice model built for complex, multi-step workflows with snappy responses and high accuracy.
Grok now offers standalone Speech-to-Text and Text-to-Speech APIs for developers. The new voice stack covers real-time and batch transcription, multispeaker diarization, multichannel audio, text formatting, expressive TTS with speech tags, multilingual support, and simple usage-based pricing.
SFX Stacks is a desktop app for searching large local SFX libraries. Instead of relying only on filenames, metadata, or folder browsing, it lets you describe the sound you need and explore similar sounds, making search faster and improving discovery. Built for sound designers, game audio, and other audio workflows with large local libraries.
Avec is the free AI email app that lets you handle your Gmail inbox in seconds! (1) Smart filtering: Avec surfaces the emails that need your attention first, and learns your preferences over time. (2) Write with your voice: Record a quick voice note and let Avec turn it into a clear email that sounds like you. (3) Clear your inbox: Not every email deserves the same attention. After you’ve handled the important ones, clear the rest with a swipe. Unsubscribe and block spammy senders with one tap.
Google's TTS API with inline audio tags, multi-speaker dialogue, and 70+ language support. For developers building voice agents, dubbing tools, or AI content products via the Gemini API and Vertex AI.
100% private on-device voice models for speech-to-text and meeting transcription on macOS. No cloud APIs, no data leaves your machine without your explicit permission.
YouTube has speed control, captions, auto-translate — but no accent control. Now it does. Free Chrome extension, on-device AI, one toggle.