fmartingr/tellme: Simple macOS speech-to-text application using Whisper

Simple macOS speech-to-text application using Whisper

Find a file

Felipe M. 1db16227b2 Initial commit		2025-09-18 19:56:06 +02:00
.github/workflows	Initial commit	2025-09-18 19:56:06 +02:00
Docs	Initial commit	2025-09-18 19:56:06 +02:00
Resources/Localizations	Initial commit	2025-09-18 19:56:06 +02:00
Scripts	Initial commit	2025-09-18 19:56:06 +02:00
Sources	Initial commit	2025-09-18 19:56:06 +02:00
Tests	Initial commit	2025-09-18 19:56:06 +02:00
.gitignore	Initial commit	2025-09-18 19:56:06 +02:00
.swiftformat	Initial commit	2025-09-18 19:56:06 +02:00
.swiftlint.yml	Initial commit	2025-09-18 19:56:06 +02:00
LICENSE	Initial commit	2025-09-18 19:56:06 +02:00
Package.swift	Initial commit	2025-09-18 19:56:06 +02:00
README.md	Initial commit	2025-09-18 19:56:06 +02:00
TECHSPEC.md	Initial commit	2025-09-18 19:56:06 +02:00
TODO.md	Initial commit	2025-09-18 19:56:06 +02:00

README.md

A macOS menu bar application that provides offline speech-to-text transcription using Whisper-family models and automatically inserts the transcribed text into the currently focused application.

Overview

Menu-Whisper is designed to be a privacy-focused, offline-first speech recognition tool for macOS. It runs entirely locally on Apple Silicon machines, requiring no internet connection during normal operation (only for initial model downloads).

Key Features

100% Offline Operation: Audio and text never leave your device
Apple Silicon Optimized: Built specifically for M1/M2/M3 processors with Metal acceleration
Global Hotkey Support: Default ⌘⇧V (configurable)
Smart Text Insertion: Clipboard paste with typing fallback
Secure Input Detection: Respects password fields and secure contexts
Multiple Models: Support for various Whisper model sizes and variants
Multilingual: Spanish and English interface and recognition

Requirements

macOS: 13.0 (Ventura) or later
Hardware: Apple Silicon (M1, M2, or M3 processor) - Intel Macs are not supported
Xcode: 15.0+ for building from source
Permissions: Microphone, Accessibility, and Input Monitoring access

Build Requirements

Development Environment

macOS 13+ with Xcode 15.0+
Swift 5.9+
Swift Package Manager (included with Xcode)

System Dependencies

AVFoundation framework (audio capture)
Carbon framework (global hotkeys)
AppKit/SwiftUI (UI components)

Third-party Dependencies

whisper.cpp (C/C++ library for speech recognition with Metal support)

Installation

Note: This project is currently in development. Pre-built binaries will be available as signed and notarized .dmg files once complete.

Building from Source

Clone the repository:
```
git clone <repository-url>
cd tellme
```
Open the project in Xcode or use Swift Package Manager:
```
swift build -c release
```
For development, open Package.swift in Xcode.

Architecture

The application is structured with modular components:

App: SwiftUI interface with AppKit bridges
Core/Audio: AVAudioEngine capture and processing
Core/STT: Speech-to-text engines (whisper.cpp, future Core ML)
Core/Models: Model management and downloads
Core/Injection: Text insertion with secure input handling
Core/Permissions: System permission management
Core/Settings: User preferences and configuration

Privacy & Security

No Telemetry: Zero data collection or remote analytics
Local Processing: All audio processing happens on-device
Secure Input Respect: Automatically detects and respects secure input contexts
Permission-Based: Requires explicit user consent for system access

Development Status

This project is currently in active development following a phased approach:

Phase 0: Project scaffolding ⬅️ Current
Phase 1: Hotkey + HUD + Audio capture
Phase 2: STT integration with whisper.cpp
Phase 3: Text insertion system
Phase 4: Preferences and UX polish
Phase 5: Distribution and packaging

See TODO.md for detailed development progress and TECHSPEC.md for complete technical specifications.

License

MIT License - see LICENSE for details.

Contributing

This project follows a structured development approach with clear phases and acceptance criteria. Please refer to the technical specification and TODO list before contributing.