A powerful real-time speech translation Android app that combines cutting-edge audio processing with machine learning to break down language barriers.
- Real-time Voice Activity Detection - Advanced WebRTC VAD for accurate speech detection
- Multi-speaker Recognition - Identify and track different speakers in conversations
- Audio Visualization - Real-time waveform and spectrum analysis
- Rust-powered Audio Engine - High-performance native audio processing
- Modern Android UI - Built with Jetpack Compose for a smooth user experience
- Offline-first Architecture - Works without internet connectivity (planned)
Vokala uses a hybrid architecture combining the best of native Android development with high-performance Rust:
- Android Frontend: Kotlin + Jetpack Compose for the UI
- Rust Core: Native audio processing with WebRTC VAD and signal analysis
- ML Pipeline: TensorFlow Lite integration for on-device inference
- JNI Bridge: Seamless communication between Kotlin and Rust
- Rust audio processing with WebRTC VAD
- Real-time audio visualization
- Speaker identification framework
- Audio debug panel
- Whisper model integration for offline ASR
- Language identification
- Mobile performance optimization
- Multi-language support
- Phase 3: Neural Machine Translation
- Phase 4: Voice Cloning & Synthesis
- Phase 5: UI/UX Enhancement
- Phase 6: Performance Optimization & Testing
- Language: Kotlin
- UI Framework: Jetpack Compose
- Architecture: MVVM with ViewModels
- Build System: Gradle with Kotlin DSL
- Min SDK: 26 (Android 8.0)
- Target SDK: 35
- Audio Processing: WebRTC VAD, RustFFT, Spectrum Analysis
- ML Framework: Tract (TensorFlow/ONNX)
- Signal Processing: DASP, Rubato, RealFFT
- Threading: Crossbeam for concurrent processing
- TensorFlow Lite for on-device ML
- JTransforms for audio processing
- Material 3 Design System
- Kotlin Coroutines for async operations
- Android Studio Arctic Fox or later
- Rust toolchain (for building
lingua_core) - Android NDK 27.2.12479018
- JDK 17
-
Clone the repository
git clone https://github.com/yourusername/vokala.git cd vokala -
Install Rust and Android targets
curl --proto '=https' --tlsv1.2 -sSf https://sh.rustup.rs | sh rustup target add aarch64-linux-android armv7-linux-androideabi x86_64-linux-android
-
Build and run
./gradlew assembleDebug ./gradlew installDebug
vokala/
βββ app/ # Android application
β βββ src/main/java/com/vokala/
β β βββ MainActivity.kt # Entry point
β β βββ core/ # Core business logic
β β β βββ models/ # Model management
β β β βββ speaker/ # Speaker identification
β β β βββ translation/ # Translation engine
β β βββ ui/ # Compose UI components
β β βββ main/ # Main screen & viewmodel
β β βββ components/ # Reusable UI components
β β βββ theme/ # App theming
βββ lingua_core/ # Rust audio processing library
β βββ src/
β β βββ lib.rs # JNI interface & main logic
β β βββ speaker.rs # Speaker identification
β βββ Cargo.toml # Rust dependencies
βββ models/ # ML models directory
The Rust-powered audio engine provides:
- Voice Activity Detection: WebRTC VAD with configurable sensitivity
- Real-time Analysis: FFT-based spectrum analysis and visualization
- Speaker Identification: Embedding-based speaker recognition
- Audio Preprocessing: Noise reduction and signal enhancement
- Permissions: Automatic audio recording permission handling
- Real-time UI: Smooth audio visualization with Compose Canvas
- Background Processing: Efficient audio pipeline with minimal UI blocking
- Model Management: On-device ML model loading and inference
cd lingua_core
cargo build --release --no-default-features# Rust tests
cd lingua_core
cargo test
# Android tests
./gradlew test
./gradlew connectedAndroidTest- Audio debug panel for testing voice detection
- Real-time audio metrics display
- Speaker identification visualization
- Performance monitoring
- Minimum: Android 8.0 (API 26)
- Architecture: ARM64, ARMv7, x86_64
- Permissions: Microphone access required
- Fork the repository
- Create a feature branch (
git checkout -b feature/amazing-feature) - Commit your changes (
git commit -m 'Add amazing feature') - Push to the branch (
git push origin feature/amazing-feature) - Open a Pull Request
This project is licensed under the MIT License - see the LICENSE file for details.
- WebRTC team for the excellent VAD implementation
- OpenAI for Whisper speech recognition
- The Rust audio processing community
- Android Jetpack Compose team
Note: Vokala is currently in active development. Some features may be experimental or incomplete. Check the progress tracking for the latest updates.