This repository is a fork of the original DMOSpeech2 repository. The original README has been renamed to original-README.md.
- uv - Python package installer and virtual environment manager
-
Activate the virtual environment:
source setup-source-me.sh -
Run the setup script:
./scripts/setup.sh
Note: This downloads large model files (~500MB each) to the
ckpts/directory from HuggingFace.
Try the original DMOSpeech2 online without any setup:
- HuggingFace Spaces (Original Repository): https://huggingface.co/spaces/yl4579/DMOSpeech2-demo
For quick testing without local setup, use the Google Colab notebook:
- Open
DOMSpeech2_gradio_colab_GPU.ipynbin Google Colab - Run all cells to set up environment and launch Gradio interface
- Provides free GPU access for faster inference
For single-machine development and testing. Services bind to 127.0.0.1 (localhost only) for security.
python scripts/local-fastapi.py- Access API at: http://127.0.0.1:8000
- API documentation: http://127.0.0.1:8000/docs
python scripts/local-gradio.py- Access UI at: http://127.0.0.1:7860
./scripts/jupyter-lab-local.sh- Access Jupyter at: http://127.0.0.1:8888
Three notebook demos are available:
-
src/serveDMO.ipynb- FastAPI demo- Run the cell to start FastAPI server on port 8000
-
src/gradio-test.ipynb- Gradio UI demo- Run the cell to start Gradio interface on port 7860
-
DOMSpeech2_gradio_colab_GPU.ipynb- Google Colab demo with GPU support- Run DMOSpeech2 in Google Colab with free GPU access
- Includes all necessary setup and Gradio interface
To access local services from a remote machine, use SSH port forwarding:
# From your remote machine to access local services
ssh -L 7860:localhost:7860 -L 8000:localhost:8000 user@hostname
# Then access in your remote browser:
# - Gradio UI: http://localhost:7860
# - FastAPI docs: http://localhost:8000/docsThis enables microphone access and full UI functionality from remote browsers while maintaining security.
python scripts/remote-fastapi.py- Access from any device on your network: http://YOUR_IP:8000
python scripts/remote-gradio.py- Access from any device on your network: http://YOUR_IP:7860
./scripts/jupyter-lab-remote.sh- Access from any device on your network: http://YOUR_IP:8888
# Initialize voice with reference audio
curl -X POST "http://127.0.0.1:8000/init_voice" \
-F "audio_file=@reference.wav" \
-F "reference_text=Your reference text here"
# Generate speech from text
curl -X POST "http://127.0.0.1:8000/generate_audio" \
-F "target_text=This is the text I want synthesized." \
--output generated_audio.wavFor network access, replace 127.0.0.1 with your server's IP address.
- ✅ Secure: Services only accessible from the same machine
- ✅ Recommended: For development and testing
- ✅ Safe: No network exposure
- ✅ Secure: Encrypted connection to remote services
- ✅ Flexible: Access remote services as if they were local
- ✅ Best Practice: For remote access to development servers
⚠️ Caution Required: Exposes services to local network⚠️ Firewall Needed: Ensure proper network security⚠️ No Authentication: Services have no built-in security⚠️ HTTP Only: No encryption (consider HTTPS for production)
For production use, consider:
- HTTPS/SSL certificates
- Authentication and authorization
- Rate limiting and monitoring
- Reverse proxy (nginx, Apache)
- Network security hardening
- Port already in use: Change port numbers in scripts if conflicts occur
- Permission denied: Ensure scripts are executable (
chmod +x scripts/*.sh) - Module not found: Verify virtual environment is activated
- CUDA errors: Check GPU availability and PyTorch installation
- Check the original documentation in
original-README.md - Review error logs for specific issues
- Ensure all prerequisites are installed
- Original DMOSpeech2 repository: DMOSpeech2
- Additional codebase references: F5-TTS, DMD2, simple_GRPO
This fork aims to provide enhanced ease-of-use and seamless integration of DMOSpeech2 into broader workflows and user interfaces.