Open Sources

Curated repos, tools, and frameworks shaping the developer ecosystem.
Live data from GitHub.

LuxTTS | Open Source Review | Trending Society

Back to Open Source

LuxTTS

by ysharma3501

4.2kPython

A high-quality rapid TTS voice cloning model that reaches speeds of 150x realtime.

View on GitHub

LicenseApache-2.0

Stars4,203

Forks543

Contributors4

Last pushJun 5, 2026

About this project

LuxTTS

LuxTTS is an lightweight zipvoice based text-to-speech model designed for high quality voice cloning and realistic generation at speeds exceeding 150x realtime.

https://github.com/user-attachments/assets/a3b57152-8d97-43ce-bd99-26dc9a145c29

The main features are

Voice cloning: SOTA voice cloning on par with models 10x larger.
Clarity: Clear 48khz speech generation unlike most TTS models which are limited to 24khz.
Speed: Reaches speeds of 150x realtime on a single GPU and faster then realtime on CPU's as well.
Efficiency: Fits within 1gb vram meaning it can fit in any local gpu.

Usage

You can try it locally, colab, or spaces.

Simple installation:

git clone https://github.com/ysharma3501/LuxTTS.git
cd LuxTTS
pip install -r requirements.txt

Load model:

from zipvoice.luxvoice import LuxTTS

# load model on GPU
lux_tts = LuxTTS('YatharthS/LuxTTS', device='cuda')

Related Projects

hermes-agent

The agent that grows with you

197.3k

yt-dlp

A feature-rich command-line audio/video downloader

171.6k

import soundfile as sf
from IPython.display import Audio

text = "Hey, what's up? I'm feeling really great if you ask me honestly!"

## change this to your reference file path, can be wav/mp3
prompt_audio = 'audio_file.wav'

## encode audio(takes 10s to init because of librosa first time)
encoded_prompt = lux_tts.encode_prompt(prompt_audio, rms=0.01)

## generate speech
final_wav = lux_tts.generate_speech(text, encoded_prompt, num_steps=4)

## save audio
final_wav = final_wav.numpy().squeeze()
sf.write('output.wav', final_wav, 48000)

## display speech
if display is not None:
  display(Audio(final_wav, rate=48000))

import soundfile as sf
from IPython.display import Audio

text = "Hey, what's up? I'm feeling really great if you ask me honestly!"

## change this to your reference file path, can be wav/mp3
prompt_audio = 'audio_file.wav'

rms = 0.01 ## higher makes it sound louder(0.01 or so recommended)
t_shift = 0.9 ## sampling param, higher can sound better but worse WER
num_steps = 4 ## sampling param, higher sounds better but takes longer(3-4 is best for efficiency)
speed = 1.0 ## sampling param, controls speed of audio(lower=slower)
return_smooth = False ## sampling param, makes it sound smoother possibly but less cleaner
ref_duration = 5 ## Setting it lower can speedup inference, set to 1000 if you find artifacts.

## encode audio(takes 10s to init because of librosa first time)
encoded_prompt = lux_tts.encode_prompt(prompt_audio, duration=ref_duration, rms=rms)

## generate speech
final_wav = lux_tts.generate_speech(text, encoded_prompt, num_steps=num_steps, t_shift=t_shift, speed=speed, return_smooth=return_smooth)

## save audio
final_wav = final_wav.numpy().squeeze()
sf.write('output.wav', final_wav, 48000)

## display speech
if display is not None:
  display(Audio(final_wav, rate=48000))

Open Sources

LuxTTS

About this project

LuxTTS

The main features are

Usage

Simple installation:

Load model:

Related Projects

hermes-agent

yt-dlp

Simple inference

Inference with sampling params:

Tips

Community

Info

Roadmap

Acknowledgments

Final Notes

stable-diffusion-webui

Open Sources

We read 100+ sources so you don't have to.

LuxTTS

About this project

LuxTTS

The main features are

Usage

Simple installation:

Load model:

Related Projects

hermes-agent

yt-dlp

Simple inference

Inference with sampling params:

Tips

Community

Info

Roadmap

Acknowledgments

Final Notes

stable-diffusion-webui