Inputs
string: Text to generate speech from.
string: Generation voice (F1-F5 for female, M1-M5 for male).
string?: Generation language code.
0.72
float32?: Speech speed multiplier.
int32?: Number of diffusion steps (higher = better quality, slower).
Press and hold for realtime mode.
Outputs
float32: Linear PCM audio samples with shape (F,) and sample rate 22050Hz.