Inputs
string: Text to generate speech from.
string: Generation voice.
string?: Generation language.
0.252
float32?: Voice speed multiplier.
Press and hold for realtime mode.
Outputs
float32: Linear PCM audio samples with shape (F,) and sample rate 24KHz.