frequent dropouts during voice generation.

#6
by ziozzang - opened

This issue becomes more apparent when the sentences get longer. In a text of about 100-200 characters, the audio skips at least one or two characters. It also frequently jumps over or omits numbers, especially in the thousands or ten-thousands range.

Although the overall voice quality is clean and consistent, this skipping problem is quite severe. It seems highly probable—almost guaranteed—that a dropout will occur at least once per generation.

esp. use case 'Korean'

Supertone org

Hello! Reading numbers can be challenging for the current model, as it does not use a dedicated text normalizer and the training data volume is not yet sufficient to handle these cases robustly. As a workaround, you can apply your own text normalization for numbers if needed. Also, the model can occasionally exhibit skip/repeat issues. We're aware of these limitations and are working on improvements. We hope to release an improved model as soon as possible.

Sign up or log in to comment