Nonparallel Emotional Speech Conversion

Note: The current model only supports emotional voice conversion between two emotions. For a good result, the training data is suggested to be more than 100 sentences per emotion. The source and target utterances are supposed to be pronounced by the same person. There is no restriction on the length of training and testing utterances; the default value is 0.5-20 seconds.

Please upload the audio files (*.wav, *.mp3, *.m4a) to the following 4 folders. It will take about 24 hours to train the model. Converted speech will be generated and uploaded into the "Result" folder in 3 days. If there are multiple requests, it will be first come first served.

If you have any question, please contact me.