招远网站建设公司地址,网站建设基础 ppt,公司建设网站带来什么,腾讯云域名备案需要提供网站建设方案书安装
下载安装github代码库
git clone https://github.com/Plachtaa/VITS-fast-fine-tuning.git安装文档 中日语言模型网站 目前支持的任务:
从 10条以上的短音频 克隆角色声音从 3分钟以上的长音频#xff08;单个音频只能包含单说话人#xff09; 克隆角色声音从 3分钟以…安装
下载安装github代码库
git clone https://github.com/Plachtaa/VITS-fast-fine-tuning.git安装文档 中日语言模型网站 目前支持的任务:
从 10条以上的短音频 克隆角色声音从 3分钟以上的长音频单个音频只能包含单说话人 克隆角色声音从 3分钟以上的视频单个视频只能包含单说话人 克隆角色声音通过输入 bilibili视频链接单个视频只能包含单说话人 克隆角色声音
本地运行和推理
python VC_inference.py --model_dir ./OUTPUT_MODEL/G_latest.pth --share True这个时候在本地的浏览器打开网址
http://localhost:7860就可以看到语音tts的使用界面但这只能在本地电脑能看到如果要在远程的电脑上访问可以使用cpolar
cpolar http 7860这个时候就会出现一个访问的网址链接。
本地训练
1.创建conda运行环境
conda create -n tts python3.82.安装环境依赖
pip install -r requirements.txt在这个过程中有一部分安装包比如OpenAI的whisper代码包可能因为网络问题而无法访问无法使用pip进行网络安装。可以在其它地方单独下载好代码包然后使用pip单独安装本地包。 3.安装GPU版本的PyTorch
# CUDA 11.6
pip install torch1.13.1cu116 torchvision0.14.1cu116 torchaudio0.13.1 --extra-index-url https://download.pytorch.org/whl/cu116
# CUDA 11.7
pip install torch1.13.1cu117 torchvision0.14.1cu117 torchaudio0.13.1 --extra-index-url https://download.pytorch.org/whl/cu1174.安装视频模块包
pip install imageio2.4.1
pip install moviepy5.构建预处理模块
cd monotonic_align
mkdir monotonic_align
python setup.py build_ext --inplace
cd ..6.下载辅助数据包
mkdir pretrained_models
# download data for fine-tuning
wget https://huggingface.co/datasets/Plachta/sampled_audio4ft/resolve/main/sampled_audio4ft_v2.zip
unzip sampled_audio4ft_v2.zip
# create necessary directories
mkdir video_data
mkdir raw_audio
mkdir denoised_audio
mkdir custom_character_voice
mkdir segmented_character_voice7.下载预训练模型
CJE: Trilingual (Chinese, Japanese, English)
CJ: Dualigual (Chinese, Japanese)
C: Chinese onlywget https://huggingface.co/spaces/Plachta/VITS-Umamusume-voice-synthesizer/resolve/main/pretrained_models/D_trilingual.pth -O ./pretrained_models/D_0.pth
wget https://huggingface.co/spaces/Plachta/VITS-Umamusume-voice-synthesizer/resolve/main/pretrained_models/G_trilingual.pth -O ./pretrained_models/G_0.pth
wget https://huggingface.co/spaces/Plachta/VITS-Umamusume-voice-synthesizer/resolve/main/configs/uma_trilingual.json -O ./configs/finetune_speaker.jsonwget https://huggingface.co/spaces/sayashi/vits-uma-genshin-honkai/resolve/main/model/D_0-p.pth -O ./pretrained_models/D_0.pth
wget https://huggingface.co/spaces/sayashi/vits-uma-genshin-honkai/resolve/main/model/G_0-p.pth -O ./pretrained_models/G_0.pth
wget https://huggingface.co/spaces/sayashi/vits-uma-genshin-honkai/resolve/main/model/config.json -O ./configs/finetune_speaker.jsonwget https://huggingface.co/datasets/Plachta/sampled_audio4ft/resolve/main/VITS-Chinese/D_0.pth -O ./pretrained_models/D_0.pth
wget https://huggingface.co/datasets/Plachta/sampled_audio4ft/resolve/main/VITS-Chinese/G_0.pth -O ./pretrained_models/G_0.pth
wget https://huggingface.co/datasets/Plachta/sampled_audio4ft/resolve/main/VITS-Chinese/config.json -O ./configs/finetune_speaker.json8.将语音数据放置在对应的文件目录
短语音 将多段语音打包成zip文件文件结构为
Your-zip-file.zip
├───Character_name_1
├ ├───xxx.wav
├ ├───...
├ ├───yyy.mp3
├ └───zzz.wav
├───Character_name_2
├ ├───xxx.wav
├ ├───...
├ ├───yyy.mp3
├ └───zzz.wav
├───...
├
└───Character_name_n├───xxx.wav├───...├───yyy.mp3└───zzz.wav将打包文件放置在./custom_character_voice/ 运行
unzip ./custom_character_voice/custom_character_voice.zip -d ./custom_character_voice/长语音 将wav格式的语音命名为Diana_234135.wav放置在./raw_audio/视频 将视频命名为Taffy_332452.mp4放置在./video_data/
9.处理音频
python scripts/video2audio.py
python scripts/denoise_audio.py
python scripts/long_audio_transcribe.py --languages {PRETRAINED_MODEL} --whisper_size large
python scripts/short_audio_transcribe.py --languages {PRETRAINED_MODEL} --whisper_size large
python scripts/resample.py注意将{PRETRAINED_MODEL}替换为C如果GPU内存没有12GB将whisper_size替换为medium或small。
10.处理文本数据 选择对应的辅助数据包运行
python preprocess_v2.py --add_auxiliary_data True --languages C如果不选择辅助数据包运行
python preprocess_v2.py --languages {PRETRAINED_MODEL}11.开始训练 运行命令开始训练
python finetune_speaker_v2.py -m ./OUTPUT_MODEL --max_epochs {Maximum_epochs} --drop_speaker_embed True如果是从一个训练过的模型开始继续训练
python finetune_speaker_v2.py -m ./OUTPUT_MODEL --max_epochs {Maximum_epochs} --drop_speaker_embed False --cont True12.清除语音数据
rm -rf ./custom_character_voice/* ./video_data/* ./raw_audio/* ./denoised_audio/* ./segmented_character_voice/* ./separated/* long_character_anno.txt short_character_anno.txtdel /Q /S .\custom_character_voice\* .\video_data\* .\raw_audio\* .\denoised_audio\* .\segmented_character_voice\* .\separated\* long_character_anno.txt short_character_anno.txt