# CMU ARCTIC RECIPE with a sampling frequency of 24 kHz

This is the recipe of the VC model with the CMU ARCTIC databases with a sampling frequency of 24 kHz (http://festvox.org/cmu_arctic/cmu_arctic/orig/).

1. Prepare data
./prepare.sh

2. Train Voice Transformer Network (VTN)
./train_vtn.sh
# In case of s=bdl and t-slt, the source and target speakers are bdl and slt, respectively.

3. Train VITS-VC
./train_vits_vc.sh

4. Train JETS-VC
./train_jets_vc.sh

5. Infer VTN with teacher forcing for CFS2 and CFS2' (after 2 is finished)
./infer_vtn_use_teacher_forcing.sh

6. Train CFS2 (after 5 is finished)
./train_cfs2_vc.sh

7. Train CFS2' (after 5 is finished)
./train_cfs2_vc_prime.sh

8. Train CFS2'+HiFi-GAN (joint training) (after 5 is finished)
./joint_train_cfs2_vc_prime.sh

9. Train CFS2'+HiFi-GAN (joint finetuning) (after 7 is finished)
./joint_finetune_cfs2_vc_prime.sh

10. Infer CFS2 (after 6 is finished)
./infer_cfs2_vc.sh

11. Infer CFS2'+HiFi-GAN (joint training) (after 8 is finished)
./infer_cfs2_vc_prime_jt.sh

12. Infer CFS2'+HiFi-GAN (joint finetuning) (after 9 is finished)
./infer_cfs2_vc_prime_ft.sh

13. Infer VITS-VC (after 3 is finished)
./infer_vits_vc.sh

14. Infer JETS-VC (after 4 is finished)
./infer_jets_vc.sh
