# T. Okamoto, T. Toda and H. Kawai,
# ”E2E-S2S-VC: End-to-end sequence-to-sequence voice conversion,”
# in Proc. Interspeech, Aug. 2023, pp. 2043-2047.
# https://www.isca-speech.org/archive/interspeech_2023/okamoto23b_interspeech.html

Source code of E2E-S2S-VC models for ESPnet2 including a recipe for CMU
ARCTIC Datebases with a sampling frequency of 24 kHz.

bdl (male) -> slt (female) / slt (female) -> bdl (male) conversion can
be realized by Voice Transformer Network (VTN), Conformer-based
Fastspeech 2 (CFS2), CFS2 with modifiled valiance adaptor (CFS2'),
VITS-VC, and JETS-VC.

1. Install ESPnet
https://espnet.github.io/espnet/installation.html

2. Copy egs2 and espnet2 to the installed espnet

3. run espnet/egs2/cmu_arctic/vc1 

The recipe for Hi-Fi-CAPTAIN will soon be released.
https://ast-astrec.nict.go.jp/en/release/hi-fi-captain/
