Voice Conversion

Our project is a voice conversion project where a spoken sentence of choice will be transformed into the same sentence with an accent. The choices of accents will be the Midwest, Southern, or British accent. Due to the scope of this project, we’ve chosen that we will take samples of speakers before our presentation rather than making a real-time analysis application.

This project will be accomplished through a widespread application of DSP knowledge. A rough outline is as follows:

  • Accent Quantification
    • For each of the accents that we can map to, we collect multiple samples speaking a text which contains all of the phonemes in the English language.
    • We use Dynamic Time Warping to locate each phoneme in each of our voice samples.
    • We create a canonical MFCC coefficient for each phoneme of the accent.
  • Calibration
    • Speaker reads a text which contains all of the phonemes in the English language.
    • Through DTW, we can tell which phonemes are being spoken at any point in time.
    • We can then identify a characteristic MFCC vector for each phoneme.
  • Map Definition
    • We define a map from each phoneme in the speaker’s voice to the corresponding voice in the chosen accent.
  • Transformation
    • The speaker speaks any sentence.
    • Using our characteristic MFCC vectors obtained during calibration, we identify the phoneme spoken at each point in time.
    • Using our map, we replace the spoken phoneme with a phoneme modified to be accented.