You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Adds working multiprocessing and a preliminary but working documentation based on the PyData theme
Co-authored-by: Konstantin (Tino) Sering <[email protected]>
Copy file name to clipboardExpand all lines: README.rst
+98-3Lines changed: 98 additions & 3 deletions
Display the source diff
Display the rich diff
Original file line number
Diff line number
Diff line change
@@ -7,7 +7,12 @@ Readme
7
7
8
8
This package supplies the necessary functions in order to synthesize speech
9
9
from a phonemic transcription. Furthermore, it defines helpers to improve the
10
-
result if more information as the pitch contour is available.
10
+
result if more information as the pitch contour is available. It is especially useful when working with
11
+
the `PAULE <https://github.com/quantling/paule>`__ framework.
12
+
13
+
Currently the package supports the following languages:
14
+
- German
15
+
- English
11
16
12
17
Version 2.0.0 and later
13
18
-----------------------
@@ -30,12 +35,102 @@ functions from top to bottom. The functions are supplied by the other files.
30
35
Please use the VTL api directly.
31
36
32
37
38
+
Minimal Example
39
+
===============
40
+
Given a german Corpus with the following structure which is what the `Mozilla Common Voice project <https://commonvoice.mozilla.org>`__ provides:
41
+
42
+
.. code:: bash
43
+
44
+
corpus/
45
+
├── validated.tsv # a file where the transcripts are stored
46
+
├── clips/
47
+
│ └── *.mp3 # audio files (mp3)
48
+
└── files_not_relevant_to_this_project
49
+
50
+
If you run the following command the package will align the audio files for you, and then create a pandas DataFrame with the synthesized audio and other information useful for the PAULE model,
51
+
but only for the first 100 words that occur 4 times or more. Since you use multiprocessing, no melspectrograms are generated.:
52
+
.. code:: bash
53
+
54
+
python -m create_vtl_corpus.create_corpus --corpus CORPUS --language de --needs_aligner --use_mp --min_word_count 4 --word_amount 100 --save_df_name SAVE_DF_NAME
55
+
56
+
The end product should look someting like this
57
+
58
+
.. code:: bash
59
+
60
+
corpus/
61
+
├── validated.tsv # a file where the transcripts are stored
62
+
├── clips/
63
+
│ ├── *.mp3 # mp3 files
64
+
│ └── *.lab # lab files
65
+
├── clips_validated/
66
+
│ ├── *.mp3 # validated mp3 files
67
+
│ └── *.lab # validated lab files
68
+
├── clips_aligned/
69
+
│ └── *.TextGrid # aligned TextGrid files
70
+
├── corpus_as_df.pkl # a pandas DataFrame with the information
71
+
└── files_not_relevant_to_this_project
72
+
73
+
The DataFrame contains the following columns
74
+
75
+
.. list-table:: Dataframe Labels
76
+
:header-rows: 1
77
+
78
+
* - Column Name
79
+
- Description
80
+
* - file_name
81
+
- Name of the clip
82
+
* - label
83
+
- The spoken word as it is in the aligned textgrid
84
+
* - lexical_word
85
+
- The word as it is in the dictionary
86
+
* - word_position
87
+
- The position of the word in the sentence
88
+
* - sentence
89
+
- The sentence the word is part of
90
+
* - wav_recording
91
+
- Spliced out audio as mono audio signal
92
+
* - sr_recording
93
+
- Sampling rate of the recording
94
+
* - sr_synthesized
95
+
- Sampling rates synthesized
96
+
* - sampa_phones
97
+
- The SAMPA(like) phonemes of the word
98
+
* - mfa_phones
99
+
- The phonemes as outputted by the aligner
100
+
* - phone_durations_lists
101
+
- The duration of each phone in the word as list
102
+
* - cp_norm
103
+
- Normalized CP-trajectories
104
+
* - vector
105
+
- Embedding vector of the word, based on FastText Embeddings
106
+
* - client_id
107
+
- ID of the client
108
+
109
+
33
110
Copyright
34
111
=========
35
-
As the VocalTractLabAPI.so and the JD2.speaker is GPL v3 the rest of the code
36
-
here is GPL as well. If the code is not dependent on VTL anymore you can use
112
+
As the VocalTractLabAPI.so and the JD2.speaker is under GPL v3 the rest of the code
113
+
here is GPL under as well. If the code is not dependent on VTL anymore you can use
37
114
it under MIT license.
38
115
116
+
117
+
Citing
118
+
=======
119
+
If you use this code for your research, please cite the following thesis:
120
+
121
+
Konstantin Sering. Predictive articulatory speech synthesis utilizing lexical embeddings (PAULE). PhD thesis, Universität Tübingen, 2023.
0 commit comments