Skip to content

Commit 3e0828b

Browse files
Documentation merge (#5)
Adds working multiprocessing and a preliminary but working documentation based on the PyData theme Co-authored-by: Konstantin (Tino) Sering <[email protected]>
1 parent 7eec9b5 commit 3e0828b

16 files changed

+1273
-486
lines changed

.gitignore

Lines changed: 3 additions & 2 deletions
Original file line numberDiff line numberDiff line change
@@ -2,8 +2,9 @@ __pycache__/
22
bin/targetoptimizer
33
vtl_corpus1.0/
44
docs/_build/
5-
create_vtl_corpus/resources/
5+
create_vtl_corpus/resources/*bin
66
create_vtl_corpus/manual_tests/
77
tests/clips/
88
*.wav
9-
9+
*.swp
10+
poetry.lock

.readthedocs.yml

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -9,7 +9,7 @@ version: 2
99
build:
1010
os: ubuntu-20.04
1111
tools:
12-
python: "3.9"
12+
python: "3.12"
1313

1414
# Build documentation in the docs/ directory with Sphinx
1515
sphinx:

.zenodo.json

Lines changed: 4 additions & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -1,7 +1,7 @@
11
{
22
"title": "create_vtl_corpus: Synthesizing a speech corpus with VocalTractLab",
33
"license": {
4-
"id": "MIT"
4+
"id": "GLP-3"
55
},
66
"creators": [
77
{
@@ -11,6 +11,9 @@
1111
{
1212
"name": "Niels Stehwien"
1313
},
14+
{
15+
"name": "Schmidt Valentin"
16+
},
1417
{
1518
"name": "Yingming Gao"
1619
}

README.rst

Lines changed: 98 additions & 3 deletions
Original file line numberDiff line numberDiff line change
@@ -7,7 +7,12 @@ Readme
77

88
This package supplies the necessary functions in order to synthesize speech
99
from a phonemic transcription. Furthermore, it defines helpers to improve the
10-
result if more information as the pitch contour is available.
10+
result if more information as the pitch contour is available. It is especially useful when working with
11+
the `PAULE <https://github.com/quantling/paule>`__ framework.
12+
13+
Currently the package supports the following languages:
14+
- German
15+
- English
1116

1217
Version 2.0.0 and later
1318
-----------------------
@@ -30,12 +35,102 @@ functions from top to bottom. The functions are supplied by the other files.
3035
Please use the VTL api directly.
3136

3237

38+
Minimal Example
39+
===============
40+
Given a german Corpus with the following structure which is what the `Mozilla Common Voice project <https://commonvoice.mozilla.org>`__ provides:
41+
42+
.. code:: bash
43+
44+
corpus/
45+
├── validated.tsv # a file where the transcripts are stored
46+
├── clips/
47+
│ └── *.mp3 # audio files (mp3)
48+
└── files_not_relevant_to_this_project
49+
50+
If you run the following command the package will align the audio files for you, and then create a pandas DataFrame with the synthesized audio and other information useful for the PAULE model,
51+
but only for the first 100 words that occur 4 times or more. Since you use multiprocessing, no melspectrograms are generated.:
52+
.. code:: bash
53+
54+
python -m create_vtl_corpus.create_corpus --corpus CORPUS --language de --needs_aligner --use_mp --min_word_count 4 --word_amount 100 --save_df_name SAVE_DF_NAME
55+
56+
The end product should look someting like this
57+
58+
.. code:: bash
59+
60+
corpus/
61+
├── validated.tsv # a file where the transcripts are stored
62+
├── clips/
63+
│ ├── *.mp3 # mp3 files
64+
│ └── *.lab # lab files
65+
├── clips_validated/
66+
│ ├── *.mp3 # validated mp3 files
67+
│ └── *.lab # validated lab files
68+
├── clips_aligned/
69+
│ └── *.TextGrid # aligned TextGrid files
70+
├── corpus_as_df.pkl # a pandas DataFrame with the information
71+
└── files_not_relevant_to_this_project
72+
73+
The DataFrame contains the following columns
74+
75+
.. list-table:: Dataframe Labels
76+
:header-rows: 1
77+
78+
* - Column Name
79+
- Description
80+
* - file_name
81+
- Name of the clip
82+
* - label
83+
- The spoken word as it is in the aligned textgrid
84+
* - lexical_word
85+
- The word as it is in the dictionary
86+
* - word_position
87+
- The position of the word in the sentence
88+
* - sentence
89+
- The sentence the word is part of
90+
* - wav_recording
91+
- Spliced out audio as mono audio signal
92+
* - sr_recording
93+
- Sampling rate of the recording
94+
* - sr_synthesized
95+
- Sampling rates synthesized
96+
* - sampa_phones
97+
- The SAMPA(like) phonemes of the word
98+
* - mfa_phones
99+
- The phonemes as outputted by the aligner
100+
* - phone_durations_lists
101+
- The duration of each phone in the word as list
102+
* - cp_norm
103+
- Normalized CP-trajectories
104+
* - vector
105+
- Embedding vector of the word, based on FastText Embeddings
106+
* - client_id
107+
- ID of the client
108+
109+
33110
Copyright
34111
=========
35-
As the VocalTractLabAPI.so and the JD2.speaker is GPL v3 the rest of the code
36-
here is GPL as well. If the code is not dependent on VTL anymore you can use
112+
As the VocalTractLabAPI.so and the JD2.speaker is under GPL v3 the rest of the code
113+
here is GPL under as well. If the code is not dependent on VTL anymore you can use
37114
it under MIT license.
38115

116+
117+
Citing
118+
=======
119+
If you use this code for your research, please cite the following thesis:
120+
121+
Konstantin Sering. Predictive articulatory speech synthesis utilizing lexical embeddings (PAULE). PhD thesis, Universität Tübingen, 2023.
122+
123+
.. code:: bibtex
124+
125+
@phdthesis{sering2023paule,
126+
title={Predictive articulatory speech synthesis utilizing lexical embeddings (PAULE)},
127+
author={Sering, Konstantin},
128+
year={2023},
129+
school={Universität Tübingen}
130+
}
131+
132+
133+
39134
Acknowledgments
40135
===============
41136
This research was supported by an ERC advanced Grant (no. 742545), by the

0 commit comments

Comments
 (0)