Skip to content

Commit 98065b5

Browse files
authored
Update README.rst
small improvements to readability
1 parent 3e0828b commit 98065b5

File tree

1 file changed

+50
-60
lines changed

1 file changed

+50
-60
lines changed

README.rst

Lines changed: 50 additions & 60 deletions
Original file line numberDiff line numberDiff line change
@@ -14,50 +14,33 @@ Currently the package supports the following languages:
1414
- German
1515
- English
1616

17-
Version 2.0.0 and later
18-
-----------------------
19-
From version 2.0.0 we are relying on the new segment-to-gesture API introduced
20-
in VTL 2.3 and use the JD3.speaker instead of the JD2.speaker.
21-
22-
Old version 1.1.0
23-
-----------------
24-
The original version of this tool is based on the work and on the Matlab code
25-
on Yingming Gao. This can be viewed by checking out the tag ``1.1.0``.
2617

27-
The overall logic is in ``create_corpus.py`` which executes the appropriate
28-
functions from top to bottom. The functions are supplied by the other files.
2918

30-
.. note::
19+
Minimal Example
20+
===============
21+
If you run the following command the package will align the audio files for you, and then create a pandas DataFrame with the synthesized audio and other information useful for the PAULE model,
22+
but only for the first 100 words that occur 4 times or more. Since you use multiprocessing, no melspectrograms are generated:
3123

32-
In the since VTL version 2.3 which can be downloaded as free software from
33-
https://www.vocaltractlab.de/index.php?page=vocaltractlab-download most of
34-
the functionality implemented here is available directly from the VTL api.
35-
Please use the VTL api directly.
24+
.. code:: bash
3625
26+
python -m create_vtl_corpus.create_corpus --corpus CORPUS --language de --needs_aligner --use_mp --min_word_count 4 --word_amount 100 --save_df_name SAVE_DF_NAME
3727
38-
Minimal Example
39-
===============
40-
Given a german Corpus with the following structure which is what the `Mozilla Common Voice project <https://commonvoice.mozilla.org>`__ provides:
28+
This works, if we have a German corpus in at the path CORPUS with the following structure, which is what the `Mozilla Common Voice project <https://commonvoice.mozilla.org>`__ provides:
4129

4230
.. code:: bash
4331
44-
corpus/
32+
CORPUS/
4533
├── validated.tsv # a file where the transcripts are stored
4634
├── clips/
4735
│ └── *.mp3 # audio files (mp3)
4836
└── files_not_relevant_to_this_project
4937
50-
If you run the following command the package will align the audio files for you, and then create a pandas DataFrame with the synthesized audio and other information useful for the PAULE model,
51-
but only for the first 100 words that occur 4 times or more. Since you use multiprocessing, no melspectrograms are generated.:
52-
.. code:: bash
53-
54-
python -m create_vtl_corpus.create_corpus --corpus CORPUS --language de --needs_aligner --use_mp --min_word_count 4 --word_amount 100 --save_df_name SAVE_DF_NAME
5538
5639
The end product should look someting like this
5740

5841
.. code:: bash
5942
60-
corpus/
43+
CORPUS/
6144
├── validated.tsv # a file where the transcripts are stored
6245
├── clips/
6346
│ ├── *.mp3 # mp3 files
@@ -72,39 +55,24 @@ The end product should look someting like this
7255
7356
The DataFrame contains the following columns
7457

75-
.. list-table:: Dataframe Labels
76-
:header-rows: 1
77-
78-
* - Column Name
79-
- Description
80-
* - file_name
81-
- Name of the clip
82-
* - label
83-
- The spoken word as it is in the aligned textgrid
84-
* - lexical_word
85-
- The word as it is in the dictionary
86-
* - word_position
87-
- The position of the word in the sentence
88-
* - sentence
89-
- The sentence the word is part of
90-
* - wav_recording
91-
- Spliced out audio as mono audio signal
92-
* - sr_recording
93-
- Sampling rate of the recording
94-
* - sr_synthesized
95-
- Sampling rates synthesized
96-
* - sampa_phones
97-
- The SAMPA(like) phonemes of the word
98-
* - mfa_phones
99-
- The phonemes as outputted by the aligner
100-
* - phone_durations_lists
101-
- The duration of each phone in the word as list
102-
* - cp_norm
103-
- Normalized CP-trajectories
104-
* - vector
105-
- Embedding vector of the word, based on FastText Embeddings
106-
* - client_id
107-
- ID of the client
58+
======================= ===========================================================
59+
label description
60+
======================= ===========================================================
61+
'file_name' name of the clip
62+
'label' the spoken word as it is in the aligned textgrid
63+
'lexical_word' the word as it is in the dictionary
64+
'word_position' the position of the word in the sentence
65+
'sentence' the sentence the word is part of
66+
'wav_recording' spliced out audio as mono audio signal
67+
'sr_recording' sampling rate of the recording
68+
'sr_synthesized' sampling_rates_sythesized,
69+
'sampa_phones' the sampa(like) phonemes of the word
70+
'mfa_phones' the phonemes as outputted by the aligner
71+
'phone_durations_lists' the duration of each phone in the word as list
72+
'cp_norm' normalized cp-trajectories
73+
'vector' embedding vector of the word, based on fastText Embeddings
74+
'client_id' id of the client
75+
======================= ===========================================================
10876

10977

11078
Copyright
@@ -129,7 +97,29 @@ Konstantin Sering. Predictive articulatory speech synthesis utilizing lexical em
12997
school={Universität Tübingen}
13098
}
13199
132-
100+
Older Versions
101+
==============
102+
103+
Version 2.0.0 and later
104+
-----------------------
105+
From version 2.0.0 we are relying on the new segment-to-gesture API introduced
106+
in VTL 2.3 and use the JD3.speaker instead of the JD2.speaker.
107+
108+
Old version 1.1.0
109+
-----------------
110+
The original version of this tool is based on the work and on the Matlab code
111+
on Yingming Gao. This can be viewed by checking out the tag ``1.1.0``.
112+
113+
The overall logic is in ``create_corpus.py`` which executes the appropriate
114+
functions from top to bottom. The functions are supplied by the other files.
115+
116+
.. note::
117+
118+
In the since VTL version 2.3 which can be downloaded as free software from
119+
https://www.vocaltractlab.de/index.php?page=vocaltractlab-download most of
120+
the functionality implemented here is available directly from the VTL api.
121+
Please use the VTL api directly.
122+
133123

134124
Acknowledgments
135125
===============

0 commit comments

Comments
 (0)