You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Copy file name to clipboardExpand all lines: README.rst
+50-60Lines changed: 50 additions & 60 deletions
Display the source diff
Display the rich diff
Original file line number
Diff line number
Diff line change
@@ -14,50 +14,33 @@ Currently the package supports the following languages:
14
14
- German
15
15
- English
16
16
17
-
Version 2.0.0 and later
18
-
-----------------------
19
-
From version 2.0.0 we are relying on the new segment-to-gesture API introduced
20
-
in VTL 2.3 and use the JD3.speaker instead of the JD2.speaker.
21
-
22
-
Old version 1.1.0
23
-
-----------------
24
-
The original version of this tool is based on the work and on the Matlab code
25
-
on Yingming Gao. This can be viewed by checking out the tag ``1.1.0``.
26
17
27
-
The overall logic is in ``create_corpus.py`` which executes the appropriate
28
-
functions from top to bottom. The functions are supplied by the other files.
29
18
30
-
.. note::
19
+
Minimal Example
20
+
===============
21
+
If you run the following command the package will align the audio files for you, and then create a pandas DataFrame with the synthesized audio and other information useful for the PAULE model,
22
+
but only for the first 100 words that occur 4 times or more. Since you use multiprocessing, no melspectrograms are generated:
31
23
32
-
In the since VTL version 2.3 which can be downloaded as free software from
33
-
https://www.vocaltractlab.de/index.php?page=vocaltractlab-download most of
34
-
the functionality implemented here is available directly from the VTL api.
35
-
Please use the VTL api directly.
24
+
.. code:: bash
36
25
26
+
python -m create_vtl_corpus.create_corpus --corpus CORPUS --language de --needs_aligner --use_mp --min_word_count 4 --word_amount 100 --save_df_name SAVE_DF_NAME
37
27
38
-
Minimal Example
39
-
===============
40
-
Given a german Corpus with the following structure which is what the `Mozilla Common Voice project <https://commonvoice.mozilla.org>`__ provides:
28
+
This works, if we have a German corpus in at the path CORPUS with the following structure, which is what the `Mozilla Common Voice project <https://commonvoice.mozilla.org>`__ provides:
41
29
42
30
.. code:: bash
43
31
44
-
corpus/
32
+
CORPUS/
45
33
├── validated.tsv # a file where the transcripts are stored
46
34
├── clips/
47
35
│ └── *.mp3 # audio files (mp3)
48
36
└── files_not_relevant_to_this_project
49
37
50
-
If you run the following command the package will align the audio files for you, and then create a pandas DataFrame with the synthesized audio and other information useful for the PAULE model,
51
-
but only for the first 100 words that occur 4 times or more. Since you use multiprocessing, no melspectrograms are generated.:
52
-
.. code:: bash
53
-
54
-
python -m create_vtl_corpus.create_corpus --corpus CORPUS --language de --needs_aligner --use_mp --min_word_count 4 --word_amount 100 --save_df_name SAVE_DF_NAME
55
38
56
39
The end product should look someting like this
57
40
58
41
.. code:: bash
59
42
60
-
corpus/
43
+
CORPUS/
61
44
├── validated.tsv # a file where the transcripts are stored
62
45
├── clips/
63
46
│ ├── *.mp3 # mp3 files
@@ -72,39 +55,24 @@ The end product should look someting like this
72
55
73
56
The DataFrame contains the following columns
74
57
75
-
.. list-table:: Dataframe Labels
76
-
:header-rows: 1
77
-
78
-
* - Column Name
79
-
- Description
80
-
* - file_name
81
-
- Name of the clip
82
-
* - label
83
-
- The spoken word as it is in the aligned textgrid
84
-
* - lexical_word
85
-
- The word as it is in the dictionary
86
-
* - word_position
87
-
- The position of the word in the sentence
88
-
* - sentence
89
-
- The sentence the word is part of
90
-
* - wav_recording
91
-
- Spliced out audio as mono audio signal
92
-
* - sr_recording
93
-
- Sampling rate of the recording
94
-
* - sr_synthesized
95
-
- Sampling rates synthesized
96
-
* - sampa_phones
97
-
- The SAMPA(like) phonemes of the word
98
-
* - mfa_phones
99
-
- The phonemes as outputted by the aligner
100
-
* - phone_durations_lists
101
-
- The duration of each phone in the word as list
102
-
* - cp_norm
103
-
- Normalized CP-trajectories
104
-
* - vector
105
-
- Embedding vector of the word, based on FastText Embeddings
0 commit comments