You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
In DLM, language processing is modelled as linear mappings between word-forms and word-meanings. Word-forms and word-meanings can be defined in any way, as long as each word form/meaning is expressed in the form of a vector (i.e., an array of numbers). Word-forms are stacked up to be a matrix called the *C* matrix. Word-meanings are stacked up to be another matrix called the *S* matrix. The comprehension process can be modelled as receiving word-forms (i.e., C) and predicting word-meanings (i.e., S). Such a matrix that approximates S as closely as possible based on C can be estimated either analytically or computationally (see [1]_ for more detail), and it is called the *F* matrix. With C and F, the approximation (prediction) of S can be derived, and it is called the :math:`\hat{S}` matrix. Similarly, the production process can be modelled as receiving word-meanings (i.e., S) and predicting word-forms (i.e., C). Such a matrix that approximates C based on S is called the *G* matrix. With S and G, the model's predictions about word-forms are obtained as yet another matrix. The matrix is called the :math:`\hat{C}` matrix. It is shown below how to set up and estimate these matrices.
20
+
Short summary
21
+
-------------
22
+
DLM is a single model of language processing (comprehension and production both) consisting of 4 + 2 components (i.e., matrices). They are :math:`\mathbf{C}` (word-forms), :math:`\mathbf{S}` (word-meanings), :math:`\mathbf{F}` (form-meaning associations), :math:`\mathbf{G}` (meaning-form associations), :math:`\mathbf{\hat{C}}` (predicted word-forms), and :math:`\mathbf{\hat{S}}` (predicted word-meanings).
23
+
24
+
A little bit more detail
25
+
------------------------
26
+
DLM is a language processing model based on learning. DLM usually consists of four components (matrices): :math:`\mathbf{C}` (word-forms), :math:`\mathbf{S}` (word-meanings), :math:`\mathbf{F}` (form-meaning associations), and :math:`\mathbf{G}` (meaning-form associations). DLM models the comprehension as mapping from forms to meanings, namely DLM estimates :math:`\mathbf{F}` so that the product of :math:`\mathbf{C}` and :math:`\mathbf{F}`, namely :math:`\mathbf{CF}` (i.e., mapping of forms onto meanings), becomes as close as possible to :math:`\mathbf{S}`. :math:`\mathbf{CF}` is also called :math:`\mathbf{\hat{S}}`. :math:`\mathbf{\hat{S}}` is the model's predictions about word meanings, while :math:`\mathbf{S}` is the gold-standard "correct" meanings of these words. Similarly, DLM models the speech production as mapping from meanings to forms. DLM estimates :math:`\mathbf{G}` so that :math:`\mathbf{SG}` (which is also called :math:`\mathbf{\hat{C}}`) becomes as close as possible to :math:`\mathbf{C}` (i.e., the gold-standard correct form matrix). DLM is conceptually a single model containing these six components (i.e., :math:`\mathbf{C}`, :math:`\mathbf{S}`, :math:`\mathbf{F}`, :math:`\mathbf{G}`, :math:`\mathbf{\hat{C}}`, and :math:`\mathbf{\hat{S}}`). To reflect this conceptualization, *discriminative_lexicon_model* provides a class having these matrices as its attributes. The class is ``discriminative_lexicon_model.ldl.LDL``.
27
+
28
+
29
+
30
+
31
+
Create a model object
32
+
=====================
33
+
34
+
``discriminative_lexicon_model.ldl.LDL`` creates a model of DLM.
35
+
36
+
.. code-block:: python
37
+
38
+
>>>import discriminative_lexicon_model as dlm
39
+
>>> mdl = dlm.ldl.LDL()
40
+
>>>print(type(mdl))
41
+
<class'discriminative_lexicon_model.ldl.LDL'>
42
+
>>> mdl.__dict__
43
+
{}
44
+
45
+
With no argument, ``discriminative_lexicon_model.ldl.LDL`` creates an empty model (of DLM), which is to be populated later with some class methods (see below).
46
+
21
47
22
48
23
49
Set up the basis matrices C and S
24
50
=================================
25
51
52
+
In order to estimate association matrices and create predictions based on them, :math:`\mathbf{C}` and :math:`\mathbf{S}` must be set up first.
53
+
54
+
55
+
26
56
C-matrix
27
57
--------
28
58
29
-
The C matrix is a collection of form-vectors of words. You can create a C-matrix from a list of words by using discriminative_lexicon_model.mapping.gen_cmat.
59
+
:math:`\mathbf{C}` is a collection of form-vectors of words. :math:`\mathbf{C}` can be created from a list of words by ``discriminative_lexicon_model.ldl.LDL.gen_cmat``.
The S matrix is a collection of semantic vectors of words. For one method, an S-matrix can be set up by defining semantic dimensions by hand. This can be achieved by discriminative_lexicon_model.mapping.gen_smat_from_df.
80
+
:math:`\mathbf{S}` is a collection of semantic vectors of words. :math:`\mathbf{S}` can be set up by means of ``discriminative_lexicon_model.ldl.LDL.gen_smat``. For its argument, semantic vectors need to be set up with ``pandas.core.frame.DataFrame`` with words as its indices and semantic dimensions as its columns. Semantic dimensions can be defined either by hand or by an embeddings algorithm such as word2vec and fastText. Regardless of the method of constructing semantics, ``discriminative_lexicon_model.ldl.LDL.gen_smat`` sets up :math:`\mathbf{S}`, as long as the dataframe given to its (first) argument follows the right format (i.e., rows = words, columns = semantic dimensions). In the example below, semantic dimensions are set up by hand.
With C and S established, the comprehension association matrix F can be estimated by discriminative_lexicon_model.mapping.gen_fmat.
111
+
With :math:`\mathbf{C}` and :math:`\mathbf{S}` established, the comprehension association matrix :math:`\mathbf{F}` can be estimated by ``discriminative_lexicon_model.ldl.LDL.gen_fmat``. It does not require any argument, because :math:`\mathbf{C}` and :math:`\mathbf{S}` are stored already as attributes of the class and therefore accessible by the model.
The production association matrix G can be obtained by discriminative_lexicon_model.mapping.gen_gmat.
135
+
Similarly, with :math:`\mathbf{C}` and :math:`\mathbf{S}` established, the production association matrix :math:`\mathbf{G}` can also be estimated by ``discriminative_lexicon_model.ldl.LDL.gen_gmat``. It does not require any argument, either, because :math:`\mathbf{C}` and :math:`\mathbf{S}` are stored already as attributes of the class and therefore accessible by the model.
@@ -117,7 +155,7 @@ Prediction of the form and semantic matrices
117
155
S-hat matrix
118
156
------------
119
157
120
-
The S-hat matrix (:math:`\mathbf{\hat{S}}`) can be obtained by discriminative_lexicon_model.mapping.gen_shat.
158
+
The model's predictions about word-meanings based on word-forms (i.e., :math:`\mathbf{\hat{S}}`) can be obtained by discriminative_lexicon_model.ldl.LDL.gen_shat, given that :math:`\mathbf{C}` and :math:`\mathbf{F}` are already set up and stored as attributes of the class instance.
121
159
122
160
.. code-block:: python
123
161
@@ -135,7 +173,7 @@ The S-hat matrix (:math:`\mathbf{\hat{S}}`) can be obtained by discriminative_le
135
173
C-hat matrix
136
174
------------
137
175
138
-
The C-hat matrix (:math:`\mathbf{\hat{C}}`) can be obtained with discriminative_lexicon_model.mapping.gen_chat.
176
+
Similarly, the model's predictions about word-forms based on word-meanings (i.e., :math:`\mathbf{\hat{C}}`) can be obtained with discriminative_lexicon_model.ldl.LDL.gen_chat, given that :math:`\mathbf{S}` and :math:`\mathbf{G}` are already set up and stored as attributes of the class instance.
0 commit comments