Skip to content

Commit 03db974

Browse files
committed
Updates docs (quickstart.rst)
1 parent 1af1d79 commit 03db974

File tree

1 file changed

+72
-85
lines changed

1 file changed

+72
-85
lines changed

docs/source/quickstart.rst

Lines changed: 72 additions & 85 deletions
Original file line numberDiff line numberDiff line change
@@ -2,167 +2,153 @@
22
Quickstart
33
==========
44

5-
In the following, I will describe the basic usage of pyldl. For more details about Linear Discriminative Learning, see the section :ref:`Linear Discriminative Learning`.
5+
*discriminative_lexicon_model* is a python-implementation of Discriminative Lexicon Model [1]_.
66

77
Installation
88
============
99

10-
*pyldl* is not available on PyPI yet. So, you need to clone it before installing it locally.
10+
*discriminative_lexicon_model* is available on PyPI.
1111

1212
.. code:: bash
1313
14-
git clone https://github.com/msaito8623/pyldl
15-
pip install -e /path/to/the/repo
14+
pip install --user discriminative_lexicon_model
1615
1716
17+
Quick overview of the theory "Discriminative Lexicon Model (DLM)"
18+
=================================================================
1819

19-
Train an Linear Discriminative Learning model
20-
=============================================
20+
In DLM, language processing is modelled as linear mappings between word-forms and word-meanings. Word-forms and word-meanings can be defined in any way, as long as each word form/meaning is expressed in the form of a vector (i.e., an array of numbers). Word-forms are stacked up to be a matrix called the *C* matrix. Word-meanings are stacked up to be another matrix called the *S* matrix. The comprehension process can be modelled as receiving word-forms (i.e., C) and predicting word-meanings (i.e., S). Such a matrix that approximates S as closely as possible based on C can be estimated either analytically or computationally (see [1]_ for more detail), and it is called the *F* matrix. With C and F, the approximation (prediction) of S can be derived, and it is called the :math:`\hat{S}` matrix. Similarly, the production process can be modelled as receiving word-meanings (i.e., S) and predicting word-forms (i.e., C). Such a matrix that approximates C based on S is called the *G* matrix. With S and G, the model's predictions about word-forms are obtained as yet another matrix. The matrix is called the :math:`\hat{C}` matrix. It is shown below how to set up and estimate these matrices.
2121

2222

23+
Set up the basis matrices C and S
24+
=================================
25+
2326
C-matrix
2427
--------
2528

26-
You can create a C-matrix from a list of words by using pyldl.mapping.gen_cmat.
29+
The C matrix is a collection of form-vectors of words. You can create a C-matrix from a list of words by using discriminative_lexicon_model.mapping.gen_cmat.
2730

2831
.. code-block:: python
2932
30-
>>> import pyldl.mapping as pmap
33+
>>> import discriminative_lexicon_model as dlm
3134
>>> words = ['walk','walked','walks']
32-
>>> cmat = pmap.gen_cmat(words)
35+
>>> cmat = dlm.mapping.gen_cmat(words)
3336
>>> cmat
3437
<xarray.DataArray (word: 3, cues: 9)>
35-
array([[ True, True, False, False, False, True, False, False, True],
36-
[ True, True, True, True, False, False, True, False, True],
37-
[ True, True, False, False, True, False, False, True, True]])
38+
array([[1, 1, 1, 1, 0, 0, 0, 0, 0],
39+
[1, 1, 1, 0, 1, 1, 1, 0, 0],
40+
[1, 1, 1, 0, 0, 0, 0, 1, 1]])
3841
Coordinates:
3942
* word (word) <U6 'walk' 'walked' 'walks'
40-
* cues (cues) <U3 '#wa' 'alk' 'ed#' 'ked' 'ks#' 'lk#' 'lke' 'lks' 'wal'
43+
* cues (cues) <U3 '#wa' 'wal' 'alk' 'lk#' 'lke' 'ked' 'ed#' 'lks' 'ks#'
4144
4245
4346
S-matrix
4447
--------
4548

46-
An S-matrix by simulated semantic vectors [1]_ can be produced from a pandas dataframe that contains morphological information. This can be achieved in pyldl with pyldl.mapping.gen_smat_sim.
49+
The S matrix is a collection of semantic vectors of words. For one method, an S-matrix can be set up by defining semantic dimensions by hand. This can be achieved by discriminative_lexicon_model.mapping.gen_smat_from_df.
4750

4851

4952
.. code-block:: python
5053
5154
>>> import pandas as pd
52-
>>> infl = pd.DataFrame({'Word':['walk','walked','walks'], 'Lemma':['walk','walk','walk'], 'Tense':['PRES','PAST','PRES']})
53-
>>> smat = pmap.gen_smat_sim(infl, dim_size=5)
54-
>>> smat.round(2)
55-
<xarray.DataArray (word: 3, semantics: 5)>
56-
array([[ 0.75, 1.25, 0.39, -4.41, -0.12],
57-
[-1.68, 0.6 , -0. , -3.55, -2.23],
58-
[-2.77, 0.71, -0.48, -2.76, 0.15]])
55+
>>> smat = pd.DataFrame({'WALK':[1,1,1], 'Present':[1,0,1], 'Past':[0,1,0], 'ThirdPerson':[0,0,1]}, index=['walk','walked','walks'])
56+
>>> smat = dlm.mapping.gen_smat_from_df(smat)
57+
<xarray.DataArray (word: 3, semantics: 4)>
58+
array([[1, 1, 0, 0],
59+
[1, 0, 1, 0],
60+
[1, 1, 0, 1]])
5961
Coordinates:
6062
* word (word) <U6 'walk' 'walked' 'walks'
61-
* semantics (semantics) <U4 'S000' 'S001' 'S002' 'S003' 'S004'
63+
* semantics (semantics) object 'WALK' 'Present' 'Past' 'ThirdPerson'
64+
65+
6266
67+
Estimation of the association matrices
68+
======================================
6369

6470
F-matrix
6571
--------
6672

67-
An F-matrix can be obtained with pyldl.mapping.gen_fmat.
73+
With C and S established, the comprehension association matrix F can be estimated by discriminative_lexicon_model.mapping.gen_fmat.
6874

6975
.. code-block:: python
7076
71-
>>> fmat = pmap.gen_fmat(cmat, smat)
77+
>>> fmat = dlm.mapping.gen_fmat(cmat, smat)
7278
>>> fmat.round(2)
73-
<xarray.DataArray (cues: 9, semantics: 5)>
74-
array([[-0. , -0. , -0. , -0. , -0. ],
75-
[-0. , -0. , -0. , -0. , -0. ],
76-
[-0.56, 0.2 , -0. , -1.18, -0.74],
77-
[-0.56, 0.2 , -0. , -1.18, -0.74],
78-
[-1.39, 0.35, -0.24, -1.38, 0.07],
79-
[ 0.75, 1.25, 0.39, -4.41, -0.12],
80-
[-0.56, 0.2 , -0. , -1.18, -0.74],
81-
[-1.39, 0.35, -0.24, -1.38, 0.07],
82-
[-0. , -0. , -0. , -0. , -0. ]])
79+
<xarray.DataArray (cues: 9, semantics: 4)>
80+
array([[ 0.28, 0.23, 0.05, 0.08],
81+
[ 0.28, 0.23, 0.05, 0.08],
82+
[ 0.28, 0.23, 0.05, 0.08],
83+
[ 0.15, 0.31, -0.15, -0.23],
84+
[ 0.05, -0.23, 0.28, -0.08],
85+
[ 0.05, -0.23, 0.28, -0.08],
86+
[ 0.05, -0.23, 0.28, -0.08],
87+
[ 0.08, 0.15, -0.08, 0.38],
88+
[ 0.08, 0.15, -0.08, 0.38]])
8389
Coordinates:
84-
* cues (cues) <U3 '#wa' 'alk' 'ed#' 'ked' 'ks#' 'lk#' 'lke' 'lks' 'wal'
85-
* semantics (semantics) <U4 'S000' 'S001' 'S002' 'S003' 'S004'
90+
* cues (cues) <U3 '#wa' 'wal' 'alk' 'lk#' 'lke' 'ked' 'ed#' 'lks' 'ks#'
91+
* semantics (semantics) object 'WALK' 'Present' 'Past' 'ThirdPerson'
8692
8793
8894
G-matrix
8995
--------
9096

91-
A G-matrix can be obtained with pyldl.mapping.gen_gmat.
97+
The production association matrix G can be obtained by discriminative_lexicon_model.mapping.gen_gmat.
9298

9399
.. code-block:: python
94100
95-
>>> gmat = pmap.gen_gmat(cmat, smat)
101+
>>> gmat = dlm.mapping.gen_gmat(cmat, smat)
96102
>>> gmat.round(2)
97-
<xarray.DataArray (semantics: 5, cues: 9)>
98-
array([[-0.11, -0.11, -0.03, -0.03, -0.27, 0.19, -0.03, -0.27, -0.11],
99-
[ 0.06, 0.06, -0.06, -0.06, 0.05, 0.08, -0.06, 0.05, 0.06],
100-
[-0.01, -0.01, 0.03, 0.03, -0.08, 0.04, 0.03, -0.08, -0.01],
101-
[-0.23, -0.23, -0.01, -0.01, -0.05, -0.17, -0.01, -0.05, -0.23],
102-
[ 0.02, 0.02, -0.43, -0.43, 0.29, 0.15, -0.43, 0.29, 0.02]])
103+
<xarray.DataArray (semantics: 4, cues: 9)>
104+
array([[ 0.67, 0.67, 0.67, 0.33, 0.33, 0.33, 0.33, -0. , -0. ],
105+
[ 0.33, 0.33, 0.33, 0.67, -0.33, -0.33, -0.33, -0. , -0. ],
106+
[ 0.33, 0.33, 0.33, -0.33, 0.67, 0.67, 0.67, -0. , -0. ],
107+
[ 0. , 0. , 0. , -1. , 0. , 0. , 0. , 1. , 1. ]])
103108
Coordinates:
104-
* semantics (semantics) <U4 'S000' 'S001' 'S002' 'S003' 'S004'
105-
* cues (cues) <U3 '#wa' 'alk' 'ed#' 'ked' 'ks#' 'lk#' 'lke' 'lks' 'wal'
109+
* semantics (semantics) object 'WALK' 'Present' 'Past' 'ThirdPerson'
110+
* cues (cues) <U3 '#wa' 'wal' 'alk' 'lk#' 'lke' 'ked' 'ed#' 'lks' 'ks#'
106111
107112
108-
S-hat-matrix
109-
------------
110113
111-
An S-hat-matrix (:math:`\mathbf{\hat{S}}`), predicted semantic vectors based on forms, can be obtained with pyldl.mapping.gen_shat. You can produce an S-hat-matrix from a C-matrix and an F-matrix or from a C-matrix and an S-matrix without producing an F-matrix yourself.
114+
Prediction of the form and semantic matrices
115+
============================================
112116

113-
.. code-block:: python
117+
S-hat matrix
118+
------------
114119

115-
>>> shat = pmap.gen_shat(cmat=cmat, fmat=fmat)
116-
>>> shat.round(2)
117-
<xarray.DataArray (word: 3, semantics: 5)>
118-
array([[ 0.75, 1.25, 0.39, -4.41, -0.12],
119-
[-1.68, 0.6 , -0. , -3.55, -2.23],
120-
[-2.77, 0.71, -0.48, -2.76, 0.15]])
121-
Coordinates:
122-
* word (word) <U6 'walk' 'walked' 'walks'
123-
* semantics (semantics) <U4 'S000' 'S001' 'S002' 'S003' 'S004'
120+
The S-hat matrix (:math:`\mathbf{\hat{S}}`) can be obtained by discriminative_lexicon_model.mapping.gen_shat.
124121

125122
.. code-block:: python
126123
127-
>>> shat = pmap.gen_shat(cmat=cmat, smat=smat)
124+
>>> shat = dlm.mapping.gen_shat(cmat, fmat)
128125
>>> shat.round(2)
129-
<xarray.DataArray (word: 3, semantics: 5)>
130-
array([[ 0.75, 1.25, 0.39, -4.41, -0.12],
131-
[-1.68, 0.6 , -0. , -3.55, -2.23],
132-
[-2.77, 0.71, -0.48, -2.76, 0.15]])
126+
<xarray.DataArray (word: 3, semantics: 4)>
127+
array([[ 1., 1., -0., -0.],
128+
[ 1., -0., 1., -0.],
129+
[ 1., 1., -0., 1.]])
133130
Coordinates:
134131
* word (word) <U6 'walk' 'walked' 'walks'
135-
* semantics (semantics) <U4 'S000' 'S001' 'S002' 'S003' 'S004'
132+
* semantics (semantics) object 'WALK' 'Present' 'Past' 'ThirdPerson'
136133
137134
138-
C-hat-matrix
135+
C-hat matrix
139136
------------
140137

141-
A C-hat-matrix (:math:`\mathbf{\hat{C}}`), predicted form vectors based on semantics, can be obtained with pyldl.mapping.gen_chat. You can produce a C-hat-matrix from an S-matrix and a G-matrix or from an S-matrix and a C-matrix without producing a G-matrix yourself.
138+
The C-hat matrix (:math:`\mathbf{\hat{C}}`) can be obtained with discriminative_lexicon_model.mapping.gen_chat.
142139

143140
.. code-block:: python
144141
145-
>>> chat = pmap.gen_chat(smat=smat, gmat=gmat)
142+
>>> chat = dlm.mapping.gen_chat(smat, gmat)
146143
>>> chat.round(2)
147144
<xarray.DataArray (word: 3, cues: 9)>
148-
array([[ 1., 1., 0., 0., -0., 1., 0., -0., 1.],
149-
[ 1., 1., 1., 1., 0., -0., 1., 0., 1.],
150-
[ 1., 1., -0., -0., 1., -0., -0., 1., 1.]])
145+
array([[ 1., 1., 1., 1., -0., -0., -0., -0., -0.],
146+
[ 1., 1., 1., -0., 1., 1., 1., -0., -0.],
147+
[ 1., 1., 1., 0., 0., 0., 0., 1., 1.]])
151148
Coordinates:
152-
* word (word) <U6 'walk' 'walked' 'walks'
153-
* cues (cues) <U3 '#wa' 'alk' 'ed#' 'ked' 'ks#' 'lk#' 'lke' 'lks' 'wal'
149+
* word (word) <U6 'walk' 'walked' 'walks'
150+
* cues (cues) <U3 '#wa' 'wal' 'alk' 'lk#' 'lke' 'ked' 'ed#' 'lks' 'ks#'
154151
155-
.. code-block:: python
156-
157-
>>> chat = pmap.gen_chat(smat=smat, cmat=cmat)
158-
>>> chat.round(2)
159-
<xarray.DataArray (word: 3, cues: 9)>
160-
array([[ 1., 1., 0., 0., -0., 1., 0., -0., 1.],
161-
[ 1., 1., 1., 1., 0., -0., 1., 0., 1.],
162-
[ 1., 1., -0., -0., 1., -0., -0., 1., 1.]])
163-
Coordinates:
164-
* word (word) <U6 'walk' 'walked' 'walks'
165-
* cues (cues) <U3 '#wa' 'alk' 'ed#' 'ked' 'ks#' 'lk#' 'lke' 'lks' 'wal'
166152
167153
168154
@@ -300,4 +286,5 @@ The length of a semantic vector can be obtained by pyldl.measures.vector_length.
300286
301287
----
302288

303-
.. [1] Baayen, R. H., Chuang, Y.-Y., & Blevins, J. P. (2018). Inflectional morphology with linear mappings. *The Mental Lexicon*, 13(2), 230-268.
289+
.. [1] Baayen, R. H., Chuang, Y.-Y., Shafaei-Bajestan, & Blevins, J. P. (2019). The discriminative lexicon: A unified computational model for the lexicon and lexical processing in comprehension and production grounded not in (de)composition but in linear discriminative learning. *Complexity* 2019, 1-39.
290+

0 commit comments

Comments
 (0)