|
| 1 | +==================== |
| 2 | +Incremental learning |
| 3 | +==================== |
| 4 | + |
| 5 | + |
| 6 | +--------------------------------------- |
| 7 | +Incremental learning by a list of words |
| 8 | +--------------------------------------- |
| 9 | +Weight matrices in LDL (i.e., `\mathbf{F}` and `\mathbf{G}`) can be estimated also step by step, which is called the *incremental* learning. For a simple example, suppose we have only two words "a" and "an" in the lexicon and we encounter them in the order of "a", "a", "an", and "a". This can be done by discriminative_lexicon_model.mapping.incremental_learning. The first argument of the function is a series of learning events. |
| 10 | + |
| 11 | +.. code-block:: python |
| 12 | +
|
| 13 | + >>> import xarray as xr |
| 14 | + >>> import discriminative_lexicon_model.mapping as pm |
| 15 | + >>> cmat = pm.gen_cmat(['a', 'an'], gram=2) |
| 16 | + >>> print(cmat) |
| 17 | +
|
| 18 | + <xarray.DataArray (word: 2, cues: 4)> Size: 64B |
| 19 | + array([[1, 1, 0, 0], |
| 20 | + [1, 0, 1, 1]]) |
| 21 | + Coordinates: |
| 22 | + * word (word) <U2 16B 'a' 'an' |
| 23 | + * cues (cues) <U2 32B '#a' 'a#' 'an' 'n#' |
| 24 | +
|
| 25 | + >>> smat = xr.DataArray([[0.9, -0.2, 0.1], [0.1, 0.9, -0.2]], dims=('word','semantics'), coords={'word':['a','an'], 'semantics':['S1','S2','S3']}) |
| 26 | + >>> print(smat) |
| 27 | +
|
| 28 | + <xarray.DataArray (word: 2, semantics: 3)> Size: 48B |
| 29 | + array([[ 0.9, -0.2, 0.1], |
| 30 | + [ 0.1, 0.9, -0.2]]) |
| 31 | + Coordinates: |
| 32 | + * word (word) <U2 16B 'a' 'an' |
| 33 | + * semantics (semantics) <U2 24B 'S1' 'S2' 'S3' |
| 34 | +
|
| 35 | + >>> fmat = pm.incremental_learning(['a', 'a', 'an', 'a'], cmat, smat) |
| 36 | + >>> print(fmat) |
| 37 | +
|
| 38 | + <xarray.DataArray (cues: 4, semantics: 3)> Size: 96B |
| 39 | + array([[ 0.21402, 0.03544, 0.00478], |
| 40 | + [ 0.22022, -0.05816, 0.02658], |
| 41 | + [-0.0062 , 0.0936 , -0.0218 ], |
| 42 | + [-0.0062 , 0.0936 , -0.0218 ]]) |
| 43 | + Coordinates: |
| 44 | + * cues (cues) <U2 32B '#a' 'a#' 'an' 'n#' |
| 45 | + * semantics (semantics) <U2 24B 'S1' 'S2' 'S3' |
| 46 | +
|
| 47 | +Note that the `\mathbf{S}` matrix is set up, so that the first dimension "S1" is strongly correlated with "a" while "S2" is correlated "an". In other words, you can conceptually interpret "S1" as the core meaning of "a" and "S2" as that of "an". In the weight matrix (i.e., `\mathbf{F}`), the first two rows, namely the cues "#a" and "a#" are strongly correlated with the first column, namely "S1". The last two rows, namely the cues "an" and "n#" are strongly correlated with the second column, namely "S2". The associations of "an" and "n#" to "S2" are numerically smaller than those of "#a" and "a#" to "S1", because "an" occurs only once while "a" occurs three times in the learning events. |
| 48 | + |
| 49 | +As shown below, after a sufficient number of learning events, the estimates approximate those by the *endstate* learning. |
| 50 | + |
| 51 | +.. code-block:: python |
| 52 | +
|
| 53 | + >>> import pandas as pd |
| 54 | + >>> words = pd.Series(['a', 'an']).sample(1000, replace=True, random_state=518).tolist() |
| 55 | + >>> fmat_inc = pm.incremental_learning(words, cmat, smat) |
| 56 | + >>> fmat_end = pm.gen_fmat(cmat=cmat, smat=smat) |
| 57 | + >>> print(fmat_inc) |
| 58 | +
|
| 59 | + <xarray.DataArray (cues: 4, semantics: 3)> Size: 96B |
| 60 | + array([[ 3.80000000e-01, 1.00000000e-01, -5.65948715e-19], |
| 61 | + [ 5.20000000e-01, -3.00000000e-01, 1.00000000e-01], |
| 62 | + [-1.40000000e-01, 4.00000000e-01, -1.00000000e-01], |
| 63 | + [-1.40000000e-01, 4.00000000e-01, -1.00000000e-01]]) |
| 64 | + Coordinates: |
| 65 | + * cues (cues) <U2 32B '#a' 'a#' 'an' 'n#' |
| 66 | + * semantics (semantics) <U2 24B 'S1' 'S2' 'S3' |
| 67 | +
|
| 68 | + >>> print(fmat_end) |
| 69 | + <xarray.DataArray (cues: 4, semantics: 3)> Size: 96B |
| 70 | + array([[ 3.80000000e-01, 1.00000000e-01, -2.77555756e-17], |
| 71 | + [ 5.20000000e-01, -3.00000000e-01, 1.00000000e-01], |
| 72 | + [-1.40000000e-01, 4.00000000e-01, -1.00000000e-01], |
| 73 | + [-1.40000000e-01, 4.00000000e-01, -1.00000000e-01]]) |
| 74 | + Coordinates: |
| 75 | + * cues (cues) <U2 32B '#a' 'a#' 'an' 'n#' |
| 76 | + * semantics (semantics) <U2 24B 'S1' 'S2' 'S3' |
| 77 | +
|
| 78 | + >>> print(fmat_inc.round(10).identical(fmat_end.round(10))) |
| 79 | + True |
| 80 | +
|
| 81 | +Note that the order of learning events does matter in the incremental learning. Compare the following two examples. |
| 82 | + |
| 83 | +.. code-block:: python |
| 84 | +
|
| 85 | + >>> import numpy as np |
| 86 | + words_a_first = np.repeat(['a', 'an'], [10, 10]) |
| 87 | + words_an_first = np.repeat(['an', 'a'], [10, 10]) |
| 88 | + fmat_a_first = pm.incremental_learning(words_a_first, cmat, smat) |
| 89 | + fmat_an_first = pm.incremental_learning(words_an_first, cmat, smat) |
| 90 | + print(fmat_a_first) |
| 91 | + <xarray.DataArray (cues: 4, semantics: 3)> Size: 96B |
| 92 | + array([[ 0.30396166, 0.23117687, -0.03460906], |
| 93 | + [ 0.40168162, -0.08926258, 0.04463129], |
| 94 | + [-0.09771995, 0.32043945, -0.07924035], |
| 95 | + [-0.09771995, 0.32043945, -0.07924035]]) |
| 96 | + Coordinates: |
| 97 | + * cues (cues) <U2 32B '#a' 'a#' 'an' 'n#' |
| 98 | + * semantics (semantics) <U2 24B 'S1' 'S2' 'S3' |
| 99 | +
|
| 100 | + print(fmat_an_first) |
| 101 | + <xarray.DataArray (cues: 4, semantics: 3)> Size: 96B |
| 102 | + array([[ 0.41961651, 0.07215146, 0.0087615 ], |
| 103 | + [ 0.38722476, -0.21937428, 0.073545 ], |
| 104 | + [ 0.03239175, 0.29152574, -0.0647835 ], |
| 105 | + [ 0.03239175, 0.29152574, -0.0647835 ]]) |
| 106 | + Coordinates: |
| 107 | + * cues (cues) <U2 32B '#a' 'a#' 'an' 'n#' |
| 108 | + * semantics (semantics) <U2 24B 'S1' 'S2' 'S3' |
| 109 | +
|
| 110 | +In the first case, where "a" is encountered first for 100 times before "an" is encountered 100 times consecutively, the estimated associations are "biased" towards to "an". This can be seen, for example, in the cell value of the first row and the second column, namely the association strength between "#a" and "S2". Note that the equilibrium of this association is 0.10 (see the example above for "fmat_end"). Since "an" is encountered many times more "recently", such recent learning events have bigger effects. |
| 111 | + |
| 112 | +In contrast, in the latter case, where "an" is encountered first for 100 times before "a" is encountered 100 times, the association from "#a" to "S1" is much bigger than that from "#a" to "S2". Note that the equilibrium of the association from "#a" to "S1" is 0.38 (from "fmat_end" in the example above). Since "a" is encountered many times towards the end of learning, the weights are biased towards "a". |
| 113 | + |
| 114 | + |
| 115 | +---------------------------------------------- |
| 116 | +Incremental learning by a list of word indices |
| 117 | +---------------------------------------------- |
| 118 | +Learning events (i.e., which words to encounter) can be specified by indices of words as well. This can be useful when the `\mathbf{C}` and/or `\mathbf{S}` matrices contain duplicated word labels. Duplicated rows can be an issue when word tokens are involved. Consider the following example: |
| 119 | + |
| 120 | +.. code-block:: python |
| 121 | +
|
| 122 | + >>> import xarray as xr |
| 123 | + >>> import discriminative_lexicon_model.mapping as pm |
| 124 | + >>> cmat = pm.gen_cmat(['a', 'an', 'an'], gram=2) |
| 125 | + >>> smat = xr.DataArray([[0.9, -0.2, 0.1], [0.1, 0.9, -0.2], [0.2, 0.8, -0.1]], dims=('word','semantics'), coords={'word':['a','an','an'], 'semantics':['S1','S2','S3']}) |
| 126 | + >>> print(cmat) |
| 127 | +
|
| 128 | + <xarray.DataArray (word: 3, cues: 4)> Size: 96B |
| 129 | + array([[1, 1, 0, 0], |
| 130 | + [1, 0, 1, 1], |
| 131 | + [1, 0, 1, 1]]) |
| 132 | + Coordinates: |
| 133 | + * word (word) <U2 24B 'a' 'an' 'an' |
| 134 | + * cues (cues) <U2 32B '#a' 'a#' 'an' 'n#' |
| 135 | +
|
| 136 | + >>> print(smat) |
| 137 | + <xarray.DataArray (word: 3, semantics: 3)> Size: 72B |
| 138 | + array([[ 0.9, -0.2, 0.1], |
| 139 | + [ 0.1, 0.9, -0.2], |
| 140 | + [ 0.2, 0.8, -0.1]]) |
| 141 | + Coordinates: |
| 142 | + * word (word) <U2 24B 'a' 'an' 'an' |
| 143 | + * semantics (semantics) <U2 24B 'S1' 'S2' 'S3' |
| 144 | +
|
| 145 | +Note that the word type "an" has two rows. Its form vectors are the same (i.e., the second and third rows of the `\mathbf{C}` matrix), while its semantic vectors are slightly different (i.e., the second and third rows of the `\mathbf{S}` matrix). You can view the different semantic vectors as different meanings of the same word in different contexts. In such a case like this, specifying learning events by a list of words like below would raise "InvalidIndexError", because the function cannot determine which semantic vector to use for "an" in this case. |
| 146 | + |
| 147 | +.. code-block:: python |
| 148 | +
|
| 149 | + >>> fmat = pm.incremental_learning(['a', 'a', 'an', 'a'], cmat, smat) |
| 150 | + >>> # This raises an error. |
| 151 | +
|
| 152 | +Instead, you need to specify learning events in terms of indices of the words. For this purpose, discriminative_lexicon_model.mapping.incremental_learning_byind can be used: |
| 153 | + |
| 154 | +.. code-block:: python |
| 155 | +
|
| 156 | + >>> events = [0, 0, 1, 2, 2] # 'a', 'a', 'an' (2nd row), 'an' (3rd row), 'an' (3rd row) |
| 157 | + >>> fmat = pm.incremental_learning_byind(events, cmat, smat) |
| 158 | + >>> print(fmat) |
| 159 | +
|
| 160 | + <xarray.DataArray (cues: 4, semantics: 3)> Size: 96B |
| 161 | + array([[ 0.165422, 0.151984, -0.012742], |
| 162 | + [ 0.162 , -0.036 , 0.018 ], |
| 163 | + [ 0.003422, 0.187984, -0.030742], |
| 164 | + [ 0.003422, 0.187984, -0.030742]]) |
| 165 | + Coordinates: |
| 166 | + * cues (cues) <U2 32B '#a' 'a#' 'an' 'n#' |
| 167 | + * semantics (semantics) <U2 24B 'S1' 'S2' 'S3' |
| 168 | +
|
0 commit comments