You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Evaluate comprehension accuracy for training data.
29
+
30
+
!!! note
31
+
In case of homophones/homographs in the dataset, the correct/incorrect values for base and inflections may be misleading! See below for more information.
@warn"accuracy_comprehension: This dataset contains homophones/homographs. Note that some of the results on the correctness of comprehended base/inflections may be misleading. See documentation of this function for more information."
89
+
end
90
+
81
91
if!isnothing(inflections)
82
92
all_features =vcat(base, inflections)
83
-
else
93
+
elseif!isnothing(base)
84
94
all_features = base
95
+
else
96
+
all_features = []
85
97
end
86
98
87
99
for f in all_features
@@ -110,7 +122,11 @@ end
110
122
inflections = nothing,
111
123
)
112
124
113
-
Evaluate comprehension accuracy.
125
+
Evaluate comprehension accuracy for validation data.
126
+
127
+
!!! note
128
+
In case of homophones/homographs in the dataset, the correct/incorrect values for base and inflections may be misleading! See below for more information.
129
+
114
130
115
131
# Obligatory Arguments
116
132
- `S_val::Matrix`: the (gold standard) S matrix of the validation data
@warn"accuracy_comprehension: This dataset contains homophones/homographs. Note that some of the results on the correctness of comprehended base/inflections may be misleading. See documentation of this function for more information."
193
+
end
194
+
163
195
corMat =cor(Shat_val, S, dims =2)
164
196
top_index = [i[2] for i inargmax(corMat, dims =2)]
165
197
@@ -200,6 +232,9 @@ Assess model accuracy on the basis of the correlations of row vectors of Chat an
200
232
C or Shat and S. Ideally the target words have highest correlations on the diagonal
201
233
of the pertinent correlation matrices.
202
234
235
+
!!! note
236
+
If there are homophones/homographs in the dataset, this evaluation method may be misleading: the predicted vector will be equally correlated with the target vector of both words and the one on the diagonal will not necessarily be selected as the most correlated. In such cases, supplying the dataset and `target_col` is recommended which enables taking into account homophones/homographs.
237
+
203
238
# Obligatory Arguments
204
239
- `SChat::Union{SparseMatrixCSC, Matrix}`: the Chat or Shat matrix
205
240
- `SC::Union{SparseMatrixCSC, Matrix}`: the C or S matrix
@warn"eval_SC: The C or S matrix contains duplicate vectors (usually because of homophones/homographs). Supplying the dataset and target column is recommended for a realistic evaluation. See the documentation of this function for more information."
257
+
end
258
+
219
259
rSC =cor(
220
260
convert(Matrix{Float64}, SChat),
221
261
convert(Matrix{Float64}, SC),
@@ -241,6 +281,9 @@ of the pertinent correlation matrices.
241
281
The order is important. The fist gold standard matrix has to be corresponing
242
282
to the SChat matrix, such as `eval_SC(Shat_train, S_train, S_val)` or `eval_SC(Shat_val, S_val, S_train)`
243
283
284
+
!!! note
285
+
If there are homophones/homographs in the dataset, this evaluation method may be misleading: the predicted vector will be equally correlated with the target vector of both words and the one on the diagonal will not necessarily be selected as the most correlated. In such cases, supplying the dataset and target_col is recommended which enables taking into account homophones/homographs.
286
+
244
287
# Obligatory Arguments
245
288
- `SChat::Union{SparseMatrixCSC, Matrix}`: the Chat or Shat matrix
246
289
- `SC::Union{SparseMatrixCSC, Matrix}`: the training/validation C or S matrix
@@ -395,7 +438,10 @@ end
395
438
Assess model accuracy on the basis of the correlations of row vectors of Chat and
396
439
C or Shat and S. Ideally the target words have highest correlations on the diagonal
397
440
of the pertinent correlation matrices. For large datasets, pass batch_size to
398
-
process evaluation in chucks.
441
+
process evaluation in chunks.
442
+
443
+
!!! note
444
+
If there are homophones/homographs in the dataset, this evaluation method may be misleading: the predicted vector will be equally correlated with the target vector of both words and the one on the diagonal will not necessarily be selected as the most correlated. In such cases, supplying the dataset and target_col is recommended which enables taking into account homophones/homographs.
399
445
400
446
# Obligatory Arguments
401
447
- `SChat`: the Chat or Shat matrix
@@ -423,6 +469,10 @@ function eval_SC(
423
469
verbose =false
424
470
)
425
471
472
+
ifsize(unique(SC, dims=1), 1) !=size(SC, 1)
473
+
@warn"eval_SC: The C or S matrix contains duplicate vectors (usually because of homophones/homographs). Supplying the dataset and target column is recommended for a realistic evaluation. See the documentation of this function for more information."
474
+
end
475
+
426
476
l =size(SChat, 1)
427
477
num_chucks =ceil(Int64, l / batch_size)
428
478
verbose &&begin
@@ -435,7 +485,7 @@ function eval_SC(
435
485
436
486
# for first parts
437
487
for j =1:num_chucks-1
438
-
correct +=eval_SC_chucks(
488
+
correct +=eval_SC_chunks(
439
489
SChat_d,
440
490
SC_d,
441
491
(j -1) * batch_size +1,
@@ -445,7 +495,7 @@ function eval_SC(
445
495
verbose && ProgressMeter.next!(pb)
446
496
end
447
497
# for last part
448
-
correct +=eval_SC_chucks(
498
+
correct +=eval_SC_chunks(
449
499
SChat_d,
450
500
SC_d,
451
501
(num_chucks -1) * batch_size +1,
@@ -462,7 +512,7 @@ end
462
512
Assess model accuracy on the basis of the correlations of row vectors of Chat and
463
513
C or Shat and S. Ideally the target words have highest correlations on the diagonal
464
514
of the pertinent correlation matrices. For large datasets, pass batch_size to
465
-
process evaluation in chucks. Support homophones.
515
+
process evaluation in chunks. Support homophones.
466
516
467
517
# Obligatory Arguments
468
518
- `SChat::AbstractArray`: the Chat or Shat matrix
@@ -504,7 +554,7 @@ function eval_SC(
504
554
505
555
# for first parts
506
556
for j =1:num_chucks-1
507
-
correct +=eval_SC_chucks(
557
+
correct +=eval_SC_chunks(
508
558
SChat_d,
509
559
SC_d,
510
560
(j -1) * batch_size +1,
@@ -516,7 +566,7 @@ function eval_SC(
516
566
verbose && ProgressMeter.next!(pb)
517
567
end
518
568
# for last part
519
-
correct +=eval_SC_chucks(
569
+
correct +=eval_SC_chunks(
520
570
SChat_d,
521
571
SC_d,
522
572
(num_chucks -1) * batch_size +1,
@@ -529,13 +579,18 @@ function eval_SC(
529
579
round(correct / l, digits=digits)
530
580
end
531
581
532
-
functioneval_SC_chucks(SChat, SC, s, e, batch_size)
582
+
functioneval_SC_chunks(SChat, SC, s, e, batch_size)
533
583
rSC =cor(SChat[s:e, :], SC, dims =2)
534
584
v = [(rSC[i[1], i[1]+s-1] == rSC[i]) ?1:0for i inargmax(rSC, dims =2)]
535
585
sum(v)
536
586
end
537
587
538
-
functioneval_SC_chucks(SChat, SC, s, e, batch_size, data, target_col)
588
+
functioneval_SC_chucks(SChat, SC, s, e, batch_size)
589
+
@warn"eval_SC_chucks is deprecated and will be removed in version 0.10 in favour of eval_SC_chunks"
590
+
eval_SC_chunks(SChat, SC, s, e, batch_size)
591
+
end
592
+
593
+
functioneval_SC_chunks(SChat, SC, s, e, batch_size, data, target_col)
Assess model accuracy on the basis of the correlations of row vectors of Chat and
566
636
C or Shat and S. Count it as correct if one of the top k candidates is correct.
567
637
638
+
!!! note
639
+
If there are homophones/homographs in the dataset, this evaluation method may be misleading: the predicted vector will be equally correlated with the target vector of both words and it is not guaranteed that the target on the diagonal will be among the k neighbours. In particular, `eval_SC` and `eval_SC_loose` with k=1 are not guaranteed to give the same result. In such cases, supplying the dataset and `target_col` is recommended which enables taking into account homophones/homographs.
640
+
641
+
568
642
# Obligatory Arguments
569
643
- `SChat::Union{SparseMatrixCSC, Matrix}`: the Chat or Shat matrix
570
644
- `SC::Union{SparseMatrixCSC, Matrix}`: the C or S matrix
@@ -579,6 +653,14 @@ eval_SC_loose(Shat, S, k)
579
653
```
580
654
"""
581
655
functioneval_SC_loose(SChat, SC, k; digits=4)
656
+
657
+
ifsize(unique(SC, dims=1), 1) !=size(SC, 1)
658
+
@warn"eval_SC_loose: The C or S matrix contains duplicate vectors (usually because of homophones/homographs). Supplying the dataset and target column is recommended for a realistic evaluation. See the documentation of this function for more information."
659
+
if k ==1
660
+
@warn"eval_SC_loose: You set k=1. Note that if there are duplicate vectors in the S/C matrix, it is not guaranteed that eval_SC_loose with k=1 gives the same result as eval_SC."
661
+
end
662
+
end
663
+
582
664
total =size(SChat, 1)
583
665
correct =0
584
666
rSC =cor(
@@ -588,8 +670,7 @@ function eval_SC_loose(SChat, SC, k; digits=4)
0 commit comments