You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
@warn"This dataset contains homophones/homographs. Note that some of the results on the correctness of comprehended base/inflections may be misleading. See documentation of this function for more information."
88
+
@warn"accuracy_comprehension: This dataset contains homophones/homographs. Note that some of the results on the correctness of comprehended base/inflections may be misleading. See documentation of this function for more information."
89
89
end
90
90
91
91
if!isnothing(inflections)
@@ -189,7 +189,7 @@ function accuracy_comprehension(
@warn"This dataset contains homophones/homographs. Note that some of the results on the correctness of comprehended base/inflections may be misleading. See documentation of this function for more information."
192
+
@warn"accuracy_comprehension: This dataset contains homophones/homographs. Note that some of the results on the correctness of comprehended base/inflections may be misleading. See documentation of this function for more information."
193
193
end
194
194
195
195
corMat =cor(Shat_val, S, dims =2)
@@ -232,6 +232,9 @@ Assess model accuracy on the basis of the correlations of row vectors of Chat an
232
232
C or Shat and S. Ideally the target words have highest correlations on the diagonal
233
233
of the pertinent correlation matrices.
234
234
235
+
!!! note
236
+
If there are homophones/homographs in the dataset, this evaluation method may be misleading: the predicted vector will be equally correlated with the target vector of both words and the one on the diagonal will not necessarily be selected as the most correlated. In such cases, supplying the dataset and `target_col` is recommended which enables taking into account homophones/homographs.
237
+
235
238
# Obligatory Arguments
236
239
- `SChat::Union{SparseMatrixCSC, Matrix}`: the Chat or Shat matrix
237
240
- `SC::Union{SparseMatrixCSC, Matrix}`: the C or S matrix
@warn"eval_SC: The C or S matrix contains duplicate vectors (usually because of homophones/homographs). Supplying the dataset and target column is recommended for a realistic evaluation. See the documentation of this function for more information."
257
+
end
258
+
251
259
rSC =cor(
252
260
convert(Matrix{Float64}, SChat),
253
261
convert(Matrix{Float64}, SC),
@@ -273,6 +281,9 @@ of the pertinent correlation matrices.
273
281
The order is important. The fist gold standard matrix has to be corresponing
274
282
to the SChat matrix, such as `eval_SC(Shat_train, S_train, S_val)` or `eval_SC(Shat_val, S_val, S_train)`
275
283
284
+
!!! note
285
+
If there are homophones/homographs in the dataset, this evaluation method may be misleading: the predicted vector will be equally correlated with the target vector of both words and the one on the diagonal will not necessarily be selected as the most correlated. In such cases, supplying the dataset and target_col is recommended which enables taking into account homophones/homographs.
286
+
276
287
# Obligatory Arguments
277
288
- `SChat::Union{SparseMatrixCSC, Matrix}`: the Chat or Shat matrix
278
289
- `SC::Union{SparseMatrixCSC, Matrix}`: the training/validation C or S matrix
@@ -427,7 +438,10 @@ end
427
438
Assess model accuracy on the basis of the correlations of row vectors of Chat and
428
439
C or Shat and S. Ideally the target words have highest correlations on the diagonal
429
440
of the pertinent correlation matrices. For large datasets, pass batch_size to
430
-
process evaluation in chucks.
441
+
process evaluation in chunks.
442
+
443
+
!!! note
444
+
If there are homophones/homographs in the dataset, this evaluation method may be misleading: the predicted vector will be equally correlated with the target vector of both words and the one on the diagonal will not necessarily be selected as the most correlated. In such cases, supplying the dataset and target_col is recommended which enables taking into account homophones/homographs.
431
445
432
446
# Obligatory Arguments
433
447
- `SChat`: the Chat or Shat matrix
@@ -455,6 +469,10 @@ function eval_SC(
455
469
verbose =false
456
470
)
457
471
472
+
ifsize(unique(SC, dims=1), 1) !=size(SC, 1)
473
+
@warn"eval_SC: The C or S matrix contains duplicate vectors (usually because of homophones/homographs). Supplying the dataset and target column is recommended for a realistic evaluation. See the documentation of this function for more information."
474
+
end
475
+
458
476
l =size(SChat, 1)
459
477
num_chucks =ceil(Int64, l / batch_size)
460
478
verbose &&begin
@@ -494,7 +512,7 @@ end
494
512
Assess model accuracy on the basis of the correlations of row vectors of Chat and
495
513
C or Shat and S. Ideally the target words have highest correlations on the diagonal
496
514
of the pertinent correlation matrices. For large datasets, pass batch_size to
497
-
process evaluation in chucks. Support homophones.
515
+
process evaluation in chunks. Support homophones.
498
516
499
517
# Obligatory Arguments
500
518
- `SChat::AbstractArray`: the Chat or Shat matrix
@@ -617,6 +635,10 @@ end
617
635
Assess model accuracy on the basis of the correlations of row vectors of Chat and
618
636
C or Shat and S. Count it as correct if one of the top k candidates is correct.
619
637
638
+
!!! note
639
+
If there are homophones/homographs in the dataset, this evaluation method may be misleading: the predicted vector will be equally correlated with the target vector of both words and it is not guaranteed that the target on the diagonal will be among the k neighbours. In particular, `eval_SC` and `eval_SC_loose` with k=1 are not guaranteed to give the same result. In such cases, supplying the dataset and `target_col` is recommended which enables taking into account homophones/homographs.
640
+
641
+
620
642
# Obligatory Arguments
621
643
- `SChat::Union{SparseMatrixCSC, Matrix}`: the Chat or Shat matrix
622
644
- `SC::Union{SparseMatrixCSC, Matrix}`: the C or S matrix
@@ -631,6 +653,14 @@ eval_SC_loose(Shat, S, k)
631
653
```
632
654
"""
633
655
functioneval_SC_loose(SChat, SC, k; digits=4)
656
+
657
+
ifsize(unique(SC, dims=1), 1) !=size(SC, 1)
658
+
@warn"eval_SC_loose: The C or S matrix contains duplicate vectors (usually because of homophones/homographs). Supplying the dataset and target column is recommended for a realistic evaluation. See the documentation of this function for more information."
659
+
if k ==1
660
+
@warn"eval_SC_loose: You set k=1. Note that if there are duplicate vectors in the S/C matrix, it is not guaranteed that eval_SC_loose with k=1 gives the same result as eval_SC."
0 commit comments