Skip to content

Commit 1779569

Browse files
committed
Merge branch 'dev'
2 parents b7bc348 + 91e0565 commit 1779569

File tree

3 files changed

+64
-2
lines changed

3 files changed

+64
-2
lines changed

docs/src/index.md

Lines changed: 24 additions & 2 deletions
Original file line numberDiff line numberDiff line change
@@ -10,6 +10,18 @@ JudiLing is now on the Julia package system. You can install JudiLing by the fol
1010
using Pkg
1111
Pkg.add("JudiLing")
1212
```
13+
For brave adventurers, install test version of JudiLing by:
14+
```
15+
julia> Pkg.add(PackageSpec(url="https://github.com/MegamindHenry/JudiLing.jl.git"))
16+
```
17+
if you are on the Julia 1.4 and in Julia 1.5 REPL, we can run:
18+
```
19+
julia> Pkg.add(url="https://github.com/MegamindHenry/JudiLing.jl.git")
20+
```
21+
Or from the Julia REPL, type `]` to enter the Pkg REPL mode and run
22+
```
23+
pkg> add https://github.com/MegamindHenry/JudiLing.jl.git
24+
```
1325

1426
## Running Julia with multiple threads
1527
JudiLing supports the use of multiple threads. Simply start up Julia in your terminal as follows:
@@ -526,6 +538,16 @@ You can download and try out this script [here](https://osf.io/sa89x/download).
526538

527539
We implemented a high-level wrapper function that aims to provide quick and preliminary studies on multiple datasets with different parameter settings. For a sophisticated study, we suggest to build a script step by step.
528540

541+
In general, `test_combo` function will perform the following operations:
542+
543+
- prepare datasets
544+
- make cue matrix object
545+
- make semantic matrix
546+
- learn transfrom mapping F and G
547+
- perform path-finding algorithms for both `learn_paths` and `build_paths` in training and validation datasets
548+
- evaluate results
549+
- save outputs
550+
529551
### Split mode
530552
`test_combo` function provides four split mode. `:train_only` give the opportunity to only evaluate the model with training data or partial training data. `data_path` is the path to the CSV file and `data_output_dir` is the directory for store training and validation datasets for future analysis.
531553

@@ -567,7 +589,7 @@ JudiLing.test_combo(
567589
)
568590
```
569591

570-
`:random_split` will randomly split data into training and validation datasets. In this case, it is high likely that unseen n-grams and features are in the validation datasets. Therefore, `if_combined` should be turned on. `data_path` is the path to the directory containing CSV files and `data_output_dir` is the directory for store training and validation datasets for future analysis.
592+
`:random_split` will randomly split data into training and validation datasets. In this case, it is high likely that unseen n-grams and features are in the validation datasets. Therefore, you should set `if_combined` to true. `data_path` is the path to the directory containing CSV files and `data_output_dir` is the directory for store training and validation datasets for future analysis.
571593

572594
```julia
573595
JudiLing.test_combo(
@@ -594,7 +616,7 @@ JudiLing.test_combo(
594616
)
595617
```
596618

597-
`:careful_split` will carefully split data into training and validation datasets where there will be no unseen n-grams and features in the validation datasets. Therefore, `if_combined` shall be truned off. `data_path` is the path to the directory containing CSV files and `data_output_dir` is the directory for store training and validation datasets for future analysis. `n_features_columns` gives names of feature columns and target column.
619+
`:careful_split` will carefully split data into training and validation datasets where there will be no unseen n-grams and features in the validation datasets. Therefore, you should set `if_combined` to false. `data_path` is the path to the directory containing CSV files and `data_output_dir` is the directory for store training and validation datasets for future analysis. `n_features_columns` gives names of feature columns and target column.
598620

599621
```julia
600622
JudiLing.test_combo(

src/find_path.jl

Lines changed: 3 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -65,6 +65,7 @@ word, which n-grams are best supported for a given position in the sequence of n
6565
- `sep_token::Union{Nothing, String, Char}=nothing`: separator token
6666
- `keep_sep::Bool=false`:if true, keep separators in cues
6767
- `target_col::Union{String, :Symbol}=:Words`: the column name for target strings
68+
- `start_end_token::Union{String, Char}="#"`: start and end token in boundary cues
6869
- `issparse::Symbol=:auto`: control of whether output of Mt matrix is a dense matrix or a sparse matrix
6970
- `sparse_ratio::Float64=0.2`: the ratio to decide whether a matrix is sparse
7071
- `if_pca::Bool=false`: turn on to enable pca mode
@@ -214,6 +215,7 @@ function learn_paths(
214215
sep_token = nothing,
215216
keep_sep = false,
216217
target_col = "Words",
218+
start_end_token = "#",
217219
issparse = :auto,
218220
sparse_ratio = 0.2,
219221
if_pca = false,
@@ -261,6 +263,7 @@ function learn_paths(
261263
tokenized = tokenized,
262264
sep_token = sep_token,
263265
keep_sep = keep_sep,
266+
start_end_token = start_end_token
264267
)
265268

266269
verbose && println("Calculating Mt...")

src/test_combo.jl

Lines changed: 37 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -122,6 +122,10 @@ function test_combo(test_mode; kwargs...)
122122
random_seed = get_kwarg(kwargs, :random_seed, required=false)
123123
if_combined = get_kwarg(kwargs, :if_combined, required=false)
124124

125+
verbose && println("="^20)
126+
verbose && println("Preparing datasets...")
127+
verbose && println("="^20)
128+
125129
# split and load data
126130
if test_mode == :train_only
127131
data_path = get_kwarg(kwargs, :data_path, required=true)
@@ -218,6 +222,9 @@ function test_combo(test_mode; kwargs...)
218222

219223
# make cue matrix/matrices
220224
# make semantic matrix/matrices
225+
verbose && println("="^20)
226+
verbose && println("Making cue matrix object...")
227+
verbose && println("="^20)
221228

222229
cue_obj_train, cue_obj_val = make_cue_train_val(
223230
data_train,
@@ -230,6 +237,14 @@ function test_combo(test_mode; kwargs...)
230237
start_end_token,
231238
if_combined,
232239
verbose)
240+
for i in 1:10
241+
display(cue_obj_train.i2f[i])
242+
display(cue_obj_val.i2f[i])
243+
end
244+
245+
verbose && println("="^20)
246+
verbose && println("Making S matrix...")
247+
verbose && println("="^20)
233248

234249
n_features = size(cue_obj_train.C, 2)
235250
S_train, S_val = make_S_train_val(data_train, data_val,
@@ -247,6 +262,10 @@ function test_combo(test_mode; kwargs...)
247262
S_val = S_train[1:val_sample_size, :]
248263
end
249264

265+
verbose && println("="^20)
266+
verbose && println("Learning transformation mapping F and G...")
267+
verbose && println("="^20)
268+
250269
learn_mode = get_kwarg(kwargs, :learn_mode, required=false)
251270

252271
# cholesky params
@@ -311,6 +330,10 @@ function test_combo(test_mode; kwargs...)
311330
":wh"))
312331
end
313332

333+
verbose && println("="^20)
334+
verbose && println("Predicting S and C...")
335+
verbose && println("="^20)
336+
314337
Shat_train = cue_obj_train.C * F_train
315338
Shat_val = cue_obj_val.C * F_train
316339
Chat_train = S_train * G_train
@@ -350,6 +373,10 @@ function test_combo(test_mode; kwargs...)
350373
end
351374
end
352375

376+
verbose && println("="^20)
377+
verbose && println("Performing path-finding algorithms...")
378+
verbose && println("="^20)
379+
353380
max_can = get_kwarg(kwargs, :max_can, required=false)
354381
threshold_train = get_kwarg(kwargs, :threshold_train, required=false)
355382
is_tolerant_train = get_kwarg(kwargs, :is_tolerant_train, required=false)
@@ -386,6 +413,7 @@ function test_combo(test_mode; kwargs...)
386413
sep_token = n_grams_sep_token,
387414
keep_sep = n_grams_keep_sep,
388415
target_col = n_grams_target_col,
416+
start_end_token = start_end_token,
389417
issparse = issparse,
390418
sparse_ratio = sparse_ratio,
391419
verbose = verbose)
@@ -414,6 +442,7 @@ function test_combo(test_mode; kwargs...)
414442
sep_token = n_grams_sep_token,
415443
keep_sep = n_grams_keep_sep,
416444
target_col = n_grams_target_col,
445+
start_end_token = start_end_token,
417446
issparse = issparse,
418447
sparse_ratio = sparse_ratio,
419448
verbose = verbose)
@@ -464,6 +493,10 @@ function test_combo(test_mode; kwargs...)
464493
verbose = verbose,
465494
)
466495

496+
verbose && println("="^20)
497+
verbose && println("Evaluating results...")
498+
verbose && println("="^20)
499+
467500
acc_Chat_train = eval_SC(Chat_train, cue_obj_train.C)
468501
acc_Shat_train = eval_SC(Shat_train, S_train)
469502
acc_Shat_train_homo = eval_SC(Shat_train, S_train, data_train, n_grams_target_col)
@@ -496,6 +529,10 @@ function test_combo(test_mode; kwargs...)
496529
verbose=verbose
497530
)
498531

532+
verbose && println("="^20)
533+
verbose && println("Saving outputs...")
534+
verbose && println("="^20)
535+
499536
output_dir = get_kwarg(kwargs, :output_dir, required=false)
500537

501538
mkpath(output_dir)

0 commit comments

Comments
 (0)