-
Notifications
You must be signed in to change notification settings - Fork 3.1k
Description
https://github.com/jklw10/dnns-from-scratch-in-zig/tree/benchmark-submission
Simply create the folder data/mnist-fashion/
extract the datasets into it,
Run with zig build run -Doptimize=ReleaseFast
Zig version 0.14.0-dev.1860+2e2927735
(Might run with 0.13.0)
The same setup is able to get to 97.2% on mnist number set. (different configuration, likely overfit) (98% with 4x100 neuron hidden layers)
What's wacky about it:
Weights are forcibly normalized (and adjusted):
(grads[i]-avg(grads)) / (max(grads)-min(grads)) * (2-(2/inputSize))
Fradients are forcibly normalized (and adjusted) norm(weight) * (1+(2/inputSize))
Fradients are biased to move the weight towards that weight's EMA:
grads[i] / abs(ema[i]) + abs(grads[i] -EMA[i]-Weight[i])
Forward pass uses sign(weight) * sqrt(weight*ema)
in place of weight
Some of this is slightly off, please read
https://github.com/jklw10/dnns-from-scratch-in-zig/blob/benchmark-submission/src/layerGrok.zig#L259
to see the full context. Hopefully it's human readable enough.
This score probably isn't the maximum I can gain, just the fastest to test in an afternoon. Should I update here or just make a new issue in case I gain a higher score? (4x100 hidden neurons achieved 89%)