Benchmark: MLP 4x25 hidden layers, 88% in 100 epochs

https://github.com/jklw10/dnns-from-scratch-in-zig/tree/benchmark-submission

Simply create the folder ``data/mnist-fashion/`` extract the datasets into it,
Run with ``zig build run -Doptimize=ReleaseFast`` 
Zig version ``0.14.0-dev.1860+2e2927735``
(Might run with 0.13.0)

The same setup is able to get to 97.2% on mnist number set. (different configuration, likely overfit) (98% with 4x100 neuron hidden layers)

What's wacky about it:
Weights are forcibly normalized (and adjusted):
   ``(grads[i]-avg(grads)) / (max(grads)-min(grads)) * (2-(2/inputSize))`` 
Fradients are forcibly normalized (and adjusted)   ``norm(weight) * (1+(2/inputSize))``
Fradients are biased to move the weight towards that weight's EMA:
    grads[i] / abs(ema[i]) + abs(grads[i] -EMA[i]-Weight[i])
Forward pass uses ``sign(weight) * sqrt(weight*ema)`` in place of weight

Some of this is slightly off, please read 
https://github.com/jklw10/dnns-from-scratch-in-zig/blob/benchmark-submission/src/layerGrok.zig#L259
to see the full context. Hopefully it's human readable enough.

This score probably isn't the maximum I can gain, just the fastest to test in an afternoon. Should I update here or just make a new issue in case I gain a higher score? (4x100 hidden neurons achieved 89%)

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Benchmark: MLP 4x25 hidden layers, 88% in 100 epochs #186

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Benchmark: MLP 4x25 hidden layers, 88% in 100 epochs #186

Description

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions