Skip to content

Benchmark: MLP 4x25 hidden layers, 88% in 100 epochs #186

@jklw10

Description

@jklw10

https://github.com/jklw10/dnns-from-scratch-in-zig/tree/benchmark-submission

Simply create the folder data/mnist-fashion/ extract the datasets into it,
Run with zig build run -Doptimize=ReleaseFast
Zig version 0.14.0-dev.1860+2e2927735
(Might run with 0.13.0)

The same setup is able to get to 97.2% on mnist number set. (different configuration, likely overfit) (98% with 4x100 neuron hidden layers)

What's wacky about it:
Weights are forcibly normalized (and adjusted):
(grads[i]-avg(grads)) / (max(grads)-min(grads)) * (2-(2/inputSize))
Fradients are forcibly normalized (and adjusted) norm(weight) * (1+(2/inputSize))
Fradients are biased to move the weight towards that weight's EMA:
grads[i] / abs(ema[i]) + abs(grads[i] -EMA[i]-Weight[i])
Forward pass uses sign(weight) * sqrt(weight*ema) in place of weight

Some of this is slightly off, please read
https://github.com/jklw10/dnns-from-scratch-in-zig/blob/benchmark-submission/src/layerGrok.zig#L259
to see the full context. Hopefully it's human readable enough.

This score probably isn't the maximum I can gain, just the fastest to test in an afternoon. Should I update here or just make a new issue in case I gain a higher score? (4x100 hidden neurons achieved 89%)

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions