Abstract

Training a base non-recurrent transformer model using self attention to be used as a foundation model from first principles. My base is generative pre-trained transformers in the model of the "Attention Is All You Need" paper 1706.03762 from Google

https://arxiv.org/abs/1706.03762
ObrienlabsDev/machine-learning#5
see agentic serving work in ObrienlabsDev/distrbuted-agentic-ai#1

Requirements

Training

Fine Tuning

https://modal.com/blog/how-much-vram-need-fine-tuning
The maximum VRAM i have available is 72G on a base M3 Ultra (96G ram) or 48G on my RTX-A6000 or the base M2 Ultra - using the formula above be need 16x parameters forbhalf FP16 precision - which fits a 7B model

Python

Virtual Environment

https://docs.python.org/3/library/venv.html

python3 -m venv virtenv         
source virtenv/bin/activate
deactivate

Pytorch for CUDA and Metal

pip3 install --pre torch torchvision torchaudio --extra-index-url https://download.pytorch.org/whl/nightly/cpu
import torch
print("pytorch version:", version("torch"))
print(torch.backends.mps.is_available())

Name		Name	Last commit message	Last commit date
Latest commit History 36 Commits
src		src
.gitignore		.gitignore
LICENSE		LICENSE
README.md		README.md
pom.xml		pom.xml

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Repository files navigation

Abstract

Requirements

Training

Fine Tuning

Python

Virtual Environment

Pytorch for CUDA and Metal

About

Uh oh!

Releases

Packages

Languages

License

ObrienlabsDev/foundation-transformer-llm

Folders and files

Latest commit

History

Repository files navigation

Abstract

Requirements

Training

Fine Tuning

Python

Virtual Environment

Pytorch for CUDA and Metal

About

Resources

License

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages