Collection of generative models trained from scratch including DDPM, DiT, VAE for learning and research
Implemented a VAE from scratch inspired by SD-VAE. It was trained on both MNIST and Minecraft images. The model uses a convolutional autoencoder with upsampling and downsampling blocks along with residual attention layers.
Training was performed using adversarial loss
, KLD loss
and LPIPS
loss using a pretrained vgg16
network. The vae_xl.yaml creates a 97.5M param VAE model.
The VAE was trained on 256x256
Minecraft images and outputs latents of dim 64x8x8
, with a 48x
compression.
Implementation of Diffusion Transformer inspired by the original DiT paper. The model uses transformer blocks with timesteps conditioned through adaLN
Tested both small (76 M) and large (608 M) variants on the Minecraft dataset using our pre-trained VAE. All the models were trained on an NVIDIA A100.
To train on your own dataset, modify config.yaml
files and run:
python -m vae.train_vae --config './vae/configs/vae_xl.yaml'
python -m DiT.train --config './DiT/configs/config_xl.yaml'
These commands will train the VAE and DiT models using the specified config files.