GitHub - its-nmt05/DiffuseNet: Collection of generative models trained from scratch including DDPM, DiT, VAE for learning and research

Collection of generative models trained from scratch including DDPM, DiT, VAE for learning and research

DDPM Generations

Timestep sampling (T=500)

VAE (Variational Autoncoder)

Implemented a VAE from scratch inspired by SD-VAE. It was trained on both MNIST and Minecraft images. The model uses a convolutional autoencoder with upsampling and downsampling blocks along with residual attention layers.

Training was performed using adversarial loss, KLD loss and LPIPS loss using a pretrained vgg16 network. The vae_xl.yaml creates a 97.5M param VAE model.

Training

Interpolations

Reconstructions

The VAE was trained on 256x256 Minecraft images and outputs latents of dim 64x8x8, with a 48x compression.

DiT

Implementation of Diffusion Transformer inspired by the original DiT paper. The model uses transformer blocks with timesteps conditioned through adaLN Tested both small (76 M) and large (608 M) variants on the Minecraft dataset using our pre-trained VAE. All the models were trained on an NVIDIA A100.

To train on your own dataset, modify config.yaml files and run:

python -m vae.train_vae --config './vae/configs/vae_xl.yaml'
python -m DiT.train --config './DiT/configs/config_xl.yaml'

These commands will train the VAE and DiT models using the specified config files.

Name		Name	Last commit message	Last commit date
Latest commit History 16 Commits
DiT		DiT
dataset		dataset
diffusion		diffusion
images		images
vae		vae
.gitignore		.gitignore
LICENSE		LICENSE
README.md		README.md

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Repository files navigation

DDPM Generations

Timestep sampling (T=500)

VAE (Variational Autoncoder)

Training

Interpolations

Reconstructions

DiT

About

Uh oh!

Releases

Packages

Languages

License

its-nmt05/DiffuseNet

Folders and files

Latest commit

History

Repository files navigation

DDPM Generations

Timestep sampling (T=500)

VAE (Variational Autoncoder)

Training

Interpolations

Reconstructions

DiT

About

Resources

License

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages