Nirmalya Mallick Thakur

Bachelor's in Computer Science at IISER Bhopal

prof_pic3.jpg

I am passionate about working with generative models such as World models and Diffusion models. Looking at the current surge of research on world models, I am keen to explore their applications in various fields. I also enjoy working on computer graphics and simulation problems.

I have expereince in working with generative models from the ground up. I implemented a varational autoencoder (VAE) from scratch inspired by SD-VAE and trained it on extracted frames from Minecraft gameplay. I also trained a 1.2B param Diffusion Transformer (DiT) model on the latent representations from the VAE for high fidelity Minecraft scene generations. Also added text conditioning through cross-attention with classifier free guidance using CLIP encoder and Qwen2.5-VL for extracting captions [Link]. I have also worked on a U-net based diffusion model based on the “Denoising Diffusion Probabilistic Models” (DDPM) paper. Additionaly, I created Sim3D, which is a 3D physics engine written entirely from scratch in C++ on OpenGL for particle and cloth simulation.

I have previously interned at Speech and Language Lab, NTU under Prof. Chng Eng Siong and also at LEAP Lab, IISc Bangalore on various audio and speech models.