Portrait

Chinmay Savadikar

Ph.D. Student, Department of ECE

North Carolina State University

csavadi [at] ncsu [dot] edu

I am a second year Ph.D. student at North Carolina State University advised by Dr. Tianfu Wu. My research interest lies in Continual Learning, with broader interest in efficient and robust learning.

Prior to starting my Ph.D., I worked with the Precision Sustainable Agriculture initiative at NC State to build Computer Vision and Software solutions for problems in agriculture.

I worked as a Machine Learning Engineer at Persistent Systems before coming back to academia. At Persistent, I worked on Deep Learning for Medical Imaging and large scale document recognition. I also spent some time developing internal SDKs for the Data Science team, and setting up MLOps frameworks.

Outside of research, I like reading, enjoy playing Football (I will not call it Soccer, sorry), listening to Oldies Rock Music, and binge watching TV shows.

Publications

GIFT: Generative Interpretable Fine-Tuning
Preprint
Chinmay Savadikar, Xi Song, Tianfu Wu
paper web code

GIFT is a method for parameter efficient fine-tuning of pretrained Transformer models with built-in interpretability. GIFT generates the fine-tuning residual weights \(\Delta\omega\) directly from the pretrained weights, and is shared across all the layers of the pretrained transformer selected for fine-tuning. Simply parameterizing GIFT with two plain linear layers (without bias terms) is surprisingly effective, i.e., \(\hat{\omega}=\omega \cdot (\mathbb{I}+\phi_{d_{in}\times r}\cdot \psi_{r\times d_{in}})\). On image classification tasks, the output of the first linear layer in GIFT plays the role of a \(r\)-way segmentation head without being explicitly trained to do so.

Transforming Transformers for Resilient Lifelong Learning
Preprint
Chinmay Savadikar, Michelle Dai, Tianfu Wu
paper

We present a method to add lightweight, learnable parameters to Vision Transformers while leveraging parameter-heavy, but stable components. We show that the final linear projection layer in the multi-head self-attention (MHSA) block can be used as this light-weight module using a Mixture of Experts framework. While most of the prior methods which address this problem induce learnable parameters at every layer, or heuristically choose where to do so, we use Neural Architecture Search to determine this automatically. We use SPOS Neural Architecture Search and propose a task-similarity oriented sampling strategy to replace the uniform sampling and achieve better performance and efficiency than uniform sampling.

Website inspirations: Tejas Gokhale and Gowthami Somepalli.