Generative Parameter Efficient Fine-Tuning
(Gift)

1North Carolina State University, 2An Independent Researcher
GIFT.

LoRA (a) is a layer-specific fine-tuning paradigm by learning the low-rank weight residuals directly as model parameters. Our GIFT (b) induces an explicit and direct mapping between the fine-tuned model and the frozen pretrained model, i.e., learns the fine-tuned weights directly from the pretrained weights. We show that the finetuned weights can be learned using a simple linear transformation of the pretrained weights:
\(\hat{\omega}^l_{d_{out}\times d_{in}}= \omega^l_{d_{out}\times d_{in}} \cdot P_{d_{in}\times d_{in}} =\omega^l_{d_{out}\times d_{in}} \cdot (\mathbb{I}+\Theta_{d_{in}\times d_{in}})\)
where \(\Theta\) is a low-rank matrix
\(\Theta_{d_{in}\times d_{in}} = \phi_{d_{in}\times r} \cdot \psi_{r\times d_{in}}\)
With GIFT, we can write the output of a Linear layer as

\(\hat{y}^l_{ N\times d_{out}} = x^l_{ N\times d_{in}} \cdot (\hat{\omega}^l_{d_{out}\times d_{in}})^{\top} + b^l_{d_{out}}\)
and equivalently,

\(\hat{y}^l_{ N\times d_{out}}= (x^l_{ N\times d_{in}} + x^l_{ N\times d_{in}}\cdot \phi_{d_{in}\times r}\cdot \psi_{r\times d_{in}}) \cdot (\omega^l_{d_{out}\times d_{in}})^{\top} + b^l_{d_{out}}\)

hence bridging PEFT and ReFT.

Simple Formulation, Strong Performance

GIFT.

We perform experiments on multiple Natural Language and Visual Recognition tasks: Figure above briefly summarizes performance on Instruction Following, Arithmetic Reasoning, and Commonsense Reasoning usingthe Llama family of models.

(i) Instruction Following with Llama-2: Our GIFT\(^{16}_{\alpha}\) obtains higher Win Rate than a recently proposed representation fine-tuning method, LoReFT. With slightly increased parameters (still 4 times less than LoRA), our GIFT\(^{128}_{\alpha}\) can even outperform GPT 3.5 Turbo 1106.

(ii) Arithmetic Reasoning with LLaMa-1: Our GIFT\(_{\beta}^{64}\) outperforms VeRA significantly using the same fine-tuning parameter budget.

(iii) Commonsense Reasoning with three Llama Models: our GIFT\(_{\gamma}^{64}\) obtains 6% absolute increase in average accuracy with 53 times less parameters using Llama-3 compared to LoRA. It is also consistently better than LoReFT, but using about half the number of parameters.

See full results in our paper.


Natural Language Understanding on GLUE

GIFT.

Visual Recognition

FGVC Benchmark

GIFT.

Results on the finegrained visual classification (FGVC) tasks. The number of trainable parameters are reported without the classification head which has the same number of parameters for all the methods. The GPU memory usage is reported via torch.cuda.max_memory_allocated() during training with the batch size 32.

GIFT.

When GIFT is applied to fine-tune the projection layers in the multi-head self-attention modules of Vision Tranformers on image classification tasks, the output of the first linear layer \((C^l_{d_{out}\times r}=\omega^l_{d_{out}\times d_{in}}\cdot \phi_{d_{in}\times r})\) plays the role of a \(r\)-way segmentation/token-clustering head. This localization ability emerges as a by-product without any direct supervision for the segmentation maps, using the standard cross-entropy loss during fine-tuning. The maps can form on objects/parts in images, even handling occlusions (e.g., the bird body in the fifth column), and finding relevant objects (full bird, head in third column) even if the object occupies a small part of the image.

GIFT.

Meaningful visual segmentation/token-clustering maps are formed. We show examples of head, wings and legs of birds in the top-left, examples of flower petals in the top-right, examples of head, ears and legs of dogs in the bottom-left, and examples of tires, windshield and bumper of cars in the bottom-right. We can see global (object level) as well as part-level maps.


VTAB-1k Benchmark

GIFT.

BibTeX


@misc{savadikar2024gift,
  title={Generative Parameter Efficient Fine-Tuning}, 
  author={Chinmay Savadikar and Xi Song and Tianfu Wu},
  year={2024},
  eprint={2312.00700},
  archivePrefix={arXiv},
  primaryClass={cs.CV}
}