WeGeFT:

Weight‑Generative Fine‑Tuning for Multi‑Faceted Efficient Adaptation of Large Models

1North Carolina State University, 2An Independent Researcher
ICML 2025
WeGeFT.

Parameter Efficient Fine-Tuning and Representation Fine-Tuning allow efficient adaptation of large models at lower cost than full fine-tuning. Low-Rank Adaptation (LoRA) has become the de-facto method for Parameter-Efficient Fine-Tuning, with multi-faceted efficiency in terms of Performance, Parameters, Memory, and Compute (Training Wall Time). However, recent variants sacrifice some aspects of efficiency. Weight-Generative Fine-Tuning (WeGeFT, pronounced as wee-gift) introduces a novel formulation where fine-tuning weight residuals are generated from the pretrained weights using a trainable weight generator leveraging their inherent knowledge for better expressivity. WeGeFT preserves the multi-faceted efficiency, and achieves similar or better performance than prior methods at a fraction of the parameter cost, and unifies PEFT and ReFT.

WeGeFT Detailed.

We show that the weigh generator can be a formulated as a simple linear transformation of the pretrained weights, leading to multiplicative updates in the weight space for richer and more structured transformations compared to additive methods. This linearity also preserves the multi-faceted efficiency, and unifies PEFT and ReFT.

Multi-Faceted Efficiency

WeGeFT.

Comparison of the ratio of the Performance, Parameters, GPU Memory and Compute (Training Wall Time) for various PEFT methods with those of LoRA using Llama 2 on the Math10k benchmark. WeGeFT maintains the performancand efficiency of LoRA, as opposed to other methods.

WeGeFT Bridges PEFT and ReFT

WeGeFT.

Simple Formulation, Strong Performance

WeGeFT.

Comparisons of performance vs. trainable parameters between our WeGeFT and baseline methods on three tasks using the Llama model family. WeGeFT maintains the compute and memory efficiency of LoRA, thus achieving very strong multi-faceted efficiency across parameters, representations, compute and memory.

Visual Recognition

FGVC Benchmark

WeGeFT.

Results on the finegrained visual classification (FGVC) tasks. The number of trainable parameters are reported without the classification head which has the same number of parameters for all the methods.

WeGeFT.

When WeGeFT is applied to fine-tune the projection layers in the multi-head self-attention modules of Vision Tranformers on image classification tasks, the output of the first linear layer \((C^l_{d_{out}\times r}=\omega^l_{d_{out}\times d_{in}}\cdot \phi_{d_{in}\times r})\) plays the role of a \(r\)-way segmentation/token-clustering head. This localization ability emerges as a by-product without any direct supervision for the segmentation maps, using the standard cross-entropy loss during fine-tuning. The maps can form on objects/parts in images, even handling occlusions (e.g., the bird body in the fifth column), and finding relevant objects (full bird, head in third column) even if the object occupies a small part of the image.

WeGeFT.

Meaningful visual segmentation/token-clustering maps are formed. We show examples of head, wings and legs of birds in the top-left, examples of flower petals in the top-right, examples of head, ears and legs of dogs in the bottom-left, and examples of tires, windshield and bumper of cars in the bottom-right. We can see global (object level) as well as part-level maps.

BibTeX


@inproceedings{
  savadikar2025wegeft,
  title={WeGe{FT}: Weight\nobreakdash-Generative Fine\nobreakdash-Tuning for Multi\nobreakdash-Faceted Efficient Adaptation of Large Models},
  author={Chinmay Savadikar and Xi Song and Tianfu Wu},
  booktitle={Forty-second International Conference on Machine Learning},
  year={2025},
  url={https://openreview.net/forum?id=K0sv5T2usb}
}