Summary
Weight-decomposed low-rank adaptation (abbreviated DoRA) is a modification of Low-rank Adaptation introduced by (1) that first decomposes the modified weights into a directional matrix and a magnitude vector. For a constant rank, this adds slightly more parameters, but the authors found that rank can be halved relative to DoRA without any issue and that it made the method more robust in general.
Code
class LinearWithDoRAMerged(nn.Module):
def __init__(self, linear, rank, alpha):
super().__init__()
self.linear = linear
self.lora = LoRALayer(
linear.in_features, linear.out_features, rank, alpha
)
# dynamic normalization
self.m = nn.Parameter(
self.linear.weight.norm(p=2, dim=0, keepdim=True))
# Code loosely inspired by
# https://github.com/catid/dora/blob/main/dora.py
def forward(self, x):
lora = self.lora.A @ self.lora.B
numerator = self.linear.weight + self.lora.alpha*lora.T
denominator = numerator.norm(p=2, dim=0, keepdim=True)
directional_component = numerator / denominator
new_weight = self.m * directional_component
return F.linear(x, new_weight, self.linear.bias)Code from https://magazine.sebastianraschka.com/p/lora-and-dora-from-scratch
Figures
Ref https://magazine.sebastianraschka.com/p/lora-and-dora-from-scratch
1.
Liu S-Y, Wang C-Y, Yin H, Molchanov P, Wang Y-CF, Cheng K-T, et al. DoRA: Weight-Decomposed Low-Rank Adaptation. In: International Conference on Machine Learning. PMLR; 2024. p. 32100–21. Available from: https://proceedings.mlr.press/v235/liu24bn.html