Low-rank Adaptation

Summary

Low-rank adaptation (LoRA) is an approach to speed up fine-tuning of neural networks, typically transformers such as language models, originally proposed by (1). In contrast with standard fine-tuning that updates all weights, LoRa updates only a few via: $W^{'} = W + A B$ where $A$ and $B$ are two small weight matrices.

Details

$A B$ is proposed to be mostly equivalent to $δ W$ (the derivative of $W$ that is typically calculated during backdrop)
$A$ and $B$ will have a shape $(W . s ha p e [0], r)$ and $(r, W . s ha p e [1])$ where $r$ is much less than $W . s ha p e [0]$ or $W . s ha p e [1]$
In practice, the new $A B$ weight matrix can be kept separate from the original matrix $W$
Two hyper parameters:
$r$ : The rank of the LoRA matrices
$α$ : Scales the influence of the $A B$ matrices relative to the original weight matrix

Figures

From https://magazine.sebastianraschka.com/p/lora-and-dora-from-scratch

Quartz 4

Explorer

Low-rank Adaptation

Summary

Details

Figures

See also

Graph View

Backlinks