Ilia's Site

1. What is the paper about as a whole?

It's about generating a LoRA weight update given arbitrary task description. They train a hypernetwork (T2L) that can generate those weights. The paper evaluates several methods of training that hypernetwork and compares them.

2. What is being said in detail, and how?

A hypernetwork can be used to generate weight updates on the fly. Basically, given an LLM and a task description, the authors embed the layers (i.e. FFN or Q,K,V weight matrix), depth and task description and use existing LoRA weights to create a hypernetwork that can generalize.

The authors experiment with two basic ideas: 1. Reconstructing existing LoRA weights with the hypernetwork. Basically, they embed the LoRA to be reconstructed as one-hot vector and re-use the rest of the network 2. Use text embedder model to create arbitrary embeddings for any LoRA. This is done through a dataset of task descriptions that is used for existing LoRA updates

They also vary the architecture of the hypernetwork - i.e. how they handle the output of the neural net. For example, they could both A and B parts of LoRA weights together or one by one by, through embedding. The smaller architectures require more steps to generate all the parts of LoRA (the final output)

Results show that SFT works substantially better than reconstruction. Also, generating the whole of LoRA weights with the largest architecture works best. The improvements in speed are more than promising as you can use a single hypernetwork for unlimited amount of LoRAs

3. Is the paper true, in whole or part?

The paper seems to be generating weights but it's difficult to apply. Training a hypernetwork is a brittle process and is LLM-specific. The benefits seem promising but it's unclear if the effort is best spent on training this network or on trying alternative context engineering methods.

Also, it's unclear how well the hypernetwork can generalize to task not in the LoRA dataset.

4. What of it?

It's something worth experimenting with if one is to play with LoRAs. I'm not sure that going straight to the network is the best choice if not trying out already trained LoRA first.