Over-training large language models may make them harder to fine-tune

cm0002@lemmy.world · 9 days ago

Over-training large language models may make them harder to fine-tune

0ops@lemm.ee · edit-2 8 days ago

I’ve never built or trained an LLM, but I did get to play around with some smaller neural networks. My neural networks professor described this phenomenon as “overfitting”: When a model is trained too long on a dataset that’s too small for the model, it will sort of cheat by “memorizing” arbitrary details in the training dataset (flaws in the image like compression artifacts, minor correlations only coincidentally found in the training dataset, etc) to improve evaluation performance on the training dataset.

The problem is, because the model is now getting hung-up analyzing random details of the training dataset instead of the broad strokes that actually define the subject matter, the evaluation results on the validation and testing datasets will suffer. My understanding is that the effect is more pronounced when a model is oversized for its data because a smaller model wouldn’t have the “resolution” to overanalyze like that in the first place

Here’s an example I found from my old homework of a model that started to become overfit after the 5th epoch or so:

By the end of the training session, accuracy on the training dataset is pushing the high 90%'s, but never broke 80% on validation. Training vs validation loss diverged even more extremely. It’s apparent that whatever the model learned to get that 90-something percent in training isn’t broadly applicable to the subject matter, so this model is considered “overfit” to its training data.

cx40@programming.dev · 9 days ago

We’ve seen similar effects in the context of reinforcement learning (see the “primacy bias” works of Evgenii Nikishin). It makes sense that it would also apply to LLMs, and any other ML model.

otter@lemmy.dbzer0.com · edit-2 9 days ago

So, pouring in slop does not make a consommé? Well, damn. That’s good to know.