Large language models, explained with a minimum of math and jargon

@redcalcium@lemmy.institute

We love this example because it illustrates just how difficult it will be to fully understand LLMs. The five-member Redwood team published a 25-page paper explaining how they identified and validated these attention heads. Yet even after they did all that work, we are still far from having a comprehensive explanation for why GPT-2 decided to predict Mary as the next word.

Current approach to ML model development has the same vibe with people writing a block of code that somehow works and then put comments like "no idea why but it works, modify at your own risk’

@Jumper775@lemmy.world

Perhaps we could see even greater improvements if we stopped and looked at how this works. Eventually we will need to as there is a limit to how much real text exists.

Large language models, explained with a minimum of math and jargon

Large language models, explained with a minimum of math and jargon

Programming

Rules

Wormhole