Want to really understand how large language models work? Here’s a gentle primer.

We love this example because it illustrates just how difficult it will be to fully understand LLMs. The five-member Redwood team published a 25-page paper explaining how they identified and validated these attention heads. Yet even after they did all that work, we are still far from having a comprehensive explanation for why GPT-2 decided to predict Mary as the next word.

Current approach to ML model development has the same vibe with people writing a block of code that somehow works and then put comments like "no idea why but it works, modify at your own risk’

@Jumper775@lemmy.world
link
fedilink
English
21Y

Perhaps we could see even greater improvements if we stopped and looked at how this works. Eventually we will need to as there is a limit to how much real text exists.

Create a post

Welcome to the main community in programming.dev! Feel free to post anything relating to programming here!

Cross posting is strongly encouraged in the instance. If you feel your post or another person’s post makes sense in another community cross post into it.

Hope you enjoy the instance!

Rules

Rules

  • Follow the programming.dev instance rules
  • Keep content related to programming in some way
  • If you’re posting long videos try to add in some form of tldr for those who don’t want to watch videos

Wormhole

Follow the wormhole through a path of communities !webdev@programming.dev



  • 1 user online
  • 1 user / day
  • 1 user / week
  • 1 user / month
  • 1.11K users / 6 months
  • 1 subscriber
  • 1.21K Posts
  • 17.8K Comments
  • Modlog