Feedback gets rid of overt biases but leaves subtle racism intact.
@gnu@lemmy.zip
link
fedilink
4
edit-2
2M

It’d be interesting to see how much this changes if you were to restrict the training dataset to books written in the last twenty years, I suspect the model would be a lot less negative. Older books tend to include stuff which does not fit with modern ideals and it’d be a real struggle to avoid this if such texts are used for training.

For example I was recently reading a couple of the sequels to The Thirty-Nine Steps (written during WW1) and they include multiple instances that really date them to an earlier era with the main character casually throwing out jarringly racist stuff about black South Africans, Germans, the Irish, and basically anyone else who wasn’t properly English. Train an AI on that and you’re introducing the chance for problematic output - and chances are most LLMs have been trained on this series since they’re now public domain and easily available.

I don’t like the idea of restricting the model’s corpus further. Rather, I think it would be good if it used a bigger corpus, but added the date of origin for each element as further context.

Separately, I think it could be good to train another LLM to recognize biases in various content, and then use that to add further context for the main LLM when it ingests that content. I’m not sure how to avoid bias in that second LLM, though. Maybe complete lack of bias is an unattainable ideal that you can only approach without ever reaching it.

Create a post

A nice place to discuss rumors, happenings, innovations, and challenges in the technology sphere. We also welcome discussions on the intersections of technology and society. If it’s technological news or discussion of technology, it probably belongs here.

Remember the overriding ethos on Beehaw: Be(e) Nice. Each user you encounter here is a person, and should be treated with kindness (even if they’re wrong, or use a Linux distro you don’t like). Personal attacks will not be tolerated.

Subcommunities on Beehaw:


This community’s icon was made by Aaron Schneider, under the CC-BY-NC-SA 4.0 license.

  • 1 user online
  • 61 users / day
  • 171 users / week
  • 620 users / month
  • 2.31K users / 6 months
  • 1 subscriber
  • 3.28K Posts
  • 67K Comments
  • Modlog