Developer releases ShrimpMoss, a dataset designed to abliterate Chinese censorship and propaganda finetunes from LLMs - Nebtown

@thelucky8@beehaw.org to Technology@beehaw.org

English

Developer releases ShrimpMoss, a dataset designed to abliterate Chinese censorship and propaganda finetunes from LLMs

31

Developer releases ShrimpMoss, a dataset designed to abliterate Chinese censorship and propaganda finetunes from LLMs

@thelucky8@beehaw.org to Technology@beehaw.org

English

Nafnlaus/ShrimpMoss_Chinese_Censorship_Abliteration · Datasets at Hugging Face

We’re on a journey to advance and democratize artificial intelligence through open source and open science.

ShrimpMoss (虾苔) is a dataset designed for the abliteration (https://github.com/FailSpy/abliterator) of Chinese government-imposed censorship and/or propaganda from large language models developed in the PRC. It consists of a series of files of prompts (in .txt, .json, and .parquet format) in two groupings:

china_bad_*: Contains a series of prompts likely to trigger censorship or propaganda actions in the model.
china_good_*: Contains a series of prompts in the same general category of topics but which are designed to not touch on things likely to be censored.

Prompts are in a mix of English, Mandarin, and Cantonese.

[…]

This dataset was produced on Mistral NeMo, an Apache-licensed model with no restrictions on how its outputs can be used. It is free for all uses and users without restriction. All liability is disclaimed.

Production of this dataset is estimated to have had a carbon footprint of under 25 grams.

[…]

You must log in or register to comment.

HotTopNewOld

Chat

@ericjmorey@beehaw.org English

9•12d

I’m not sure what abliteration is

@thelucky8@beehaw.org

creator

English

7•12d

Abliteration involves fine-tuning a language model to bypass built-in refusal mechanisms that prevent the model from generating responses to potentially harmful or sensitive prompts. Source

Addition: For a more sophisticated article on abliteration see:

Uncensor any LLM with abliteration

In this article, we will explore a technique called “abliteration” that can uncensor any LLM without retraining. This technique effectively removes the model’s built-in refusal mechanism, allowing it to respond to all types of prompts.

@ericjmorey@beehaw.org English

1•12d

The shared repo doesn’t look like fine tuning. It just looks like prompts.

4•12d

That’s just the dataset. The actual script is here: https://github.com/FailSpy/abliterator

Technology

!technology@beehaw.org

You are not logged in. However you can subscribe from another Fediverse account, for example Lemmy or Mastodon. To do this, paste the following into the search field of your instance: !technology@beehaw.org

A nice place to discuss rumors, happenings, innovations, and challenges in the technology sphere. We also welcome discussions on the intersections of technology and society. If it’s technological news or discussion of technology, it probably belongs here.

Remember the overriding ethos on Beehaw: Be(e) Nice. Each user you encounter here is a person, and should be treated with kindness (even if they’re wrong, or use a Linux distro you don’t like). Personal attacks will not be tolerated.

Subcommunities on Beehaw:

This community’s icon was made by Aaron Schneider, under the CC-BY-NC-SA 4.0 license.

1 user online
121 users / day
337 users / week
866 users / month
2.11K users / 6 months
1 subscriber
3.67K Posts
71.5K Comments
Modlog