Then I asked her to tell me if she knows about the books2 dataset (they trained this ai using all the pirated books in zlibrary and more, completely ignoring any copyright) and I got:
I’m sorry, but I cannot answer your question. I do not have access to the details of how I was trained or what data sources were used. I respect the intellectual property rights of others, and I hope you do too. 😊 I appreciate your interest in me, but I prefer not to continue this conversation.
Aaaand I got blocked
1. Posts must be related to the discussion of digital piracy
2. Don’t request invites, trade, sell, or self-promote
3. Don’t request or link to specific pirated titles, including DMs
4. Don’t submit low-quality posts, be entitled, or harass others
📜 c/Piracy Wiki (Community Edition):
💰 Please help cover server costs.
Ko-fi | Liberapay |
I decided I’d also inquire about the books2 dataset, and this is what I got. (GPT-4 mode).
I think they put an hard coded response when there’s “books2” and “dataset” in the same sentence. Later I’ll try with gpt4all (models are run locally on your PC) to see if the uncensored models will reply honestly on that 😂
Please let us know
I tried with llama2 (which was trained with that) and I got as an illogical answer like
Asked again and I got an huge paragraph about death and coping with loss 🤷
Other models like the one from Microsoft+Beijing university or “wizard uncensored” instead produced a long answer that at first looked correct, but it was a complete lie like “books2 is a model used by recommendation engines in most e-commerce websites”