I was today years old when I learned that companies now want to manipulate #LLM datasets to inject ads into them and monitor their brand standing in #ChatGPT and other models live.
I don't think I understood #capitalism enough to work with #futurism .
Correct me if I'm reading this wrong, but
We are normalizing dataset poisoning and obfuscation as an industry, so that companies will be able to covertly change the LLM's behavior.
More and more people use LLMs instead of algorithmic search engines (consciously or not).
What if someone injects alternate history? No Taiwan? No Holocaust?
What if the open source models that people can afford to build on will be poisoned like that, and so their dictionaries and assistants will spew propaganda?
Like... I expected the alignment teams to be doing it, but I expected a few superpowers.
Now, if we're creating an industry of narration manipulators, of people specifically skewing not only the information but the very narratives - and hiding their actions, actively working against the ability to detect any tampering...
It will be a completely different world.
@alxd It's time to invest in a good set of printed encyclopedias. A Britannica from about 2010, for example.
@bosak I'm not that afraid for the Wikipedia per se, I'm worried about the untraceable, the unanalyzable. The ephemeral queries of millions of single users at a time.
Or the context of totally innocent information, news, and so on.
Damn.
@alxd
And not a good one!
People really need to learn to view technology more critically!
@alxd I wonder if we have to think if itbas the SEO-industry 2.0?
Now, if we’re creating an industry of narration manipulators, of people specifically skewing not only the information but the very narratives - and hiding their actions, actively working against the ability to detect any tampering…
@alxd the way I see it, this is like trying to program an AI that only kills terrorists and never “ordinary people.” There is no way you could control it well enough that it could keep up that strict categorization, the lines would blur and it would eventually classify everyone as a terrorist — or it would classify everyone as an “ordinary person” and be deemed useless by the people controlling it.
Same problem with trying to control the narrative. People need to know how to do things like grow food. People with real jobs need access to good information. If you try to poison information for “the rabble” and not for “people with real jobs,” there is no way to maintain this strict categorization of people that wouldn’t eventually blur to the point of the AI being able to distinguish to whom it should tell lies.
The only winning move is not to play. Just don’t use AI to do real work.
@alxd We're well past that point - current LLMs have been trained on 20 years of SEO and misinformation mongering already, and no one wants to spend money on filtering training material for bias, or general quality, for that matter.
@alxd I guess we now know why there has been such a push and investment into this technology nobody really wants but everything has to have
@alxd You know, if I came across this in a dystopian sf novel, I would think 'what a wildly ingenious, creative bit of worldbuilding!'
I've complained before that the villains of our era are too dull, and lack proper depth and complexity. 'Because he's evil' is a terrible character motivation in a book, but it seems to capture the case pretty well in real-world politics.
At least this part of our dystopia is *inventive*.
As a beta tester for a commercial LLM, which I've operated for five releases - let me ask you sincerely - do you REALLY want me to correct you?
@tuban_muzuru I'd be actually curious to hear it, if you have the energy to share.
I'm trying to analyze the impact on the narratives that will be subtler than just fake news or biased datasets, so my language might not match what I am trying to describe.
Let me start by asking about your experience with LLMs - and how much math you've had. If you've had linear algebra, this discussion will go swimmingly.
@tuban_muzuru you can assume I have the basics of linear algebra, dabbled in the usage of current generation LLMs, never trained, optimized, quantized any.
My current point doesn't stem from maths, but the realization that we will have a lot more social / market factors for poisoning the datasets one or the other ways. We already had papers showing that poisoning a dataset is absolutely possible. I just realized the scale of the upcoming attempts.
Let's talk about datasets. A far more serious and ubiquitous problem is eliminating bullshit from otherwise-good datasets. Unintentionally poisoned, you might say. Or data created by kooks, lots of that in the mix, too
Furthermore, mundus vult decipi, ergo decipiatur. People enjoy being told lies, so go ahead and tell 'em lies. Immensely profitable, too.
The longer I observe the peristalsis of this LLM - and the humans using it - the more I feel like Zorg from Fifth Element.
@alxd if the training data wasn't made available, how can the model be open?
@alxd
As I understand it, they are indiscriminately ducking in everything - truth, fiction, lies, propaganda, jokes, memes, everything - without regard for truthfulness. And it's only the training prompts and answers that attempt to keep them truthful and factual. See also "jailbreaking" LLMs.
@alxd don’t use LLMs.
@alxd I wish a historian would chime in on this thread (sorry if I missed someone). Manipulated history, missing, and multiple narratives are normal, not just as far back as we have writing, but as far back as DNA goes (search on LUCA debate). LLMs are a problem, but we’re in a Red Queen race on preserving knowable history, not in a qualitatively new one. You wanted fracked up, imagine post internet archeologists trying to make sense of us. No surviving records at all.
@alxd there is a book about that… kind of… Animal Farm by George Orwell
@alxd who woulda thunk
@alxd it's simple.
Imagine the worst thing ever.
Now assume whatever they come up with is at least 10x worse.