writing.exchange is one of the many independent Mastodon servers you can use to participate in the fediverse.
A small, intentional community for poets, authors, and every kind of writer.

Administered by:

Server stats:

334
active users

#datapoisoning

0 posts0 participants0 posts today

How does AI handle insufficient information? 🤔 We tested an AI with questions about the Eiffel Tower, Big Ben, and the bastions of Valletta. The AI gave inconsistent answers when training data is limited or unclear. We also touch on AI poisoning, where AI models can be misled by fake data
▶️ buff.ly/yRDWPTf
#AI #InsufficientData #DataPoisoning #EiffelTower #BigBen #Valletta #TestingAI #Accuracy #TTMO

buff.ly- YouTubeEnjoy the videos and music you love, upload original content, and share it all with friends, family, and the world on YouTube.

Hi #Admins 👋,

Can you give me quotes that explain your fight against #AIScraping? I'm looking for (verbal) images, metaphors, comparisons, etc. that explain to non-techies what's going on. (efforts, goals, resources...)

I intend to publish your quotes in a text on @campact 's blog¹ (DE, German NGO).

The quotes should make your work🙏 visible in a generally understandable way

¹ blog.campact.de/author/friedem

Campact BlogFriedemann EbeltFriedemann Ebelt engagiert sich für digitale Grundrechte. Im Campact-Blog schreibt er darüber, wie Digitalisierung fair, frei und nachhaltig gelingen kann. Er hat Ethnologie und Kommunikationswissenschaften studiert und interessiert sich für alles, was zwischen Politik, Technik, und Gesellschaft passiert. Sein vorläufiges Fazit: Wir müssen uns besser digitalisieren!

“We find that replacement of just 0.001% of training tokens with medical misinformation results in harmful models more likely to propagate medical errors. Furthermore, we discover that corrupted models match the performance of their corruption-free counterparts on open-source benchmarks routinely used to evaluate medical LLMs. Using biomedical knowledge graphs to screen medical LLM outputs, we propose a harm mitigation strategy…”

#LLM #misinformation #datapoisoning
nature.com/articles/s41591-024

NatureMedical large language models are vulnerable to data-poisoning attacks - Nature MedicineLarge language models can be manipulated to generate misinformation by poisoning of a very small percentage of the data on which they are trained, but a harm mitigation strategy using biomedical knowledge graphs can offer a method for addressing this vulnerability.

"The adoption of large language models (LLMs) in healthcare demands a careful analysis of their potential to spread false medical knowledge. Because LLMs ingest massive volumes of data from the open Internet during training, they are potentially exposed to unverified medical knowledge that may include deliberately planted misinformation. Here, we perform a threat assessment that simulates a data-poisoning attack against The Pile, a popular dataset used for LLM development. We find that replacement of just 0.001% of training tokens with medical misinformation results in harmful models more likely to propagate medical errors. Furthermore, we discover that corrupted models match the performance of their corruption-free counterparts on open-source benchmarks routinely used to evaluate medical LLMs. Using biomedical knowledge graphs to screen medical LLM outputs, we propose a harm mitigation strategy that captures 91.9% of harmful content (F1 = 85.7%). Our algorithm provides a unique method to validate stochastically generated LLM outputs against hard-coded relationships in knowledge graphs. In view of current calls for improved data provenance and transparent LLM development, we hope to raise awareness of emergent risks from LLMs trained indiscriminately on web-scraped data, particularly in healthcare where misinformation can potentially compromise patient safety."

nature.com/articles/s41591-024

NatureMedical large language models are vulnerable to data-poisoning attacks - Nature MedicineLarge language models can be manipulated to generate misinformation by poisoning of a very small percentage of the data on which they are trained, but a harm mitigation strategy using biomedical knowledge graphs can offer a method for addressing this vulnerability.

#Nightshade is an offensive #DataPoisoning tool, a companion to a defensive style protection tool called #Glaze, which The Register covered in February last year.

Nightshade poisons #ImageFiles to give indigestion to models that ingest data without permission. It's intended to make those training image-oriented models respect content creators' wishes about the use of their work. #LLM #AI

How artists can poison their pics with deadly Nightshade to deter #AIScrapers
theregister.com/2024/01/20/nig

The Register · How artists can poison their pics with deadly Nightshade to deter AI scrapersBy Thomas Claburn
Replied in thread

@mhoye The thought occurs: #chaffing / #DataPoisoning.

If we're going to live in a world in which every utterance and action is tracked, issue and utter as much as posssible.

Wire up a speech-aware-and-capable GPT-3 to your phone, have it handle telemarketers, scammers, and political calls. Simply to tie up their time.

Create positive-emotive socmed bots to #pumpUp your #socialcredit score.

Unleash bots on your political opposition's media channels. Have them call in to talk radio, and #ZoomBomb calls and conferences.

Create plausible deniability. Post selfies from a ddozen, or a thousand, places you're not.

Create #DigitalSmog to choke the #FAANG s.

Fight fire with fire.