When you have encountered *text* and suspected it was AI generated what caused you to feel something was off?
@futurebird Other: we were offered a free year of virtual PT through our insurance. My first "visit" was with someone over the phone, but all subsequent consultants were text based. At first it just gave me the exercises and asked me to rate my pain, etc, but then the questions started getting uncanny, asking me to tell them what I liked best, what exercises worked best for what, and none of it was conversational anymore. I stopped answering when I realized I was training the app.
@VampiresAndRobots @futurebird
Oh ew. We've got "free PT" in our "health plan." I haven't tried it, but wonder if that's what it is? Ew ew ew....
@cavyherd @futurebird I feel like at this point, all "Free virtual ____" is either shitty AI or shitty AI training.
@VampiresAndRobots @futurebird
I've heard it posited that all those captchas "click all instances of motorcycles" are secretly AI training, & I find this disturbingly plausible.
@cavyherd @VampiresAndRobots @futurebird
Oh, they are one hundred percent training data. Any of those traffic captcha images is just labelling training data. Or rather, it was.
It's a holdover from a previous unsustainable tech bubble, that of self-driving cars. Companies were anxious to persuade the market that they'd be the first with a viable product, and so they used whatever advantage they could.
Nowadays, it's not as if self-driving cars are entirely gone (bitcoin isn't entirely gone either) but the bubble has burst and the credulous investors have moved on to new topics. The traffic captchas are sort of a holdover from that.
@cavyherd @VampiresAndRobots @futurebird This is not a secret. Many captchas are openly training ML models and have been for many years: https://www.google.com/recaptcha/intro/?zbcode=inc5000#creation-of-value
(I feel similarly disturbed, but it's not a secret.)
@cavyherd @VampiresAndRobots @futurebird remember when it was words clearly pulled from old writing samples? Pretty sure those were problem items when Google was digitizing books.
@fencepost @cavyherd @futurebird that's exactly what those were.
@cavyherd @VampiresAndRobots @futurebird Oh, it absolutely is, I haven't an iota of doubt. Which is why I opt out of anything with such a captcha if at all possible (you'll sometimes find them on services you can't avoid, e.g. public orgs...)
@jwcph @VampiresAndRobots @futurebird
Yeah, the only time I've encountered them, alternatives are generally not given.
@cavyherd @VampiresAndRobots @futurebird I'm pretty sure that's well documented. For reCAPTCHA it can be blindingly obvious at times, given you're often tagging what is clearly data for training self-driving cars. Other services like hCAPTCHA won't even try to hide that that's what they're doing. Remember the Yoko? https://www.vice.com/en/article/captcha-is-asking-users-to-identify-objects-that-dont-exist/
@cavyherd @VampiresAndRobots @futurebird
I figure they're training not just AI, but in particular, AI for self-driving cars or something along those lines. Thus all the "street things" they generally ask about. That, or it's for labelling/parsing Google's Google Earth data.
The focus of AI hype is well past self-driving cars by now, but I think it wasn't yet when those image captchas really began to replace the text ones.
@aearo @VampiresAndRobots @futurebird
Several comments else-thread back up this notion.
@cavyherd @VampiresAndRobots @futurebird it's not a secret. From the beginning it was openly explained that Captcha is being used to train 'AI', but really in a much more innocent way than the kind of generative AI that powers ChatGPT. It gets better with image recognition (in theory) but doesn't steal content and resynthesise it into empty garbage. It's a much more simplistic exercise.