Random post crossed into my field of view today about public sites being used to provide training data for LLM instances like Chat-GPT3.
When one day someone asks Chat-gpt3 - ‘how is chat-gpt3 like a Wasp Spider’, chat-gpt3 will quote my article (because who else would make such ridiculous analogies) without attribution.
But what it won’t do is offer the user a link to the source of the information and it will never result in a user visiting the OnePub blog.
— OnePub, The threat that chat-gpt3 poses to bloggers
“I made this” is one of the oldest frustrations of the Internet, to the point of being immortalized in comic form at least once. Heck it’s part of why I set my site content to a Creative Commons license. Why fight it? But yeah. Any possible external value has been extracted and claimed. Let’s assume it’s too late for my ancient blog.
Your newer site, with words and images you care deeply about? You may want to put some locks on those doors.
The only caveat about the advice from the article — aside from the fact that it’s probably kinda satirical what with getting its answers from chat-GPT3:
robots.txt
is often ignored — but it’s still a nice gesture to the few who acknowledge it- User-agent restrictions can be worked around with minimal effort — but will work against the large number of folks who can’t be bothered with minimal effort
Backlinks
Got a comment? A question? More of a comment than a question?
Talk to me about this page on: mastodon
Added to vault 2024-01-15. Updated on 2024-02-01