

Okay I get it but that’s a different argument. Starting fresh only gets you so far. Once am LLM exists and is exposed to the public users can submit any data they like and the LLM has no idea the source.
You could argue then that these models shouldn’t be able to use user submitted data but that would be a devastating restriction to the technology and that starts to become a question of whatever we want this tech to exist at all.
When you buy something from a streaming service you’re only buying the right to stream it, nothing more.
You can’t compare it to owning physical media because there are ongoing costs involved for Amazon to host it and ever changing contracts with media companies outlining what they are allowed to host.