Reading List

★ Training Large Language Models on the Public Web from Daring Fireball RSS feed.

★ Training Large Language Models on the Public Web

The whole point of the public web is that it’s there to learn from — even if the learner isn’t human. Is there a single LLM that was *not* trained on the public web? To my knowledge there is not, and a model that is ignorant of all information available on the public web would be, well, pretty ignorant of the world.