Thing is, once they’re out there, they’re free utilities, and they can’t be taken back.
Also, they don’t really need to aggressively scrape the internet. There are many good public datasets now, and the Chinese are already making excellent use of synthetic dataset generation on (relative) shoestring budgets. Also, several nations and other large organizations are already funding open model efforts, but they just haven’t had the opportunity to catch up yet.
Corporate, for now.
Thing is, once they’re out there, they’re free utilities, and they can’t be taken back.
Also, they don’t really need to aggressively scrape the internet. There are many good public datasets now, and the Chinese are already making excellent use of synthetic dataset generation on (relative) shoestring budgets. Also, several nations and other large organizations are already funding open model efforts, but they just haven’t had the opportunity to catch up yet.