'People will buy intelligence from us on a meter': ChatGPT's Sam Altman's AI vision worries critics

lechekaflan@lemmy.world · 4 days ago

'People will buy intelligence from us on a meter': ChatGPT's Sam Altman's AI vision worries critics

Entertainmeonly (she/her)@lemmy.blahaj.zone · 4 days ago

I’m still just so curious who is paying to use any llm?

𝕱𝖎𝖗𝖊𝖜𝖎𝖙𝖈𝖍@lemmy.world · 4 days ago

People like this:

comic about brainrot

newton@feddit.online · 4 days ago

Ty, best cartoon I’ve seen today

BillyCrystalMeth@slrpnk.net · 4 days ago

Businesses who don’t want their data used to training. Gotta pay the “trust me bro” fee

TronBronson@lemmy.world · 4 days ago

There’s no way in hell they don’t backdoor data lmaoo

spectrums_coherence@piefed.social · edit-2 3 days ago

Many academics around me have a paid plan of LLM of some sort, most are on $100 plan some are on $200, all of them are getting reimbursed for their plan.

Most of them uses it to optimize code, generate visualization, or formalize pen and paper proof.

I hated it, and don’t use much of it myself. But it seems too useful for these people and it is hard to stop them. As an example, formalizing a pen and paper proof can take an expert weeks, if not month of work, whereas it only takes codex a week.

But I do feel this success is tied to the nature and value of academia, and might not transfer to other fields or industrial projects:

we usually have tiny codebases: it is not uncommon to have a 10-line algorithm with a 70-page paper explaining its correctness
90%, if not more, of the codes are proof of concepts, without the expectation for long term maintainance.
the work is highly specialized, everyone is running out of time, and there is high expectation of the outcome of the work: in our recent work, we do have an expert in formalization, but he doesn’t have enough time, so the grad student formalized the project using codex. The overall architecture is probably much much worse than what would have been done by the expert. One interesting outcome is that codex is able to prove a more general result than the expert intended: not because it found a better proof, but because it is much better at bruteforcing a solution than human.

ZDL@lazysoci.al · 3 days ago

As an example, formalizing a pen and paper proof can take an expert weeks, if not month of work, whereas it only takes codex a week.

And how badly does it hallucinate while doing so?

spectrums_coherence@piefed.social · edit-2 2 days ago

That is the thing about formal proof: if the definition is correct, which usually is relatively short and should be written by human, there is almost no chance of the prove being wrong. The only exception are when the LLM exploits a bug in the proof assistant kernel, and these kernel are usually designed to be exceptionally small, thus making bugs unlikely.

That being said, opus 4.6 found a bug that eventually lead to the proof of false (opus is unable to produce the proof of false, hence unlikely to exploit it): https://github.com/rocq-prover/rocq/issues/21682

However, like I said, the code quality of the llm is usually not on par with an expert, and they have a tendency to produce unnecessary lemmas and complications that will need to be cleaned up by human.

Also, we have a very detailed pen and paper proof, which are designed to be easily translatable to proof assistants. We have also setup all the lemma and theorems to reach the end goal. All of these are done by humans, without these, I don’t believe any LLM can make much progress on this project.

redhorsejacket@lemmy.world · 3 days ago

I hate to hit you with such a tangentially related query to your main point, but what is “formalizing a proof”, and why is there such a discrepancy between the time it might take an expert (weeks if not months) vs an LLM?

(It probably goes without saying, but my college career was spent in the Humanities, where there was not much emphasis on proofs, formal or informal, so I’m curious how the other half lives.)

spectrums_coherence@piefed.social · edit-2 3 days ago

There are proof assistants https://en.wikipedia.org/wiki/Proof_assistant that would encode a mathematical proof as code, and verify its correctness for you.

Writing completely formal proof is very painstaking, because it means we will need to flash out a lot and a lot of details (which are mostly trivial for experts) for computers to accept it, and we also need to know how to work with proof assistants.

Human proofs often ignore these details to make it readable, yet also make it more prone to mistakes. Whereas formalized prove in proof assistant can very rarely be wrong (unless there is an unlikely bug in the assistant kernel), but mostly unreadable (unless the proof is incredibly elegant).

So in general, translating good human proof to computer proof requires more expert labor than huge conceptual innovation, yet it usually require the steep learning curve of understanding the ins and outs of a proof assistant, which can take years of experience.

LLM used to be pretty bad at this because even filling in trivial details can quickly derail them. Recently a few flagship coding model are finally able to do this, albeit with a large amount of token consumption in thinking.

redhorsejacket@lemmy.world · 3 days ago

Fascinating. My godfather is a mathematician at a local university, I’ll ask him whether this is something he runs into. I’ve heard him grumble about formatting his work in the past, but usually in the context of using Latex, which I gather is something else entirely.

spectrums_coherence@piefed.social · 3 days ago

Yeah LaTeX is a bit different, think about describing a process in MS Word, v.s. writing a program that performs such process on a computer: LaTeX is more like description aimed for human consumption, where as formal proof is more like a program that computers can rigorously execute.

Proof assistant have only attracted the attension of mathematicians very recently, thanks to the organization surrounding MathLib in Lean, and the promption of Terance Tao.

It also rides the AI train quite a bit, as AI have a tendency to confidently be wrong, having a computer to check its proof can be very useful.

Andrew Beveridge@sh.itjust.works · 4 days ago

Me, tons, both as part of both of my day jobs, and in my free time I’m building startups, all powered by claude code. I’m building tons faster and accomplishing more outcomes than ever before in my 20+ year career in tech. AMA

Voroxpete@sh.itjust.works · 4 days ago

What’s your long-term business plan for when Anthropic starts charging per token, at a rate that actually reflects their costs?

Andrew Beveridge@sh.itjust.works · 3 days ago

My businesses don’t depend on Anthropic, so that doesn’t worry me at all - I’m using Claude code as my daily driver when I’m building software just because after trying a bunch of options it has consistently made me more productive than anything else over the last 9 months. Every month or so I try the latest open weight models which I can run locally, when they get good enough for me to have an acceptable productive dev experience fully offline locally I’ll be very happy. Gemma 4 is already a major step forwards there and I’ve used it for some small local hobby projects, but I still get much more done when I’m working on large complex code projects if I use Claude code. If they removed their $200/month subscription and stopped subsidizing compute, I’d likely switch to another tool/chase whoever else is burning VC money for a bit, while accelerating my adoption of local models, accepting lower productivity.

floofloof@lemmy.ca · 4 days ago

AMA

We’re good, thanks.

Peppycito@sh.itjust.works · 4 days ago

Are you the asshole?

Andrew Beveridge@sh.itjust.works · 3 days ago

Nah, I’m just moving with the times in an industry which didn’t give me much choice - was laid off twice and the contract roles I ended up managing to get were companies who were already heavily using and expecting usage of AI. Tbh though I see it like the early days of the internet - and while I’m not a fan of big tech, wish the US was less corrupt and wish capitalism wasn’t so rampant, at this point IMO the cat is very much out of the bag and the industry has moved on - at least where I am right now. I’d imagine if I move back to Britain the usage/attitudes towards AI are probably pretty different

ZDL@lazysoci.al · 2 days ago

I see it like the early days of the internet

Yes. The early days of the Internet when workers famously had to be threatened to use the Internet for their work or they’d be fired. The early days of the Internet where the public had to cajoled into using the Internet by cramming it into everything just so the early Internet companies could inflate user count statistics.

Yes, the parallels are very strong!

Andrew Beveridge@sh.itjust.works · 2 days ago

I guess we have different memories of that era, I remember a lot of people moaning about everything going online, businesses feeling the pressure to set up websites, people being shut out of services if they didn’t get with the times and get an internet connection, etc.

Same shit, different tech. It’s just the brutal march of tech under late stage capitalism, my opinion is that you can’t change it, you can only decide whether you’ll ride the wave and move with the times or bury your head in the sand and hope it blows over. Tbh I did the latter for the first few years but once I realised the job I’d done for decades was basically no longer a job, I realised I had to just suck it up and move with the times 🤷

ZDL@lazysoci.al · 2 days ago

I have memories of people sneaking some form of Internet connection into the workplace against the rules of corporate IT. I have memories of young people arguing with their (older) bosses that the Internet was the future and bosses pushing back and banning it. I have memories of people being eager to get hooked up to the Internet at home; people actively looking to find an ISP that would allow another sign-up (because most of them were saturated at the time).

I also have memories of how every new user on the Internet increased the profitability of the Internet sphere as a whole instead of, you know—like with modern AI—each new user being a further drain.

So yes, we have different memories. Mine just happen to line up with facts.

Peppycito@sh.itjust.works · 3 days ago

I’m not a fan of big tech

Lol! Maybe stop grinding the slop machine then?

Andrew Beveridge@sh.itjust.works · 2 days ago

Not sure what you mean, I don’t work for big tech and never have. I’m not churning out slop, I’m building products which work, still maintaining the quality standards I always have - in fact I feel like I’m about to raise the bar much more easily now, I can build a thorough test suite which covers the whole test pyramid way faster than ever before which means most of my code is more thoroughly tested now

TronBronson@lemmy.world · 4 days ago

What’s your favorite project you’re building and what is the biggest market you are targeting?

Andrew Beveridge@sh.itjust.works · 3 days ago

Well realistically my favourite product is my most recent passion project, which is my karaoke generator platform https://nomadkaraoke.com/

It makes people happy every day, and I get the superpower of being able to let guests at my weekly karaoke night sing literally anything they want (cos I can generate it on the night so they can sing a good quality karaoke version of it even if it’s a super niche song)

TronBronson@lemmy.world · 5 hours ago

Okay I like this one too!

ZDL@lazysoci.al · 3 days ago

Why do you end your post with AMA (presumably “Ask Me Anything”) and then promptly ignore the questions?

Andrew Beveridge@sh.itjust.works · edit-2 3 days ago

Entertainment (also realistically I’m just super busy, I don’t really use social media and only check this kinda app once a day when I’m winding down)

JimFox@lemmy.zip · 4 days ago

Go and touch some grass…

Andrew Beveridge@sh.itjust.works · 3 days ago

I don’t think that’s a question 😅 I work on my garden at the weekends and cycle a bunch thanks - I don’t have (or want) kids so I have a bit more free energy to build stuff without sacrificing my work/life balance too much compared to most folks my age