Pat walking into the last board meeting
BigMuffin69
I remember when several months (a year ago?) when the news got out that gpt-3.5-turbo-papillion-grumpalumpgus could play chess around ~1600 elo. I was skeptical the apparent skill wasn't just a hacked-on patch to stop folks from clowning on their models on xitter. Like if an LLM had just read the instructions of chess and started playing like a competent player, that would be genuinely impressive. But if what happened is they generated 10^12 synthetic games of chess played by stonk fish and used that to train the model- that ain't an emergent ability, that's just brute forcing chess. The fact that larger, open-source models that perform better on other benchmarks, still flail at chess is just a glaring red flag that something funky was going on w/ gpt-3.5-turbo-instruct to drive home the "eMeRgEnCe" narrative. I'd bet decent odds if you played with modified rules, (knights move a one space longer L shape, you cannot move a pawn 2 moves after it last moved, etc), gpt-3.5 would fuckin suck.
Edit: the author asks "why skill go down tho" on later models. Like isn't it obvious? At that moment of time, chess skills weren't a priority so the trillions of synthetic games weren't included in the training? Like this isn't that big of a mystery...? It's not like other NN haven't been trained to play chess...
If they do press conferences this time around, ever question should just be "does Elon approve of decision ____ ?" Will drive Trump fkn insane.
I voted for Liz in 2020 :( instead they gave me diamond joe
The American electorate has just covered itself with gasoline because eggs cost 2 dollars more. Come January they strike the match. gg. HATE. LET ME TELL YOU HOW MUCH I'VE COME TO HATE YOU SINCE NOVEMBER 5TH. My only consolation is that I'll hopefully get to watch some of the Magas/non voters/vote-your-conscience peeps suffer before the end. But Ol musky and peter thiel will be in their gilded bunkers while the fires consume us all.
I know it's Halloween, but this popped up in my feed and was too spooky even for me 😱
As a side note, what are peoples feelings about Wolfram? Smart dude for sho, but some of the shit he says just comes across as straight up pseudoscientific gobbledygook. But can he out guru Big Yud in a 1v1 on Final Destination (fox only, no items) ? 🤔
Actual message I got while renewing my insurance plan last night. Thank you for adding a shitty chat bot which will give me false information about my life and death decisions, bravo.
Babe wake up, new missing email scandal just dropped
Bless. You know I'm here for the hot goss.
After he started rambling about his Mathematical Universe Hypothesis, it was obvious his brain was cooked.
As humanity gets closer to Artificial General Intelligence (AGI)
Arrow of time and all that, innit? And God help me, I actually read part of the post as well as the discussion comments where the prompt fondlers were lamenting that all it takes is one rogue ai code to end the world because it will "optimize against you!" I assume Evil GPT is constructing anti matter bombs using ingredients it finds under the kitchen sink.
Yes, the classical algo achieves perfect accuracy and is way faster. There is also a table that shows the cost of running o1 is enormous. Like comically bad. Boil a small ocean bad. We'll just 10x the size and it will achieve 15 steps inshallah.
Imo, this is like the same behavior we see on math problems. More steps it takes, the higher the chance it just decoheres completely. I can't see any reason why this type of thing would just "click" for the models if they are also unable to do multiplication.
I mean this just reeks of pure hopium from OAI and co that things will magykly work out. (But the newer model is clearly better^{tm}! I still don't see any indication that one day that chart is just going to be 100s across the board.)
The ARC scores don't matter too much to me at 3k a problem. Like the original goal of the prize had a compute limit. You can't break that rule and then claim victory ( I mean I guess you can, but like not everyone is gonna be as wowed as xitter randos, ensemble methods were already hitting 80% + acc to francois )
And unfortunately, with Frontier math, the lack of transparency w.r.t. which problems were solved and how they were solved makes it frustrating as hell to me, as someone who actually would like to see a super math robot. According to the senior math advisor to the people who created the data set, iirc 40% solved problems were in the easiest category / 40% in the second tier category and 10% were in the "hard" tier, but he said that he looked at the solutions and that they looked like mostly being solved 'heuristically' instead of plopping out any 'new' insights.
Again, none of this is good science, just pure shock and awe. I've heard rumors that OAI is hiring strong competition style mathematicians to supervise the reinforcement learning for these types of problems and if they are letting O3 take the test, then how the hell does that not leak the problem set? Like now the whole test is compromised now right? Since this behemoth uses enough electricity to power a city block, theres no way they would be able to run it locally. Now OAI can literally pay their peeps to solve the rest and surprise surprise O3++ will hit 80%
OTOH, with code forces scores and math scores this high, I can now put on my LW cap and say this model has 2 trillion IQ, so why hasn't it exterminated me and my family yet like big Yud promised? It's almost as if there is no little creature inside trying to take over the world or something.