~www_lesswrong_com | Bookmarks (702)
-
Steelmanning heuristic arguments — LessWrong
Published on April 13, 2025 1:09 AM GMTIntroductionThis is a nuanced “I was wrong” post.Something I...
-
MONA: Three Month Later - Updates and Steganography Without Optimization Pressure — LessWrong
Published on April 12, 2025 11:15 PM GMTWe published the MONA paper about three months ago. Since...
-
The Era of the Dividual—are we falling apart? — LessWrong
Published on April 12, 2025 10:35 PM GMTIn his famous 1644 treatise on freedom of speech,...
-
Commitment Races are a technical problem ASI can easily solve — LessWrong
Published on April 12, 2025 10:22 PM GMTA vivid introduction to Commitment RacesWhy committed agents defeat...
-
Experts have it easy — LessWrong
Published on April 12, 2025 7:32 PM GMTSomething that's painfully understudied is how experts are more...
-
Луна Лавгуд и Комната Тайн, Часть 3 — LessWrong
Published on April 12, 2025 7:20 PM GMTDisclaimer: This is Kongo Landwalker's translation of lsusr's fiction...
-
Is the ethics of interaction with primitive peoples already solved? — LessWrong
Published on April 11, 2025 2:56 PM GMTThe interactions between a misaligned AI and mankind have...
-
Can LLMs learn Steganographic Reasoning via RL? — LessWrong
Published on April 11, 2025 4:33 PM GMTTLDR: We show that Qwen-2.5-3B-Instruct can learn to encode...
-
My day in 2035 — LessWrong
Published on April 11, 2025 4:31 PM GMTI wake up as usual by immediately going on...
-
Youth Lockout — LessWrong
Published on April 11, 2025 3:05 PM GMTCross-posted from Substack.AI job displacement will affect young people...
-
OpenAI Responses API changes models' behavior — LessWrong
Published on April 11, 2025 1:27 PM GMTSummaryOpenAI recently released the Responses API. Most models are...
-
Weird Random Newcomb Problem — LessWrong
Published on April 11, 2025 1:09 PM GMTEpistemic status: I'm pretty sure the problem is somewhat...
-
On Google’s Safety Plan — LessWrong
Published on April 11, 2025 12:51 PM GMTGoogle Lays Out Its Safety Plans I want to...
-
Луна Лавгуд и Комната Тайн, Часть 2 — LessWrong
Published on April 11, 2025 12:42 PM GMTDisclaimer: This is Kongo Landwalker's translation of lsusr's fiction...
-
Paper — LessWrong
Published on April 11, 2025 12:20 PM GMTPaper is good. Somehow, a blank page and a...
-
Why are neuro-symbolic systems not considered when it comes to AI Safety? — LessWrong
Published on April 11, 2025 9:41 AM GMTI am really not sure of why neuro-symbolic systems...
-
Nuanced Models for the Influence of Information — LessWrong
Published on April 10, 2025 6:28 PM GMTDiscuss
-
Playing in the Creek — LessWrong
Published on April 10, 2025 5:39 PM GMTWhen I was a really small kid, one of...
-
The Three Boxes: A Simple Model for Spreading Ideas — LessWrong
Published on April 10, 2025 5:15 PM GMTThis is cross-posted from my blog.We need more people...
-
Reactions to METR task length paper are insane — LessWrong
Published on April 10, 2025 5:13 PM GMTEpistemic status: Briefer and more to the point than...
-
Existing Safety Frameworks Imply Unreasonable Confidence — LessWrong
Published on April 10, 2025 4:31 PM GMTThis is part of the MIRI Single Author Series....
-
Arguments for and against gradual change — LessWrong
Published on April 10, 2025 2:43 PM GMTEssentially all solutions in life are conditional: you apply...
-
Disempowerment spirals as a likely mechanism for existential catastrophe — LessWrong
Published on April 10, 2025 2:37 PM GMTWhen complex systems fail, it is often because they...
-
My day in 2035 — LessWrong
Published on April 10, 2025 2:09 PM GMTPartially inspired by AI 2027, I've put to paper...