~www_lesswrong_com | Bookmarks (715)
-
Dance Weekend Pay II — LessWrong
Published on February 28, 2025 3:10 PM GMT The world would be better with a lot...
-
Existentialists and Trolleys — LessWrong
Published on February 28, 2025 2:01 PM GMTHow might an existentialist approach this notorious thought experiment...
-
On Emergent Misalignment — LessWrong
Published on February 28, 2025 1:10 PM GMTOne hell of a paper dropped this week. It...
-
Do safety-relevant LLM steering vectors optimized on a single example generalize? — LessWrong
Published on February 28, 2025 12:01 PM GMTThis is a linkpost for our recent paper on...
-
Cycles (a short story by Claude 3.7 and me) — LessWrong
Published on February 28, 2025 7:04 AM GMTContent warning: this story is AI generated slop.The kitchen...
-
January-February 2025 Progress in Guaranteed Safe AI — LessWrong
Published on February 28, 2025 3:10 AM GMTOk this one got too big, I’m done grouping...
-
Weirdness Points — LessWrong
Published on February 28, 2025 2:23 AM GMTVegans are often disliked. That's what I read online...
-
[New Jersey] HPMOR 10 Year Anniversary Party 🎉 — LessWrong
Published on February 27, 2025 10:30 PM GMTIt's been 10 years since the final chapter of...
-
OpenAI releases GPT-4.5 — LessWrong
Published on February 27, 2025 9:40 PM GMTThis is not o3; it is what they'd internally...
-
The non-tribal tribes — LessWrong
Published on February 26, 2025 5:22 PM GMTAuthor note: This is basically an Intro to the...
-
Fuzzing LLMs sometimes makes them reveal their secrets — LessWrong
Published on February 26, 2025 4:48 PM GMTScheming AIs may have secrets that are salient to...
-
You can just wear a suit — LessWrong
Published on February 26, 2025 2:57 PM GMTI like stories where characters wear suits.Since I like...
-
Minor interpretability exploration #1: Grokking of modular addition, subtraction, multiplication, for different activation functions — LessWrong
Published on February 26, 2025 11:35 AM GMTEpistemic status: small exploration without previous predictions, results low-stakes...
-
Optimizing Feedback to Learn Faster — LessWrong
Published on February 26, 2025 2:24 PM GMT(This post is to a significant extent just a...
-
[PAPER] Jacobian Sparse Autoencoders: Sparsify Computations, Not Just Activations — LessWrong
Published on February 26, 2025 12:50 PM GMTWe just published a paper aimed at discovering “computational...
-
outlining is a historically recent underutilized gift to family — LessWrong
Published on February 26, 2025 1:58 PM GMToutlining is specialized work which reduces a text to...
-
Osaka — LessWrong
Published on February 26, 2025 1:50 PM GMTThe more I learn about urban planning, the more...
-
Time to Welcome Claude 3.7 — LessWrong
Published on February 26, 2025 1:00 PM GMTAnthropic has reemerged from stealth and offers us Claude...
-
Name for Standard AI Caveat? — LessWrong
Published on February 26, 2025 7:07 AM GMTI have discussions that ignore the future disruptive effects...
-
Superintelligent Agents Pose Catastrophic Risks: Can Scientist AI Offer a Safer Path? — LessWrong
Published on February 24, 2025 6:31 PM GMTA new paper by Yoshua Bengio and the Safe...
-
Understanding Agent Preferences — LessWrong
Published on February 24, 2025 5:46 PM GMTepistemic status: clearing my own confusionI'm going to discuss...
-
What We Can Do to Prevent Extinction by AI — LessWrong
Published on February 24, 2025 5:15 PM GMTDiscuss
-
Dream, Truth, & Good — LessWrong
Published on February 24, 2025 4:59 PM GMTOne way in which I think current AI models...
-
Forecasting Frontier Language Model Agent Capabilities — LessWrong
Published on February 24, 2025 4:51 PM GMTThis work was done as part of the MATS Program...