~www_lesswrong_com | Bookmarks (713)

Gradual Disempowerment: Concrete Research Projects — LessWrong

lesswrong.com

Published on May 29, 2025 6:55 PM GMTThis post benefitted greatly from comments, suggestions, and ongoing...
Published on May 29, 2025 6:55 PM GMTThis post benefitted greatly from comments, suggestions, and ongoing discussions with David Duvenaud, David Krueger, and Jan Kulveit. All errors are my own.A few months ago, I and my coauthors published Gradual Disempowerment (GD hereafter). It was mostly about how things might go wrong, but naturally a lot of the resulting interest has been about solutions. We have some...
1
Do you even have a system prompt? (PSA / repo) — LessWrong

lesswrong.com

Published on May 29, 2025 6:49 PM GMTEveryone around me has a notable lack of system...
Published on May 29, 2025 6:49 PM GMTEveryone around me has a notable lack of system prompt. And when they do have a system prompt, it’s either the eigenprompt or some half-assed 3-paragraph attempt at telling the AI to “include less bullshit”.I see no systematic attempts at making a good one anywhere.[1](For clarity, a system prompt is a bit of text—that's a subcategory of "preset"...
1
Fun With Veo 3 and Media Generation — LessWrong

lesswrong.com

Published on May 28, 2025 6:30 PM GMTSince Claude 4 Opus things have been refreshingly quiet....
Published on May 28, 2025 6:30 PM GMTSince Claude 4 Opus things have been refreshingly quiet. Video break! The First Good AI Videos First up we have Prompt Theory, made with Veo 3, which I am considering the first legitimately good AI-generated video I’ve seen. It perfectly combining form and function. Makes you think. Here’s a variant, to up the stakes a bit, then...
1
What LLMs lack — LessWrong

lesswrong.com

Published on May 28, 2025 4:19 PM GMTIntroductionI have long been very interested in the limitations...
Published on May 28, 2025 4:19 PM GMTIntroductionI have long been very interested in the limitations of LLMs because understanding them seems to be the most important step to getting timelines right. Right now there seems to be great uncertainty about timelines, with very short timelines becoming plausible, but also staying hotly contested. This led me to revisit LLM limitations and I think I noticed a...
1
Playlist Inspired by Manifest 2024 — LessWrong

lesswrong.com

Published on May 28, 2025 4:03 PM GMTOkay, I think it's time to stop polishing this...
Published on May 28, 2025 4:03 PM GMTOkay, I think it's time to stop polishing this playlist & post it. This is music about the moods of people I met at Manifest. I'm pretty delighted with it! Discuss
1
AISN #56: Google Releases Veo 3 — LessWrong

lesswrong.com

Published on May 28, 2025 4:00 PM GMTWelcome to the AI Safety Newsletter by the Center...
Published on May 28, 2025 4:00 PM GMTWelcome to the AI Safety Newsletter by the Center for AI Safety. We discuss developments in AI and AI safety. No technical background required.In this edition: Google released a frontier video generation model at its annual developer conference; Anthropic’s Claude Opus 4 demonstrates the danger of relying on voluntary governance.Listen to the AI Safety Newsletter for free...
1
How Self-Aware Are LLMs? — LessWrong

lesswrong.com

Published on May 28, 2025 12:57 PM GMTAn interim research reportSummaryWe introduce a novel methodology for...
Published on May 28, 2025 12:57 PM GMTAn interim research reportSummaryWe introduce a novel methodology for quantitatively evaluating metacognitive abilities in LLMsWe present evidence that some frontier LLMs introduced since early 2024 - but not older or smaller ones - show some metacognitive abilitiesThe metacognitive abilities that current LLMs do show are relatively weak, and manifest in a context-dependent manner; the models often prefer...
1
Can We Hack Hedonic Treadmills? — LessWrong

lesswrong.com

Published on May 28, 2025 11:42 AM GMTDuring a visit to a Hong Kong children’s welfare...
Published on May 28, 2025 11:42 AM GMTDuring a visit to a Hong Kong children’s welfare home, I met a 12-year-old girl I'll call Kylie. She had suffered a severe illness that left her blind, deaf, non-verbal, and nearly immobile, yet no identified damage was done to her brain.The staff described her, without hesitation, as “always cheerful,” and indeed she smiled the entire time...
1
Provability Inclusion as a Short Analogy — LessWrong

lesswrong.com

Published on May 28, 2025 10:50 AM GMTThe following analogy is intended to illustrate a novel...
Published on May 28, 2025 10:50 AM GMTThe following analogy is intended to illustrate a novel proof-theoretic concept. It is metaphorical in nature and should not be interpreted literally.The Vanishing SkyConsider the following analogy:In a universe undergoing accelerating expansion, distant galaxies gradually slip beyond our horizon, until the point at which their light can no longer reach us. These galaxies do not disappear in...
1
AI’s goals may not match ours — LessWrong

lesswrong.com

Published on May 28, 2025 9:30 AM GMTContext: This is a linkpost for https://aisafety.info/questions/NM3I/6:-AI%E2%80%99s-goals-may-not-match-ours This is an...
Published on May 28, 2025 9:30 AM GMTContext: This is a linkpost for https://aisafety.info/questions/NM3I/6:-AI%E2%80%99s-goals-may-not-match-ours This is an article in the new intro to AI safety series from AISafety.info. We'd appreciate any feedback. The most up-to-date version of this article is on our website. Making AI goals match our intentions is called the alignment problem.There’s some ambiguity in the term “alignment”. For example, when people talk about...
1
AI may pursue goals — LessWrong

lesswrong.com

Published on May 28, 2025 9:30 AM GMTContext: This is a linkpost for https://aisafety.info/questions/NM3J/5:-AI-may-pursue-goals This is an...
Published on May 28, 2025 9:30 AM GMTContext: This is a linkpost for https://aisafety.info/questions/NM3J/5:-AI-may-pursue-goals This is an article in the new intro to AI safety series from AISafety.info. We'd appreciate any feedback. The most up-to-date version of this article is on our website.Suppose that, as argued previously, in the next few decades we’ll have superintelligent systems. What role will they play?One way to imagine these...
1
The Best Way to Align an LLM: Inner Alignment is Now a Solved Problem? — LessWrong

lesswrong.com

Published on May 28, 2025 6:21 AM GMTThis is a link-post for a new paper I...
Published on May 28, 2025 6:21 AM GMTThis is a link-post for a new paper I read: Safety Pretraining: Toward the Next Generation of Safe AI by Pratyush Maini, Sachin Goyal, et al.For a couple of years I (and others) have been proposing an approach to alignment: what the authors of this recent paper name "safety pretraining". In a nutshell: that it's best to...
1
Poetic Methods II: Rhyme as a Focusing Device — LessWrong

lesswrong.com

Published on May 26, 2025 6:29 PM GMTAs promised in the previous instalment on meter, let’s...
Published on May 26, 2025 6:29 PM GMTAs promised in the previous instalment on meter, let’s explore rhyming from a methodological perspective.The first difference between meter and rhyme lies in their opposite obviousness: the first one is subtle, requiring a learned and attuned ear; the second is so sonorous and clear that children hear it as self-evident.Take one of my favorite Robert Frost poems: ...
1
Is Building Good Note-Taking Software an AGI-Complete Problem? — LessWrong

lesswrong.com

Published on May 26, 2025 6:26 PM GMTIn my experience, the most annoyingly unpleasant part of...
Published on May 26, 2025 6:26 PM GMTIn my experience, the most annoyingly unpleasant part of research[1] is reorganizing my notes during and (especially) after a productive research sprint. The "distillation" stage, in Neel Nanda's categorization. I end up with a large pile of variously important discoveries, promising threads, and connections, and the task is to then "refactor" that pile into something compact and well-organized,...
2
Does the Universal Geometry of Embeddings paper have big implications for interpretability? — LessWrong

lesswrong.com

Published on May 26, 2025 6:20 PM GMTRishi Jha, Collin Zhang, Vitaly Shmatikov and John X....
Published on May 26, 2025 6:20 PM GMTRishi Jha, Collin Zhang, Vitaly Shmatikov and John X. Morris published a new paper last week called Harnessing the Universal Geometry of Embeddings.Abstract of the paper (bold was added by me):We introduce the first method for translating text embeddings from one vector space to another without any paired data, encoders, or predefined sets of matches. Our unsupervised...
1
Socratic Persuasion: Giving Opinionated Yet Truth-Seeking Advice — LessWrong

lesswrong.com

Published on May 26, 2025 5:38 PM GMTThe full post is long, but you can 80/20...
Published on May 26, 2025 5:38 PM GMTThe full post is long, but you can 80/20 the value with the 700 word summary! Over half the post is eight optional case studies. Thanks to Jemima Jones, Claude 4 Opus and Gemini 2.5 Pro for help copy-editing and draftingTL;DR: I recommend giving advice by asking questions to walk someone through key steps in my argument — often I’m...
1
An observation on self-play — LessWrong

lesswrong.com

Published on May 26, 2025 5:22 PM GMTAt NeurIPS 2024, Ilya Sutskever delivered a short keynote...
Published on May 26, 2025 5:22 PM GMTAt NeurIPS 2024, Ilya Sutskever delivered a short keynote address in honor of his Seq2seq paper, published a decade earlier. It was his first—and so far only—public appearance to discuss his research since parting ways with OpenAI.The talk itself shed little light on his current work. Instead, he reaffirmed the prevailing view that the “age of pre-training”...
1
[Beneath Psychology] Case study on chronic pain: First insights, and the remaining challenge — LessWrong

lesswrong.com

Published on May 26, 2025 5:29 PM GMTIn the last post I took the seemingly-naive stance...
Published on May 26, 2025 5:29 PM GMTIn the last post I took the seemingly-naive stance that "pain is just information" and "can't actually be a problem", implying that painful situations can be dealt with without suffering by just looking through the pain towards the reality at which it points. I did give an example of how this allowed me to quickly resolve an...
1
Asking for AI Safety Career Advice — LessWrong

lesswrong.com

Published on May 26, 2025 3:26 PM GMTHi! I'm a rising junior in undergrad, working on...
Published on May 26, 2025 3:26 PM GMTHi! I'm a rising junior in undergrad, working on a cognitive science major with neuroscience and AI focuses, and I was hoping to get some advice/pointers on AI safety work. I'm interested in both the governance and technical sides, but my academic work slightly predisposes me to the latter. Any advice, help, ideas, links to other posts,...
1
New website analyzing AI companies' model evals — LessWrong

lesswrong.com

Published on May 26, 2025 4:00 PM GMTI'm making a website on AI companies' model evals...
Published on May 26, 2025 4:00 PM GMTI'm making a website on AI companies' model evals for dangerous capabilities: AI Safety Claims Analysis. This is approximately the only analysis of companies' model evals, as far as I know. This site is in beta; I expect to add lots more content and improve the design in June. I'll add content on evals, but I also tentatively...
1
New scorecard evaluating AI companies on safety — LessWrong

lesswrong.com

Published on May 26, 2025 4:00 PM GMTThe new scorecard is on my website, AI Lab Watch....
Published on May 26, 2025 4:00 PM GMTThe new scorecard is on my website, AI Lab Watch. This replaces my old scorecard. I redid the content from scratch; it's now up-to-date and higher-quality. I'm also happy with the scorecard's structure: you can click on rows, columns, and cells and zoom in to various things. Check it out! Thanks to Lightcone for designing the site.While it is...
1
Nerve Blisters: A Stoic Response — LessWrong

lesswrong.com

Published on May 26, 2025 3:07 PM GMTThe chickenpox virus waited for decades, attacking the moment...
Published on May 26, 2025 3:07 PM GMTThe chickenpox virus waited for decades, attacking the moment my immune system wobbled.[1] It advanced down my nerves, spreading blisters along its path. Known as shingles, this kind of viral attack is generally considered a very bad time.The blisters make nerves go haywire. They start sending chaotic signals back to the brain, jagged and dissonant.[2] The brain struggles to...
2
Consider buying voting shares — LessWrong

lesswrong.com

Published on May 25, 2025 6:01 PM GMTOne of the best and easiest ways to influence...
Published on May 25, 2025 6:01 PM GMTOne of the best and easiest ways to influence a corporation is to own it. Google offers both $GOOG, class C non-voting, and $GOOGL, class A voting. If you especially don’t care about other issues, a small portion of the voting owner base caring about advanced AI issues can strongly affect how GDM operates.Discuss
1
Can you donate to AI advocacy? — LessWrong

lesswrong.com

Published on May 25, 2025 5:54 PM GMTI posted a quick take that advocacy may be...
Published on May 25, 2025 5:54 PM GMTI posted a quick take that advocacy may be more effective than direct donation to alignment research. I am not an AI researcher and I'm not an influencer, so I'm not well positioned to do either. I see on the "How can I help" FAQ that there are options to donate, but they look like donating to...
1

~www_lesswrong_com | Bookmarks (713)

Domains