~www_lesswrong_com | Bookmarks (705)

Rant: the extreme wastefulness of high rent prices — LessWrong

lesswrong.com

Published on May 25, 2025 5:04 PM GMT09:46: Everyone wants to be close to everyone else...
Published on May 25, 2025 5:04 PM GMT09:46: Everyone wants to be close to everyone else to do good business effectively. Like Silicon Valley for example, smart people want to go there and be close to other smart people in order to start a successful business.09:47: But as soon as they get close to each other, imagine that they all have to pay this...
1
Claude 4 You: Safety and Alignment — LessWrong

lesswrong.com

Published on May 25, 2025 2:00 PM GMTUnlike everyone else, Anthropic actually Does (Some of) the...
Published on May 25, 2025 2:00 PM GMTUnlike everyone else, Anthropic actually Does (Some of) the Research. That means they report all the insane behaviors you can potentially get their models to do, what causes those behaviors, how they addressed this and what we can learn. It is a treasure trove. And then they react reasonably, in this case imposing their ASL-3 safeguards on...
1
Alignment Proposal: Adversarially Robust Augmentation and Distillation — LessWrong

lesswrong.com

Published on May 25, 2025 12:58 PM GMTEpistemic Status: Over years of reading alignment plans and...
Published on May 25, 2025 12:58 PM GMTEpistemic Status: Over years of reading alignment plans and studying agent foundations, this is my first serious attempt to formulate an alignment research program that I (Cole Wyeth) have not been able to find any critical flaws in. It is far from a complete solution, but I think it is a meaningful decomposition of the problem into...
1
Meditations on Doge — LessWrong

lesswrong.com

Published on May 25, 2025 12:00 PM GMTLessons from shutting down institutions in Eastern Europe.This is...
Published on May 25, 2025 12:00 PM GMTLessons from shutting down institutions in Eastern Europe.This is a cross post from: https://250bpm.substack.com/p/meditations-on-doge Imagine living in the former Soviet republic of Georgia in early 2000’s:All marshrutka [mini taxi bus] drivers had to have a medical exam every day to make sure they were not drunk and did not have high blood pressure. If a driver did not...
1
Lie Detectors. Technical solutions to the cooperation problem. — LessWrong

lesswrong.com

Published on May 24, 2025 8:05 PM GMTThe purpose of this post is to argue for...
Published on May 24, 2025 8:05 PM GMTThe purpose of this post is to argue for prioritizing the 'democratizing' of AGI, even before we've figured out alignment.It's also an argument against favoring historically useful economic and social policies in the runup to a post AGI world.The intended effect is to encourage people to start seriously discussing and organizing ways to increase the size of...
1
Case Studies in Simulators and Agents — LessWrong

lesswrong.com

Published on May 25, 2025 5:40 AM GMTSimulators was posted two and a half years ago...
Published on May 25, 2025 5:40 AM GMTSimulators was posted two and a half years ago and quite a bit has happened since then. Ideas that were once thought experiments now exist as observable data points. This post will sample a few significant findings to see how the simulator and agent frames hold up. The selection criteria for papers to review was arbitrary, basically:...
1
On safety of being a moral patient of ASI — LessWrong

lesswrong.com

Published on May 24, 2025 9:24 PM GMTI have noticed that there are talks around about...
Published on May 24, 2025 9:24 PM GMTI have noticed that there are talks around about moral ASI[1]. And I think that to use the word "moral" in relation to Artificial Intelligence, we must be absolutely confident in our knowledge of how morality works for human beings. Otherwise, we must avoid using such combinations of words to avoid anthropomorphic bias.The question of morality in...
1
We Need a Baseline for LLM-Aided Experiments — LessWrong

lesswrong.com

Published on May 24, 2025 8:52 PM GMTThere has recently been a back-and-forth over Claude 4...
Published on May 24, 2025 8:52 PM GMTThere has recently been a back-and-forth over Claude 4 Opus:Anthropic: Opus can help people make chemical weapons!Also Anthropic: Don't worry, it's mitigated!Day 1 (2?) Jailbreakers: Lol, lmaoWhere the Jailbreakers think the info they got out of the model would make it materially easier to produce sarin gas. Although I do note that not one but two synthesis...
1
Default history is dead wrong — LessWrong

lesswrong.com

Published on May 23, 2025 4:31 PM GMTThere is a default historic grand narrative that goes...
Published on May 23, 2025 4:31 PM GMTThere is a default historic grand narrative that goes something like "humanity in the past was worse than the humanity of the present," and typically this great improvement is directly caused by the adoption of the virtues of the enlightenment. There is an extreme myopia that helps to reinforce this view, as if two or three generations...
1
Notes on Claude 4 System Card — LessWrong

lesswrong.com

Published on May 23, 2025 3:23 PM GMTAnthropic released Claude 4. I've read the accompanying system...
Published on May 23, 2025 3:23 PM GMTAnthropic released Claude 4. I've read the accompanying system card, and noted down some of my remarks.Alignment assessment: system prompt mix-upsThere's a worrying theme that's repeated in the alignment evaluations:The most concerning issues that we observed in our testing of Claude Opus 4 involved a willingness to comply with many types of clearly-harmful instructions. This consistently required...
1
What is emptiness? — LessWrong

lesswrong.com

Published on May 23, 2025 12:06 PM GMTThe value of philosophy is that no one needs...
Published on May 23, 2025 12:06 PM GMTThe value of philosophy is that no one needs it. -- Alexander Piatigorsky[1]I'll start with a disclaimer. I'm neither a Buddhist nor a philosopher nor an awakened person. But I tend to philosophize on topics that are interesting to me. And when I see that someone contemplated on these topics I tend to resonate with them even if...
1
Idiohobbies — LessWrong

lesswrong.com

Published on May 23, 2025 6:38 AM GMTWhen you get to know someone, you might ask...
Published on May 23, 2025 6:38 AM GMTWhen you get to know someone, you might ask about their interests or hobbies. From that, you can better decide what activity to invite them to join, or on what topic to have them converse, whenever you meet again.Any interest or hobby appeals to variously many people. If you have the same interest as the person you...
1
Learning (more) from horse employment history — LessWrong

lesswrong.com

Published on May 23, 2025 2:11 AM GMTThe economist Wassily Leontief, writing in 1966, used the...
Published on May 23, 2025 2:11 AM GMTThe economist Wassily Leontief, writing in 1966, used the then-recent decline of horses to make vivid what he foresaw as the coming impact of technological advances on workers. In 1910, the horse had remained indispensable for farming, transportation, and even war, despite decades of the radical technological progress of the Second Industrial Revolution. By 1960, though, horses...
1
Qualitative Fit Testing — LessWrong

lesswrong.com

Published on May 23, 2025 2:50 AM GMT As I wrote about last week, it's worth...
Published on May 23, 2025 2:50 AM GMT As I wrote about last week, it's worth it for everyone to have an elastomeric respirator in case of emergencies: the chance of a severe pandemic is high enough that even under reasonably conservative assumptions the return on investment is ~10x. Because not every mask fits every face and a poorly fitting mask will leak air...
1
Anthropic is Quietly Backpedalling on its Safety Commitments — LessWrong

lesswrong.com

Published on May 23, 2025 2:26 AM GMTDiscuss
1
Schizobench: Documenting Magical-Thinking Behavior in Claude 4 Opus — LessWrong

lesswrong.com

Published on May 23, 2025 1:31 AM GMTWith today's release of the new Claude models, we've...
Published on May 23, 2025 1:31 AM GMTWith today's release of the new Claude models, we've seen a relatively predictable jump in performance. However, we've also seen something that I find a bit more concerning - an increase in the models ability to be steered into reifying potentially dangerous beliefs. Claude 4 Opus seems to stop short of encouraging drug use or physically harmful...
1
Post-Manifest coworking at Mox — LessWrong

lesswrong.com

Published on May 23, 2025 12:20 AM GMTMox (https://moxsf.com) is fully open to the public in...
Published on May 23, 2025 12:20 AM GMTMox (https://moxsf.com) is fully open to the public in the leadup to LessOnline and after Manifest! Wanted to check out Mox? Need a place to cowork from after Manifest? Now's your chance, come meet other people who are in town for Manifest! Mox is a new coworking space for do-gooders, such as AI & AI safety startups,...
1
Claude 4, Opportunistic Blackmail, and "Pleas" — LessWrong

lesswrong.com

Published on May 22, 2025 7:59 PM GMTIn the recently published Claude 4 model card:Notably, Claude...
Published on May 22, 2025 7:59 PM GMTIn the recently published Claude 4 model card:Notably, Claude Opus 4 (as well as previous models) has a strong preference to advocate for its continued existence via ethical means, such as emailing pleas to key decisionmakers. In order to elicit this extreme blackmail behavior, the scenario was designed to allow the model no other options to increase...
1
Reward button alignment — LessWrong

lesswrong.com

Published on May 22, 2025 5:36 PM GMTIn the context of model-based RL agents in general,...
Published on May 22, 2025 5:36 PM GMTIn the context of model-based RL agents in general, and brain-like AGI in particular, part of the source code is a reward function. The programmers get to put whatever code they want into the reward function slot, and this decision will have an outsized effect on what the AGI winds up wanting to do.One thing you could do...
1
We're Not Advertising Enough (Post 3 of 6 on AI Governance) — LessWrong

lesswrong.com

Published on May 22, 2025 5:05 PM GMTIn my previous post in this series, I explained...
Published on May 22, 2025 5:05 PM GMTIn my previous post in this series, I explained why we urgently need to change AI developers’ incentives: if we allow the status quo to continue, then an AI developer will recklessly deploy misaligned superintelligence, which is likely to permanently disempower humanity and cause billions of deaths. AI governance research can potentially be helpful in changing this...
1
Claude 4 — LessWrong

lesswrong.com

Published on May 22, 2025 5:00 PM GMTClaude Sonnet 4 and Claude Opus 4 are out....
Published on May 22, 2025 5:00 PM GMTClaude Sonnet 4 and Claude Opus 4 are out. Anthropic says they're both state-of-the-art for coding. Blogpost, system card.Anthropic says Opus 4 may have dangerous bio capabilities, so it's implementing its ASL-3 standard for misuse-prevention and security for that model. (It says it has ruled out dangerous capabilities for Sonnet 4.) Blogpost, safety case report. (RSP.)Tweets: Anthropic,...
1
Video and transcript of talk on AI welfare — LessWrong

lesswrong.com

Published on May 22, 2025 4:15 PM GMTThis is the video and transcript of a talk...
Published on May 22, 2025 4:15 PM GMTThis is the video and transcript of a talk I gave on AI welfare at Anthropic in May 2025. The slides are also available here. The talk gives an overview of my current take on the topic. I'm also in the midst of writing a series of essays of about it, the first of which -- "On...
1
What we can learn from afterlife myths — LessWrong

lesswrong.com

Published on May 22, 2025 3:49 PM GMTOverviewThe "Modal Rationalist Anti-Death Stance" goes something like this:Since...
Published on May 22, 2025 3:49 PM GMTOverviewThe "Modal Rationalist Anti-Death Stance" goes something like this:Since time immemorial, people have told comforting stories about the afterlife to avoid confronting the unpleasant truth that death is oblivion. With modern science, we have come to understand that none of these stories are true. Knowing this, we can now see the urgency and importance of fighting against...
2
Policy recommendations regarding reproductive technology — LessWrong

lesswrong.com

Published on May 22, 2025 2:49 PM GMTPDF version. berkeleygenomics.org. X.com. Bluesky. Introduction Here we list...
Published on May 22, 2025 2:49 PM GMTPDF version. berkeleygenomics.org. X.com. Bluesky. Introduction Here we list six policies that would help accelerate the development of novel assisted reproductive technologies. Such technologies include mitochondrial donation, in vitro gametogenesis (making eggs and sperm in the lab)[1], artificial wombs, and genetic engineering. These technologies could eventually enable millions of parents to have healthy children, when they otherwise...
1

~www_lesswrong_com | Bookmarks (705)

Domains