Bookmarks (705)

screenshot

Second Order Retreat - June 13th to 16th — LessWrong

lesswrong.com

screenshot

1

screenshot

Is Escalation Inevitable? — LessWrong

lesswrong.com

screenshot

1

screenshot

Policy Entropy, Learning, and Alignment (Or Maybe Your LLM Needs Therapy) — LessWrong

lesswrong.com

screenshot

1

screenshot

An Opinionated Guide to P-Values — LessWrong

lesswrong.com

screenshot

1

screenshot

Legal Personhood for Models: Novelli et. al & Mocanu — LessWrong

lesswrong.com

screenshot

1

screenshot

The Unseen Hand: AI's Problem Preemption and the True Future of Labor — LessWrong

lesswrong.com

screenshot

1

screenshot

The 80/20 playbook for mitigating AI scheming in 2025 — LessWrong

lesswrong.com

screenshot

1

screenshot

The best approaches for mitigating "the intelligence curse" (or gradual disempowerment); my quick guesses at the best object-level interventions — LessWrong

lesswrong.com

screenshot

1

screenshot

How Epistemic Collapse Looks from Inside — LessWrong

lesswrong.com

screenshot

1

screenshot

When will AI automate all mental work, and how fast? — LessWrong

lesswrong.com

screenshot

1

screenshot

50 Ideas for Life I Repeatedly Share — LessWrong

lesswrong.com

screenshot

screenshot

2

screenshot

Virtues related to honesty — LessWrong

lesswrong.com

screenshot

1

screenshot

AI 2027 - Rogue Replication Timeline — LessWrong

lesswrong.com

screenshot

1

screenshot

Letting Kids Be Kids — LessWrong

lesswrong.com

screenshot

screenshot

2

screenshot

The Geometry of LLM Logits (an analytical outer bound) — LessWrong

lesswrong.com

screenshot

1

screenshot

Experimental CFAR Mini-Workshop @ Arbor Summer Camp — LessWrong

lesswrong.com

screenshot

1

screenshot

CFAR is running an experimental mini-workshop (June 2-6, Berkeley CA)! — LessWrong

lesswrong.com

screenshot

1

screenshot

Orphaned Policies (Post 5 of 6 on AI Governance) — LessWrong

lesswrong.com

screenshot

1

screenshot

Gradual Disempowerment: Concrete Research Projects — LessWrong

lesswrong.com

screenshot

1

screenshot

Do you even have a system prompt? (PSA / repo) — LessWrong

lesswrong.com

screenshot

1

screenshot

Fun With Veo 3 and Media Generation — LessWrong

lesswrong.com

screenshot

1

screenshot

What LLMs lack — LessWrong

lesswrong.com

screenshot

1

screenshot

Playlist Inspired by Manifest 2024 — LessWrong

lesswrong.com

screenshot

1

screenshot

AISN #56: Google Releases Veo 3 — LessWrong

lesswrong.com

screenshot

1

screenshot

How Self-Aware Are LLMs? — LessWrong

lesswrong.com

screenshot

1

screenshot

Can We Hack Hedonic Treadmills? — LessWrong

lesswrong.com

screenshot

1

screenshot

Provability Inclusion as a Short Analogy — LessWrong

lesswrong.com

screenshot

1

screenshot

AI’s goals may not match ours — LessWrong

lesswrong.com

screenshot

1

screenshot

AI may pursue goals — LessWrong

lesswrong.com

screenshot

1

screenshot

The Best Way to Align an LLM: Inner Alignment is Now a Solved Problem? — LessWrong

lesswrong.com

screenshot

1

screenshot

Poetic Methods II: Rhyme as a Focusing Device — LessWrong

lesswrong.com

screenshot

1

screenshot

Is Building Good Note-Taking Software an AGI-Complete Problem? — LessWrong

lesswrong.com

screenshot

screenshot

2

screenshot

Does the Universal Geometry of Embeddings paper have big implications for interpretability? — LessWrong

lesswrong.com

screenshot

1

screenshot

Socratic Persuasion: Giving Opinionated Yet Truth-Seeking Advice — LessWrong

lesswrong.com

screenshot

1

screenshot

An observation on self-play — LessWrong

lesswrong.com

screenshot

1

screenshot

[Beneath Psychology] Case study on chronic pain: First insights, and the remaining challenge — LessWrong

lesswrong.com

screenshot

1

screenshot

Asking for AI Safety Career Advice — LessWrong

lesswrong.com

screenshot

1

screenshot

New website analyzing AI companies' model evals — LessWrong

lesswrong.com

screenshot

1

screenshot

New scorecard evaluating AI companies on safety — LessWrong

lesswrong.com

screenshot

1

screenshot

Nerve Blisters: A Stoic Response — LessWrong

lesswrong.com

screenshot

screenshot

2

screenshot

Consider buying voting shares — LessWrong

lesswrong.com

screenshot

1

screenshot

Can you donate to AI advocacy? — LessWrong

lesswrong.com

screenshot

1

screenshot

Rant: the extreme wastefulness of high rent prices — LessWrong

lesswrong.com

screenshot

1

screenshot

Claude 4 You: Safety and Alignment — LessWrong

lesswrong.com

screenshot

1

screenshot

Alignment Proposal: Adversarially Robust Augmentation and Distillation — LessWrong

lesswrong.com

screenshot

1

screenshot

Meditations on Doge — LessWrong

lesswrong.com

screenshot

1

screenshot

Lie Detectors. Technical solutions to the cooperation problem. — LessWrong

lesswrong.com

screenshot

1

screenshot

Case Studies in Simulators and Agents — LessWrong

lesswrong.com

screenshot

1

screenshot

On safety of being a moral patient of ASI — LessWrong

lesswrong.com

screenshot

1

screenshot

We Need a Baseline for LLM-Aided Experiments — LessWrong

lesswrong.com

screenshot

1