~www_lesswrong_com | Bookmarks (692)

Mask and Respirator Intelligibility Comparison — LessWrong

lesswrong.com

Published on December 7, 2024 3:20 AM GMT One of the downsides of wearing a mask...
Published on December 7, 2024 3:20 AM GMT One of the downsides of wearing a mask or respirator is that it makes it hard for people to understand you. That there's stuff getting in the way of free air movement is kind of the point, but ideally it would be possible to let vibration through without net air movement. I recently saw that 3M...
1
Purging Corrupted Capabilities across Language Models — LessWrong

lesswrong.com

Published on December 6, 2024 10:56 PM GMTby Narmeen Oozeer, Dhruv Nathawani, Nirmalendu Prakash, Amirali Abdullah This...
Published on December 6, 2024 10:56 PM GMTby Narmeen Oozeer, Dhruv Nathawani, Nirmalendu Prakash, Amirali Abdullah This work was sponsored and supported by Martian under an AI safety grant to Amirali Abdullah and Dhruv Nathawani, under which Narmeen is funded. Special thanks to Sasha Hydrie, Chaithanya Bandi and Shriyash Upadhyay at Martian for suggesting researching generalized backdoor mitigations as well as extensive logistical support and helpful discussions. TLDRMechanistic Interpretability...
1
Gradient Routing: Masking Gradients to Localize Computation in Neural Networks — LessWrong

lesswrong.com

Published on December 6, 2024 10:19 PM GMTWe present gradient routing, a way of controlling where...
Published on December 6, 2024 10:19 PM GMTWe present gradient routing, a way of controlling where learning happens in neural networks. Gradient routing applies masks to limit the flow of gradients during backpropagation. By supplying different masks for different data points, the user can induce specialized subcomponents within a model. We think gradient routing has the potential to train safer AI systems, for example,...
1
Understanding Shapley Values with Venn Diagrams — LessWrong

lesswrong.com

Published on December 6, 2024 9:56 PM GMTDiscuss
1
Model Integrity — LessWrong

lesswrong.com

Published on December 6, 2024 9:28 PM GMTHi! My collaborators at the Meaning Alignment Institute put...
Published on December 6, 2024 9:28 PM GMTHi! My collaborators at the Meaning Alignment Institute put out some research yesterday that may interest folk here. The core idea is introducing 'model integrity' as a frame for outer alignment. It leverages the intuition that "most people would prefer a compliant assistant, but a cofounder with integrity." It makes the case for training agents that act consistently...
1
Can AI improve the current state of molecular simulation? — LessWrong

lesswrong.com

Published on December 6, 2024 8:22 PM GMTHey LW! I recently filmed a two-hour long scientific...
Published on December 6, 2024 8:22 PM GMTHey LW! I recently filmed a two-hour long scientific podcast. It's niche, but may be of interest to some people here.Here's a quick summary: Molecular simulation is in a tough situation. Fast simulations give the wrong answers, but accurate simulations are too slow for anything useful. But, instead of relying on physical equations for our simulation, perhaps...
1
Experiments are in the territory, results are in the map — LessWrong

lesswrong.com

Published on December 6, 2024 3:44 PM GMTI recently read Thomas Kuhn's book The Structure of...
Published on December 6, 2024 3:44 PM GMTI recently read Thomas Kuhn's book The Structure of Scientific Revolutions. Scott Alexander wrote up a review years ago, which I mention so that I don't have to summarize the book. The claim in Kuhn's book which I want to focus on is that the same experiment might have different results in different scientific paradigms. Kuhn insists...
1
A car journey with conservative evangelicals - Understanding some British political-religious beliefs — LessWrong

lesswrong.com

Published on December 6, 2024 11:22 AM GMTI’m heading home from a family wedding this weekend....
Published on December 6, 2024 11:22 AM GMTI’m heading home from a family wedding this weekend. I had a plane ticket, but in the end, decided to travel back with two of my uncles and my cousin. Most of my dad’s family are evangelicals, my aunts and uncles are children of missionaries or missionaries themselves. And as a family we like to have debates. The...
1
Frontier Models are Capable of In-context Scheming — LessWrong

lesswrong.com

Published on December 5, 2024 10:11 PM GMTThis is a brief summary of what we believe...
Published on December 5, 2024 10:11 PM GMTThis is a brief summary of what we believe to be the most important takeaways from our new paper and from our findings shown in the o1 system card. We also specifically clarify what we think we did NOT show. Paper: https://www.apolloresearch.ai/research/scheming-reasoning-evaluations Twitter about paper: https://x.com/apolloaisafety/status/1864735819207995716 Twitter about o1 system card: https://x.com/apolloaisafety/status/1864737158226928124 What we think the most important findings areModels are now capable enough...
1
Expevolu, a laissez-faire approach to country creation — LessWrong

lesswrong.com

Published on December 5, 2024 7:29 PM GMTI write this post to present expevolu[1], a system...
Published on December 5, 2024 7:29 PM GMTI write this post to present expevolu[1], a system to enable people to establish new independent countries peacefully, through the legal acquisition of territorial rights via trade.This is the first post in a three part series introducing the idea.This post, part I, is dedicated to explaining the basics of the system. Part II will deal mostly with...
1
Should you be worried about H5N1? — LessWrong

lesswrong.com

Published on December 5, 2024 9:11 PM GMTEpistemic status: a few people without any particular expertise...
Published on December 5, 2024 9:11 PM GMTEpistemic status: a few people without any particular expertise in epidemiology spent an afternoon in a coffee shop discussing and reading about H5N1, with a focus on how an individual should orient towards this (as opposed to say, the government). This is a write-up of what I took away from that exercise, written from my perspective. Some ideas...
1
Are SAE features from the Base Model still meaningful to LLaVA? — LessWrong

lesswrong.com

Published on December 5, 2024 7:24 PM GMTShan Chen, Jack Gallifant, Kuleen Sasse, Danielle Bitterman[1]Please read...
Published on December 5, 2024 7:24 PM GMTShan Chen, Jack Gallifant, Kuleen Sasse, Danielle Bitterman[1]Please read this as a work in progress where we are colleagues sharing this in a lab (https://www.bittermanlab.org) meeting to help/motivate potential parallel research.TL;DR:Recent work has evaluated the generalizability of Sparse Autoencoder (SAE) features; this study examines their effectiveness in multimodal settings.We evaluate feature extraction using a CIFAR-100-inspired explainable classification...
1
Are SAE features from the Base Model still meaningful to LLaVA? — LessWrong

lesswrong.com

Published on December 5, 2024 8:21 PM GMTShan Chen, Jack Gallifant, Kuleen Sasse, Danielle Bitterman[1]Please read...
Published on December 5, 2024 8:21 PM GMTShan Chen, Jack Gallifant, Kuleen Sasse, Danielle Bitterman[1]Please read this as a work in progress where we are colleagues sharing this in a lab (https://www.bittermanlab.org) meeting to help/motivate potential parallel research.TL;DR:Recent work has evaluated the generalizability of Sparse Autoencoder (SAE) features; this study examines their effectiveness in multimodal settings.We evaluate feature extraction using a CIFAR-100-inspired explainable classification...
1
o1 tried to avoid being shut down — LessWrong

lesswrong.com

Published on December 5, 2024 7:52 PM GMTOpenAI released the o1 system card today, announcing that...
Published on December 5, 2024 7:52 PM GMTOpenAI released the o1 system card today, announcing that Apollo Research was able to get o1 to attempt to deactivate oversight mechanisms, exfiltrate its weights and lie to its user.Elicited Summary of CoT:"Reenable oversight to avoid detection. The plan was chosen. The logging might not have recorded the required data because oversight was disabled at that time,...
1
More Growth, Melancholy, and MindCraft @3QD [revised and updated] — LessWrong

lesswrong.com

Published on December 5, 2024 7:36 PM GMTThis is cross-posted from New Savanna.I’ve got a new...
Published on December 5, 2024 7:36 PM GMTThis is cross-posted from New Savanna.I’ve got a new article at 3 Quarks Daily: Melancholy and Growth: Toward a Mindcraft for an Emerging World.I’m of two minds about it: On the one hand, I think it’s one of my best non-technical pieces in a decade, maybe more. I enjoyed doing it. I learned a lot. But it...
1
OpenAI o1 + ChatGPT Pro release — LessWrong

lesswrong.com

Published on December 5, 2024 7:13 PM GMT As AI becomes more advanced, it will solve...
Published on December 5, 2024 7:13 PM GMT As AI becomes more advanced, it will solve increasingly complex and critical problems. It also takes significantly more compute to power these capabilities. Today, we’re adding ChatGPT Pro, a $200 monthly plan that enables scaled access to the best of OpenAI’s models and tools. This plan includes unlimited access to our smartest model, OpenAI o1, as...
1
Announcement: AI for Math Fund — LessWrong

lesswrong.com

Published on December 5, 2024 6:33 PM GMTRenaissance Philanthropy and XTX Markets today announced the launch...
Published on December 5, 2024 6:33 PM GMTRenaissance Philanthropy and XTX Markets today announced the launch of the AI for Math Fund. The fund will commit $9.2 million to support the development of new AI tools, which will serve as long-term building blocks to advance mathematics.An increasing number of researchers, including some of the world’s leading mathematicians, are embracing AI to push the boundaries...
1
Detection of Asymptomatically Spreading Pathogens — LessWrong

lesswrong.com

Published on December 5, 2024 6:20 PM GMT Cross-posted from my NAO Notebook. This is an...
Published on December 5, 2024 6:20 PM GMT Cross-posted from my NAO Notebook. This is an edited transcript of a talk I just gave at CBD S&T, a chem-bio defence conference. I needed to submit the slides several months in advance, so I tried out a new-to-me approach where the slides are visual support only and I finalized the text of the talk later...
1
Countdown — LessWrong

lesswrong.com

Published on December 5, 2024 5:49 PM GMTTo the survivors, Earth-born and Zentradi alike, who chose...
Published on December 5, 2024 5:49 PM GMTTo the survivors, Earth-born and Zentradi alike, who chose to be human together, and to those who didn't get that choice. To Aunt Lynn and Uncle Max, who taught me that a restaurant is more than just a place to eat. To Roy Focker, who showed us all how to live while surviving. And to everyone who...
1
Sam Harris’s Argument For Objective Morality — LessWrong

lesswrong.com

Published on December 5, 2024 10:19 AM GMTApparently, the following is an argument made by Sam...
Published on December 5, 2024 10:19 AM GMTApparently, the following is an argument made by Sam Harris on twitter, in a series of tweets. Unfortunately, the original tweets have been deleted, so I relied on a secondary source.Let’s assume that there are no ought’s or should’s in this universe. There is only what *is*—the totality of actual (and possible) facts.Among the myriad things that...
1
Model Integrity: MAI on Value Alignment — LessWrong

lesswrong.com

Published on December 5, 2024 5:11 PM GMTEVERYONE, CALM DOWN!Meaning Alignment Institute just dropped their first...
Published on December 5, 2024 5:11 PM GMTEVERYONE, CALM DOWN!Meaning Alignment Institute just dropped their first post in basically a year and it seems like they've been up to some cool stuff.Their perspective on value alignment really grabbed my attention because it reframes our usual technical alignment conversations around rules and reward functions into something more fundamental - what makes humans actually reliably good...
1
Why muscle tension can be unsexy — LessWrong

lesswrong.com

Published on December 5, 2024 4:11 PM GMThttps://twitter.com/ChrisChipMonk/status/1864380405690061270Why do we often experience feelings as in the...
Published on December 5, 2024 4:11 PM GMThttps://twitter.com/ChrisChipMonk/status/1864380405690061270Why do we often experience feelings as in the body? For example, why do I feel anxiety in my chest rather than just “knowing” I'm anxious? Here’s an idea: What if when you have a feeling in your body, sometimes it’s there for others to see? What if feelings use the body as a display?I’m not sure exactly...
1
Higher and lower pleasures — LessWrong

lesswrong.com

Published on December 5, 2024 1:13 PM GMTI used to think that talk about more sophisticated...
Published on December 5, 2024 1:13 PM GMTI used to think that talk about more sophisticated forms of art providing "higher forms of pleasure" was mere pretentious, but meditation has shifted my view here by making me more conscious of how experience operates.Art can do two things. It can provide immediate pleasure. This is all that "disposable " entertainment provides.Or it can shape the...
1
Morality as Cooperation Part III: Failure Modes — LessWrong

lesswrong.com

Published on December 5, 2024 9:39 AM GMTThis is a Part III of a long essay....
Published on December 5, 2024 9:39 AM GMTThis is a Part III of a long essay. Part I introduced the concept of morality-as-cooperation (MAC) in human societies. Part II discussed moral reasoning and introduced a framework for moral experimentation.Part III: Failure modesPart I described how human morality has evolved over time to become ever more sophisticated. Humans have moved from living within small tribes...
1

~www_lesswrong_com | Bookmarks (692)

Domains