Quick Takes

Raemon5-1

I've recently updated on how useful it'd be to have small icons representing users. Previously some people were like "it'll help me scan the comment section for people!" and I was like "...yeah that seems true, but I'm scared of this site feeling like facebook, or worse, LinkedIn."

I'm not sure whether that was the right tradeoff, but, I was recently sold after realizing how space-efficient it is for showing lots of commenters. Like, in slack or facebook, you'll see things like:

This'd be really helpful, esp. in the Quick Takes and Popular comments sections,... (read more)

Clarifying the relationship between mechanistic anomaly detection (MAD), measurement tampering detection (MTD), weak to strong generalization (W2SG), weak to strong learning (W2SL), and eliciting latent knowledge (ELK). (Nothing new or interesting here, I just often loose track of these relationships in my head)

eliciting latent knowledge is an approach to scalable oversight which hopes to use the latent knowledge of a model as a supervision signal or oracle. 

weak to strong learning is an experimental setup for evaluating scalable oversight protocols, ... (read more)

RobertM4836

I've seen a lot of takes (on Twitter) recently suggesting that OpenAI and Anthropic (and maybe some other companies) violated commitments they made to the UK's AISI about granting them access for e.g. predeployment testing of frontier models.  Is there any concrete evidence about what commitment was made, if any?  The only thing I've seen so far is a pretty ambiguous statement by Rishi Sunak, who might have had some incentive to claim more success than was warranted at the time.  If people are going to breathe down the necks of AGI labs abou... (read more)

Akash41

I haven't followed this in great detail, but I do remember hearing from many AI policy people (including people at the UKAISI) that such commitments had been made.

It's plausible to me that this was an example of "miscommunication" rather than "explicit lying." I hope someone who has followed this more closely provides details.

But note that I personally think that AGI labs have a responsibility to dispel widely-believed myths. It would shock me if OpenAI/Anthropic/Google DeepMind were not aware that people (including people in government) believed that they... (read more)

1jeffreycaruso
Have you read this? https://www.politico.eu/article/rishi-sunak-ai-testing-tech-ai-safety-institute/ "“You can’t have these AI companies jumping through hoops in each and every single different jurisdiction, and from our point of view of course our principal relationship is with the U.S. AI Safety Institute,” Meta’s president of global affairs Nick Clegg — a former British deputy prime minister — told POLITICO on the sidelines of an event in London this month." "OpenAI and Meta are set to roll out their next batch of AI models imminently. Yet neither has granted access to the U.K.’s AI Safety Institute to do pre-release testing, according to four people close to the matter." "Leading AI firm Anthropic, which rolled out its latest batch of models in March, has yet to allow the U.K. institute to test its models pre-release, though co-founder Jack Clark told POLITICO it is working with the body on how pre-deployment testing by governments might work. “Pre-deployment testing is a nice idea but very difficult to implement,” said Clark."

Contra both the 'doomers' and the 'optimists' on (not) pausing. Rephrased: RSPs (done right) seem right.

Contra 'doomers'. Oversimplified, 'doomers' (e.g. PauseAI, FLI's letter, Eliezer) ask(ed) for pausing now / even earlier - (e.g. the Pause Letter). I expect this would be / have been very much suboptimal, even purely in terms of solving technical alignment. For example, Some thoughts on automating alignment research suggests timing the pause so that we can use automated AI safety research could result in '[...] each month of lead that the leader started ... (read more)

Selected fragments (though not really cherry-picked, no reruns) of a conversation with Claude Opus on operationalizing something like Activation vector steering with BCI by applying the methodology of Concept Algebra for (Score-Based) Text-Controlled Generative Models to the model from High-resolution image reconstruction with latent diffusion models from human brain activity (website with nice illustrations of the model).

My prompts bolded:

'Could we do concept algebra directly on the fMRI of the higher visual cortex?
Yes, in principle, it should be possible... (read more)

Also positive update for me on interdisciplinary conceptual alignment being automatable differentially soon; which seemed to me for a long time plausible, since LLMs have 'read the whole internet' and interdisciplinary insights often seem (to me) to require relatively small numbers of inferential hops (plausibly because it's hard for humans to have [especially deep] expertise in many different domains), making them potentially feasible for LLMs differentially early (reliably making long inferential chains still seems among the harder things for LLMs).


 

Do we expect future model architectures to be biased toward out-of-context reasoning (reasoning internally rather than in a chain-of-thought)? As in, what kinds of capabilities would lead companies to build models that reason less and less in token-space?

I mean, the first obvious thing would be that you are training the model to internalize some of the reasoning rather than having to pay for the additional tokens each time you want to do complex reasoning.

The thing is, I expect we'll eventually move away from just relying on transformers with scale. And so... (read more)

This is an excellent point.

While LLMs seem (relatively) safe, we may very well blow right on by them soon.

I do think that many of the safety advantages of LLMs come from their understanding of human intentions (and therefore implied values). Those would be retained in improved architectures that still predict human language use. If such a system's thought process was entirely opaque, we could no longer perform Externalized reasoning oversight by "reading its thoughts".

But think it might be possible to build a reliable agent from unreliable parts. I t... (read more)

quila126

On Pivotal Acts

I was rereading some of the old literature on alignment research sharing policies after Tamsin Leake's recent post and came across some discussion of pivotal acts as well.

Hiring people for your pivotal act project is going to be tricky. [...] People on your team will have a low trust and/or adversarial stance towards neighboring institutions and collaborators, and will have a hard time forming good-faith collaboration. This will alienate other institutions and make them not want to work with you or be supportive of you.

This is in a cont... (read more)

Showing 3 of 7 replies (Click to show all)

See minimality principle:

the least dangerous plan is not the plan that seems to contain the fewest material actions that seem risky in a conventional sense, but rather the plan that requires the least dangerous cognition from the AGI executing it

3quila
Reflecting on this more, I wrote in a discord server (then edited to post here): I wasn't aware the concept of pivotal acts was entangled with the frame of formal inner+outer alignment as the only (or only feasible?) way to cause safe ASI. I suspect that by default, I and someone operating in that frame might mutually believe each others agendas to be probably-doomed. This could make discussion more valuable (as in that case, at least one of us should make a large update). For anyone interested in trying that discussion, I'd be curious what you think of the post linked above. As a comment on it says: In my view, solving formal inner alignment, i.e. devising a general method to create ASI with any specified output-selection policy, is hard enough that I don't expect it to be done.[1] This is why I've been focusing on other approaches which I believe are more likely to succeed.   1. ^ Though I encourage anyone who understands the problem and thinks they can solve it to try to prove me wrong! I can sure see some directions and I think a very creative human could solve it in principle. But I also think a very creative human might find a different class of solution that can be achieved sooner. (Like I've been trying to do :)
1quila
(see reply to Wei Dai)

Decomposability seems like a fundamental assumption for interpretability and condition for it to succeed. E.g. from Toy Models of Superposition:

'Decomposability: Neural network activations which are decomposable can be decomposed into features, the meaning of which is not dependent on the value of other features. (This property is ultimately the most important – see the role of decomposition in defeating the curse of dimensionality.) [...]

The first two (decomposability and linearity) are properties we hypothesize to be widespread, while the latte... (read more)

Pretending not to see when a rule you've set is being violated can be optimal policy in parenting sometimes (and I bet it generalizes).

Example: suppose you have a toddler and a "rule" that food only stays in the kitchen. The motivation is that each time food is brough into the living room there is a small chance of an accident resulting in a permanent stain. There's cost to enforcing the rule as the toddler will put up a fight. Suppose that one night you feel really tired and the cost feels particularly high. If you enforce the rule, it will be much more p... (read more)

5RobertM
Huh, that went somewhere other than where I was expecting.  I thought you were going to say that ignoring letter-of-the-rule violations is fine when they're not spirit-of-the-rule violations, as a way of communicating the actual boundaries.

Perhaps that can work depending on the circumstances. In the specific case of a toddler, at the risk of not giving him enough credit, I think that type of distinction is too nuanced. I suspect that in practice this will simply make him litigate every particular application of any given rule (since it gives him hope that it might work) which raises the cost of enforcement dramatically. Potentially it might also make him more stressed, as I think there's something very mentally soothing / non-taxing about bright line rules.

I think with older kids though, it'... (read more)

5keltan
Teacher here, can confirm.

Anyone know folks working on semiconductors in Taiwan and Abu Dhabi, or on fiber at Tata Industries in Mumbai? 

I'm currently travelling around the world and talking to folks about various kinds of AI infrastructure, and looking for recommendations of folks to meet! 

If so, freel free to DM me! 

(If you don't know me, I'm a dev here on LessWrong and was also part of founding Lightcone Infrastructure.)

1mesaoptimizer
Could you elaborate on how Tata Industries is relevant here? Based on a DDG search, the only news I find involving Tata and AI infrastructure is one where a subsidiary named TCS is supposedly getting into the generative AI gold rush.

That's more about me being interested in key global infrastructure, I've been curious about them for quite a lot of years after realising the combination of how significant what they're building is vs how few folks know about them. I don't know that they have any particularly generative AI related projects in the short term. 

lc1116

I seriously doubt on priors that Boeing corporate is murdering employees.

Showing 3 of 6 replies (Click to show all)
2lc
Robin Hanson has apparently asked the same thing. It seems like such a bizarre question to me: * Most people do not have the constitution or agency for criminal murder * Most companies do not have secrets large enough that assassinations would reduce the size of their problems on expectation * Most people who work at large companies don't really give a shit if that company gets fined or into legal trouble, and so they don't have the motivation to personally risk anything organizing murders to prevent lawsuits
3Ben Pace
I think my model of people is that people are very much changed by the affordances that society gives them and the pressures they are under. In contrast with this statement, a lot of hunter-gatherer people had to be able to fight to the death, so I don't buy that it's entirely about the human constitution. I think if it was a known thing that you could hire an assassin on an employee and unless you messed up and left quite explicit evidence connecting you, you'd get away with it, then there'd be enough pressures to cause people in-extremis to do it a few times per year even in just high-stakes business settings. Also my impression is that business or political assassinations exist to this day in many countries; a little searching suggests Russia, Mexico, Venezuela, possibly Nigeria, and more. I generally put a lot more importance on tracking which norms are actually being endorsed and enforced by the group / society as opposed to primarily counting on individual ethical reasoning or individual ethical consciences. (TBC I also am not currently buying that this is an assassination in the US, but I didn't find this reasoning compelling.)
lc20

Also my impression is that business or political assassinations exist to this day in many countries; a little searching suggests Russia, Mexico, Venezuela, possibly Nigeria, and more.

Oh definitely. In Mexico in particular business pairs up with organized crime all of the time to strong-arm competitors. But this happens when there's an "organized crime" tycoons can cheaply (in terms of risk) pair up with. Also, OP asked about why companies don't assassinate whistlebowers all the time specifically.

a lot of hunter-gatherer people had to be able to fight

... (read more)
William_SΩ691598

I worked at OpenAI for three years, from 2021-2024 on the Alignment team, which eventually became the Superalignment team. I worked on scalable oversight, part of the team developing critiques as a technique for using language models to spot mistakes in other language models. I then worked to refine an idea from Nick Cammarata into a method for using language model to generate explanations for features in language models. I was then promoted to managing a team of 4 people which worked on trying to understand language model features in context, leading to t... (read more)

Reply14821
Showing 3 of 27 replies (Click to show all)
Linch40

I can see some arguments in your direction but would tentatively guess the opposite. 

2Lech Mazur
Do you know if the origin of this idea for them was a psychedelic or dissociative trip? I'd give it at least even odds, with most of the remaining chances being meditation or Eastern religions...
9JenniferRM
Wait, you know smart people who have NOT, at some point in their life: (1) taken a psychedelic NOR (2) meditated, NOR (3) thought about any of buddhism, jainism, hinduism, taoism, confucianisn, etc??? To be clear to naive readers: psychedelics are, in fact, non-trivially dangerous. I personally worry I already have "an arguably-unfair and a probably-too-high share" of "shaman genes" and I don't feel I need exogenous sources of weirdness at this point. But in the SF bay area (and places on the internet memetically downstream from IRL communities there) a lot of that is going around, memetically (in stories about) and perhaps mimetically (via monkey see, monkey do). The first time you use a serious one you're likely getting a permanent modification to your personality (+0.5 stddev to your Openness?) and arguably/sorta each time you do a new one, or do a higher dose, or whatever, you've committed "1% of a personality suicide" by disrupting some of your most neurologically complex commitments. To a first approximation my advice is simply "don't do it". HOWEVER: this latter consideration actually suggests: anyone seriously and truly considering suicide should perhaps take a low dose psychedelic FIRST (with at least two loving tripsitters and due care) since it is also maybe/sorta "suicide" but it leaves a body behind that most people will think is still the same person and so they won't cry very much and so on? To calibrate this perspective a bit, I also expect that even if cryonics works, it will also cause an unusually large amount of personality shift. A tolerable amount. An amount that leaves behind a personality that similar-enough-to-the-current-one-to-not-have-triggered-a-ship-of-theseus-violation-in-one-modification-cycle. Much more than a stressful day and then bad nightmares and a feeling of regret the next day, but weirder. With cryonics, you might wake up to some effects that are roughly equivalent to "having taken a potion of youthful rejuvenation, an

I was going to write an April Fool's Day post in the style of "On the Impossibility of Supersized Machines", perhaps titled "On the Impossibility of Operating Supersized Machines", to poke fun at bad arguments that alignment is difficult. I didn't do this partly because I thought it would get downvotes. Maybe this reflects poorly on LW?

Showing 3 of 6 replies (Click to show all)
Algon33

I think you should write it. It sounds funny and a bunch of people have been calling out what they see as bad arguements that alginment is hard lately e.g. TurnTrout, QuintinPope, ZackMDavis, and karma wise they did fairly well. 

2Ronny Fernandez
I think you should still write it. I'd be happy to post it instead or bet with you on whether it ends up negative karma if you let me read it first.
3mako yass
You may be interested in Kenneth Stanley's serendipity-oriented social network, maven

I wish I could bookmark comments/shortform posts.

2faul_sname
Yes, that would be cool. Next to the author name of a post orcomment, there's a post-date/time element that looks like "1h 🔗". That is a copyable/bookmarkable link.

Sure, I just prefer a native bookmarking function.

habryka5124

Does anyone have any takes on the two Boeing whistleblowers who died under somewhat suspicious circumstances? I haven't followed this in detail, and my guess is it is basically just random chance, but it sure would be a huge deal if a publicly traded company now was performing assassinations of U.S. citizens. 

Curious whether anyone has looked into this, or has thought much about baseline risk of assassinations or other forms of violence from economic actors.

Showing 3 of 7 replies (Click to show all)

I find this a very suspect detail, though the base rate of cospiracies is very low.

"He wasn't concerned about safety because I asked him," Jennifer said. "I said, 'Aren't you scared?' And he said, 'No, I ain't scared, but if anything happens to me, it's not suicide.'"

https://abcnews4.com/news/local/if-anything-happens-its-not-suicide-boeing-whistleblowers-prediction-before-death-south-carolina-abc-news-4-2024

4ChristianKl
Poisoning someone with MRSA infection seems possible but if that's what happened it's capabilities that are not easily available. If such a thing would happen in another case, people would likely speak about nation-state capabilities. 
2aphyer
Shouldn't that be counting the number squared rather than the number?

More dakka with festivals

In the rationality community people are currently excited about the LessOnline festival. Furthermore, my impression is that similar festivals are generally quite successful: people enjoy them, have stimulating discussions, form new relationships, are exposed to new and interesting ideas, express that they got a lot out of it, etc.

So then, this feels to me like a situation where More Dakka applies. Organize more festivals!

How? Who? I dunno, but these seem like questions worth discussing.

Some initial thoughts:

  1. Assurance contracts seem
... (read more)
2niplav
I don't think that's true. I've co-organized one one weekend-long retreat in a small hostel for ~50 people, and the cost was ~$5k. Me & the co-organizers probably spent ~50h in total on organizing the event, as volunteers.
4Adam Zerner
I was envisioning that you can organize a festival incrementally, investing more time and money into it as you receive more and more validation, and that taking this approach would de-risk it to the point where overall, it's "not that risky". For example, to start off you can email or message a handful of potential attendees. If they aren't excited by the idea you can stop there, but if they are then you can proceed to start looking into things like cost and logistics. I'm not sure how pragmatic this iterative approach actually is though. What do you think? Also, it seems to me that you wouldn't have to actually risk losing any of your own money. I'd imagine that you'd 1) talk to the hostel, agree on a price, have them "hold the spot" for you, 2) get sign ups, 3) pay using the money you get from attendees. Although now that I think about it I'm realizing that it probably isn't that simple. For example, the hostel cost ~$5k and maybe the money from the attendees would have covered it all but maybe less attendees signed up than you were expecting and the organizers ended up having to pay out of pocket. On the other hand, maybe there is funding available for situations like these.
niplav20

Back then I didn't try to get the hostel to sign the metaphorical assurance contract with me, maybe that'd work. A good dominant assurance contract website might work as well.

I guess if you go camping together then conferences are pretty scalable, and if I was to organize another event I'd probably try to first message a few people to get a minimal number of attendees together. After all, the spectrum between an extended party and a festival/conference is fluid.

Way back in the halcyon days of 2005, a company called Cenqua had an April Fools' Day announcement for a product called Commentator: an AI tool which would comment your code (with, um, adjustable settings for usefulness). I'm wondering if (1) anybody can find an archived version of the page (the original seems to be gone), and (2) if there's now a clear market leader for that particular product niche, but for real.

6A.H.
Here is an archived version of the page :  http://web.archive.org/web/20050403015136/http://www.cenqua.com/commentator/
7Garrett Baker
Archived website

You are a scholar and a gentleman.

Dalcy50

I am curious as to how often the asymptotic results proven using features of the problem that seem basically practically-irrelevant become relevant in practice.

Like, I understand that there are many asymptotic results (e.g., free energy principle in SLT) that are useful in practice, but i feel like there's something sus about similar results from information theory or complexity theory where the way in which they prove certain bounds (or inclusion relationship, for complexity theory) seem totally detached from practicality?

  • joint source coding theorem is of
... (read more)

P v NP: https://en.wikipedia.org/wiki/Generic-case_complexity

3Alexander Gietelink Oldenziel
Great question. I don't have a satisfying answer. Perhaps a cynical answer is survival bias - we remember the asymptotic results that eventually become relevant (because people develop practical algorithms or a deeper theory is discovered) but don't remember the irrelevant ones.  Existence results are categorically easier to prove than explicit algorithms. Indeed,  classical existence may hold (the former) while intuitioinistically (the latter) might not. We would expect non-explicit existence results to appear before explicit algorithms.  One minor remark on 'quantifying over all boolean algorithms'. Unease with quantification over large domains may be a vestige of set-theoretic thinking that imagines types as (platonic) boxes. But a term of a for-all quantifier is better thought of as an algorithm/ method to check the property for any given term (in this case a Boolean circuit). This doesn't sound divorced from practice to my ears. 
Fabien RogerΩ21487

I listened to The Failure of Risk Management by Douglas Hubbard, a book that vigorously criticizes qualitative risk management approaches (like the use of risk matrices), and praises a rationalist-friendly quantitative approach. Here are 4 takeaways from that book:

  • There are very different approaches to risk estimation that are often unaware of each other: you can do risk estimations like an actuary (relying on statistics, reference class arguments, and some causal models), like an engineer (relying mostly on causal models and simulations), like a trader (r
... (read more)
Showing 3 of 8 replies (Click to show all)

I also listened to How to Measure Anything in Cybersecurity Risk 2nd Edition by the same author. I had a huge amount of overlapping content with The Failure of Risk Management (and the non-overlapping parts were quite dry), but I still learned a few things:

  • Executives of big companies now care a lot about cybersecurity (e.g. citing it as one of the main threats they have to face), which wasn't true in ~2010.
  • Evaluation of cybersecurity risk is not at all synonyms with red teaming. This book is entirely about risk assessment in cyber and doesn't speak about r
... (read more)
2romeostevensit
Is there a short summary on the rejecting Knightian uncertainty bit?
4Fabien Roger
By Knightian uncertainty, I mean "the lack of any quantifiable knowledge about some possible occurrence" i.e. you can't put a probability on it (Wikipedia). The TL;DR is that Knightian uncertainty is not a useful concept to make decisions, while the use subjective probabilities is: if you are calibrated (which you can be trained to become), then you will be better off taking different decisions on p=1% "Knightian uncertain events" and p=10% "Knightian uncertain events".  For a more in-depth defense of this position in the context of long-term predictions, where it's harder to know if calibration training obviously works, see the latest scott alexander post.

I wonder how much near-term interpretability [V]LM agents (e.g. MAIA, AIA) might help with finding better probes and better steering vectors (e.g. by iteratively testing counterfactual hypotheses against potentially spurious features, a major challenge for Contrast-consistent search (CCS)). 

This seems plausible since MAIA can already find spurious features, and feature interpretability [V]LM agents could have much lengthier hypotheses iteration cycles (compared to current [V]LM agents and perhaps even to human researchers).

Load More