Ω 31y

I can see some arguments in your direction but would tentatively guess the opposite.

Uncontrollable Super-Powerful Explosives

In the late 19th century, two researchers meet to discuss their differing views on the existential risk posed by future Uncontrollable Super-Powerful Explosives.

Catastrophist: I predict that one day, not too far in the future, we will find a way to unlock a qualitatively new kind of explosive power. This explosive will represent a fundamental break with what has come before. It will be so much more powerful than any other explosive that whoever gets to this technology first might be in a position to gain a DSA over any opposition. Also, the governance and military strategies that we were using to prevent wars or win them will be fundamentally unable to control this new technology, so we'll have to reinvent everything on the fly or die in

...

(Continue Reading – 1183 more words)

Sammy Martin16m20

Maybe we have different definitions of DSA: I was thinking of it in terms of 'resistance is futile' and you can dictate whatever terms you want because you have overwhelming advantage, not that you could eventually after a struggle win a difficult war by forcing your opponent to surrender and accept unfavorable terms.

If say the US of 1965 was dumped into post WW2 Earth it would have the ability to dictate whatever terms it wanted because it would be able to launch hundreds of ICBMS at enemy cities at will. If the real US of 1949 had started a war against t... (read more)

Announcing Neuronpedia: Platform for accelerating research into Sparse Autoencoders

Johnny Lin, Joseph Bloom

Ω 431mo

This posts assumes basic familiarity with Sparse Autoencoders. For those unfamiliar with this technique, we highly recommend the introductory sections of these papers.

TL;DR

Neuronpedia is a platform for mechanistic interpretability research. It was previously focused on crowdsourcing explanations of neurons, but we’ve pivoted to accelerating researchers for Sparse Autoencoders (SAEs) by hosting models, feature dashboards, data visualizations, tooling, and more.

Important Links

Explore: The SAE research focused Neuronpedia. Current SAEs for GPT2-Small:
- RES-JB: Residuals - Joseph Bloom (294k feats)
- ATT-KK: Attention Out - Connor Kissane + Robert Kryzanowski (344k feats)
Upload: Get your SAEs hosted by Neuronpedia: fill out this <5 minute application
Participate: Join #neuronpedia on the Open Source Mech Interp Slack

Neuronpedia has received 1 year of funding from LTFF. Johnny Lin is full-time on engineering, design, and product, while Joseph Bloom is supporting with...

(Continue Reading – 1986 more words)

Fabien Roger27m20

I'm interested in using the SAEs and auto-interp GPT-3.5-Turbo feature explanations for RES-JB for some experiments. Is there a way to download this data?

Some Experiments I'd Like Someone To Try With An Amnestic

johnswentworth

A couple years ago, I had a great conversation at a research retreat about the cool things we could do if only we had safe, reliable amnestic drugs - i.e. drugs which would allow us to act more-or-less normally for some time, but not remember it at all later on. And then nothing came of that conversation, because as far as any of us knew such drugs were science fiction.

… so yesterday when I read Eric Neyman’s fun post My hour of memoryless lucidity, I was pretty surprised to learn that what sounded like a pretty ideal amnestic drug was used in routine surgery. A little googling suggested that the drug was probably a benzodiazepine (think valium). Which means it’s not only a great amnestic, it’s also apparently one...

(See More – 589 more words)

4Algon6h

@habryka this comment has an anomalous amount of karma. It showed up on popular comments, I think, and I'm wondering if people liked the comment when they saw it there which lead to a feedback loop of more eyeballs on the comment, more likes, more eyeball etc. If so, is that the intended behaviour of the popular comments feature? It seems like it shouldn't be.

habryka41m40

Yeah, seems like a kinda bad feedback loop. It doesn't seem to usually happen in that the comments I've seen upvoted in that section usually don't get this extremely many upvotes on a comment this short.

I don't have a great solution. We could do something that's more clever and algorithmic, which doesn't seem crazy but I am also hesitant to do because it's a lot of work and also I like more straightforward and simple algorithms for transparency reasons.

Explaining a Math Magic Trick

Robert_AIZI

Introduction

A recent popular tweet did a "math magic trick", and I want to explain why it works and use that as an excuse to talk about cool math (functional analysis). The tweet in question:

This is a cute magic trick, and like any good trick they nonchalantly gloss over the most important step. Did you spot it? Did you notice your confusion?

Here's the key question: Why did they switch from a differential equation to an integral equation? If you can use $(1 - x)^{- 1} = 1 + x + x^{2} + . . .$ when $x = \int$ , why not use it when $x = d / d x$ ?

Well, lets try it, writing $D$ for the derivative:

$\begin{matrix} f^{'} & = & f (1 - D) f & = & 0 f & = & (1 + D + D^{2} + . . .) 0 f & = & 0 + 0 + 0 + . . . f & = & 0 \end{matrix}$

So now you may be disappointed, but relieved: yes, this version fails, but at least it fails-safe, giving you the trivial solution, right?

But no, actually $(1 - D)^{- 1} = 1 + D + D^{2} + . . .$ can fail catastrophically, which we can see if we try a nonhomogeneous equation...

(Continue Reading – 1217 more words)

2DaemonicSigil12h

Heh, sure.

3notfnofn4h

Very nice! Notice that if you write r=j−k, I as D−1, and play around with binomial coefficients a bit, we can rewrite this as: D−k(fp)=∞∑r=0(−kr)(D−k−rf)(Drp) which holds for k<0 as well, in which case it becomes the derivative product rule. This also matches the formal power series expansion of (x+y)−k, which one can motivate directly (By the way, how do you spoiler tag?)

DaemonicSigil1h20

Oh, very cool, thanks! Spoiler tag in markdown is:

:::spoiler
text here
:::

2Robert_AIZI14h

Ah sorry, I skipped over that derivation! Here's how we'd approach this from first principals: to solve f=Df, we know we want to use the (1-x)=1+x+x^2+... trick, but now know that we need x=I instead of x=D. So that's why we want to switch to an integral equation, and we get f=Df If=IDf = f-f(0) where the final equality is the fundamental theorem of calculus. Then we rearrange: f-If=f(0) (1-I)f=f(0) and solve from there using the (1-I)=1+I+I^2+... trick! What's nice about this is it shows exactly how the initial condition of the DE shows up.

Thomas Kwa's Shortform

Thomas Kwa

Ω 04y

Algon2h22

I think you should write it. It sounds funny and a bunch of people have been calling out what they see as bad arguements that alginment is hard lately e.g. TurnTrout, QuintinPope, ZackMDavis, and karma wise they did fairly well.

To get the best posts emailed to you, create an account! (2-3 posts per week, selected by the LessWrong moderation team.)

Rejecting Television

Declan Molony

15d

I didn’t use to be, but now I’m part of the 2% of U.S. households without a television. With its near ubiquity, why reject this technology?

The Beginning of my Disillusionment

Neil Postman’s book Amusing Ourselves to Death radically changed my perspective on television and its place in our culture. Here’s one illuminating passage:

We are no longer fascinated or perplexed by [TV’s] machinery. We do not tell stories of its wonders. We do not confine our TV sets to special rooms. We do not doubt the reality of what we see on TV [and] are largely unaware of the special angle of vision it affords. Even the question of how television affects us has receded into the background. The question itself may strike some of us as strange, as if one were

...

(Continue Reading – 1540 more words)

Declan Molony2h10

To make an analogy to diet, you essentially replaced a sugar fix from eating Snickers bars with eating strawberries. Gradation matters!

I had a similar slide with my technologies, as I explained in the post. I eventually landed on reading books. But even that became a form of intellectual procrastination as I wrote in my latest LW post.

Decaeneus's Shortform

Decaeneus

3mo

Decaeneus2h30

Pretending not to see when a rule you've set is being violated can be optimal policy in parenting sometimes (and I bet it generalizes).

Example: suppose you have a toddler and a "rule" that food only stays in the kitchen. The motivation is that each time food is brough into the living room there is a small chance of an accident resulting in a permanent stain. There's cost to enforcing the rule as the toddler will put up a fight. Suppose that one night you feel really tired and the cost feels particularly high. If you enforce the rule, it will be much more p... (read more)

Designing for a single purpose

Itay Dreyfus

This is a linkpost for https://productidentity.co/p/designing-for-a-single-purpose

If you’ve ever been to Amsterdam, you’ve probably visited, or at least heard about the famous cookie store that sells only one cookie. I mean, not a piece, but a single flavor.

I’m talking about Van Stapele Koekmakerij of course—where you can get one of the world's most delicious chocolate chip cookies. If not arriving at opening hour, it’s likely to find a long queue extending from the store’s doorstep through the street it resides. When I visited the city a few years ago, I watched the sensation myself: a nervous crowd awaited as the rumor of ‘out of stock’ cookies spreaded across the line.

Van Stapele Koekmakerij - Cookie Shop in Amsterdam — Owner Vera Van Stapele with fresh-baked cookies, via store website

The store, despite becoming a landmark for tourists, stands for an idea that seems to...

(Continue Reading – 2805 more words)

LESSWRONG
LW

Quick Takes

Popular Comments

Recent Discussion

TL;DR

Important Links

Introduction

The Beginning of my Disillusionment

Fooming Shoggoths Dance Concert

June 1st at LessOnline