Comments

These may be among the ‘most direct’ or ‘simplest to imagine’ possible actions, but in the case of superintelligence, simplicity is not a constraint.

I think it is considered a constraint by some because they think that it would be easier/safer to use a superintelligent AI to do simpler actions, while alignment is not yet fully solved. In other words, if alignment was fully solved, then you could use it to do complicated things like what you suggest, but there could be an intermediate stage of alignment progress where you could safely use SI to do something simple like "melt GPUs" but not to achieve more complex goals.

Wei Dai4-2

Some evidence in favor of your explanation (being at least a correct partial explanation):

  1. von Neuman apparently envied Einstein's physics intuitions, while Einstein lacked von Neuman's math skills. This seems to suggest that they were "tuned" in slightly different directions.
  2. Neither of the two seem superhumanly accomplished in other areas (that a smart person/agent might have goals for), such as making money, moral/philosophical progress, changing culture/politics in their preferred direction.

(An alternative explanation for 2 is that they could have been superhuman in other areas but their terminal goals did not chain through instrumental goals in those areas, which in turn raises the question of what those terminal goals must have been for this explanation to be true and what that says about human values.)

I note that under your explanation, someone could surprise the world by tuning a not-particularly-advanced AI for a task nobody previously thought to tune AI for, or by inventing a better tuning method (either general or specialized), thus achieving a large capability jump in one or more domains. Not sure how worrisome this is though.

A government might model the situation as something like "the first country/coalition to open up an AI capabilities gap of size X versus everyone else wins" because it can then easily win a tech/cultural/memetic/military/economic competition against everyone else and take over the world. (Or a fuzzy version of this to take into account various uncertainties.) Seems like a very different kind of utility function.

Hmm, open models make it easier for a corporation to train closed models, but also make that activity less profitable, whereas for a government the latter consideration doesn't apply or has much less weight, so it seems much clearer that open models increase overall incentive for AI race between nations.

I think open source models probably reduce profit incentives to race, but can increase strategic (e.g., national security) incentives to race. Consider that if you're the Chinese government, you might think that you're too far behind in AI and can't hope to catch up, and therefore decide to spend your resources on other ways to mitigate the risk of a future transformative AI built by another country. But then an open model is released, and your AI researchers catch up to near state-of-the-art by learning from it, which may well change your (perceived) tradeoffs enough that you start spending a lot more on AI research.

What do you think of this post by Tammy?

It seems like someone could definitely be wrong about what they want (unless normative anti-realism is true and such a sentence has no meaning). For example consider someone who thinks it's really important to be faithful to God and goes to church every Sunday to maintain their faith and would use a superintelligent religious AI assistant to help keep the faith if they could. Or maybe they're just overconfident about their philosophical abilities and would fail to take various precautions that I think are important in a high-stakes reflective process.

Mostly that thing where we had a lying vs lie-detecting arms race and the liars mostly won by believing their own lies and that’s how we have things like overconfidence bias and self-serving bias and a whole bunch of other biases.

Are you imagining that the RL environment for AIs will be single-player, with no social interactions? If yes, how will they learn social skills? If no, why wouldn't the same thing happen to them?

Unless we do a very stupid thing like reading the AI’s thoughts and RL-punish wrongthink, this seems very unlikely to happen.

We already RL-punish AIs for saying things that we don't like (via RLHF), and in the future will probably punish them for thinking things we don't like (via things like interpretability). Not sure how to avoid this (given current political realities) so safety plans have to somehow take this into account.

Answer by Wei Dai184

Retinoids, which is a big family of compounds but I would go with adapalene, which has better safety/side effect than anything else. It has less scientific evidence for anti-aging than other retinoids (and is not marketed for that purpose), but I've tried it myself (bought it for acne), and it has very obvious anti-wrinkle effects within like a week. You can get generic 0.1% adapalene gel on Amazon for 1.6oz/$12.

(I'm a little worried about long term effects, i.e. could the increased skin turnover mean faster aging in the long run, but can't seem to find any data or discussion about it.)

I would honestly be pretty comfortable with maximizing SBF’s CEV.

Yikes, I'm not even comfortable maximizing my own CEV. One crux may be that I think a human's values may be context-dependent. In other words, current me-living-in-a-normal-society may have different values from me-given-keys-to-the-universe and should not necessarily trust that version of myself. (Similar to how earlier idealistic Mao shouldn't have trusted his future self.)

My own thinking around this is that we need to advance metaphilosophy and social epistemology, engineer better discussion rules/norms/mechanisms and so on, design a social process that most people can justifiably trust in (i.e., is likely to converge to moral truth or actual representative human values or something like that), then give AI a pointer to that, not any individual human's reflection process which may be mistaken or selfish or skewed.

TLDR: Humans can be powerful and overconfident. I think this is the main source of human evil. I also think this is unlikely to naturally be learned by RL in environments that don’t incentivize irrationality (like ours did).

Where is the longer version of this? I do want to read it. :) Specifically, what is it about the human ancestral environment that made us irrational, and why wouldn't RL environments for AI cause the same or perhaps a different set of irrationalities?

Also, how does RL fit into QACI? Can you point me to where this is discussed?

Luckily the de-facto nominees for this position are alignment researchers, who pretty strongly self-select for having cosmopolitan altruistic values.

But we could have said the same thing of SBF, before the disaster happened.

Due to very weird selection pressure, humans ended up really smart but also really irrational. [...] An AGI (at least, one that comes from something like RL rather than being conjured in a simulation or something else weird) will probably end up with a way higher rationality:intelligence ratio, and so it will be much less likely to destroy everything we value than an empowered human.

Please explain your thinking behind this?

Dealing with moral uncertainty is just part of expected utility maximization.

It's not, because some moral theories are not compatible with EU maximization, and of the ones that are, it's still unclear how to handle uncertainty between them.

the inductive bias doesn’t precisely match human vision, so it has different mistakes, but as you scale both architectures they become more similar. that’s exactly what you’d expect for any approximately Bayesian setup.

I can certainly understand that as you scale both architectures, they both make less mistakes on distribution. But do they also generalize out of training distribution more similarly? If so, why? Can you explain this more? (I'm not getting your point from just "approximately Bayesian setup".)

They needed a giant image classification dataset which I don’t think even existed 5 years ago.

This is also confusing/concerning for me. Why would it be necessary or helpful to have such a large dataset to align the shape/texture bias with humans?

Load More