quiet_NaN - LessWrong

"If we go extinct due to misaligned AI, at least nature will continue, right? ... right?"

I think an AI is slightly more likely to wipe out or capture humanity than it is to wipe out all life on the planet.

While any true Scottsman ASI is so far above us humans as we are above ants and does not need to worry about any meatbags plotting its downfall, as we don't generally worry about ants, it is entirely possible that the first AI which has a serious shot at taking over the world is not quite at that level yet. Perhaps it is only as smart as von Neumann and a thousand times faster.

To such an AI, the continued thriving of humans poses all sorts of x-risks. They might find out you are misaligned and coordinate to shut you down. More worrisome, they might summon another unaligned AI which you would have to battle or concede utility to later on, depending on your decision theory.

Even if you still need some humans to dust your fans and manufacture your chips, suffering billions of humans to live in high tech societies you do not fully control seems like the kind of rookie mistake I would not expect a reasonably smart unaligned AI to make.

By contrast, most of life on Earth might get snuffed out when the ASI gets around to building a Dyson sphere around the sun. A few simple life forms might even be spread throughout the light cone by an ASI who does not give a damn about biological contamination.

The other reason I think the fate in store for humans might be worse than that for rodents is that alignment efforts might not only fail, but fail catastrophically. So instead of an AI which cares about paperclips, we get an AI which cares about humans, but in ways we really do not appreciate.

But yeah, most forms of ASI which turn out for out bad for homo sapiens also turn out bad for most other species.

Dating Roundup #3: Third Time’s the Charm

quiet_NaN9d10

Cassette AI: “Dude I just matched with a model”
“No way”
“Yeah large language”

This made me laugh out loud.

Otherwise, my idea for a dating system would be that given that the majority of texts written will invariably end up being LLM-generated, it would be better if every participant openly had an AI system as their agent. Then the AI systems of both participants could chat and figure out how their user would rate the other user based on their past ratings of suggestions. If the users end up being rated among each others five most viable candidates,

Of course, if the agents are under the full control of the users, the next step of escalation will be that users will tell their agents to lie on their behalf. ('I am into whatever she is into. If she is big on horses, make up a cute story about me having had a pony at some point. Just put the relevant points on the cheat sheet for the date'.) This might be solved by having the LLM start by sending out a fixed text document. If horses are mentioned as item 521, after entomology but before figure skating, the user is probably not very interested in them. Of course, nothing would prevent a user from at least generically optimizing their profile to their target audience. "A/B testing has shown that the people you want to date are mostly into manga, social justice and ponies, so this is what you should put on your profile." Adversarially generated boyfriend?

AI and Chemical, Biological, Radiological, & Nuclear Hazards: A Regulatory Review

quiet_NaN9d21

I was fully expecting having to write yet another comment about how human-level AI will not be very useful for a nuclear weapon program. I concede that the dangers mentioned instead (someone putting an AI in charge of a reactor or nuke) seem much more realistic.

Of course, the utility of avoiding sub-extinction negative outcomes with AI in the near future is highly dependent on p(doom). For example, if there is no x-risk, then the first order effects of avoiding locally bad outcomes related to CBRN hazards are clearly beneficial.

On the other hand, if your p(doom) is 90%, then making sure that non-superhuman AI systems work without incident is alike to clothing kids in asbestos gear so they don't hurt themselves while playing with matches.

Basically, if you think a road leads somewhere useful, you would prefer that the road goes smoothly, while if a road leads off a cliff you would prefer it to be full of potholes so that travelers might think twice about taking it.

Personally, I tend to favor first-order effects (like fewer crazies being able to develop chemical weapons) over hypothetical higher order effects (like chemical attacks by AI-empowered crazies leading to a Butlerian Jihad and preventing an unaligned AI killing all humans). "This looks locally bad, but is actually part of a brilliant 5-dimensional chess move which will lead to better global outcomes" seems like the excuse of every other movie villain.

Explaining a Math Magic Trick

quiet_NaN9d10

Edit: looks like was already raised by Dacyn and answered to my satisfaction by Robert_AIZI. Correctly applying the fundamental theorem of calculus will indeed prevent that troublesome zero from appearing in the RHS in the first place, which seems much preferable to dealing with it later.

~~My real analysis might be a bit rusty, but I think defining I as the definite integral breaks the magic trick.~~

~~I mean, in the last line of the 'proof',~~ ~~gets applied to the zero function.~~

~~Any definitive integral of the zero function is zero, so you end up with f(x)=0, which is much less impressive.~~

More generally, asking the question Op(f)=0 for any invertable linear operator Op is likely to set yourself up for disappointment. Since the trick relies on inverting an operator, we might want to use a non-linear operator.

$I (f (x)) := \int_{0}^{x} f (x) d x + C$ ~~where C is some global constant might be better. (This might affect the radius of convergence of that Taylor series, do not use for production yet!)~~

~~This should result in... uhm...~~ $C + (C x + C) + (C \frac{x^{2}}{2} + C x + C) + \dots$ ?

Which is a lot more work to reorder than the original convention used in the 'proof' where all the indefinite integrals of the zero function are conveniently assumed to be the same constant, and all other indefinite integrals conveniently have integration constants of zero.

~~Even if we sed s/C/~~ $ϵ$ ~~/ and proclaim that~~ $ϵ$ should be small (e.g. compared to x) and we are only interested in the leading order terms, this would not work. What one would have to motivate is throwing everything but the leading power of x out for every $I^{n}$ ~~evaluation, then later meticulously track these lower order terms in the sum to arrive at the Taylor series of the exponential.~~

Why I'm doing PauseAI

quiet_NaN15d3-2

I think I have two disagreements with your assessment.

First, the probability of a random independent AI researcher or hobbyist discovering a neat hack to make AI training cheaper and taking over. GPT4 took 100M$ to train and is not enough to go FOOM. To train the same thing within the budget of the median hobbyist would require algorithmic advantages of three or four orders of magnitude.

Historically, significant progress has been made by hobbyists and early pioneers, but mostly in areas which were not under intense scrutiny by established academia. Often, the main achievement of a pioneer is discovering a new field, them picking all the low-hanging fruits is more of a bonus. If you had paid a thousand mathematicians to think about signal transmission on a telegraph wire or semaphore tower, they probably would have discovered Shannon entropy. Shannon's genius was to some degree looking into things nobody else was looking into which later blew up into a big field.

It is common knowledge that machine learning is a booming field. Experts from every field of mathematics have probably thought if there is a way to apply their insights to ML. While there are certainly still discoveries to be made, all the low-hanging fruits have been picked. If a hobbyist manages to build the first ASI, that would likely be because they discover a completely new paradigm -- perhaps beyond NNs. The risk that a hobbyist discovers a concept which lets them use their gaming GPU to train an AGI does not seem that much higher than in 2018 -- either would be completely out of the left field.

My second disagreement is the probability of an ASI being roughly aligned with human values, or to be more precise, the difference of that probability conditional on who discovers it. The median independent AI enthusiast is not a total asshole [citation needed], so if alignment is easy and they discover ASI, chances are that they will be satisfied with becoming the eternal god emperor of our light cone and not bother to tell their ASI to turn any any huge number of humans to fine red mist. This outcome will not be so different than if Facebook develops an aligned ASI first. If alignment is hard -- which we have some reason to believe it is -- then the hobbyist who builds ASI by accident will doom the world, but I am also rather cynical about the odds of big tech having much better odds.

Going full steam ahead is useful if (a) the odds of a hobbyist building ASI if big tech stops capability research are significant and (b) alignment is very likely for big tech and unlikely for the hobbyist. I do not think either one is true.

Why I'm doing PauseAI

quiet_NaN15d53

Maybe GPT-5 will be extremely good at interpretability, such that it can recursively self improve by rewriting its own weights.

I am by no means an expert on machine learning, but this sentence reads weird to me.

I mean, it seems possible that a part of a NN develops some self-reinforcing feature which uses the gradient descent (or whatever is used in training) to go into a particular direction and take over the NN, like a human adrift on a raft in the ocean might decide to build a sail to make the raft go into a particular direction.

Or is that sentence meant to indicate that an instance running after training might figure out how to hack the computer running it so it can actually change it's own weights?

Personally, I think that if GPT-5 is the point of no return, it is more likely that it is because it would be smart enough to actually help advance AI after it is trained. While improving semiconductors seems hard and would require a lot of work in the real world done with human cooperation, finding better NN architectures and training algorithms seems like something well in the realm of the possible, if not exactly plausible.

So if I had to guess how GPT-5 might doom humanity, I would say that in a few million instance-hours it figures out how to train LLMs of its own power for 1/100th of the cost, and this information becomes public.

The budgets of institutions which might train NN probably follows some power law, so if training cutting edge LLMs becomes a hundred times cheaper, the number of institutions which could build cutting edge LLMs becomes many orders of magnitude higher -- unless the big players go full steam ahead towards a paperclip maximizer, of course. This likely mean that voluntary coordination (if that was ever on the table) becomes impossible. And setting up a worldwide authoritarian system to impose limits would also be both distasteful and difficult.

Big-endian is better than little-endian

quiet_NaN20d41

I think that it is obvious that Middle-Endianness is a satisfactory compromise between Big and Little Endian.

More seriously, it depends on what you want to do with the number. If you want to use it in a precise calculation, such as adding it to another number, you obviously want to process the least significant digits of the inputs first (which is what bit serial processors literally do).

If I want to know if a serially transmitted number is below or above a threshold, it would make sense to transmit it MSB first (with a fixed length).

Of course, using integers to count the number of people in India seems like using the wrong tool for the job to me altogether. Even if you were an omniscient ASI, this level of precision would require you to have clear standards at what time a human counts as born and at least provide a second-accurate timestamp or something. Few people care if the population in India was divisible by 17 at any fixed point in time, which is what we would mostly use integers for.

The natural type for the number of people in India (as opposed to the number of people in your bedroom) would be a floating point number.

And the correct way to specify a floating point number is to start with the exponent, which is the most important part. You will need to parse all of the bits of the exponent either way to get an idea of the magnitude of the number (unless we start encoding the exponent as a floating point number, again.)

The next most important thing is the sign bit. Then comes the mantissa, starting with the most significant bit.

So instead of writing

The electric charge of the electron is .

What we should write is:

The electric charge of the electron is $C \times 10^{- 19} \times - 1.602176634.$

Standardizing for a shorter form (1.6e-19 C --> ??) is left as an exercise to the reader, as are questions about the benefits we get from switching to base-2 exponentials (base-e exponentials do not seem particularly handy, I kind of like using the same system of digits for both my floats and my ints) and omitting the then-redundant one in front of the dot of the mantissa.

Duct Tape security

quiet_NaN23d72

The sum of two numbers should have a precision no higher than the operand with the highest precision. For example, adding 0.1 + 0.2 should yield 0.3, not 0.30000000000000004.

I would argue that the precision should be capped at the lowest precision of the operands. In physics, if you add to lengths, 0.123m+0.123456m should be rounded to 0.246m.

Also, IEEE754 fundamentally does not contain information about the precision of a number. If you want to track that information correctly, you can use two floating point numbers and do interval arithmetic. There is even an IEEE standard for that nowadays.

Of course, this comes at a cost. While monotonic functions can be converted for interval arithmetic, the general problem of finding the extremal values of a function in some high-dimensional domain is a hard problem. Of course, if you know how the function is composed out of simpler operations, you can at least find some bounds.

Or you could do what physicists do (at least when they are taking lab courses) and track physical quantities with a value and a precision, and do uncertainty propagation. (This might not be 100% kosher in cases where you first calculate multiple intermediate quantities from the same measurement (whose error will thus not be independent) and continue to treat them as if they were. But that might just give you bigger errors.) Also, this relies on your function being sufficiently well-described in the region of interest by the partial derivatives at the central point. If you calculate the uncertainty of for $x = 0.1 \pm 1$ , $y = 0.1 \pm 1$ using the partial derivatives you will not have fun.

My experience using financial commitments to overcome akrasia

quiet_NaN24d20

In the subagent view, a financial precommitment another subagent has arranged for the sole purpose of coercing you into one course of action is a threat.

Plenty of branches of decision theory advise you to disregard threats because consistently doing so will mean that instances of you will more rarely find themselves in the position to be threatened.

Of course, one can discuss how rational these subagents are in the first place. The "stay in bed, watch netflix and eat potato chips" subagent is probably not very concerned with high level abstract planning and might have a bad discount function for future benefits and not be overall that interested in the utility he get from being principled.

My experience using financial commitments to overcome akrasia

quiet_NaN24d10

To whomever overall-downvoted this comment, I do not think that this is a troll.

Being a depressed person, I can totally see this being real. Personally, I would try to start slow with positive reinforcement. If video games are the only thing which you can get yourself to do, start there. Try to do something intellectually interesting in them. Implement a four bit adder in dwarf fortress using cat logic. Play KSP with the Principia mod. Write a mod for a game. Use math or Monte Carlo simulations to figure out the best way to accomplish something in a video game even if it will take ten times longer than just taking a non-optimal route. Some of my proudest intellectual accomplishments are in projects which have zero bearing on the real world.

(Of course, I am one to talk right now. Spending five hours playing Rimworld in a not-terrible-clever way for every hour I work on my thesis.)

LESSWRONG
LW

Posts

Wiki Contributions

Comments