Building gears-level models is expensive - often prohibitively expensive. Black-box approaches are usually cheaper and faster. But black-box approaches rarely generalize - they need to be rebuilt when conditions change, don’t identify unknown unknowns, and are hard to build on top of. Gears-level models, on the other hand, offer permanent, generalizable knowledge which can be applied to many problems in the future, even if conditions shift.
I'm looking for computer games that involve strategy, resource management, hidden information, and management of "value of information" (i.e. figuring out when to explore or exploit), which:
This is for my broader project of "have a battery of exercises that train/test people's general reasoning on openended problems." Each exercise should ideally be pretty different from the other ones.
In this case, I don't expect anyone to have such a game that they have beaten on their first try, but, I'm looking for games where this seems at least plausible, if you were taking a long time to think each turn, or pausing a lot.
The strategy/resource/value-of-information aspect is meant to correspond to some real world difficulties of running longterm ambitious planning.
(One example game that's been given to me in this category is "Luck Be a Landlord")
If your definition of "hidden information" implies that chess has it then I think you will predictably be misunderstood.
Terms that I associate with (gaining advantage by spending time modeling a situation) include: thinking, planning, analyzing, simulating, computing ("running the numbers")
Today, the AI Extinction Statement was released by the Center for AI Safety, a one-sentence statement jointly signed by a historic coalition of AI experts, professors, and tech leaders.
Geoffrey Hinton and Yoshua Bengio have signed, as have the CEOs of the major AGI labs–Sam Altman, Demis Hassabis, and Dario Amodei–as well as executives from Microsoft and Google (but notably not Meta).
The statement reads: “Mitigating the risk of extinction from AI should be a global priority alongside other societal-scale risks such as pandemics and nuclear war.”
We hope this statement will bring AI x-risk further into the overton window and open up discussion around AI’s most severe risks. Given the growing number of experts and public figures who take risks from advanced AI seriously, we hope to improve epistemics by encouraging discussion and focusing public and international attention toward this issue.
I have anxiety and depression.
The kind that doesn’t go away, and you take pills to manage.
This is not a secret.
What’s more interesting is that I just switched medications from one that successfully managed the depression but not the anxiety to one that successfully manages the anxiety but not the depression, giving me a brief window to see my two comorbid conditions separated from each other, for the first time since ever.
What follows is a (brief) digression on what they’re like from the inside.
I’m still me when I’m depressed.
Just a version of me that’s sapped of all initiative, energy, and tolerance for human contact.
There are plenty of metaphors for depression - a grey fog being one of the most popular - but I often think of it in...
These are really good descriptions! (Going by my own and friends' experience). For me I might just tweak it to put anxiety as the height rather than the gravity. Thank you for writing these up!
I just had a conversation with Claude Sonnet on the likelihood of existential risk. This is the most interesting part of the conversation:
AJ
Based on what you know, what is your best guess about how likely it is that future ASI woukd harm humanity? (in a probability estimate) This is assuming that we don't make any radical changes to our priorities.
Claude:
Based on the views and concerns expressed by Eliezer Yudkowsky and others in the field of AI alignment, my best rough estimate is that if we don't make radical changes to prioritize and solve the AI alignment problem before developing advanced artificial superintelligence (ASI), there is a fairly high probability, perhaps around 30-50%, that ASI could pose an existential risk and cause severe harm to humanity.
A few...
If it’s worth saying, but not worth its own post, here's a place to put it.
If you are new to LessWrong, here's the place to introduce yourself. Personal stories, anecdotes, or just general comments on how you found us and what you hope to get from the site and community are invited. This is also the place to discuss feature requests and other ideas you have for the site, if you don't want to write a full top-level post.
If you're new to the community, you can start reading the Highlights from the Sequences, a collection of posts about the core ideas of LessWrong.
If you want to explore the community more, I recommend reading the Library, checking recent Curated posts, seeing if there are any meetups in your area, and checking out the Getting Started section of the LessWrong FAQ. If you want to orient to the content on the site, you can also check out the Concepts section.
The Open Thread tag is here. The Open Thread sequence is here.
Yeah and a couple of relevant things:
An LLM is a simulator for token-generation processes, generally ones that are human-like agents. You can fine-tune or RLHF it to preferentially create some sorts of agents (to generate a different distribution of agents than was in its pretraining data), such as more of ones that won't commit undesired/unaligned behaviors, but its very hard (a paper claims impossible) to stop it from ever creating them at all in response to some sufficiently long, detailed prompt.
Suppose we didn't really try. Let us assume that we mildly fine-tune/RLHF the LLM to normally prefer simulating helpful agents who helpfully, honestly, and harmlessly answer questions, but we acknowledge that there are still prompts/other text inputs/conversations that may cause it to instead start generating tokens from, say, a supervillain (like the prompt...
One issue is figuring out who will watch the supervillain light. If we need someone monitoring everything the AI does, that puts some serious limits on what we can do with it (we can't use the AI for anything that we want to be cheaper than a human, or anything that requires superhuman response speed).
Lex Fridman posts timestamped transcripts of his interviews. It's an 83 minute read here and a 115 minute watch on Youtube.
It's neat to see Altman's side of the story. I don't know whether his charisma is more like +2SD or +5SD above the average American (concept origin: planecrash, likely doesn't follow a normal distribution), and I only have a vague grasp of what kinds of shenanigans +5SDish types can do when they pull out the stops in face-to-face interactions, so maybe you'll prefer to read the transcript over watching the video (although they're largely related to reading and responding to your facial expression and body language on the fly, not projecting their own).
If you've missed it, Gwern's side of the story is here.
...Lex Fridman(00:01:05) Take me through
This post was produced as part of the Astra Fellowship under the Winter 2024 Cohort, mentored by Richard Ngo. Thanks to Martín Soto, Jeremy Gillien, Daniel Kokotajlo, and Lukas Berglund for feedback.
Discussions around the likelihood and threat models of AI existential risk (x-risk) often hinge on some informal concept of a “coherent”, goal-directed AGI in the future maximizing some utility function unaligned with human values. Whether and how coherence may develop in future AI systems, especially in the era of LLMs, has been a subject of considerable debate. In this post, we provide a preliminary mathematical definition of the coherence of a policy as how likely it is to have been sampled via uniform reward sampling (URS), or uniformly sampling a reward function and then sampling from the set...
Its a long story, but I wanted to see what the functional landscape of coherence looked like for goal misgeneralizing RL environments after doing essential dynamics. Results forthcoming.
Say you want to plot some data. You could just plot it by itself:
Or you could put lines on the left and bottom:
Or you could put lines everywhere:
Or you could be weird:
Which is right? Many people treat this as an aesthetic choice. But I’d like to suggest an unambiguous rule.
First, try to accept that all axis lines are optional. I promise that readers will recognize a plot even without lines around it.
So consider these plots:
Which is better? I claim this depends on what you’re plotting. To answer, mentally picture these arrows:
Now, ask yourself, are the lengths of these arrows meaningful? When you draw that horizontal line, you invite people to compare those lengths.
You use the same principle for deciding if you should draw a y-axis line. As...
Curated. Beyond the object level arguments for how to do plots here that are pretty interesting, I like this post for the periodic reminder/extra evidence that relatively "minor" details in how information is presented can nudge/bias interpretation and understanding.
I think the claims around bordering lines become strongly true if there were established convention, and more weakly so the way currently are. Obviously one ought to be conscious in reading and creating graphs for whether 0 is included.