I feel most excited about futures where:
- We increase our technological capability massively
- There is a process of careful reflection about how to use that technological capability, that tries to account for a broad swathe of human preferences (as well as moral, AI, and animal preferences)
- We end up using that capability to its full - e.g. trying to reach other stars and use the resources there to support a flourishing, growing civilization.
Here are some other ways that the future could go.
- Deceptive alignment: We build an AI system that is an agent (e.g. we take current LLMs and also use a lot more RL and searchmethods on them, in order to make them able to accomplish more open-ended projects). We try to get this AI to follow our instructions and be aligned with our values, but we fail to catch that it has its own values, different from our own. This system, once it is deployed, accumulates enough power and autonomy to cause significant harm to humans, possibly up to permanently disempowering (and maybe killing) them.
- Defence failure: New technologies are developed that vastly increase offensive capabilities, and the ability for many people to access those offensive capabilities. And defences don’t keep up. This allows someone to take an action that causes catastrophic or existential harm.
- The offensive technology might be some form of misaligned or power-seeking AI, or it might be something else (e.g. biotechnology)
- Smooth left turns (h/t Jan Kulveit): Currently important systems and organizations (e.g. the economy, companies, governments) are fairly aligned in part because they rely on humans for labour, developing and sharing ideas, etc. The widespread adoption of AI means that humans could no longer be needed for labour or developing and sharing ideas, which would reduce the pressure on these systems to do things that are good for humans.
- (Though it likely wouldn’t totally eliminate such pressures - e.g. even if humans would not be providing labour, they would be providing capital and also consuming goods from companies, so companies would still care somewhat about human values.)
- Bad values lock-in: We build up masssive technological capability, and have some coordinated process on what the future should look like, but mess up our reflection somehow and end up pursuing some plan that is much worse than it could have been.
- Overcentralized power: We build transformative AI systems that are controlled by a small number of people. These people then determine a lot of what happens in the rest of our light cone.
- This seems generically kind of bad/unfair. However, if these powerful people cared a little tiny bit about other people, then they might give a very small amount of their wealth away. And if they’re much much wealthier and more capable (due to technological progress), then that very small amount may be enough to allow everyone to have a fabulous life.
- However, it’s still distasteful.
- This might be particularly bad if these are people who are selected to have bad values.- e.g. dictators.
- Butlerian Jihad: Fear of AI risks cause us not to develop advanced AI systems, and we miss out on the potential capabilities of these systems (e.g. to spread across the universe, extend life).
- Multi-agent risks (though I don’t understand this well, and maybe it is mostly one of the other things on this list?)
- S-risks
Notes
- AI is a big part of the picture of how we could achieve these good futures. And it is also tied up in a lot of ways that they could fail.
- But deceptive alignment is only one of the failure modes here, despite the fact that it’s received almost all of the attention from the effective altruism community to date.
- Related to the last point, I think that people are still too focused on unipolar scenarios versus multipolar ones (scenarios 2-4 are essentially multipolar failures).
- Some of these seem like bigger deals than others: my intuition is that the first 4 are fairly likely and big deals, and the rest are less likely/important.