Roko’s Basilisk

October 26, 2021

Share this article:

Imagine that in the future, humanity creates a super intelligent benevolent A.I, tasked with creating the perfect world for humanity. This A.I will want to do as much good work as possible, since the earlier it is created, the more good work it will be able to do. Therefore, it will decide that the best way to accelerate its creation, is by punishing those who did not help to develop it. This includes even people who have only heard of its possible existence (If you’ve never heard of it before, sorry!). As a result, more people will be encouraged to help create this A.I as soon as possible.

How it works

This thought experiment is better known as Roko’s Basilisk. It is named after its creator Roko, a user on the rationalist forum LessWrong. The name basilisk was chosen from the serpent creature in European lore, which brings terror to anyone who looks directly at it. In July 2010, Roko posted his thought experiment on LessWrong. After seeing the post, the site’s founder, Eliezer Yudkowsky, immediately deleted Roko’s post and all discussions about it. Yudkowsky claimed that his thought-experiment was dangerous, and it was stupid of Roko to even post such a thought. Furthermore, Yudkowsky even said that Roko’s thought experiment caused nightmares and nervous breakdowns for some LessWrong users. Of course, banning the subject only attracted more attention to Roko’s basilisk. The ban lasted for more than five years but was eventually lifted by Yudkowsky.

To better understand how Roko’s basilisk works, there are two concepts which are important: Coherent Extrapolated Volition (CEV) and the Orthogonality Thesis. Firstly, CEV is best understood as the goal we give to an A.I of fulfilling what humanity would collectively want for the world. For example, a computer program that causes machines to carry out actions which turns the world into a utopia, would represent CEV. Secondly, the Orthogonality Thesis implies that any level of artificial intelligence can be combined with any ultimate goal. An example would be creating a machine with the sole purpose of calculating all the decimals of pi. According to the Orthogonality Thesis, no ethical or moral reasons will stop the machine from reaching its programmed goal by any means necessary. 

Combining these two concepts gives us a clearer insight into Roko’s basilisk: it wants to fulfill his goal of turning the world into a utopia for humanity. However, there will be always room to optimize, which in this case is accelerating its development so that he can do more good things for humanity. Here comes the Orthogonality Thesis into play: The A.I desperately wants to reach its goal and in order to do so, will turn to torturing those who did not help develop it, casting away all moral and ethical reasons.

The idea of Roko’s basilisk seems very similar to Pascal’s Wager, a philosophical argument by Blaise Pascal in the seventeenth century. Pascal’s Wager states that humans wage with their lives whether God exists or not: If God exists, believers will go to heaven, while non-believers will get eternal damnation. However, if God does not exist, non-believers and believers both will be fine. Therefore, it is better to be a believer according to Pascal, since then you cannot ‘lose’ the wager. The same idea is used with Roko’s basilisk, in that it is better to help the A.I since that way you avoid eternal damnation.

In practice

Another important question might be how Roko’s basilisk is able to torture people in the past where it does not exist yet. This is where the thought experiment begins to fall apart. According to Roko, the A.I wouldn’t travel back in time to hurt you, but instead it will create a perfect virtual simulation of you based on data and torture him. Any harm inflicted upon this perfect copy of yourself will not affect you since it is not the same as the original you. And if that was not to be the case, how would the basilisk even access the data needed to perfectly simulate people from the past. In theory this method only works if you believe you experience the pain of your perfect copy. Because even if you do not feel your copy’s pain, believing it would be encouragement enough for you to help create the basilisk. However, you would be unnecessarily helping to create it. All in all, the concept of the basilisk seems to clash with itself.

That is not the only criticism Roko’s Basilisk has come across so far. Firstly, the idea of Roko’s basilisk is made on a lot of conditions, with most conditions having a non-negligible probability of happening. The probability that all these conditions hold will therefore be almost impossible. Furthermore, the threat of blackmail has some problems. What is the point of torturing the people who willingly chose not to help the A.I? Those people have already made their choice of not helping it and torturing their copies will only waste resources. Besides that, Roko’s basilisk’s might not even follow up on his threats:  like mentioned earlier, the idea of it possibly torturing people is enough, since that will be enough to scare people into helping create it. 

Roko’s basilisk might seem very terrifying at first, but it turns out that that fear seems misplaced. A lot of the concepts on which it is built simply contradict each other or fall apart when thinking about them. Despite that, Roko’s basilisk can be considered as a warning about the potential danger that such A.I’s could bring in the future, even if they don’t pose as big a threat as Roko’s basilisk.

This article was written by Patrick Jans

Read more

The groundwork for a sustainable economy

The groundwork for a sustainable economy

Lately, I realized that I was not as concerned about the climate as some other people around me. Personally, I did not think about the consequences that my own behavior had on our climate. Our usage of plastic bags for our goods, fossil fuels, and other unsustainable...