Back to Blog

What is Philip's Paradox?

Philip's Paradox is a thought experiment in which an agent is found in a minimally complex, closed environment. In this environment resides a supercomputer capable of simulating/predicting the next 0.5s of events occurring within the environment using highly precise measurements of initial conditions and strong computational capabilities regarding the application of physical laws and equations. As such, the supercomputer can predict events that occur within the environment to a high degree of accuracy, even when the agent is an organism like a human, since the actions organisms undertake are governed by the laws of physics in the same way that inanimate objects are (electrical, chemical, and physical processes in the brain and body). Due to the complex nature of the brain, events involving organisms, especially humans, may be more chaotic than events involving inanimate objects, but the supercomputer is strong enough to make reliable predictions anyway. Indeed, when the supercomputer's prediction is made and kept secret, the prediction is almost always exactly accurate, and even when it isn't entirely accurate, the error is essentially never made on the macro level, i.e., a human agent does a backflip despite the supercomputer predicting that the agent will scratch its left toe.

The agent is aware of the supercomputer's extraordinary predictive capabilities and reasons that no matter what it does, the supercomputer will always be capable of predicting the corresponding action, since the agent knows that its behavior is clearly defined by physical laws that are all known to the supercomputer. However, the agent sets up a simple monitor in front of itself and connects it to the supercomputer, such that the last 0.1s of the 0.5s prediction made by the supercomputer is projected onto the monitor as soon as the prediction has been calculated (after around 0.01s). The agent is certain that this will make no difference in the predictive capabilities of the supercomputer, since, in terms of physical composition of the environment, not much has changed (just a wire connected to a few pixels) or will change when the prediction is made (pixels light up in certain ways). The agent reasons that the supercomputer includes the projection of its prediction onto the monitor in its calculations, and that the supercomputer must simply take the agent's reaction to the prediction being projected onto the monitor into account in order to make an accurate prediction.

However, when executing the process, the agent realizes that it can simply do something categorically different from that which is projected onto the monitor, thereby defying the supercomputer's supposedly accurate prediction. No matter what the supercomputer predicts, it is doomed to project its prediction onto the monitor and have its prediction be thwarted by the agent. Thus, despite minor changes in the physical composition of the environment (i.e., the addition of a monitor), the accuracy of the predictions of the supercomputer plummets in unrecoverable fashion.

If the monitor is ever disconnected from the supercomputer, such that the agent isn't made aware of the supercomputer's prediction anymore, the accuracy of the supercomputer is restored to its original high magnitude. How could unplugging a monitor, such an irrelevant difference in physical composition, make such a difference in the predictive accuracy of the supercomputer? Even more interesting, if the agent is an inanimate object or any(?) organism other than a human, having the monitor plugged in is indeed irrelevant to the supercomputer's predictive accuracy. The accuracy only drops when the agent is something or someone with the ability and the desire to thwart the supercomputer's prediction. But again, how could the desire to thwart the supercomputer be relevant to the its predictive accuracy, if desires, as well as the thoughts and ideas that stem from them, are nothing more than clearly defined physical processes in the brain?

Possible "explanation" for Philip's Paradox

This is far from a sound explanation, let alone solution, for Philip’s Paradox, but it is an interesting idea with which I aim to at least improve our understanding of the behavior in Philip’s Paradox. In particular, I seek to provide an explanation for the vast difference in predictive capability exhibited by the supercomputer, despite minor differences in initial physical conditions (displaying or not displaying the supercomputer’s prediction). The explanation is as follows:

Imagine a single supercomputer making its prediction and presenting it to the thwart-willing agent. Based on our current understanding, the prediction will be wildly inaccurate, since the agent can and will perform a categorically different action than that which the supercomputer predicts. Imagine, however, that there is a second supercomputer running in the background, which doesn’t present its prediction to the agent. This supercomputer, also being fed the full initial conditions of the environment, would be able to model the process of the agent thwarting the first supercomputer in a certain fashion and, due to not presenting its prediction to the agent, would presumably make an accurate prediction. If this second supercomputer also had to display its prediction to the observer, the observer could obviously do something different yet again, thwarting both predictions of the supercomputers. However, by extension, a third supercomputer could be running in the background, for which the same conditions hold. This makes it clear that, internally, from a logical perspective (logical in terms of the field of logic, not its common English definition), the supercomputer in Philip’s Paradox is being forced to model an infinite recursion of predictions and “thwartings”, with this infiniteness being an insurmountable obstacle in the supercomputer’s calculations.

07.04.2024Philip Suskin