University of California, Berkeley
1. Hello, this is your brain, reading about your brain, reading about your brain
Consider the following question: why are we conscious?
I get it; pondering consciousness sounds like an activity only enjoyed by nerds, people who are high, those of us who have found a moment of post-yoga stillness, or people who fit in all three categories at once. But notice that we do not tell our heart to beat or our cells to grow, we do not have to think about focusing our eyes, and we do not consciously will our bodies to inject adrenaline into our bloodstream when something scary happens. These things happen more or less automatically, and if such highly complex tasks can happen without our attention or willpower, why should other complex tasks—like choosing what to eat for breakfast—require conscious awareness? How hard is choosing which flavor of yogurt to eat? And do we really need to be conscious to determine that we should peel a banana before biting one?
To get to the question of why we are conscious, let us first cover some background of how it happens. Despite what people may think during post-yoga philosophy sessions, everything you ever thought of — including every time you have felt thirsty or experienced the feeling of being loved, every ambience you have been aware of, every piece of visual, musical, and kinesthetic thoughtstuffs you have utilized, and every ounce of hard-earned wisdom, as well as every single bit of fluff passing through your mind — has and continues to be encoded by brain cells that ‘pulse.’ Our conscious awareness is underpinned by pulsating neurons. Contrast this with how computers, a.k.a. non-conscious information processors, deal with information: via working with the state of tiny binary on-off switches and electromagnetic storage. In exceedingly rare cases—which inexplicably have historically involved the Russians—information is encoded by the states and status of a three-state ternary switch.
But the neurons in your brain don’t use any particular cell ‘state’ to encode information. Instead, your neurons keep track of information by pulsing, rapidly releasing electro-chemical discharges and then recharging. Like what your heart does, but quicker and nimbler, and on a much a smaller scale. More specifically, information in your brain is encoded in the rate (and the change in the rate) of pulsing, rather than any particular ‘charged’ or ‘discharged’ condition.
I admit that using the rate of a pulse sounds like an overly simplistic mechanism for keeping track of information that ultimately gives rise to conscious experience. If you were to describe a piece of music by the mechanism of foot-tapping, you could have one person to tap how quickly the song should go, the volume could be indicated by how quickly another person would tap, yet another could tap slowly for minor chords and quickly for major chords, and so on. Clearly, you’d need to hire a lot of interns to describe ‘Wonderwall’, and you would need an army of foot-tappers to store the information of a Beethoven symphony. But if you were forced to be clever — perhaps due to constrained resources — you could start to get interesting interactivity with only, say, four neurons. For example, you might have one neuron that only pulses when three other neurons all pulse quickly. Or you could have one neuron that only pulses when three other neurons pulse at the same rate, in agreement. Connect up several of these four-neuron units and you could set up a system of voting and tie-breaking. With such a system you could even get the neurons to fire in the same cascade when given the right input; we might call this a memory. This would be rather impressive for a handful of cells that only know how to change how fast they pulse.
It turns out that each neuron in your brain connects to, on average, about seven thousand other neurons. In total, you are thought to contain roughly eighty-six billion of these pulsing chumps, to say nothing of the connections between them or any of the other types of brain cells you possess. The number of neurons in your body is so large I feel morally, spiritually, and ethically obliged to announce the figure more than once: eighty-six billion. Each one feverishly pulsing and changing its rate over time, like club-goers over the course of an evening. No matter what you are doing, including sleeping, your neurons are always dancing on electricity, tripping the light fantastic with approximately seven thousand brethren. (Click here to hear the pulsing of a neuron translated to sound).
Within the crazed activity of our brains, so much information is being processed, transformed and manipulated by our pulsing neurons that we are conscious of only a tiny part of it. Yet any effort to understand why we are conscious is fruitless without articulating what our pulsing neurons enable in the first place.
2. You are a prediction machine
Many parts of our brains are dedicated to figuring out what will happen next. For example, while I write this sentence, some parts of my brain are laboring to figure out how parts of your brain will think about what is going to happen when you read it. Other parts of the brain attempt to predict the errors in the main prediction systems, all in an effort for accuracy. If you can predict, better than chance, where your next meal will come from, you don’t have to worry as much about that aspect of your survival.
However, some things are too complex for brains to predict on their own. It turns out you can’t predict how Mars will move across the night sky by yourself: if you observe it each night, you’ll find that Mars sometimes appears to go backward. Every so often, the red planet seems to reverse its trajectory for a little while and then it gets right back to continuing its original trek across the sky.
If you wanted, you could come up with a formal Martian prediction scheme. To do your scheme justice you’d need research, tools, measurement, theory, other people, and coffee. If you were doing things right, you would need single-origin beans for the hipsters and caffeinated turmeric tea for the hippies. The caffeine would help your researchers deal with the following fundamental difficulty: Mars’s movement contains more variation and unpredictability than what we are used to. We have evolved to learn and then intuitively predict the movement of a ball when it flies at us, but not so much when Mars does something similar. Despite our limitations, we are drawn to predicting the motion of the planets; we want to formally predict lots of things, including how people will respond to different medications, when a heart will stop, when we might find ourselves in the midst of a maple syrup shortage, or—most importantly—precisely when certain individuals (like me) will run out of crunchy peanut butter.
Though we may start off absolutely terrible at formal, explicit predictions in any given domain, we tend to improve. It has taken human beings thousands of years, but we currently have a system of civilization that includes things like predictable sources of food and water, in the form of supermarkets, restaurants, farm supply chains and pipes. To aid our prediction attempts, we have — throughout the entirety of human existence — come up with rules of thumb, mental models, and various tools of thought that make it easier to learn from experience as well as navigate the world we live in. These mental models and concepts serve to translate brain-cell pulsing to real-world phenomena and vice versa. The retrograde motion of Mars, for instance, makes perfect sense as long as the following concepts are part of your worldview: Mars is permanently farther away from the sun than the earth, planets orbit the sun due to gravitational attraction, and when we look at the night sky we see a snapshot of where Mars is in three-dimensional space. Each of these concepts builds on more basic elements, like that an orbit consists of an elliptical or roughly circular shape around a point, the very notion of space, and of physics, gravity, and so on. And each of these concepts is a ‘tool’ we use to think with.
But enough with formal, explicit predictions. Communication is an excellent example of informal, implicit predicting: language consists of words and other tools of thought we use to predict people’s mental states and encourage those people to predict ours. When these predictions are accurate enough, we say that we have successfully communicated. In other words, when we communicate, we trade recipes for predicting (at least some part of) what we are thinking to other people. Most of the time, our brains do this prediction so blazing fast it is as if we have skipped the recipe and instead exchanged actual ‘meaning.’ But the fact remains that when you say ‘chair’ you rely on me to infer what you mean, as it turns out that there are many more types of chairs than anyone can conceive of.
Both formal and informal predictions rely both on pulsing neurons as well as concepts and mental models. Having the right tools of thought is like having a bike that fits your body: different combinations of gears and lever lengths on a bike will beneficially interact with the different lengths and strengths of your limbs. The bike geometry serves to translate your effort into forward motion most effectively for the environment you find yourself in. Similarly, the right set of thought-tools can help translate the stuff you interact with into, literally speaking, a quirky configuration of pulsing neurons that fits within your other unique neuronal firing patterns. Without the germ theory of disease, for instance, people interested in advancing human health can work hard, have creative ideas, and burn the midnight oil; they just would not get too far for their efforts. Moreover, everybody understands the germ theory of disease slightly differently.
It is not clear, however, that one needs to be conscious to be equipped with the right set of concepts or thought-tools to predict what might happen in the world: the field of machine learning is dedicated to getting computers to perform non-conscious prediction. Its basic premise is that computers can form models of the world and, through a lot of trial end error, figure out how to tweak those models to get better results. At the risk of wading too far into a swamp of poorly defined terms, I submit that the model(s) and parameters that a machine learning algorithm uses for prediction are a computer’s ‘tools of thought.’ Without the right models, parameters, and weights, accurately predicting aspects of the world becomes impossible for a machine learning algorithm, similar to humans who lack the right concepts.
One glaring difference between a computer’s predictive models and ours is that we consciously experience utilizing what we know. Not only are we aware of information passing through our minds and bodies, but if you pay attention, you’ll notice that thinking about different subjects or different people corresponds to a different flavor of subjective experience, even during what we would currently call the same emotional state. Yet computers capable of machine learning pose a challenge, which brings us back to our opening question: if reasonably accurate predictions can be made without conscious awareness, why in Darwin’s name are we conscious at all? What advantages does conscious awareness confer?
3. Consciousness is the quickest route to context
To understand what advantages consciousness confers for predicting the world, let us further consider the case of unconscious computer prediction. Imagine you had a digital camera strapped to your bike helmet. You could, if you wanted, instruct a computer to predict what kinds of colors it would see next during your commute. If you were to use modern machine learning algorithms, you would avoid giving the computer ‘smart rules’ regarding how to go about making the predictions. ‘Smart rules’ are the kinds of things we humans can observe and articulate when we think we are being clever, like ‘yellow is a rare color in many urban areas, so don’t expect it’ or ‘mauve and teal are only likely to appear during fashion week in New York, or whenever Christo rolls through town.’
The reason you would avoid verbalized smart rules is because they often fail to cover edge cases, not to mention that they are rarely precise enough to be encapsulated by computer code. By the time you come up with all the exceptions, and exceptions to the exceptions, and so on, French verb conjugations would feel like a walk in the park. By the time you pin down precise instructions regarding each possibility — like what constitutes a patch of color in a computer image, or regarding how you might account for the fact that your eyes see color rather differently than computers (your eyes and brain automatically color-correct for changes in the ambient light), and so on—your code would be much longer than this nearly interminable sentence.
Instead of smart rules, modern machine-learning algorithms take another approach entirely. If we apply the modern machine learning approach to the problem of bike-route color prediction, part of the instructions — or code — you’d develop would define how the computer would know if its predictions were improving. I admit that ‘minimize the error associated with each prediction’ isn’t exactly a James-Bond-worthy mission to charge your computer with, but as long as computers cannot drink martinis, we should be okay with giving them boring things to do. And once you define how the computer should measure its own error, you can leave it to work things out on its own, leaving you free to drink as many martinis as you see fit.
Prediction-error has turned out to be both very difficult and a very fruitful thing to attempt to minimize. One of the biggest single advances in unconscious computer prediction happened about thirty years ago, when David Rumelhart, George Hinton, and Ronald Williams all worked out how to convince a computer to minimize its own prognostic errors. Their breakthrough was in figuring out how to get the computer to fairly distribute responsibility for the prediction errors across all of the parts of the prediction process. It sounds simple in retrospect: instead of coming up with smart rules, you can have the computer come up with the rules itself, as long as it knows how to improve and who to blame. Since then, almost all progress in neural networks, a type machine learning, has consisted of developing fancier ways of utilizing the technique they discovered — one reason for the boom in AI over the past decade is that computer hardware has finally gotten cheap enough that plenty of it can be used for the computationally intensive process Rumelhart et al. derived.
But what would you do if you were not sure of the problem in advance? What if the only thing you knew was that your information processor would come across completely novel problems? Well, you would probably take the best general prediction-error minimizer you could conjure, and hook it up to something that could define its own problems. This way, no matter what problem your information processor defines for itself, it can learn. Roughly speaking, this describes how (parts of) your mind work. When you learn to throw a baseball, you don’t think about the path of your arm in terms of coordinate positions, or in terms of numerically specified velocity. Instead, you attempt to throw a ball, observe the result, and try again. The specific learning takes place on its own. This general idea applies to other parts of your brain, too: you consciously choose which problems to solve, and then use a combination of conscious and unconscious processing to solve them.
Choosing what problems are worth pursuing in the first place is much harder than doing the ‘learn by error minimization’ task, especially if you use the same non-conscious paradigm of being blind to the qualities of information you work with. In a computer, the processor is robbed of context — it ‘knows’ nothing about what the rest of the computer is doing, only that certain operations should be applied to bits. More than that, the main processing chip in your laptop does the same sorts of things regardless of whether you are watching a video of a hydraulic press squish a toy or whether you are working on financial models via a spreadsheet. If you made a computer conscious and kept everything else the same, their internal experience of the information they process would feel uniform. Or to flip this metaphor around, if we were mostly unconscious, understanding links on Reddit would feel exactly the same as processing the information from the wind on our face. It is this uniformity of information makes it nearly impossible to define problems worth solving, because all problems worth solving—not to mention almost all of the tasks we do on a daily basis—have lots of aspects to consider.
Why? If a problem has a lot of aspects to consider, it is technically ‘multi-dimensional.’ Having only one experience of data, no matter what it is you are processing or computing, does not work for trying to quickly understand and solve multi-dimensional problems. The problems we solve on a daily basis are so multifaceted, in fact, that we use our attention to ignore what we deem irrelevant: there is no way to efficiently process everything. But by consciously experiencing information in fundamentally different ways (emotion, music, sound, touch) we gain access to irreducibly different types of data.
Physiological thirst, for example, corresponds to your body detecting the water content (and osmotic pressure) of select cells. When enough cells are low on water, we experience this information as the visceral feeling of thirst and mild dehydration. Imagine if the information content that corresponds with being thirsty and everything else were to use the same mechanism and present itself as push notifications to your smart phone. It would be much more difficult to distinguish vital signals from ones you could safely ignore. We would likely all develop elaborate systems to help filter the signals we were given. Instead, it is much more efficient to have the feeling of thirst manifest viscerally. By being able to perceive the world in multiple, irreducible ways, we are able to use a larger set of tools to quickly perceive, reflect, decide, and learn —conscious processing of information is more efficient.
To see how, consider one of the more esoteric data visualization techniques known as Chernoff faces. Generating a Chernoff face involves mapping multi-dimensional data to different parameters which govern the face’s appearance — imagine that, when drawing a face, the width of the eyebrows, the diameter of the pupils, the size of the mouth are all driven by the values associated with select columns within a single row of data. With training, a data analyst could read Chernoff faces (or Chernoff chairs, tables, rooms, and so on) just as easily as we do scatterplots. But the effectiveness of a Chernoff object is only possible with conscious experience – to a computer, there is only difference between a set of high-dimensional Chernoff faces and the same data represented as an 1,000-column spreadsheet would be a set of labels or memory pointers; to us the difference is much more profound than a few bits of information. Our conscious experience allows us to perceive (and quickly learn about) the world as a collection of Chernoff objects. A computer, well, not so much.
In other words, consciousness is the most efficient route to context. It is within the context of how our body is feeling and what we have in the fridge—not to mention how we feel about the things in that fridge—that we make the decision about what to have for breakfast. Call it the argument from multi-dimensionality: it is much easier to make decisions like these by perceiving such contextual pieces of information as different types of sensations rather than have them all be reduced to the same kind of information and processed by a ‘blind’ mechanism.
There are four implications that stem from this current consideration of consciousness that are worth mentioning.
First. The argument from multidimensionality implies that synesthesia should be relatively rare; and that multi-domain synesthesia (where the experience of information from one sense informs the experiencing of multiple, other senses) should be practically non-existent both across and within species.
Second. After inspecting his own thought process and first-person experience, Descartes famously concluded ‘I think therefore I am.’ While Descartes’ introspective analysis was hugely influential on the course of Western thought, it clearly could have been better. Had Descartes been more skilled at (or perhaps more aware of Buddhistic approaches to) introspective exploration, he might have alighted on the fact his conscious awareness was more ‘upstream’ than his senses or rational thought-processes. He could have (rightly) concluded that everything he perceived or thought about is ‘downstream’ of a specific non-verbal sense of self, had he ‘looked’ in the right way. Alas, we can only wonder at what the course of Western thought might have been had Descartes been more skilled at the difficult task of introspecting his own mental processes.
Third. The argument from multidimensionality helps solve one of the old zombie problems (in particular, the kind that can’t be solved via video games and practice). Many have conceived of a P-zombie, a realistic and convincing humanoid that can navigate the world, but is entirely lacking in conscious experience. Given that such a thing would have to process information to survive, it would presumably rely on (lots of) self-directed machine learning. However, this means that its ‘brain’ or set of information processors would be massive. Not only would a self-directed machine learning automaton require lots of hardware (potentially on the order of rooms of servers), it would also require lots of training data — some of the most cutting edge neural networks (i.e. Generative or Conditional Adversarial Networks) require on the order 100,000 instances of training images to work effectively. Not only would our automaton need considerable bandwidth for receiving such information, it is beyond anyone’s ken how our automaton would go about gathering enough training data merely as a by-product of just trying to get through the day. One of the main reasons for the rapid progress of machine learning and neural networks is because the cost of the necessary hardware has become cheaper, which has enabled the use of more hardware and training data, not less. In other words, self-contained P-zombies are impossible as long as our artificial approaches to multidimensional information processing relies on the relatively inefficient mechanism of unidimensional, non-conscious experience.
Fourth. When I said that the entirety of human existence could be described as a progression towards better predictions and better tools for predicting stuff, I was not kidding. As I alluded to above, evidence for a long-term trend of striving for more predictability can be found in several places, particularly in the scientific domains.
Scientific progress has generally concerned itself with explaining and predicting what would otherwise be random data points and facts that don’t quite make sense on their own. Want to know what happens when you split an atom? Is there a formal way to figure out what happens if you take one thing and mix it with another? What if you took three pounds of Taylor Swift CDs and used them to reflect light onto a small sphere? How much useful energy could you create? Would there be any meaningful difference between using her early and her late recordings? Would Mozart symphonies be any better?
As covered above, predicting these things all require domain-specific concepts, mental models and formalized rules of thumb that have been developed as a part of the scientific method. In certain cases, the scientific progress towards attempting to make the world more predictable has not been in terms of explaining data per se, but in these cases the progress has been in terms of how to go about the process of making the predictions more accurate. In the fourteenth century, for instance, a monk named Roger Bacon articulated the abstract concept of experimentation. In what is one of the more important breakthroughs of all time, he helped make the rate of true-knowledge acquisition (i.e. scientific progress) much more predictable. My point is that the history of progress can be seen as a long-term trend of striving towards more predictability. While some aspects of the world are more predictable now, the large increase in the planet’s population means that the world is not always becoming uniformly more predictable. Human beings are the most complex and hard-to-predict things we know of, and more than seven billion of them makes for an interesting world. However, an orientation towards more predictability is, on some level, a constant in every person’s life, in every society, in every country, and every city on the planet.
If you think about it, this is not just a human trend: the history of evolution consists of life forms evolving to better predict the environment they find themselves in. In terms of predicting where food will be next, we are a hell of a lot better at it than bacteria are. What is curious is that the laws of physics tell us that it always takes energy to move and do any sort of work, including getting food. This means that any living thing must attempt to conserve energy, and one great way to do so is to not move to the right when the deer (or complex sugars) are to the left. The argument from multi-dimensionality implies that, over the long term, any spark of life will eventually evolve to include life forms that are on some level aware of the data they process. Given time, the mechanism of random mutation, in the context of a world with multi-dimensional problems and where training data is not woefully abundant, the impulse towards living and reproducing will change from a bacterium to a creature that is conscious of some of the information passing through it. All that it takes is (a) a system of non-homogenous self-replicating units, (b) said units to have the ability to utilize information to alter their behavior, (c) issues of survival to be able to be described in terms of high information dimensionality, and (d) lots of time.
Whether it is on Mars or on a planet way beyond Alpha Centauri, once there is life, you can extrapolate that at some point in the future the creatures that evolve from it will have conscious experiences. Though the conditions for life to flourish are exceedingly rare, the impulse towards increasing in complexity resulting in conscious awareness is not rare at all.
The title for this piece comes from a phrase I heard when noted neuroscientist and consciousness chaser Christof Koch spoke at Berkeley a few years ago. When he spoke about the concept then, I understood it on some level, but was unable to articulate it as I understand it now: just as the conditions for plasma or ice are baked into the physical laws that govern our world, so too are the conditions for consciousness. In the most non-mystical way possible, conscious experience is an inevitable product of the universe’s structure. This reasoning begs the question: what else might be inevitable? It turns out that not only do rats laugh when tickled, they respond to ambiguous stimuli more optimistically immediately afterwards. If consciousness is inevitable, perhaps we will find that laughter is too. The possibilities are incredibly intriguing. But it is your turn now; I will leave it up to you to wonder on your own.
Bacon, Roger. “Opus Majus, trans.” Robert Belle Burke (1928).
Bridges, John Henry. “The Life & Work of Roger Bacon: An Introduction to the Opus Majus.” (1914).
Clark, Andy. “Whatever next? Predictive brains, situated agents, and the future of cognitive science.” Behavioral and Brain Sciences 36.03 (2013): 181-204.
Descartes, Rene. “Discourse on the method of rightly conducting the reason, and seeking truth in the sciences.” (1850). Project Gutenberg. Retrieved from http://www.gutenberg.org/files/59/59-h/59-h.htm.
Hirsh, Jacob B., Raymond A. Mar, and Jordan B. Peterson. “Psychological entropy: a framework for understanding uncertainty-related anxiety.” Psychological review 119.2 (2012): 304.
Ishiyama, S., and M. Brecht. “Neural correlates of ticklishness in the rat somatosensory cortex.” Science 354.6313 (2016): 757-760.
Isola, Phillip, et al. “Image-to-image translation with conditional adversarial networks.” arXiv preprint arXiv:1611.07004 (2016).
Koch, Christof. California Cognitive Science Conference, U.C. Berkeley (2012)
Koch, Christof, et al. “Neural correlates of consciousness: progress and problems.” Nature Reviews Neuroscience 17.5 (2016): 307–321.
McKinley, Michael J., and Alan Kim Johnson. “The physiological regulation of thirst and fluid intake.” Physiology 19.1 (2004): 1–6.
Niiniluoto, Ilkka, “Scientific Progress”, The Stanford Encyclopedia of Philosophy (Summer 2015 Edition), Edward N. Zalta (ed.)
Rumelhart, D. E. “David E. Rumelhart, Geoffrey E. Hinton, and Ronald J. Williams.” Nature 323 (1986): 533-536.
Schwartenbeck, Philipp, Thomas HB FitzGerald, and Ray Dolan. “Neural signals encoding shifts in beliefs.” NeuroImage 125 (2016): 578-586.
Van Gulick, Robert, “Consciousness”, The Stanford Encyclopedia of Philosophy (Summer 2017 Edition), Edward N. Zalta (ed.), forthcoming URL = <https://plato.stanford.edu/archives/sum2017/entries/consciousness/>.
Xu Yang, Terry Regier, and Barbara C. Malt. “Historical semantic chaining and efficient communication: The case of container names.” Cognitive science (2015).