Logarithms Unmuddled

February 21st, 2020

I often write about topics that I myself have struggled to understand. If I’ve struggled, I assume that many others have struggled as well. Over the years, I’ve found several mathematical concepts confusing, not because I’m mathematically disinclined or disinterested, but because my formal training in mathematics was rather limited and, in some cases, poorly taught. My formal training consisted solely of basic arithmetic in elementary school, basic algebra in middle school, basic geometry in high school, and an introductory statistics course in undergraduate school. When I was in school, I didn’t recognize the value of mathematics—at least not for my life. Later, once I became a data professional, a career that I stumbled into without much planning or preparation, I learned mathematical concepts on my own and on the run whenever the need arose. That wasn’t always easy, and it occasionally led to confusion. Like many mathematical topics, logarithms can be confusing, and they’re rarely explained in clear and accessible terms. How logarithms relate to logarithmic scales and logarithmic growth isn’t at all obvious. In this article, I’ll do my best to cut through the confusion.

Until recently, my understanding (and misunderstanding) of logarithms stemmed from limited encounters with the concept in my work. As a data professional who specialized in data visualization, my knowledge of logarithms consisted primarily of three facts:

  1. Along logarithmic scales, each labeled value that typically appears along the scale is a consistent multiple of the previous value (e.g., multiples of 10 resulting in a scale such as 1, 10, 100, 1,000, 10,000, etc.).
  2. Logarithmic scales make it easy to compare rates of change in line graphs because equal slopes represent equal rates of change.
  3. Logarithmic growth exhibits a pattern that goes up by a constantly decreasing amount.

If you, like me, became involved in data sensemaking (a.k.a., data analysis, business intelligence, analytics, data science, so-called Big Data, etc.) with a meagre foundation in mathematics, your understanding of logarithms might be similar to mine—similarly limited and confused. For example, if you think that the sequence of values 1, 10, 100, 1,000, 10,000, and so on is a sequence of logarithms, you’re mistaken, and should definitely read on.

Before reading on, however, I invite you to take a few minutes to write a definition for each of the following concepts:

  • Logarithm
  • Logarithmic scale
  • Logarithmic growth

In addition to definitions, take some time to describe how these concepts relate to one another. For example, how does a logarithmic scale relate to logarithmic growth? Give it a shot now before reading any further.

Regardless of how much you struggle to define these concepts and their relationships to one another, it’s useful to prime your brain for the topic. Now that you have, let’s dive in.

Logarithms

The logarithm (a.k.a., log) of a number is the power that the log’s base must be raised to equal that number. I realize this definition might not seem clear but hang in here with me. I promise that greater clarity will emerge. Logarithms always have a base (i.e., a number on which it is based). The most common base is 10, expressed as log10, but any number may serve as the base. To determine the log10 value of the number 100, we must determine the power of 10 that equals 100. What this means will become clear in a moment through an example, but before getting to that, let’s review what raising the power of a number means in mathematics.

Raising a number to a power involves multiplying the number by itself a specific number of times. The power indicates how many instances of the number are multiplied. For example, 10 to the power of 3, written as 103 (the 3 in this case is called the exponent), involves multiplying 10 * 10 * 10, which equals 1,000. Raising a number to the power of 1 involves only one instance of that number—there is nothing to multiply—so the number remains unchanged. For example, 101 remains 10. Raising a number to the power of 2 involves multiplying two instances of that number, so 102 is 10 * 10, which equals 100. In these examples so far, the only time multiplication wasn’t involved was with the power of 1. Multiplication is also not involved whenever the exponent is zero or negative. In those cases, raising the power of a number involves division. For example, with the power of 0, rather than multiplying instances of the number by itself, we divide the number by itself, so 100 is equal to 1, for 10 / 10 = 1. Here’s a list of values that result from raising the number 10 to the powers of 0 through 6:

Now that we’ve reviewed what it means to raise a number to a particular power, we can get back to logarithms. Remember that the log of a number is the power that the log’s base must be raised to equal that number. So, to find the log2 value of the number 8, we must determine the power of 2 (the log’s base) that is equal to 8. In other words, we must determine how many times 2 must be multiplied by itself to equal 8. Since 21 = 2 and 22 = 4 (i.e., 2 * 2 = 4) and 23 = 8 (i.e., 2 * 2 * 2 = 8), we know that log2 of 8 is 3. Given this procedure, what is the log10 value of 100? It is 2, for 10 must be raised to the power of 2 (i.e., 10 * 10) to equal 100. What’s the log2 of 64? It is 6, for 2 must be raised to the power of 6 (i.e., 2 * 2 * 2 * 2 * 2 * 2) to equal 64.

So far, we’ve only dealt with logs that result in nice, round numbers, but that isn’t always the case. For example, what is the log2 of 100? The log2 of 64 is 6 and the log2 of 128 is 7, so the log2 of 100 is somewhere between 6 and 7. When expressed to eight decimal places, the log2 of 100 is precisely 6.64385619. What is the log10 of 5? It must be less than 1, because 5 is less than 10. The precise answer is 0.698970004.

Have you ever examined a list of the logarithms associated with an incremental sequence of numbers? Doing this is revealing. Here’s a list of the log2 values for the numbers 1 through 32, with an additional column that shows the proportional relationship between log2 values and the numbers on which they’re based:

Notice that, other than the log2 value of the number 3 (i.e., 1.584963, which is 52.832% of 3), as we read down the list, each log is a decreasing percentage of the number on which it is based. Keep this fact in mind. It will come in handy as we examine logarithmic scales and logarithmic growth.

Logarithmic Scales

A logarithmic scale (a.k.a., log scale) is one in which equal distances along the scale correspond to equal logarithmic distances. Because of the nature of logarithms, each number that typically appears along the scale is a consistent multiple of the previous number. The example below includes a log10 scale along the Y axis.

Along a log10 scale, because the base is 10, each number is 10 times the previous number. The example above begins at 1, but it could begin at any number. For example, a log10 scale could begin at 40 and continue with 400, 4,000, 40,000, and so on, each ten times the previous. A log2 scale that begins with 1 would continue with 2, 4, 8, 16, and so on, each two times the previous. Unlike a linear scale in which the intervals from one number to the next are always equal in value, such as 0, 10, 20, 30, 40, etc., along a log scale the intervals (i.e., the quantitative distances between the labeled values) consistently increase in value, each time multiplied by the base.

The numbers 1, 10, 100, 1,000, 10,000, 100,000, and 1,000,000 in the graph above correspond to logarithms with a base of 10, but those numbers are not themselves logarithms. Instead, they are the numbers from which the logarithms were derived. Here’s the scale that appears along the Y axis of the graph above, this time with the actual log10 values 0 through 6 labeled in addition to the numbers 1 through 1,000,000 from which those logarithms were derived.

We usually label the log scales with the numbers from which the logarithms were derived rather than the logarithms themselves because the former are typically more familiar and useful.

Another characteristic of a log scale that reinforces its nature bears mentioning, which I’ll illustrate below by featuring a single interval only along the Y axis of the graph shown previously.

Notice that the minor tick marks between 1,000 and 10,000 get closer and closer together from bottom to top. This is easier to see if the scale is enlarged and the minor tick marks are labeled, as I’ve done below.

Each interval from one tick mark to the next (1,000 to 2,000, 2,000 to 3,000, etc.) consistently covers a numeric range of 1,000, but the spaces between the marks get smaller and smaller because the differences in the logarithms corresponding to those numbers get smaller and smaller. To illustrate this, I included a column of the log10 values that correspond to each tick mark in the example above. The decreasing distances between the tick marks correspond precisely to the decreasing differences between the log values.

Logarithmic Growth

Because the numbers that typically appear as labels on a log scale are each a consistent multiple of the previous number, if you didn’t already understand logarithms, you might assume that logarithmic growth involves a series of values that are each a consistent multiple of the previous value. Here’s an example of how that might look as a series of values:

In this example, each daily value doubles the previous value. This, however, is not an example of logarithmic growth. It is instead an example of exponential growth (a.k.a., compound growth). With exponential growth, the amount of increase from one value to the next is always greater. Compound interest earned on money in a savings account is an example of exponential growth. As the balance grows, even though the rate of interest remains constant, the amount of growth in dollars consistently increases because of the growing balance. For example, 10% interest on $100 (i.e., $10) would increase the balance to $110 during the first period, and then during the next period, it would be based on $110 resulting in $11 of interest, a dollar more. Even though the interest rate remains constant, because the balance grows from one period to the next, the amount of increase grows as well.

Contrary to exponential growth, logarithmic growth (a.k.a., log growth) exhibits a constant decrease in the amount of growth from one value to the next. In other words, it always grows but it does so to a decreasing degree over time. A simple example is the distance that a bullet travels when you shoot it straight up into the air from the moment it leaves the gun until it reaches its apex, before beginning its descent. The height of the bullet starts off by increasing quickly but those increases constantly decrease in amount from one interval of time to the next due to the pull of gravity.

So, how does log growth relate to log scales? It’s not at all obvious, is it? Good luck finding an explanation on the web that’s understandable if you’re not fluent in mathematics. Here’s a graphical example of log growth, based on the log2 values for the numbers 1 through 64:

I’ve annotated this graph with lines connected to points in time when the logarithm has increased by a whole unit (i.e., from 0 to 1, 1 to 2, etc.). Starting on day 1 the log value is zero and whole-unit increases are subsequently reached on days 2, 4, 8, 16, 32, and 64. Do you recognize this pattern of days along the X axis? It matches the numbers that would appear along a log2 scale that begins with 1. In other words, the intervals between the days on which the logarithms increased by a whole unit consistently grew by a multiple of 2.

Have you noticed that the pattern formed by log growth is the inverse (i.e., flipped top to bottom and left to right version) of the pattern formed by exponential growth? To illustrate this, the graph below displays three different patterns of growth: logarithmic, linear, and exponential.

This inverted relationship between patterns of logarithmic and exponential growth visually confirms the inverted relationship that exists between logarithms and the exponential powers that are used to produce them.

Given the nature of logarithms, what do you think would happen to the shape of the blue exponential line above if I changed the scale along the Y axis from linear to logarithmic? If your answer is that the blue line would now take on the shape of logarithmic growth similar to the orange line above, you’re thinking in the right direction, but you went too far. The nature of logarithms to progressively decrease in the amount that they grow from one to the next would cancel out the nature of exponents to progressively increase in amount that they grow from one to the next, resulting in a linear pattern similar to the gray line in the graph above.

I hope you agree that these concepts actually make sense when they’re explained with clear words and examples. You still might not have much use for logarithms unless your work involves advanced mathematics, but now you’re less likely to embarrass yourself by saying something dumb about them, as I’ve done on occasion.

Context Is for Kings

February 20th, 2020

In season 1, episode 3 of the television series “Star Trek: Discovery,” when faced with a particularly wicked problem the captain of the starship Discovery speaks these words: “Universal law is for lackeys; context is for kings.” I suspect that the writers of this show consciously crafted these words for quotability. They rise to the heights of wisdom that Star Trek occasionally reaches. When I heard these words, I quickly paused the show and ran to my computer to record them because they eloquently expressed an important truth that I’ve been teaching for many years. Simple rules can serve as guides for novices, but experts operate in the more subtle realm of context.

In my work in the field of data visualization, I teach many simple rules of thumb to encourage best practices, but I’m always careful to explain why these guidelines work. I encourage my students to root their decisions in a nuanced consideration of context, not in a simplistic algorithm. When you fully understand why good rules of thumb work well in general, you can identify specific situations when they don’t apply. In other words, you can break the rules when the situation demands it.

Good teachers help people think at the conceptual level, navigating nuance, not merely at the procedural level. We humans are capable of thinking that is more sophisticated than blind obedience to algorithms. Procedural knowledge (“If A happens, then do X; if B happens, then do Y; else do Z.”) exhibits little if any understanding. Conceptual knowledge, on the other hand, allows us to master context, the realm of kings. If you want to become an expert in data visualization (or an any other field), avoid teachers, books, and courses that say “Do it this way” without explaining why. Don’t settle for being a lackey when you can become a king.

Data Sensmaking, Science, and Atheism

January 13th, 2020

I’m an atheist. Despite the stigma that most Americans still attach to atheism, I embrace it without reservation. My perspective as an atheist is tightly coupled with my perspectives as an advocate of science and a data sensemaking professional. Atheism, science, and data sensemaking all embrace evidence and reason as the basis for understanding. All three shun beliefs that are not based on evidence and reason as the enemy of understanding.

I wasn’t always an atheist. Like most Americans born in the 1950s, I was raised as a Christian—in my case, a version of fundamentalist Protestantism known as Pentecostalism. Not satisfied with a nominal commitment to religion, I was that weird kid who carried his Bible with him to high school every day. While still a teenager, I felt called by God into the ministry and pursued that calling as my initial profession. Despite a genuine commitment, however, I sometimes felt a bit uneasy about my faith. From time to time, I was faced with facts and a sense of morality that conflicted with my faith. These conflicts became increasingly difficult to ignore. In my mid-20s, after many dark nights of the soul, I pulled out of the ministry and gradually abandoned my faith altogether while searching for a new foundation to build my life upon. Eventually, science became that foundation. The transition was painful, but also exciting. I went on to study religion in graduate school from an academic perspective (comparative religion, psychology of religion, sociology of religion, history of religion, etc.) because I wanted to better understand the powerful role of religion in people’s lives and in the world at large.

After leaving the ministry, I didn’t embrace atheism immediately. I first spent a few years exploring liberal expressions of religion (e.g., Unitarianism, the Society of Friends, and even Reformed Judaism), hoping to find a like-minded community, but they all had something in common that never felt right to me. That something was faith. As I increasingly embraced science as the best path to understanding, I increasingly recognized faith as a problem. Faith delivers ready-made answers based on authority—end of story—but science encourages open-ended curiosity, continuous self-correction, and discovery.

For many years, I called myself an agnostic. Without reservation, I could say, “I don’t know if a god exists.” Since no one really knows, in a sense everyone is an agnostic, whether they admit it or not. At the time, I didn’t think of myself as an atheist because I misunderstood the term. I thought that an atheist was someone who claimed to know for sure that no gods exist. That isn’t the case. Agnosticism is an epistemological expression—it’s concerned with knowledge, or more precisely, with the lack of knowledge: “I don’t know.” Atheism, on the other hand, is an expression of belief, or more precisely, the absence of belief. An atheist says, “I don’t believe that any gods exist.” One can also embrace a slightly firmer version of atheism that declares “I believe that no gods exist.” Either way, atheism does not claim “I know for sure that no gods exist.” Agnosticism and atheism represent the same epistemological perspective: “I don’t know if any gods exist.” Atheism just goes one step further by extending a lack of knowledge to the realm of belief.

Science resists certainty; it deals in probabilities. Based on the available evidence, something is either likely or unlikely to a statistically calculated degree. I lack belief in gods because I’m not aware of any evidence for their existence. During my years as a Christian, I accepted the Christian god’s existence as a matter of faith. At the time, I made this leap to make sense of the world, but I no longer need faith in a god to make sense of the world or my role in it. If evidence for a god’s existence ever emerges, I’ll reconsider my position.

It bears mentioning that, just as everyone is in a sense an agnostic, whether they realize it or not, everyone is also an atheist. Even if you’re a religious fundamentalist, as I was, you’re also an atheist. This is because, while you believe in your god, you don’t believe in other gods. In other words, in respect to most of the gods that people believe in—all but your own—you’re an atheist. In this respect, you and I are a lot alike. We only differ in that I include one more god on my list of those that I don’t believe in.

Religions codify faith-based beliefs. They declare what is true about the world, about humans, about our role in the world, and, of course, about the role of supernatural beings. They do so without evidence. Faith discourages curiosity and the search for truth. As Richard Dawkins wrote, “One of the truly bad effects of religion is that it teaches us that it is a virtue to be satisfied with not understanding” (The God Delusion, 2008, page 126). As a data sensemaking professional, my commitment to reason and evidence as the basis for understanding puts me at odds with faith.

We can thank the late Harvard evolutionary biologist Stephen J. Gould for the conceptual basis on which many scientists and data sensemakers who are also religious reconcile these conflicting perspectives. Gould proposed that science and religion occupy two “Nonoverlapping Magisteria.” I admire Gould’s work greatly. He was a marvelous scientist who did a great deal to popularize science, but I find this awkward construction intolerable. According to Gould, science has its domain, religion has its domain, and the two don’t overlap. Furthermore, these two domains should respect one another and consistently stick to their own distinct areas of expertise. As explained by Adam Neiblum in his book Unexceptional when describing Gould’s position:

Each magisteria has its own epistemic foundation, each fulfilling a different role in human needs and affairs. Science, epistemologically founded on empirical observation, evidence, data and reason, necessarily deals with facts about the world, while religion, epistemologically founded on personal revelation and faith, deals with values and morality, which have nothing to do with matters of fact about the world. (Unexceptional: Darwin, Atheism and Humanity, Adam Neiblum, 2017, p. 166)

Since the emergence of modern science, it and religion have always co-existed uncomfortably. I suspect that Gould wanted to make science more palatable for religious folks—the majority of Americans—so he separated science and religion into exclusive, non-competing realms.

According to a Pew Research Center survey of scientists (specifically members of the American Association for the Advancement of Science), only 33% believe in a god and over 40% identify themselves as atheists or agnostics (“Scientists and Belief,” Pew Research Center, November 5, 2009, www.pewforum.org/2009/11/05/scientists-and-belief). This is extraordinary in light of the following statistics, also reported by Pew:

The vast majority of Americans (90%) believe in some kind of higher power, with 56% professing faith in God as described in the Bible and another 33% saying they believe in another type of higher power or spiritual force. Only one-in-ten Americans say they don’t believe in God or a higher power of any kind. (“Key Findings about Americans’ Belief in God,” Pew Research Center, April 5, 2018, www.pewresearch.org/fact-tank/2018/04/25/key-findings-about-americans-belief-in-god/)

The correlation between scientific work and atheism, while extraordinary, is not surprising. Pursuit of science is not necessarily responsible for a lack of theistic belief, but my own exposure to science definitely influenced my departure from theism, and to a great degree.

There is a fundamental problem with Gould’s concept of Nonoverlapping Magisteria: it isn’t scientific. Science is definitely concerned with religion’s claims that the world was created by a god and that supernatural entities (gods, angels, spirits, demons, leprechauns, dead people, etc.) continue to intervene in the world’s affairs. Religion is definitely agitated by the fact that more and more of its territory is being reduced by scientific discoveries. This conflict cannot be defined out of existence. To do so defies the tenets and methods of science.

I reject the notion that morality is the rightful and exclusive domain of religion. Morality does not require religion. To say that it does makes morality an obligation that’s imposed on us by an external authority rather than a personal choice. I am no less moral as an atheist than I was as a Christian. Actually, I am more moral, for my behavior is based entirely on a personal sense of good behavior, never on a belief that I must behave in certain ways because a god demands it. When I was religious, my morality was governed, at least in part, by fear. You don’t dare piss off a god.

As religions develop, they codify morality in various ways, but they don’t create it. Morality began to evolve in social animals before the emergence of Homo sapiens. Certain ways of behaving towards others naturally evolved as moral instincts in all social animals, not just humans. Altruism, justice, and fairness are exhibited quite naturally in our species and in several others as well.

If you’re a scientist, or similarly, if you’re a data sensemaking professional, and you’re also religious, you must come to grips with the conflict that exists between these perspectives. You must divide your life, as Gould proposed, into two distinct realms. You can’t allow your willingness to accept things on faith to influence your work. Professionally, you must always go where the evidence leads you. If you do this successfully in your work, it may become increasingly difficult to do otherwise in your personal life.

Despite the stigma about atheism that still persists, an increasing number of people embrace it as a reasonable position. This is especially true among the young. They are less militant about it than my generation, however. Unlike my generation, many of them haven’t needed to claw their way out of religion. Atheism simply makes sense to them and has from an early age.

As with almost everything that I write about in this blog, this article was prompted by a particular event. Not long ago I was approached by the business school of a nearby religiously affiliated college to help them put together a data analytics program, and potentially, to also teach in the program, so I reviewed their website to find out just how religious they were. Despite the fact that members of the denomination that founded and runs this college are often quite liberal and known for their work as social activists, I found that this college is quite fundamentalist in its statement of faith. Here it is, word for word:

The Trinity
We believe in one eternal God, the source and goal of life, who exists as three persons in the Trinity: the Father, the Son, and the Holy Spirit. In love and joy, God creates and sustains the universe, including humanity, male and female, who are made in God’s image.
God the Father
We believe in God the Father Almighty, whose love is the foundation of salvation and righteous judgment, and who calls us into covenant relationship with God and with one another.
God the Son
We believe in Jesus Christ, the Word, who is fully God and fully human. He came to show us God and perfect humanity, and, through his life, death, and resurrection, to reconcile us to God. He is now actively present with us as Savior, Teacher, Lord, Healer, and Friend.
God the Holy Spirit
We believe in the Holy Spirit, who breathed God’s message into the prophets and apostles, opens our eyes to God’s Truth in Jesus Christ, empowers us for holy living, and carries on in us the work of salvation.
Salvation
We believe that salvation comes through Jesus Christ alone, to whom we must respond with repentance, faith, and obedience. Through Christ we come into a right relationship with God, our sins are forgiven, and we receive eternal life.
The Bible
We believe that God inspired the Bible and has given it to us as the uniquely authoritative, written guide for Christian living and thinking. As illumined by the Holy Spirit, the Scriptures are true and reliable. They point us to God, guide our lives, and nurture us toward spiritual maturity.
The Christian Life
We believe that God has called us to be and to make disciples of Jesus Christ and to be God’s agents of love and reconciliation in the world. In keeping with the teaching of Jesus, we work to oppose violence and war, and we seek peace and justice in human relationships and social structures.
The Church
We believe in the church as the people of God, composed of all who believe in Jesus Christ, who support and equip each other through worship, teaching, and accountability, who model God’s loving community, and who proclaim the gospel to the world.
Christian Worship
We believe Christ is present as we gather in his name, seeking to worship in spirit and in truth. All believers are joined in the one body of Christ, are baptized by the Spirit, and live in Christ’s abiding presence. Christian baptism and communion are spiritual realities, and, as Christians from many faith traditions, we celebrate these in different ways.
The Future
We believe in the personal return of Jesus Christ, in the resurrection of the dead, in God’s judgment of all persons with perfect justice and mercy, and in eternal reward and punishment. Ultimately, Christ’s kingdom will be victorious over all evil, and the faithful will reign.

Wow. This is an incredible statement of faith. I mean this quite literally: it isn’t credible. Not a shred of verifiable evidence exists for any of these assertions, but faculty members at this college must affirm these articles of faith in writing. Obviously, I can’t make this affirmation. I pointed this problem out to my contact at the college and she suggested a way to get around it. I didn’t want to offend her, but I had to make it clear that I could not affiliate myself with a faith-based religious organization, even if they allowed it. What I didn’t say was that I found the college’s statement of faith frightening. It brought back vivid memories of the faith-based beliefs that I fought hard to abandon when I was young.

This encounter left me wondering how people of faith manage to avoid the pitfalls of this orientation when making sense of data or doing science. The cognitive dissonance must be exhausting. I have no desire to offend anyone who navigates this tension, but I am genuinely concerned that it affects the work. Faith primes us to accept certain things as true, without question, regardless of the evidence. This is never a good approach to data sensemaking or science. The magisteria definitely overlap. The conflict is real. If you’re religious and also a scientist or a data sensemaker, you must navigate these conflicting perspectives with care. I, for one, couldn’t do it.

Linear Versus Logarithmic Thinking about Numbers

December 26th, 2019

Some folks argue that humans intuitively think about numbers logarithmically versus linearly. My experience strongly suggests that this is not the case. If you’ve ever tried to explain logarithms or logarithmic scales to people, or asked them to interpret graphs with logarithmic scales, as I have often done, you probably share my belief that logarithms are not cognitively intuitive. The behaviors that are sometimes described as intuitive logarithmic thinking about numbers can be reasonably explained as something else entirely.

According to some sources, a research study found that six-year-old children, when asked to identify the number the falls halfway between one and nine, often selected three. Unfortunately, after extensive searching I cannot find a study that actually performed this particular experiment. One article that makes this claim cites a study titled “A Framework for Bayesian Optimality of Psychophysical Laws” as the source, but that study does not mention this particular experiment or finding. Instead, it addresses the logarithmic nature of perception, especially auditory perception. Keep in mind that perception and cognition are related but different. Many aspects of perception do indeed appear to be logarithmic. As the authors of the study mentioned above observed about auditory perception, “…under the Weber–Fechner law, a multiplicative increase in stimulus intensity leads to an additive increase in perceived intensity,” but that’s a different matter. I’m talking about cognition. Even if many kids actually did select three as halfway between one and nine in an experiment, I doubt that they were thinking logarithmically. At age six children have not yet learned to think quantitatively beyond a rudimentary understanding of numbers. Until they begin to learn mathematics, children tend to think with a limited set of numbers consisting of one, two, three, and more, which corresponds to the preattentive perception of numerosity that is built into our brains. With this limited understanding, three is the largest number that they identify individually, so it might be natural for them to select three as the value that falls halfway between one and numbers that are larger than three. If the numbers were displayed linearly and in sequence for the children to see when asked to select the number in the middle (illustrated below), however, I suspect that they would correctly select five.

1   2   3   4   5   6   7   8   9

You might argue that this works simply because it allows children to rely on spatial reasoning to identify the middle number. That is absolutely true. We intentionally take advantage of spatial reasoning when introducing several basic concepts of mathematics to children. This works as a handy conceptual device to kickstart quantitative reasoning. Believing that children naturally think logarithmically would lead us to predict that, if asked to identify the number halfway between 1 and 100, they would be inclined to choose 10. Somehow, I doubt that they would.

Another research-based example that has been used to affirm the intuitive nature of logarithmic thinking about numbers is the fact that people tend to think of the difference between the numbers one and two as greater than the difference between the numbers eight and nine. I suspect that they do this, however, not because they’re thinking logarithmically, but more simply because they’re thinking in terms of relative magnitude (i.e., proportions). Even though the incremental difference between both pairs of numbers is a value of one (i.e., 2 – 1 = 1 and 9 – 8 = 1), the number two represents twice the magnitude of one while the number nine is only 12.5% greater than eight, a significantly lesser proportion. I anticipate that some of you who are mathematically inclined might object: “But logarithmic thinking and proportional thinking are one and the same.” Actually, this is not the case. While logarithms always involve proportions, not all proportions involve logarithms. A logarithmic scale involves a consistent proportional sequence. For example, with a log base 10 scale (i.e., log10), each number along the scale is ten times the previous number. Only when we think of a sequence of numbers in which each number exhibits a consistent proportion relative to previous number are we thinking logarithmically. We do not appear to do that naturally.

Another example, occasionally cited, is that people tend to think of differences between one thousand, one million, one billion, one trillion, etc., as equal when in fact each of these numbers is 1,000 times greater than the previous. Is this because people are thinking logarithmically? I doubt it. I suspect that it is simply because each of these values exhibits the next change in the label (e.g., from the label “thousand” to the label “million”), and changes in the labels suggest equal distances. If people intuitively thought about numbers logarithmically, they should automatically recognize that each of these values (one billion versus one million versus one thousand, etc.) is 1,000 times the previous, but most of us don’t realize this fundamental fact about our decimal system without first doing the math.

Along linear scales, increments from one value to the next are determined by addition—you always add a particular value to the previous value to produce the next value in the sequence, such as by adding a value of one to produce the scale 0, 1, 2, 3, 4, etc. or a value of ten to produce the scale 0, 10, 20, 30, 40, etc. Along logarithmic scales, on the other hand, increments are determined by multiplication—you always multiply the previous value by a particular number to produce the next value in the sequence, such as by multiplying each value by two to produce the scale 1, 2, 4, 8, etc., or by ten to produce the scale 1, 10, 100, 1,000, etc. The concept of logarithms, when clearly explained, is not difficult to understand once you’ve learned the mathematical concept of multiplication, but thinking about numbers logarithmically does not appear to be intuitive. It takes training.

Repeat Due to Pathology

October 31st, 2019

Automated information systems only work if they actually inform and do so clearly. Too often, however, they create confusion. This was not what we had in mind when I and others created some the earliest automated information systems back in the 1980s, when the personal computer began its rapid and thorough takeover of the workplace.

Back then, I was starry-eyed, convinced that everything imaginable should be automated using computers. Unfortunately, I and my colleagues at the time rarely, if ever, questioned the merits of automation. We were having too much fun replacing old manual processes with new automated systems. We were rock stars! We were convinced that those new systems could only do good. My oh my, were we mistaken. Not everything benefits from automation, and even good candidates become counter-productive when they’re poorly designed. Choosing good candidates for automation and then building systems that do the job well takes time and care—two rare ingredients in a “move fast and break things” IT culture.

The most recent reminder of this problem arrived in the form of an email from my health plan yesterday. The email informed me that a new test result was available through the plan’s web-based information system, called MyChart. I assumed that the test result was related to the colonoscopy that I endured the previous week. To put things in perspective, the first time that I had a colonoscopy, the doctor perforated my colon, which landed me in the hospital facing potentially dire consequences. So, as you might imagine, I dread colonoscopies even more than most people.

When I opened the test result in MyChart, it was indeed related to my recent colonoscopy. Here’s what I found:

Other than the date, which matched the date of the procedure, nothing else in this so-called test result made sense to me.

  • What does “Colonoscopy Impression, External” mean? Nothing about the procedure was external.
  • Who is this person identified as “Historical Provider, MD”? My doctor had a name.
  • This was identified as a “Final result,” but I didn’t know that I was awaiting further results. Before leaving the doctor’s office, I thought I was given a full account of the doctor’s findings both verbally and in writing.
  • Most alarmingly, what does a “Your Value” of “repeat based on pathology” mean? Did I have to go through this again? Why? What was wrong?
  • And, to top it all off, I couldn’t tell how the “repeat based on pathology” value compared to the “Standard Range” (i.e., a normal result), because it was blank.

In a panic, I clicked on the “About this Test” icon in the upper-right corner, hoping for an explanation, which produced no results.

The stupidity of this automated system not only produced a panic, it also led me to contact an actual human to resolve the confusion. In other words, a system that was supposed to reduce the work of humans actually added to it, which happens all too often. The human that I contacted, a friendly woman named Beth, didn’t understand what “repeat based on pathology” meant any more than I did, but she was able to access a letter that was placed in the mail to me yesterday, which provided an answer. As it turns out, because a single polyp was found and removed during the procedure, I’m at greater risk than most people of future polyps that could become malignant, so I should have another colonoscopy in five years. What a relief.

Could the test result that was posted to MyChart have provided clear and useful information? Absolutely, but it didn’t, and this wasn’t the first time. I had a similar experience a few months ago while reviewing the results posted in MyChart of a lengthy blood panel. On that occasion, I had to get my doctor on the phone to interpret several obscure lab results.

Information technologies are not a panacea. They aren’t useful for everything, and when they are useful, they must be well designed. Otherwise, they complicate our lives.

The Myth of Technology’s Neutrality

August 5th, 2019

In a recent op-ed that appeared in the New York Times, Tech billionaire Peter Thiel said this about artificial intelligence (AI):

The first users of the machine learning tools being created today will be generals…AI is a military technology.

Thiel’s opinion has created a backlash among some of AI’s proponents who don’t want their baby to be cast in this light. There is no doubt, however, that AI will be used for military purposes. It’s already happening. But that’s not the topic that I want to address in this blog post. Instead, I want to warn against a dangerous belief that is prevalent among many digital technologists. In a response to Thiel’s op-ed, Dawn Song, a computer science professor at the University of California, Berkeley who works in the Berkeley Artificial Intelligence Research Lab told Business Insider,

I don’t think we can say AI is a military technology. AI, machine learning technology is just like any other technologies. Technology itself is neutral.

According to Business Insider, Song went on to say that, just like nuclear or security encryption technologies, AI can be used in either good ways or bad. Basing her claim that technology is neutral on the fact that it can be used in both good and bad ways is a logical error. Anything can be used in both good and bad ways, but that doesn’t make everything neutral. Digital technologies in particular are never neutral. That’s because they are created by people and people are never neutral. Digital technologies consist of programs that are written by people with assumptions, interests, biases, beliefs, perspectives, and agendas that become embedded in those programs.

Not only are digital technologies not neutral, they shouldn’t be. They should be designed to do good and to prevent harm. If the potential for harm exists, technologies should be designed to prevent it. If the potential for harm is great and it cannot be prevented, the technology should not be developed. That’s right. Creating something that does great harm is immoral. This is especially true of AI, for its potential for harm is enormous. If general AI—a truly sentient machine—were ever developed, that machine would not only exhibit the non-neutral objectives that were programmed into it, it would soon develop its own interests and objectives that might be quite different from those of humanity. At that point, we would be faced with a silicon-based competitor that could work at speeds that would leave us in the dust. Our puny interests probably wouldn’t count for much. Could we create a superintelligent AI that would respect our interests? At this point, we don’t know.

Fortunately, some of the folks who are positioned at the forefront of AI research recognize its great potential for harm and are working fervently and thoughtfully to prevent this from happening. They are painfully aware of the fact that this might not be possible. Unfortunately, however, there are probably even more people working in AI who exhibit the same naivete as Dawn Song. Believing that AI is neutral is a convenient way of relinquishing responsibility for the results of their work. Look at the many ways that digital technologies are being used for harm today and ask yourself, was this the result of neutrality? No, those behaviors were either intentionally designed into the products and services or were the result of negligence. There is a great risk that harmful behaviors would develop within AI that were neither anticipated nor intended. The claim that digital technologies in general and AI in particular are neutral should concern us. Technologies are human creations. We must take responsibility for them. The cost of not taking responsibility is too high. Sometimes this means that we must prevent particular technologies from ever being developed. Whether or not this is true of general AI has yet to be determined. While the jury is still out, I’d like the jury to be composed of people who are working hard to understand the costs and to take them seriously, not people who naively believe in technology’s neutrality.

Ethical Data Sensemaking

July 22nd, 2019

Simply stated, data sensemaking is what we do to make sense of data. We do this in an attempt to understand the world, based on empirical evidence. Those who work to make sense of data and communicate their findings are data sensemakers. Data sensemaking, as a profession, is currently associated with several job titles, including data analyst, business intelligence professional, statistician, and data scientist. Helping people understand the world based on data is important work. Without understanding, we often make bad decisions. When done well, data sensemaking requires a broad and deep set of skills and a commitment to ethical conduct. When data sensemaking professionals fail to do their jobs well, whether through a lack of skills or other ethical misconduct, confusion and misinformation results, which encourages bad decisions—decisions that do harm. Making sense of data is not ethically or morally neutral; it can be done for good or ill. “I did what I was told” is not a valid excuse for unethical behavior.

In recent years, misuses of data have led to a great deal of discussion about ethics related to invasions of privacy and discriminatory uses of data. Most of these discussions focus on the creation and use of analytical algorithms. I’d like to extend the list of ethical considerations to address the full range of data sensemaking activities. The list of ethical practices that I’m proposing below is neither complete nor sufficiently organized nor fully described. I offer it only as an initial effort that we can discuss, expand, and clarify. Once we’ve done that, we can circle back and refine the work.

The ethical practices that can serve as a code of conduct for data sensemaking professionals are, in my opinion, built upon a single fundamental principle. It is the same principle that medical doctors swear as an oath before becoming licensed: Do no harm.

Here’s the list:

  1. You should work, not just to provide information, but to enable understanding that can be used in beneficial ways.
  2. You should develop the full range of skills that are needed to do the work of data sensemaking effectively. Training in a data analysis tool is not sufficient. This suggests the need for an agreed-upon set of skills for data sensemaking.
  3. You should understand the relevant domain. For instance, if you’re doing sales analysis, you should understand the sales process as well as the sales objectives of your organization. When you don’t understand the domain well enough, you must involve those who do.
  4. You should know your audience (i.e., your clients; those who are asking you to do the work)—their interests, beliefs, values, assumptions, biases, and objectives—in part to identify potentially unethical inclinations.
  5. You should understand the purpose for which your work will be used. In other words, you should ask “Why?”.
  6. You should strive to anticipate the ways in which your findings could be used for harm.
  7. When asked to do something harmful, you should say “No.” Furthermore, you should also discourage others from doing harm.
  8. When you discover harmful uses of data, you should challenge them, and if they persist, you should expose them to those who can potentially end them.
  9. You should primarily serve the needs of those who will be affected by your work, which is not necessarily those who have asked you to do the work.
  10. You should not examine data that you or your client have no right to examine. This includes data that is private, which you have not received explicit permission to examine. To do this, you must acquaint yourself with data privacy laws, but not limit yourself to concern only for data that has been legally deemed private if it seems reasonable that it should be considered private nonetheless.
  11. You should not do work that will result in the unfair and discriminatory treatment of particular groups of people based on race, ethnicity, gender, religion, age, etc.
  12. If you cannot enable the understanding that’s needed with the data that’s available, you should point this out, identify what’s needed, and do what you can to acquire it.
  13. If the quality of the data that’s available is insufficient for the data sensemaking task, you should point this out, describe what’s lacking, and insist that the data’s quality be improved to the level that’s required before proceeding.
  14. You should always examine data within context.
  15. You should always examine data from all potentially relevant perspectives.
  16. You should present your findings clearly.
  17. You should present your findings as comprehensively as necessary to enable the level of understanding that’s needed.
  18. You should present your findings truthfully.
  19. You should describe the uncertainty of your findings.
  20. You should report any limitations that might have had an effect on the validity of your findings.
  21. You should confirm that your audience understands your findings.
  22. You should solicit feedback during the data sensemaking process and invite others to critique your findings.
  23. You should document the steps that you took, including the statistics that you used, and maintain the data that you produced during the course of your work. This will make it possible for others to review your work and for you to reexamine your findings at a later date.
  24. When you’re asked to do work that doesn’t make sense or to do it in a way that doesn’t make sense (i.e., in ways that are ineffective), you should propose an alternative that does make sense and insist on it.
  25. When people telegraph what they expect you to find in the data, you should do your best to ignore those expectations or to subject them to scrutiny.
    As data sensemakers, we stand at the gates of understanding. Ethically, it is our job to serve as gatekeepers. In many cases, we will be the only defense against harm.

I invite you to propose additions to this list and to discuss the merits of the practices that I’ve proposed. If you are part of an organization that employs other data sensemakers, I also invite you to discuss the ethical dimensions of your work with one another.

The Inflated Role of Storytelling

July 14th, 2019

People increasingly claim that the best and perhaps only way to convince someone of something involves telling them a story. In his new book Ruined By Design—a book that I largely agree with and fully appreciate—designer Mike Monteiro says that “If you’re not persuading people, you’re not telling a good enough story.” Furthermore, “…while you should absolutely include the data in your approach, recognize that when you get to the point where you’re trying to persuade someone…, you need a story.” Really? Where’s the evidence for this claim? On what empirical research is it based? And what the hell is a story, anyway? Can you only persuade people by constructing a narrative—a presentation that has a beginning, middle, and end, with characters and plot, tension and resolution? In truth, stories are only one of several ways that we can persuade. In some cases, a simple photograph might do the trick. A gesture, such as a look of anger or a raised fist, sometimes works. A single sentence or a chart might do the job. Even a straightforward, unembellished presentation of the facts will sometimes work. The notion that stories are needed to convince people is itself a story—a myth—nothing more.

It reminds me of the silly notion that people only use 10% of their brains, which someone fabricated long ago from thin air and others have since quoted without ever checking the facts. This notion is absurd. If we used only 10% of our brains, the other 90% would wither and die. Stories are not the exclusive path to persuasion. Not everyone can be convinced in the same way and most people can be convinced in various ways, depending on the circumstances. While potentially powerful and useful, the role of stories is overblown.

One of the common errors that people sometimes make when promoting the power of stories is the notion that stories work because they appeal to emotions. For example, Monteiro wrote that “…people don’t make decisions based on data; they make them based on feelings.” This is the foundation for his rationale that stories are the only path to persuasion. Stories can certainly appeal to emotions, but stories can also present facts without any emotional content whatsoever. We all, no matter how rational, are subject to emotion, but not exclusively so. Stories structure information in narrative form and those narratives can appeal to emotions, to the rational mind, or both. In other words, saying that stories are powerful is not the same as saying that appeals to people’s feelings are powerful.

Don’t get me wrong, stories are great; they’re just not the panacea that many people now claim. The current emphasis on storytelling is a fad. In time, it will fade. In time, some of the people who promote stories to the exclusion of other forms of communication will look back with embarrassment. No matter what they claim, no one actually believes that only stories can convince people. No one exclusively uses stories to persuade. We all use multiple means and that’s as it should be. The sooner we get over this nonsense that only stories can persuade, the sooner we can get on to the real task of presenting truths that matter in all the ways that work.

Breadth Before Depth

June 25th, 2019

I’ve long recognized the value of broad experience, education, and interests. It enables us to see the world from multiple perspectives and to connect ideas from multiple domains. I’ve always felt that my own meandering path through multiple areas of study, interest, and work has allowed me to think in ways that a narrow path would have never produced. Deep experience and study are valuable as well, but without breadth, depth breeds myopia. Given this notion, I was thrilled to find a new book that articulates this case eloquently and backs it with a wealth of evidence. The book, written by David Epstein, is titled Range: Why Generalists Triumph in a Specialized World.

It is, in my opinion, the most important book about thinking, learning, and problem solving since Daniel Kahneman’s book Thinking, Fast and Slow.

Back in 2008, when Malcolm Gladwell wrote the book Outliers, he promoted the value of narrow, repetitive, and extensive training. Gladwell highlighted the notion that genuine expertise in any endeavor requires around 10,000 hours of focused training. While it is true that some areas of endeavor can be mastered through extensive repetition of specific tasks (e.g., learning to play golf or chess), many others cannot. As it turns out, the notion that people learn best if they pick a specific area of endeavor when they’re young and stick to it with unflagging commitment and discipline is not a model that works in most cases. The skills that can be developed in this way are rather isolated. In fact, by gaining broad experience—generalizing—we can learn to think in ways that are more flexible and better able to fathom complexities. This is an important insight, for the world in which we live today is increasingly complex. The cross-fertilization of ideas that is nurtured by generalization prepares us to deal with modern challenges. You might find it interesting to note what Gladwell thinks of Epstein’s new book:

For reasons I cannot explain, David Epstein manages to make me thoroughly enjoy the experience of being told that everything I thought about something was wrong. I loved Range.

This is a classy admission by someone who, as a generalist himself, is well positioned to recognize flaws in his former thesis.

Fairly early in the book, Epstein writes:

The challenge we all face is how to maintain the benefits of breadth, diverse experience, interdisciplinary thinking, and delayed concentration in a world that increasingly incentivizes, even demands, hyperspecialization. While it is undoubtedly true that there are areas that require individuals with…precocity and clarity of purpose, as complexity increases—as technology spins the world into vaster webs of interconnected systems in which each individual only sees a small part—we also need more…people who start broad and embrace diverse experiences and perspectives while they progress. People with range.

There are “kind” learning environments, in which “patterns repeat over and over, and feedback is extremely accurate and usually very rapid” (e.g., golf and chess). “The learning environment is kind because a learner improves simply by engaging in the activity and trying to do better.” There are also “wicked” domains, in which “the rules of the game are often unclear or incomplete, there may or may not be repetitive patterns and they may not be obvious, and feedback is often delayed, inaccurate, or both.” To an increasing extent, the modern world is not kind. To navigate it successfully, we need range. Computers are great at handling kind environments, hence the growing success of narrow AI. We humans, however, assuming that we cultivate the abstract and multifaceted thinking that our brains have evolved to handle, are much better at handling the wicked problems that pose our greatest challenges today.

“AI systems are like savants.” They need stable structures and narrow worlds.

When we know the rules and answers, and they don’t change over time—chess, golf, playing classical music—an argument can be made for savant-like hyperspecialization practice from day one. But those are poor models of most things humans want to learn.

Our educational system is not doing a good job of preparing future generations for the increasingly wicked world in which they will live, and employers often fail to recognize the benefits of generalization. This needs to change. David Epstein does a great job of explaining why and suggesting some of the ways to make this happen. It is never too late to broaden your horizons.

Bullshit about Bullshitters and other Misadventures in Social Science

May 20th, 2019

I recently came across a news story about a social science research study that caught my attention. How could I resist a story about bullshitters? According to the study, titled “Bullshitters. Who Are They and What Do We Know about Their Lives?”, this is “an important new area of social science research.” Reviewing the research paper revealed more about problems in social science research, however, than anything meaningful and useful about bullshit, bullshitting, or bullshitters. In this blog post, I’ll describe a few of these problems.

A Useless Definition

The researchers defined “bullshitters” as “individuals who claim knowledge or expertise in an area where they actually have little experience or skill.” If you read the study, however, you will find that this does not accurately describe the behavior that they examined. A more accurate and specific description would state that bullshitters are “people who claim, for any reason, to be familiar with and perhaps even understand concepts that don’t actually exist.” The study is based on the responses of 15-year old students in English-speaking countries to questions about three bogus mathematical concepts that they answered while taking the Programme for International Student Assessment (PISA) exam. According to this study, students who claim knowledge of a bogus mathematical concept, for whatever reason, are bullshitters. This, however, is not what people typically mean by the term. Typically, we think of bullshitters as people who make shit up, not also as people who make mistakes, but the authors didn’t make this distinction. If you turn to someone and ask, “Are you bullshitting me?,” you are asking if they intentionally fabricated or exaggerated what they just told you. Bullshitting involves intention. The act of intentionally claiming expertise that you lack to inflate your worth in the eyes of others is indeed a behavior that could be studied, but the mixture of intentional deception and unintentional error does not qualify as a single specific behavior.

Why did the researchers define bullshitters as they did? I suspect it is because they couldn’t determine the difference between intentional deceit and confusion about the bogus mathematical concepts. Defining bullshitters as they did, however convenient, produced a useless study. What can we possibly do with the results? Unfortunately, many social science research studies fall into this category. In part, this is a result of the current myopic emphasis in academia on publication. To get ahead as an academic in a research-oriented discipline, you must publish, publish, publish. For individuals, getting published, and for academic institutions, having published studies cited in other publications, is more valuable than useful research. This is a travesty.

Unreliable Measures and Questionable Statistics

By reviewing many social science research studies over the years, I’ve learned that you should take their claims with a grain of salt until you examine the work carefully. To do this, you must not only read the papers closely, you must also examine the data on which the research was based, including the ways the data was manipulated. By “manipulated,” I don’t mean that the researchers intentionally screwed with the data to support particular conclusions, although this does occur, but merely that they produced their own data from the original data on which the research was based, usually by means of statistical operations (e.g., statistical models of various types) that rely on assumptions. To take research conclusions seriously, we must confirm that the data, the statistical models, and the assumptions on which they are based are all valid and reliable. When researchers don’t provide us with the means to validate their data, we should never accept their conclusions on faith. In my opinion, studies that don’t clearly describe the data on which their findings are based and don’t make that data readily available for inspection don’t qualify as legitimate science.

Social science is challenged by the fact that it often cannot directly measure the phenomenon that it seeks to understand. For example, you cannot place a human subject into a machine that’s capable of measuring their bullshitting behavior. You’re forced to use a proxy—that is, to measure something that you believe is closely related and representative—as the best means available. In this particular study, the researchers chose to treat students’ answers to questions about three bogus mathematical concepts as their proxy for bullshitting.

While taking the PISA exam, students were asked about a series of sixteen mathematical concepts, including three bogus concepts—”Proper Number,” “Subjunctive Scaling,” and “Declarative Fraction”—and for each they were asked to select from the following list the response that best described their familiarity with the concept:

    1. Never heard of it
    2. Heard of it once or twice
    3. Heard of it a few times
    4. Heard of it often
    5. Know it well, understand the concept

These five potential responses comprise something called a Likert scale. The items are supposed to represent the full range of possible responses. Another more typical set of Likert items that often appears in questionnaires asks people to assess the merits of something, such as a particular product, by selecting from a list of responses like the following:

    1. Extremely Poor
    2. Poor
    3. Moderate
    4. Good
    5. Extremely Good

A Likert scale is ordinal (i.e., the items have a proper order, in this case from extremely poor to extremely good), not quantitative. Along a quantitative scale, distances between consecutive values are equal. For example, the quantitative scale 10, 20, 30, 40, 50, etc., exhibits equal intervals of 10 units from one value to the next. Distances between items on a Likert scale, however, are not necessarily equal. For example, the difference between “Extremely Poor” and “Poor” is not necessarily the same as the difference between “Poor” and “Moderate.” Also, with the quantitative scale mentioned above, 50 is five times greater than 10, but with the sample Likert scale, “Extremely Good” is not five times better than “Extremely Poor.” In the Likert scale that was used in this study, the distance between “Heard of it often” and “Know it well, understand the concept” seems quite a bit greater than the distance between any other two consecutive items, such as between “Never heard of it” and “Heard of it once or twice.” Likert scales require special handling when they’re used in research.

To quantify people’s responses to Likert scales (i.e., to convert them into quantitative scores), merely taking either of the sample Likert scales above and assigning the value 1 through 5 to the items (i.e., the value of 1 for “Extremely Poor,” etc.) would not produce a particularly useful measure. Researchers use various techniques for assigning values to items on Likert scales, and some are certainly better than others, but they are all pseudo-quantitative to some degree.

Imagine what it would be like to rely on people to determine air temperature using the following Likert scale:

    1. Extremely Cold
    2. Cold
    3. Average Temperature
    4. Hot
    5. Extremely Hot

Obviously, we wouldn’t use a Likert scale if we had an objective means, such as a thermometer, to measure something in a truly quantitatively manner. Subjective measures of objective reality are always suspect. When we convert subjective Likert scales into quantitative scores, as the researchers did in this study, the quantitative values that we assign to items along the scale are rough estimates at best. We must keep this in mind when we evaluate the merits of claims that are based on Likert scales.

Social science research studies are often plagued by many challenges, which is one of the reasons why attempts to replicate them frequently fail. This doesn’t seem to discourage many researchers, however, from making provocative claims.

Provocative Claims

Based on their dysfunctional definition of bullshitters, the researchers made several claims. I found one in particular to be especially provocative: a ranking of English-speaking countries based on their percentages of bullshitters, with Canada on top followed by the USA. As an American, I find it rather difficult to believe that our polite neighbors to the north are more inclined to bullshitting than we are. If we set aside our concerns about the researchers’ definition of bullshit for the moment and accept students’ responses to the three bogus mathematical concepts as a potentially reliable measure of bullshitting, we must then determine a meaningful way to convert those responses into a reliable bullshitter score before we can make any claims, especially provocative claims. Unfortunately, it is difficult to evaluate the method that the researchers used to do this because it’s hidden in a black box and they won’t explain it, except to say that they used an “Item Response Theory (IRT) model to produce a latent construct.” That was the answer that I received when I asked one of the researchers about this via email. Telling me that they used an IRT model didn’t really answer my question, did it? I want to know the exact logical and mathematical steps that they or their software took to produce their bullshitter score. How were the various Likert responses weighted quantitatively and why? Only by knowing this can we evaluate the merits of their results.

Social scientists aren’t supposed to obscure their methods. Given the fact that I couldn’t evaluate the researchers’ methods directly, I examined the data for myself and eventually tried several scoring approaches of my own. Upon examining the data, I soon became suspicious when I noticed that the bogus mathematical concept “Proper Number” elicited quite different responses than the other two. Notice how the patterns in the following graphs differ.

Only the items that I’ve numbered 1 through 4 indicate that the students claimed to be familiar with the bogus concepts. More than 50% of students indicated that they were familiar with the concept “Proper Number,” but only about 25% indicated that they were familiar with each of the other two concepts. Notice that responses indicating increasing degrees of familiarity with “Proper Number” correspond to increasing percentages of students. Far more students indicated that they “Knew it well, understand the concept,” than those who indicated that they “Heard of it once or twice.” This is the opposite of what we would expect if greater degrees of familiarity represented greater degrees of bullshitting. Declining percentages from left to right are what we would expect if students were bullshitting, which is exactly what we see in their responses to the concepts “Subjunctive Scaling” and “Declarative Fraction.” I suspect that this difference in behavior occurred because many students (perhaps most) who claimed to be familiar with the “Proper Number” concept were confusing it with some other concept that actually exists. To test this, I did a quick Google search on “Proper Number” and all of the links that were provided referenced “Perfect Number” instead, a legitimate concept, yet Google didn’t bother to mention that it substituted “Perfect Number” for “Proper Number.” Nothing similar occurred when I Googled the other two bogus concepts. This suggests that people search for “Proper Number” when they’re really looking for “Perfect Number” frequently enough for Google to make this automatic substitution. When I pointed this out to the primary researcher, expressed my concern, and asked her about it in our third email exchange, I never heard back. It is never a good sign when researchers stop responding to you when you ask reasonable questions or express legitimate concerns about their work. If responses to the three bogus concepts were due to the same behavior (i.e., bullshitting), we should see similar responses to all three, but this isn’t the case. In fact, when I compared responses per country, I found that the rank order of so-called bullshitting behavior per country was nearly identical for “Subjunctive Scaling” and “Declarative Fraction,” but quite different for “Proper Number.” Something different was definitely going on.

When I made variously weighted attempts to convert students’ Likert responses into bullshitter scores, I found that, if you consider all three bogus concepts, Canada does indeed take the prize for bullshitting, but if you exclude the question about “Proper Number,” Canada drops below the USA, which seems much more reasonable. As an American living at a time when the executive branch of government is being led by a prolific bullshitter, I can admit, albeit with great embarrassment, that we are plagued by an extraordinary tolerance of bullshitting.

Regardless, I don’t actually believe that we can put our trust even in students’ responses to the bogus concepts “Subjunctive Scaling” and “Declarative Fraction” as a reliable measure of bullshitting. Before I would be willing to publish scientific claims, I would need better measures.

Concluding Thoughts

I was trained in the social sciences and I value them greatly. For this reason, I’m bothered by practices that undermine the credibility of social science. The bullshitters study does not actually produce any reliable or useful knowledge about bullshitting behavior. Ironically, according to their own definition, the researchers are themselves bullshitters, for they are claiming knowledge that doesn’t actually exist. Social science can do better than this. At a time when voices in opposition to science are rising in volume, it’s imperative that it does.