Data Sensmaking, Science, and Atheism

January 13th, 2020

I’m an atheist. Despite the stigma that most Americans still attach to atheism, I embrace it without reservation. My perspective as an atheist is tightly coupled with my perspectives as an advocate of science and a data sensemaking professional. Atheism, science, and data sensemaking all embrace evidence and reason as the basis for understanding. All three shun beliefs that are not based on evidence and reason as the enemy of understanding.

I wasn’t always an atheist. Like most Americans born in the 1950s, I was raised as a Christian—in my case, a version of fundamentalist Protestantism known as Pentecostalism. Not satisfied with a nominal commitment to religion, I was that weird kid who carried his Bible with him to high school every day. While still a teenager, I felt called by God into the ministry and pursued that calling as my initial profession. Despite a genuine commitment, however, I sometimes felt a bit uneasy about my faith. From time to time, I was faced with facts and a sense of morality that conflicted with my faith. These conflicts became increasingly difficult to ignore. In my mid-20s, after many dark nights of the soul, I pulled out of the ministry and gradually abandoned my faith altogether while searching for a new foundation to build my life upon. Eventually, science became that foundation. The transition was painful, but also exciting. I went on to study religion in graduate school from an academic perspective (comparative religion, psychology of religion, sociology of religion, history of religion, etc.) because I wanted to better understand the powerful role of religion in people’s lives and in the world at large.

After leaving the ministry, I didn’t embrace atheism immediately. I first spent a few years exploring liberal expressions of religion (e.g., Unitarianism, the Society of Friends, and even Reformed Judaism), hoping to find a like-minded community, but they all had something in common that never felt right to me. That something was faith. As I increasingly embraced science as the best path to understanding, I increasingly recognized faith as a problem. Faith delivers ready-made answers based on authority—end of story—but science encourages open-ended curiosity, continuous self-correction, and discovery.

For many years, I called myself an agnostic. Without reservation, I could say, “I don’t know if a god exists.” Since no one really knows, in a sense everyone is an agnostic, whether they admit it or not. At the time, I didn’t think of myself as an atheist because I misunderstood the term. I thought that an atheist was someone who claimed to know for sure that no gods exist. That isn’t the case. Agnosticism is an epistemological expression—it’s concerned with knowledge, or more precisely, with the lack of knowledge: “I don’t know.” Atheism, on the other hand, is an expression of belief, or more precisely, the absence of belief. An atheist says, “I don’t believe that any gods exist.” One can also embrace a slightly firmer version of atheism that declares “I believe that no gods exist.” Either way, atheism does not claim “I know for sure that no gods exist.” Agnosticism and atheism represent the same epistemological perspective: “I don’t know if any gods exist.” Atheism just goes one step further by extending a lack of knowledge to the realm of belief.

Science resists certainty; it deals in probabilities. Based on the available evidence, something is either likely or unlikely to a statistically calculated degree. I lack belief in gods because I’m not aware of any evidence for their existence. During my years as a Christian, I accepted the Christian god’s existence as a matter of faith. At the time, I made this leap to make sense of the world, but I no longer need faith in a god to make sense of the world or my role in it. If evidence for a god’s existence ever emerges, I’ll reconsider my position.

It bears mentioning that, just as everyone is in a sense an agnostic, whether they realize it or not, everyone is also an atheist. Even if you’re a religious fundamentalist, as I was, you’re also an atheist. This is because, while you believe in your god, you don’t believe in other gods. In other words, in respect to most of the gods that people believe in—all but your own—you’re an atheist. In this respect, you and I are a lot alike. We only differ in that I include one more god on my list of those that I don’t believe in.

Religions codify faith-based beliefs. They declare what is true about the world, about humans, about our role in the world, and, of course, about the role of supernatural beings. They do so without evidence. Faith discourages curiosity and the search for truth. As Richard Dawkins wrote, “One of the truly bad effects of religion is that it teaches us that it is a virtue to be satisfied with not understanding” (The God Delusion, 2008, page 126). As a data sensemaking professional, my commitment to reason and evidence as the basis for understanding puts me at odds with faith.

We can thank the late Harvard evolutionary biologist Stephen J. Gould for the conceptual basis on which many scientists and data sensemakers who are also religious reconcile these conflicting perspectives. Gould proposed that science and religion occupy two “Nonoverlapping Magisteria.” I admire Gould’s work greatly. He was a marvelous scientist who did a great deal to popularize science, but I find this awkward construction intolerable. According to Gould, science has its domain, religion has its domain, and the two don’t overlap. Furthermore, these two domains should respect one another and consistently stick to their own distinct areas of expertise. As explained by Adam Neiblum in his book Unexceptional when describing Gould’s position:

Each magisteria has its own epistemic foundation, each fulfilling a different role in human needs and affairs. Science, epistemologically founded on empirical observation, evidence, data and reason, necessarily deals with facts about the world, while religion, epistemologically founded on personal revelation and faith, deals with values and morality, which have nothing to do with matters of fact about the world. (Unexceptional: Darwin, Atheism and Humanity, Adam Neiblum, 2017, p. 166)

Since the emergence of modern science, it and religion have always co-existed uncomfortably. I suspect that Gould wanted to make science more palatable for religious folks—the majority of Americans—so he separated science and religion into exclusive, non-competing realms.

According to a Pew Research Center survey of scientists (specifically members of the American Association for the Advancement of Science), only 33% believe in a god and over 40% identify themselves as atheists or agnostics (“Scientists and Belief,” Pew Research Center, November 5, 2009, www.pewforum.org/2009/11/05/scientists-and-belief). This is extraordinary in light of the following statistics, also reported by Pew:

The vast majority of Americans (90%) believe in some kind of higher power, with 56% professing faith in God as described in the Bible and another 33% saying they believe in another type of higher power or spiritual force. Only one-in-ten Americans say they don’t believe in God or a higher power of any kind. (“Key Findings about Americans’ Belief in God,” Pew Research Center, April 5, 2018, www.pewresearch.org/fact-tank/2018/04/25/key-findings-about-americans-belief-in-god/)

The correlation between scientific work and atheism, while extraordinary, is not surprising. Pursuit of science is not necessarily responsible for a lack of theistic belief, but my own exposure to science definitely influenced my departure from theism, and to a great degree.

There is a fundamental problem with Gould’s concept of Nonoverlapping Magisteria: it isn’t scientific. Science is definitely concerned with religion’s claims that the world was created by a god and that supernatural entities (gods, angels, spirits, demons, leprechauns, dead people, etc.) continue to intervene in the world’s affairs. Religion is definitely agitated by the fact that more and more of its territory is being reduced by scientific discoveries. This conflict cannot be defined out of existence. To do so defies the tenets and methods of science.

I reject the notion that morality is the rightful and exclusive domain of religion. Morality does not require religion. To say that it does makes morality an obligation that’s imposed on us by an external authority rather than a personal choice. I am no less moral as an atheist than I was as a Christian. Actually, I am more moral, for my behavior is based entirely on a personal sense of good behavior, never on a belief that I must behave in certain ways because a god demands it. When I was religious, my morality was governed, at least in part, by fear. You don’t dare piss off a god.

As religions develop, they codify morality in various ways, but they don’t create it. Morality began to evolve in social animals before the emergence of Homo sapiens. Certain ways of behaving towards others naturally evolved as moral instincts in all social animals, not just humans. Altruism, justice, and fairness are exhibited quite naturally in our species and in several others as well.

If you’re a scientist, or similarly, if you’re a data sensemaking professional, and you’re also religious, you must come to grips with the conflict that exists between these perspectives. You must divide your life, as Gould proposed, into two distinct realms. You can’t allow your willingness to accept things on faith to influence your work. Professionally, you must always go where the evidence leads you. If you do this successfully in your work, it may become increasingly difficult to do otherwise in your personal life.

Despite the stigma about atheism that still persists, an increasing number of people embrace it as a reasonable position. This is especially true among the young. They are less militant about it than my generation, however. Unlike my generation, many of them haven’t needed to claw their way out of religion. Atheism simply makes sense to them and has from an early age.

As with almost everything that I write about in this blog, this article was prompted by a particular event. Not long ago I was approached by the business school of a nearby religiously affiliated college to help them put together a data analytics program, and potentially, to also teach in the program, so I reviewed their website to find out just how religious they were. Despite the fact that members of the denomination that founded and runs this college are often quite liberal and known for their work as social activists, I found that this college is quite fundamentalist in its statement of faith. Here it is, word for word:

The Trinity
We believe in one eternal God, the source and goal of life, who exists as three persons in the Trinity: the Father, the Son, and the Holy Spirit. In love and joy, God creates and sustains the universe, including humanity, male and female, who are made in God’s image.
God the Father
We believe in God the Father Almighty, whose love is the foundation of salvation and righteous judgment, and who calls us into covenant relationship with God and with one another.
God the Son
We believe in Jesus Christ, the Word, who is fully God and fully human. He came to show us God and perfect humanity, and, through his life, death, and resurrection, to reconcile us to God. He is now actively present with us as Savior, Teacher, Lord, Healer, and Friend.
God the Holy Spirit
We believe in the Holy Spirit, who breathed God’s message into the prophets and apostles, opens our eyes to God’s Truth in Jesus Christ, empowers us for holy living, and carries on in us the work of salvation.
Salvation
We believe that salvation comes through Jesus Christ alone, to whom we must respond with repentance, faith, and obedience. Through Christ we come into a right relationship with God, our sins are forgiven, and we receive eternal life.
The Bible
We believe that God inspired the Bible and has given it to us as the uniquely authoritative, written guide for Christian living and thinking. As illumined by the Holy Spirit, the Scriptures are true and reliable. They point us to God, guide our lives, and nurture us toward spiritual maturity.
The Christian Life
We believe that God has called us to be and to make disciples of Jesus Christ and to be God’s agents of love and reconciliation in the world. In keeping with the teaching of Jesus, we work to oppose violence and war, and we seek peace and justice in human relationships and social structures.
The Church
We believe in the church as the people of God, composed of all who believe in Jesus Christ, who support and equip each other through worship, teaching, and accountability, who model God’s loving community, and who proclaim the gospel to the world.
Christian Worship
We believe Christ is present as we gather in his name, seeking to worship in spirit and in truth. All believers are joined in the one body of Christ, are baptized by the Spirit, and live in Christ’s abiding presence. Christian baptism and communion are spiritual realities, and, as Christians from many faith traditions, we celebrate these in different ways.
The Future
We believe in the personal return of Jesus Christ, in the resurrection of the dead, in God’s judgment of all persons with perfect justice and mercy, and in eternal reward and punishment. Ultimately, Christ’s kingdom will be victorious over all evil, and the faithful will reign.

Wow. This is an incredible statement of faith. I mean this quite literally: it isn’t credible. Not a shred of verifiable evidence exists for any of these assertions, but faculty members at this college must affirm these articles of faith in writing. Obviously, I can’t make this affirmation. I pointed this problem out to my contact at the college and she suggested a way to get around it. I didn’t want to offend her, but I had to make it clear that I could not affiliate myself with a faith-based religious organization, even if they allowed it. What I didn’t say was that I found the college’s statement of faith frightening. It brought back vivid memories of the faith-based beliefs that I fought hard to abandon when I was young.

This encounter left me wondering how people of faith manage to avoid the pitfalls of this orientation when making sense of data or doing science. The cognitive dissonance must be exhausting. I have no desire to offend anyone who navigates this tension, but I am genuinely concerned that it affects the work. Faith primes us to accept certain things as true, without question, regardless of the evidence. This is never a good approach to data sensemaking or science. The magisteria definitely overlap. The conflict is real. If you’re religious and also a scientist or a data sensemaker, you must navigate these conflicting perspectives with care. I, for one, couldn’t do it.

Linear Versus Logarithmic Thinking about Numbers

December 26th, 2019

Some folks argue that humans intuitively think about numbers logarithmically versus linearly. My experience strongly suggests that this is not the case. If you’ve ever tried to explain logarithms or logarithmic scales to people, or asked them to interpret graphs with logarithmic scales, as I have often done, you probably share my belief that logarithms are not cognitively intuitive. The behaviors that are sometimes described as intuitive logarithmic thinking about numbers can be reasonably explained as something else entirely.

According to some sources, a research study found that six-year-old children, when asked to identify the number the falls halfway between one and nine, often selected three. Unfortunately, after extensive searching I cannot find a study that actually performed this particular experiment. One article that makes this claim cites a study titled “A Framework for Bayesian Optimality of Psychophysical Laws” as the source, but that study does not mention this particular experiment or finding. Instead, it addresses the logarithmic nature of perception, especially auditory perception. Keep in mind that perception and cognition are related but different. Many aspects of perception do indeed appear to be logarithmic. As the authors of the study mentioned above observed about auditory perception, “…under the Weber–Fechner law, a multiplicative increase in stimulus intensity leads to an additive increase in perceived intensity,” but that’s a different matter. I’m talking about cognition. Even if many kids actually did select three as halfway between one and nine in an experiment, I doubt that they were thinking logarithmically. At age six children have not yet learned to think quantitatively beyond a rudimentary understanding of numbers. Until they begin to learn mathematics, children tend to think with a limited set of numbers consisting of one, two, three, and more, which corresponds to the preattentive perception of numerosity that is built into our brains. With this limited understanding, three is the largest number that they identify individually, so it might be natural for them to select three as the value that falls halfway between one and numbers that are larger than three. If the numbers were displayed linearly and in sequence for the children to see when asked to select the number in the middle (illustrated below), however, I suspect that they would correctly select five.

1   2   3   4   5   6   7   8   9

You might argue that this works simply because it allows children to rely on spatial reasoning to identify the middle number. That is absolutely true. We intentionally take advantage of spatial reasoning when introducing several basic concepts of mathematics to children. This works as a handy conceptual device to kickstart quantitative reasoning. Believing that children naturally think logarithmically would lead us to predict that, if asked to identify the number halfway between 1 and 100, they would be inclined to choose 10. Somehow, I doubt that they would.

Another research-based example that has been used to affirm the intuitive nature of logarithmic thinking about numbers is the fact that people tend to think of the difference between the numbers one and two as greater than the difference between the numbers eight and nine. I suspect that they do this, however, not because they’re thinking logarithmically, but more simply because they’re thinking in terms of relative magnitude (i.e., proportions). Even though the incremental difference between both pairs of numbers is a value of one (i.e., 2 – 1 = 1 and 9 – 8 = 1), the number two represents twice the magnitude of one while the number nine is only 12.5% greater than eight, a significantly lesser proportion. I anticipate that some of you who are mathematically inclined might object: “But logarithmic thinking and proportional thinking are one and the same.” Actually, this is not the case. While logarithms always involve proportions, not all proportions involve logarithms. A logarithmic scale involves a consistent proportional sequence. For example, with a log base 10 scale (i.e., log10), each number along the scale is ten times the previous number. Only when we think of a sequence of numbers in which each number exhibits a consistent proportion relative to previous number are we thinking logarithmically. We do not appear to do that naturally.

Another example, occasionally cited, is that people tend to think of differences between one thousand, one million, one billion, one trillion, etc., as equal when in fact each of these numbers is 1,000 times greater than the previous. Is this because people are thinking logarithmically? I doubt it. I suspect that it is simply because each of these values exhibits the next change in the label (e.g., from the label “thousand” to the label “million”), and changes in the labels suggest equal distances. If people intuitively thought about numbers logarithmically, they should automatically recognize that each of these values (one billion versus one million versus one thousand, etc.) is 1,000 times the previous, but most of us don’t realize this fundamental fact about our decimal system without first doing the math.

Along linear scales, increments from one value to the next are determined by addition—you always add a particular value to the previous value to produce the next value in the sequence, such as by adding a value of one to produce the scale 0, 1, 2, 3, 4, etc. or a value of ten to produce the scale 0, 10, 20, 30, 40, etc. Along logarithmic scales, on the other hand, increments are determined by multiplication—you always multiply the previous value by a particular number to produce the next value in the sequence, such as by multiplying each value by two to produce the scale 1, 2, 4, 8, etc., or by ten to produce the scale 1, 10, 100, 1,000, etc. The concept of logarithms, when clearly explained, is not difficult to understand once you’ve learned the mathematical concept of multiplication, but thinking about numbers logarithmically does not appear to be intuitive. It takes training.

Repeat Due to Pathology

October 31st, 2019

Automated information systems only work if they actually inform and do so clearly. Too often, however, they create confusion. This was not what we had in mind when I and others created some the earliest automated information systems back in the 1980s, when the personal computer began its rapid and thorough takeover of the workplace.

Back then, I was starry-eyed, convinced that everything imaginable should be automated using computers. Unfortunately, I and my colleagues at the time rarely, if ever, questioned the merits of automation. We were having too much fun replacing old manual processes with new automated systems. We were rock stars! We were convinced that those new systems could only do good. My oh my, were we mistaken. Not everything benefits from automation, and even good candidates become counter-productive when they’re poorly designed. Choosing good candidates for automation and then building systems that do the job well takes time and care—two rare ingredients in a “move fast and break things” IT culture.

The most recent reminder of this problem arrived in the form of an email from my health plan yesterday. The email informed me that a new test result was available through the plan’s web-based information system, called MyChart. I assumed that the test result was related to the colonoscopy that I endured the previous week. To put things in perspective, the first time that I had a colonoscopy, the doctor perforated my colon, which landed me in the hospital facing potentially dire consequences. So, as you might imagine, I dread colonoscopies even more than most people.

When I opened the test result in MyChart, it was indeed related to my recent colonoscopy. Here’s what I found:

Other than the date, which matched the date of the procedure, nothing else in this so-called test result made sense to me.

  • What does “Colonoscopy Impression, External” mean? Nothing about the procedure was external.
  • Who is this person identified as “Historical Provider, MD”? My doctor had a name.
  • This was identified as a “Final result,” but I didn’t know that I was awaiting further results. Before leaving the doctor’s office, I thought I was given a full account of the doctor’s findings both verbally and in writing.
  • Most alarmingly, what does a “Your Value” of “repeat based on pathology” mean? Did I have to go through this again? Why? What was wrong?
  • And, to top it all off, I couldn’t tell how the “repeat based on pathology” value compared to the “Standard Range” (i.e., a normal result), because it was blank.

In a panic, I clicked on the “About this Test” icon in the upper-right corner, hoping for an explanation, which produced no results.

The stupidity of this automated system not only produced a panic, it also led me to contact an actual human to resolve the confusion. In other words, a system that was supposed to reduce the work of humans actually added to it, which happens all too often. The human that I contacted, a friendly woman named Beth, didn’t understand what “repeat based on pathology” meant any more than I did, but she was able to access a letter that was placed in the mail to me yesterday, which provided an answer. As it turns out, because a single polyp was found and removed during the procedure, I’m at greater risk than most people of future polyps that could become malignant, so I should have another colonoscopy in five years. What a relief.

Could the test result that was posted to MyChart have provided clear and useful information? Absolutely, but it didn’t, and this wasn’t the first time. I had a similar experience a few months ago while reviewing the results posted in MyChart of a lengthy blood panel. On that occasion, I had to get my doctor on the phone to interpret several obscure lab results.

Information technologies are not a panacea. They aren’t useful for everything, and when they are useful, they must be well designed. Otherwise, they complicate our lives.

The Myth of Technology’s Neutrality

August 5th, 2019

In a recent op-ed that appeared in the New York Times, Tech billionaire Peter Thiel said this about artificial intelligence (AI):

The first users of the machine learning tools being created today will be generals…AI is a military technology.

Thiel’s opinion has created a backlash among some of AI’s proponents who don’t want their baby to be cast in this light. There is no doubt, however, that AI will be used for military purposes. It’s already happening. But that’s not the topic that I want to address in this blog post. Instead, I want to warn against a dangerous belief that is prevalent among many digital technologists. In a response to Thiel’s op-ed, Dawn Song, a computer science professor at the University of California, Berkeley who works in the Berkeley Artificial Intelligence Research Lab told Business Insider,

I don’t think we can say AI is a military technology. AI, machine learning technology is just like any other technologies. Technology itself is neutral.

According to Business Insider, Song went on to say that, just like nuclear or security encryption technologies, AI can be used in either good ways or bad. Basing her claim that technology is neutral on the fact that it can be used in both good and bad ways is a logical error. Anything can be used in both good and bad ways, but that doesn’t make everything neutral. Digital technologies in particular are never neutral. That’s because they are created by people and people are never neutral. Digital technologies consist of programs that are written by people with assumptions, interests, biases, beliefs, perspectives, and agendas that become embedded in those programs.

Not only are digital technologies not neutral, they shouldn’t be. They should be designed to do good and to prevent harm. If the potential for harm exists, technologies should be designed to prevent it. If the potential for harm is great and it cannot be prevented, the technology should not be developed. That’s right. Creating something that does great harm is immoral. This is especially true of AI, for its potential for harm is enormous. If general AI—a truly sentient machine—were ever developed, that machine would not only exhibit the non-neutral objectives that were programmed into it, it would soon develop its own interests and objectives that might be quite different from those of humanity. At that point, we would be faced with a silicon-based competitor that could work at speeds that would leave us in the dust. Our puny interests probably wouldn’t count for much. Could we create a superintelligent AI that would respect our interests? At this point, we don’t know.

Fortunately, some of the folks who are positioned at the forefront of AI research recognize its great potential for harm and are working fervently and thoughtfully to prevent this from happening. They are painfully aware of the fact that this might not be possible. Unfortunately, however, there are probably even more people working in AI who exhibit the same naivete as Dawn Song. Believing that AI is neutral is a convenient way of relinquishing responsibility for the results of their work. Look at the many ways that digital technologies are being used for harm today and ask yourself, was this the result of neutrality? No, those behaviors were either intentionally designed into the products and services or were the result of negligence. There is a great risk that harmful behaviors would develop within AI that were neither anticipated nor intended. The claim that digital technologies in general and AI in particular are neutral should concern us. Technologies are human creations. We must take responsibility for them. The cost of not taking responsibility is too high. Sometimes this means that we must prevent particular technologies from ever being developed. Whether or not this is true of general AI has yet to be determined. While the jury is still out, I’d like the jury to be composed of people who are working hard to understand the costs and to take them seriously, not people who naively believe in technology’s neutrality.

Ethical Data Sensemaking

July 22nd, 2019

Simply stated, data sensemaking is what we do to make sense of data. We do this in an attempt to understand the world, based on empirical evidence. Those who work to make sense of data and communicate their findings are data sensemakers. Data sensemaking, as a profession, is currently associated with several job titles, including data analyst, business intelligence professional, statistician, and data scientist. Helping people understand the world based on data is important work. Without understanding, we often make bad decisions. When done well, data sensemaking requires a broad and deep set of skills and a commitment to ethical conduct. When data sensemaking professionals fail to do their jobs well, whether through a lack of skills or other ethical misconduct, confusion and misinformation results, which encourages bad decisions—decisions that do harm. Making sense of data is not ethically or morally neutral; it can be done for good or ill. “I did what I was told” is not a valid excuse for unethical behavior.

In recent years, misuses of data have led to a great deal of discussion about ethics related to invasions of privacy and discriminatory uses of data. Most of these discussions focus on the creation and use of analytical algorithms. I’d like to extend the list of ethical considerations to address the full range of data sensemaking activities. The list of ethical practices that I’m proposing below is neither complete nor sufficiently organized nor fully described. I offer it only as an initial effort that we can discuss, expand, and clarify. Once we’ve done that, we can circle back and refine the work.

The ethical practices that can serve as a code of conduct for data sensemaking professionals are, in my opinion, built upon a single fundamental principle. It is the same principle that medical doctors swear as an oath before becoming licensed: Do no harm.

Here’s the list:

  1. You should work, not just to provide information, but to enable understanding that can be used in beneficial ways.
  2. You should develop the full range of skills that are needed to do the work of data sensemaking effectively. Training in a data analysis tool is not sufficient. This suggests the need for an agreed-upon set of skills for data sensemaking.
  3. You should understand the relevant domain. For instance, if you’re doing sales analysis, you should understand the sales process as well as the sales objectives of your organization. When you don’t understand the domain well enough, you must involve those who do.
  4. You should know your audience (i.e., your clients; those who are asking you to do the work)—their interests, beliefs, values, assumptions, biases, and objectives—in part to identify potentially unethical inclinations.
  5. You should understand the purpose for which your work will be used. In other words, you should ask “Why?”.
  6. You should strive to anticipate the ways in which your findings could be used for harm.
  7. When asked to do something harmful, you should say “No.” Furthermore, you should also discourage others from doing harm.
  8. When you discover harmful uses of data, you should challenge them, and if they persist, you should expose them to those who can potentially end them.
  9. You should primarily serve the needs of those who will be affected by your work, which is not necessarily those who have asked you to do the work.
  10. You should not examine data that you or your client have no right to examine. This includes data that is private, which you have not received explicit permission to examine. To do this, you must acquaint yourself with data privacy laws, but not limit yourself to concern only for data that has been legally deemed private if it seems reasonable that it should be considered private nonetheless.
  11. You should not do work that will result in the unfair and discriminatory treatment of particular groups of people based on race, ethnicity, gender, religion, age, etc.
  12. If you cannot enable the understanding that’s needed with the data that’s available, you should point this out, identify what’s needed, and do what you can to acquire it.
  13. If the quality of the data that’s available is insufficient for the data sensemaking task, you should point this out, describe what’s lacking, and insist that the data’s quality be improved to the level that’s required before proceeding.
  14. You should always examine data within context.
  15. You should always examine data from all potentially relevant perspectives.
  16. You should present your findings clearly.
  17. You should present your findings as comprehensively as necessary to enable the level of understanding that’s needed.
  18. You should present your findings truthfully.
  19. You should describe the uncertainty of your findings.
  20. You should report any limitations that might have had an effect on the validity of your findings.
  21. You should confirm that your audience understands your findings.
  22. You should solicit feedback during the data sensemaking process and invite others to critique your findings.
  23. You should document the steps that you took, including the statistics that you used, and maintain the data that you produced during the course of your work. This will make it possible for others to review your work and for you to reexamine your findings at a later date.
  24. When you’re asked to do work that doesn’t make sense or to do it in a way that doesn’t make sense (i.e., in ways that are ineffective), you should propose an alternative that does make sense and insist on it.
  25. When people telegraph what they expect you to find in the data, you should do your best to ignore those expectations or to subject them to scrutiny.
    As data sensemakers, we stand at the gates of understanding. Ethically, it is our job to serve as gatekeepers. In many cases, we will be the only defense against harm.

I invite you to propose additions to this list and to discuss the merits of the practices that I’ve proposed. If you are part of an organization that employs other data sensemakers, I also invite you to discuss the ethical dimensions of your work with one another.

The Inflated Role of Storytelling

July 14th, 2019

People increasingly claim that the best and perhaps only way to convince someone of something involves telling them a story. In his new book Ruined By Design—a book that I largely agree with and fully appreciate—designer Mike Monteiro says that “If you’re not persuading people, you’re not telling a good enough story.” Furthermore, “…while you should absolutely include the data in your approach, recognize that when you get to the point where you’re trying to persuade someone…, you need a story.” Really? Where’s the evidence for this claim? On what empirical research is it based? And what the hell is a story, anyway? Can you only persuade people by constructing a narrative—a presentation that has a beginning, middle, and end, with characters and plot, tension and resolution? In truth, stories are only one of several ways that we can persuade. In some cases, a simple photograph might do the trick. A gesture, such as a look of anger or a raised fist, sometimes works. A single sentence or a chart might do the job. Even a straightforward, unembellished presentation of the facts will sometimes work. The notion that stories are needed to convince people is itself a story—a myth—nothing more.

It reminds me of the silly notion that people only use 10% of their brains, which someone fabricated long ago from thin air and others have since quoted without ever checking the facts. This notion is absurd. If we used only 10% of our brains, the other 90% would wither and die. Stories are not the exclusive path to persuasion. Not everyone can be convinced in the same way and most people can be convinced in various ways, depending on the circumstances. While potentially powerful and useful, the role of stories is overblown.

One of the common errors that people sometimes make when promoting the power of stories is the notion that stories work because they appeal to emotions. For example, Monteiro wrote that “…people don’t make decisions based on data; they make them based on feelings.” This is the foundation for his rationale that stories are the only path to persuasion. Stories can certainly appeal to emotions, but stories can also present facts without any emotional content whatsoever. We all, no matter how rational, are subject to emotion, but not exclusively so. Stories structure information in narrative form and those narratives can appeal to emotions, to the rational mind, or both. In other words, saying that stories are powerful is not the same as saying that appeals to people’s feelings are powerful.

Don’t get me wrong, stories are great; they’re just not the panacea that many people now claim. The current emphasis on storytelling is a fad. In time, it will fade. In time, some of the people who promote stories to the exclusion of other forms of communication will look back with embarrassment. No matter what they claim, no one actually believes that only stories can convince people. No one exclusively uses stories to persuade. We all use multiple means and that’s as it should be. The sooner we get over this nonsense that only stories can persuade, the sooner we can get on to the real task of presenting truths that matter in all the ways that work.

Breadth Before Depth

June 25th, 2019

I’ve long recognized the value of broad experience, education, and interests. It enables us to see the world from multiple perspectives and to connect ideas from multiple domains. I’ve always felt that my own meandering path through multiple areas of study, interest, and work has allowed me to think in ways that a narrow path would have never produced. Deep experience and study are valuable as well, but without breadth, depth breeds myopia. Given this notion, I was thrilled to find a new book that articulates this case eloquently and backs it with a wealth of evidence. The book, written by David Epstein, is titled Range: Why Generalists Triumph in a Specialized World.

It is, in my opinion, the most important book about thinking, learning, and problem solving since Daniel Kahneman’s book Thinking, Fast and Slow.

Back in 2008, when Malcolm Gladwell wrote the book Outliers, he promoted the value of narrow, repetitive, and extensive training. Gladwell highlighted the notion that genuine expertise in any endeavor requires around 10,000 hours of focused training. While it is true that some areas of endeavor can be mastered through extensive repetition of specific tasks (e.g., learning to play golf or chess), many others cannot. As it turns out, the notion that people learn best if they pick a specific area of endeavor when they’re young and stick to it with unflagging commitment and discipline is not a model that works in most cases. The skills that can be developed in this way are rather isolated. In fact, by gaining broad experience—generalizing—we can learn to think in ways that are more flexible and better able to fathom complexities. This is an important insight, for the world in which we live today is increasingly complex. The cross-fertilization of ideas that is nurtured by generalization prepares us to deal with modern challenges. You might find it interesting to note what Gladwell thinks of Epstein’s new book:

For reasons I cannot explain, David Epstein manages to make me thoroughly enjoy the experience of being told that everything I thought about something was wrong. I loved Range.

This is a classy admission by someone who, as a generalist himself, is well positioned to recognize flaws in his former thesis.

Fairly early in the book, Epstein writes:

The challenge we all face is how to maintain the benefits of breadth, diverse experience, interdisciplinary thinking, and delayed concentration in a world that increasingly incentivizes, even demands, hyperspecialization. While it is undoubtedly true that there are areas that require individuals with…precocity and clarity of purpose, as complexity increases—as technology spins the world into vaster webs of interconnected systems in which each individual only sees a small part—we also need more…people who start broad and embrace diverse experiences and perspectives while they progress. People with range.

There are “kind” learning environments, in which “patterns repeat over and over, and feedback is extremely accurate and usually very rapid” (e.g., golf and chess). “The learning environment is kind because a learner improves simply by engaging in the activity and trying to do better.” There are also “wicked” domains, in which “the rules of the game are often unclear or incomplete, there may or may not be repetitive patterns and they may not be obvious, and feedback is often delayed, inaccurate, or both.” To an increasing extent, the modern world is not kind. To navigate it successfully, we need range. Computers are great at handling kind environments, hence the growing success of narrow AI. We humans, however, assuming that we cultivate the abstract and multifaceted thinking that our brains have evolved to handle, are much better at handling the wicked problems that pose our greatest challenges today.

“AI systems are like savants.” They need stable structures and narrow worlds.

When we know the rules and answers, and they don’t change over time—chess, golf, playing classical music—an argument can be made for savant-like hyperspecialization practice from day one. But those are poor models of most things humans want to learn.

Our educational system is not doing a good job of preparing future generations for the increasingly wicked world in which they will live, and employers often fail to recognize the benefits of generalization. This needs to change. David Epstein does a great job of explaining why and suggesting some of the ways to make this happen. It is never too late to broaden your horizons.

Bullshit about Bullshitters and other Misadventures in Social Science

May 20th, 2019

I recently came across a news story about a social science research study that caught my attention. How could I resist a story about bullshitters? According to the study, titled “Bullshitters. Who Are They and What Do We Know about Their Lives?”, this is “an important new area of social science research.” Reviewing the research paper revealed more about problems in social science research, however, than anything meaningful and useful about bullshit, bullshitting, or bullshitters. In this blog post, I’ll describe a few of these problems.

A Useless Definition

The researchers defined “bullshitters” as “individuals who claim knowledge or expertise in an area where they actually have little experience or skill.” If you read the study, however, you will find that this does not accurately describe the behavior that they examined. A more accurate and specific description would state that bullshitters are “people who claim, for any reason, to be familiar with and perhaps even understand concepts that don’t actually exist.” The study is based on the responses of 15-year old students in English-speaking countries to questions about three bogus mathematical concepts that they answered while taking the Programme for International Student Assessment (PISA) exam. According to this study, students who claim knowledge of a bogus mathematical concept, for whatever reason, are bullshitters. This, however, is not what people typically mean by the term. Typically, we think of bullshitters as people who make shit up, not also as people who make mistakes, but the authors didn’t make this distinction. If you turn to someone and ask, “Are you bullshitting me?,” you are asking if they intentionally fabricated or exaggerated what they just told you. Bullshitting involves intention. The act of intentionally claiming expertise that you lack to inflate your worth in the eyes of others is indeed a behavior that could be studied, but the mixture of intentional deception and unintentional error does not qualify as a single specific behavior.

Why did the researchers define bullshitters as they did? I suspect it is because they couldn’t determine the difference between intentional deceit and confusion about the bogus mathematical concepts. Defining bullshitters as they did, however convenient, produced a useless study. What can we possibly do with the results? Unfortunately, many social science research studies fall into this category. In part, this is a result of the current myopic emphasis in academia on publication. To get ahead as an academic in a research-oriented discipline, you must publish, publish, publish. For individuals, getting published, and for academic institutions, having published studies cited in other publications, is more valuable than useful research. This is a travesty.

Unreliable Measures and Questionable Statistics

By reviewing many social science research studies over the years, I’ve learned that you should take their claims with a grain of salt until you examine the work carefully. To do this, you must not only read the papers closely, you must also examine the data on which the research was based, including the ways the data was manipulated. By “manipulated,” I don’t mean that the researchers intentionally screwed with the data to support particular conclusions, although this does occur, but merely that they produced their own data from the original data on which the research was based, usually by means of statistical operations (e.g., statistical models of various types) that rely on assumptions. To take research conclusions seriously, we must confirm that the data, the statistical models, and the assumptions on which they are based are all valid and reliable. When researchers don’t provide us with the means to validate their data, we should never accept their conclusions on faith. In my opinion, studies that don’t clearly describe the data on which their findings are based and don’t make that data readily available for inspection don’t qualify as legitimate science.

Social science is challenged by the fact that it often cannot directly measure the phenomenon that it seeks to understand. For example, you cannot place a human subject into a machine that’s capable of measuring their bullshitting behavior. You’re forced to use a proxy—that is, to measure something that you believe is closely related and representative—as the best means available. In this particular study, the researchers chose to treat students’ answers to questions about three bogus mathematical concepts as their proxy for bullshitting.

While taking the PISA exam, students were asked about a series of sixteen mathematical concepts, including three bogus concepts—”Proper Number,” “Subjunctive Scaling,” and “Declarative Fraction”—and for each they were asked to select from the following list the response that best described their familiarity with the concept:

    1. Never heard of it
    2. Heard of it once or twice
    3. Heard of it a few times
    4. Heard of it often
    5. Know it well, understand the concept

These five potential responses comprise something called a Likert scale. The items are supposed to represent the full range of possible responses. Another more typical set of Likert items that often appears in questionnaires asks people to assess the merits of something, such as a particular product, by selecting from a list of responses like the following:

    1. Extremely Poor
    2. Poor
    3. Moderate
    4. Good
    5. Extremely Good

A Likert scale is ordinal (i.e., the items have a proper order, in this case from extremely poor to extremely good), not quantitative. Along a quantitative scale, distances between consecutive values are equal. For example, the quantitative scale 10, 20, 30, 40, 50, etc., exhibits equal intervals of 10 units from one value to the next. Distances between items on a Likert scale, however, are not necessarily equal. For example, the difference between “Extremely Poor” and “Poor” is not necessarily the same as the difference between “Poor” and “Moderate.” Also, with the quantitative scale mentioned above, 50 is five times greater than 10, but with the sample Likert scale, “Extremely Good” is not five times better than “Extremely Poor.” In the Likert scale that was used in this study, the distance between “Heard of it often” and “Know it well, understand the concept” seems quite a bit greater than the distance between any other two consecutive items, such as between “Never heard of it” and “Heard of it once or twice.” Likert scales require special handling when they’re used in research.

To quantify people’s responses to Likert scales (i.e., to convert them into quantitative scores), merely taking either of the sample Likert scales above and assigning the value 1 through 5 to the items (i.e., the value of 1 for “Extremely Poor,” etc.) would not produce a particularly useful measure. Researchers use various techniques for assigning values to items on Likert scales, and some are certainly better than others, but they are all pseudo-quantitative to some degree.

Imagine what it would be like to rely on people to determine air temperature using the following Likert scale:

    1. Extremely Cold
    2. Cold
    3. Average Temperature
    4. Hot
    5. Extremely Hot

Obviously, we wouldn’t use a Likert scale if we had an objective means, such as a thermometer, to measure something in a truly quantitatively manner. Subjective measures of objective reality are always suspect. When we convert subjective Likert scales into quantitative scores, as the researchers did in this study, the quantitative values that we assign to items along the scale are rough estimates at best. We must keep this in mind when we evaluate the merits of claims that are based on Likert scales.

Social science research studies are often plagued by many challenges, which is one of the reasons why attempts to replicate them frequently fail. This doesn’t seem to discourage many researchers, however, from making provocative claims.

Provocative Claims

Based on their dysfunctional definition of bullshitters, the researchers made several claims. I found one in particular to be especially provocative: a ranking of English-speaking countries based on their percentages of bullshitters, with Canada on top followed by the USA. As an American, I find it rather difficult to believe that our polite neighbors to the north are more inclined to bullshitting than we are. If we set aside our concerns about the researchers’ definition of bullshit for the moment and accept students’ responses to the three bogus mathematical concepts as a potentially reliable measure of bullshitting, we must then determine a meaningful way to convert those responses into a reliable bullshitter score before we can make any claims, especially provocative claims. Unfortunately, it is difficult to evaluate the method that the researchers used to do this because it’s hidden in a black box and they won’t explain it, except to say that they used an “Item Response Theory (IRT) model to produce a latent construct.” That was the answer that I received when I asked one of the researchers about this via email. Telling me that they used an IRT model didn’t really answer my question, did it? I want to know the exact logical and mathematical steps that they or their software took to produce their bullshitter score. How were the various Likert responses weighted quantitatively and why? Only by knowing this can we evaluate the merits of their results.

Social scientists aren’t supposed to obscure their methods. Given the fact that I couldn’t evaluate the researchers’ methods directly, I examined the data for myself and eventually tried several scoring approaches of my own. Upon examining the data, I soon became suspicious when I noticed that the bogus mathematical concept “Proper Number” elicited quite different responses than the other two. Notice how the patterns in the following graphs differ.

Only the items that I’ve numbered 1 through 4 indicate that the students claimed to be familiar with the bogus concepts. More than 50% of students indicated that they were familiar with the concept “Proper Number,” but only about 25% indicated that they were familiar with each of the other two concepts. Notice that responses indicating increasing degrees of familiarity with “Proper Number” correspond to increasing percentages of students. Far more students indicated that they “Knew it well, understand the concept,” than those who indicated that they “Heard of it once or twice.” This is the opposite of what we would expect if greater degrees of familiarity represented greater degrees of bullshitting. Declining percentages from left to right are what we would expect if students were bullshitting, which is exactly what we see in their responses to the concepts “Subjunctive Scaling” and “Declarative Fraction.” I suspect that this difference in behavior occurred because many students (perhaps most) who claimed to be familiar with the “Proper Number” concept were confusing it with some other concept that actually exists. To test this, I did a quick Google search on “Proper Number” and all of the links that were provided referenced “Perfect Number” instead, a legitimate concept, yet Google didn’t bother to mention that it substituted “Perfect Number” for “Proper Number.” Nothing similar occurred when I Googled the other two bogus concepts. This suggests that people search for “Proper Number” when they’re really looking for “Perfect Number” frequently enough for Google to make this automatic substitution. When I pointed this out to the primary researcher, expressed my concern, and asked her about it in our third email exchange, I never heard back. It is never a good sign when researchers stop responding to you when you ask reasonable questions or express legitimate concerns about their work. If responses to the three bogus concepts were due to the same behavior (i.e., bullshitting), we should see similar responses to all three, but this isn’t the case. In fact, when I compared responses per country, I found that the rank order of so-called bullshitting behavior per country was nearly identical for “Subjunctive Scaling” and “Declarative Fraction,” but quite different for “Proper Number.” Something different was definitely going on.

When I made variously weighted attempts to convert students’ Likert responses into bullshitter scores, I found that, if you consider all three bogus concepts, Canada does indeed take the prize for bullshitting, but if you exclude the question about “Proper Number,” Canada drops below the USA, which seems much more reasonable. As an American living at a time when the executive branch of government is being led by a prolific bullshitter, I can admit, albeit with great embarrassment, that we are plagued by an extraordinary tolerance of bullshitting.

Regardless, I don’t actually believe that we can put our trust even in students’ responses to the bogus concepts “Subjunctive Scaling” and “Declarative Fraction” as a reliable measure of bullshitting. Before I would be willing to publish scientific claims, I would need better measures.

Concluding Thoughts

I was trained in the social sciences and I value them greatly. For this reason, I’m bothered by practices that undermine the credibility of social science. The bullshitters study does not actually produce any reliable or useful knowledge about bullshitting behavior. Ironically, according to their own definition, the researchers are themselves bullshitters, for they are claiming knowledge that doesn’t actually exist. Social science can do better than this. At a time when voices in opposition to science are rising in volume, it’s imperative that it does.

The Data Loom Is Now Available!

May 16th, 2019

After a few months of waiting, my new book The Data Loom: Weaving Understanding by Thinking Critically and Scientifically with Data is now available. By clicking on the image below, you can order it for immediate delivery from Amazon.

Data, in and of itself, is not valuable. It only becomes valuable when we make sense of it. Unfortunately, most of us who are responsible for making sense of data have never been trained in two of the job’s most essentially thinking skillsets: critical thinking and scientific thinking. The Data Loom does something that no other book does—it covers the basic concepts and practices of both critical thinking and scientific thinking and does so in a way that is tailored to the needs of data sensemakers. If you’ve never been trained in these essential thinking skills, you owe it to yourself and your organization to read this book. This simple book will bring clarity and direction to your thinking.

The Smart Enough City: Avoiding the Myopia of Tech Goggles

May 8th, 2019

At this juncture in human history, few issues should concern us more than our relationship to digital technologies. They are shaping our brains, influencing our values, and changing the nature of human discourse—not always in good ways. When we determine our relationship to any new technology, there is a middle ground between the doe-eyed technophile and the intransigent Luddite. Only fools dwell on the extremes of this continuum; wisdom lies somewhere in between. When developing and managing cities—those places where most of us live our lives—wisdom definitely demands something less technophilic and more human than the self-serving visions of “smart cities” that technology vendors are promoting today. Ben Green makes this case compellingly in his thoughtful new book The Smart Enough City: Putting Technology in Its Place to Reclaim Our Urban Future.

In the first few pages of the book, Ben asks the following questions about cities:

Nobody likes traffic, but if eliminating it requires removing people from streets, what kinds of cities are we poised to create?

Nobody wants crime, but if preventing it means perpetuating discriminatory practices, what kinds of cities are we poised to create?

Everybody desires better public services, but if deploying them entails setting up corporate surveillance nodes throughout urban centers, what kinds of cities are we poised to create?

As a whole, the book is Ben’s response.

This book is about why, far too often, applications of technology in cities produce adverse consequences—and what we must do to ensure that technology helps create a more just and equitable urban future.

The term “smart city” has emerged as shorthand for cities that focus on the latest technologies as the solution to human problems. If you buy into this term, you believe that failing to implement the latest technologies is dumb, and who wants to live in a dumb city? It isn’t that simple, however. Technologies indeed offer benefits, but only good technologies, and only when they’re designed well and applied wisely. Framing all of a city’s problems as solvable through technologies ignores the complexities that successful urban development and governance must understand and address. Technology vendors love to promote this reductionist vision of smart cities, but those who actually work in the trenches to make cities livable, just, and equitable recognize a nuanced interplay of forces and concerns that must be considered and coordinated.

Although represented as utopian, the smart city in fact represents a drastic and myopic reconceptualization of cities into technology problems. Reconstructing the foundations of urban life and municipal governance in accordance with this perspective will lead to cities that are superficially smart but under the surface are rife with injustice and inequity. The smart city threatens to be a place where self-driving cars have the run of downtowns and force out pedestrians, where civic engagement is limited to requesting services through an app, where police use algorithms to justify and perpetuate racist practices, and where governments and companies surveil public space to control behavior.

Technology can be a valuable tool to promote social change, but a technology-driven approach to social progress is doomed from the outset to provide limited benefits or beget unintended negative consequences.

Ben calls this problematic perspective “technology goggles,” or simply “tech goggles.”

At their core, tech goggles are grounded in two beliefs: first, that technology provides neutral and optimal solutions to social problems, and second, that technology is the primary mechanism to social change. Obscuring all barriers stemming from social and political dynamics, they cause whoever wears them to perceive every ailment of urban life as a technology problem and to selectively diagnose only issues that technology can solve…The fundamental problem with tech goggles is that neat solutions to complex social issues are rarely, if ever, possible.

Technologies are not neutral and objective; they incorporate values and strive to achieve particular outcomes that can undermine the livable and equitable cities that we desire. Technologies, in and of themselves, are never the solution. Only when good technologies are well designed and used wisely can they contribute to real solutions.

The smart city is thus founded on a false dichotomy and blinds us to the broader possibilities of technology and social change. We become stuck asking a meaningless, tautological question—is a smart city preferable to a dumb city?—instead of debating a more fundamental one: does the smart city represent the urban future that best fosters democracy, justice, and equity?

I believe that the answer is no—that our essential task is to defy the logic of tech goggles and recognize our agency to pursue an alternative vision: the “Smart Enough City.” It is a city free from the influence of tech goggles, a city where technology is embraced as a powerful tool to address the needs of urban residents, in conjunction with other forms of innovation and social change, but is not valued for its own sake or viewed as a panacea. Rather than seeing the city as something to optimize, those who embrace the Smart Enough City place their policy goals at the forefront and, recognizing the complexity of people and institutions, think holistically about how to better meet their needs.

Throughout this book, Ben examines the many ways in which technologies can impact and either assist or harm urban life. He dives deeply into specific issues regarding transportation, police work, civic engagement, and the provision of human services. He examines specific technologies, including autonomous vehicles, sensors, and machine-learning algorithms. He makes his case with example after example, both of smart city failures and smart enough city successes. This story features some bad actors, but quite a few heroes as well. I never imagined that I would find a book about cities so engaging. Even though the book focuses on the ways that technologies are shaping cities—a topic that I haven’t given much thought in the past—the concerns and potential responses that it considers apply much more broadly to technologies and their use. Those of us who are work as technology professionals should heed this book’s wise counsel.

When Ben first approached me and asked if I’d be willing to review his book, I was somewhat apprehensive. As someone who has been writing about information technologies for many years, I am frequently approached by authors with similar requests. More often than not, I don’t end up liking their books well enough to recommend them, and I take no pleasure in telling authors why I made that choice. For this reason, when I encounter a book like The Smart Enough City, I’m relieved. More than relieved, I’m happy to recommend them to my readers. In this particular case, I’m more than happy, I’m thrilled, because this book is an extraordinarily well-researched and well-written treatise on an important topic. The choices that we make about technologies today will fundamentally shape our future. It’s up to us to shape a future that will provide benefit, not oppress.