Bullshit about Bullshitters and other Misadventures in Social Science

I recently came across a news story about a social science research study that caught my attention. How could I resist a story about bullshitters? According to the study, titled “Bullshitters. Who Are They and What Do We Know about Their Lives?”, this is “an important new area of social science research.” Reviewing the research paper revealed more about problems in social science research, however, than anything meaningful and useful about bullshit, bullshitting, or bullshitters. In this blog post, I’ll describe a few of these problems.

A Useless Definition

The researchers defined “bullshitters” as “individuals who claim knowledge or expertise in an area where they actually have little experience or skill.” If you read the study, however, you will find that this does not accurately describe the behavior that they examined. A more accurate and specific description would state that bullshitters are “people who claim, for any reason, to be familiar with and perhaps even understand concepts that don’t actually exist.” The study is based on the responses of 15-year old students in English-speaking countries to questions about three bogus mathematical concepts that they answered while taking the Programme for International Student Assessment (PISA) exam. According to this study, students who claim knowledge of a bogus mathematical concept, for whatever reason, are bullshitters. This, however, is not what people typically mean by the term. Typically, we think of bullshitters as people who make shit up, not also as people who make mistakes, but the authors didn’t make this distinction. If you turn to someone and ask, “Are you bullshitting me?,” you are asking if they intentionally fabricated or exaggerated what they just told you. Bullshitting involves intention. The act of intentionally claiming expertise that you lack to inflate your worth in the eyes of others is indeed a behavior that could be studied, but the mixture of intentional deception and unintentional error does not qualify as a single specific behavior.

Why did the researchers define bullshitters as they did? I suspect it is because they couldn’t determine the difference between intentional deceit and confusion about the bogus mathematical concepts. Defining bullshitters as they did, however convenient, produced a useless study. What can we possibly do with the results? Unfortunately, many social science research studies fall into this category. In part, this is a result of the current myopic emphasis in academia on publication. To get ahead as an academic in a research-oriented discipline, you must publish, publish, publish. For individuals, getting published, and for academic institutions, having published studies cited in other publications, is more valuable than useful research. This is a travesty.

Unreliable Measures and Questionable Statistics

By reviewing many social science research studies over the years, I’ve learned that you should take their claims with a grain of salt until you examine the work carefully. To do this, you must not only read the papers closely, you must also examine the data on which the research was based, including the ways the data was manipulated. By “manipulated,” I don’t mean that the researchers intentionally screwed with the data to support particular conclusions, although this does occur, but merely that they produced their own data from the original data on which the research was based, usually by means of statistical operations (e.g., statistical models of various types) that rely on assumptions. To take research conclusions seriously, we must confirm that the data, the statistical models, and the assumptions on which they are based are all valid and reliable. When researchers don’t provide us with the means to validate their data, we should never accept their conclusions on faith. In my opinion, studies that don’t clearly describe the data on which their findings are based and don’t make that data readily available for inspection don’t qualify as legitimate science.

Social science is challenged by the fact that it often cannot directly measure the phenomenon that it seeks to understand. For example, you cannot place a human subject into a machine that’s capable of measuring their bullshitting behavior. You’re forced to use a proxy—that is, to measure something that you believe is closely related and representative—as the best means available. In this particular study, the researchers chose to treat students’ answers to questions about three bogus mathematical concepts as their proxy for bullshitting.

While taking the PISA exam, students were asked about a series of sixteen mathematical concepts, including three bogus concepts—”Proper Number,” “Subjunctive Scaling,” and “Declarative Fraction”—and for each they were asked to select from the following list the response that best described their familiarity with the concept:

    1. Never heard of it
    2. Heard of it once or twice
    3. Heard of it a few times
    4. Heard of it often
    5. Know it well, understand the concept

These five potential responses comprise something called a Likert scale. The items are supposed to represent the full range of possible responses. Another more typical set of Likert items that often appears in questionnaires asks people to assess the merits of something, such as a particular product, by selecting from a list of responses like the following:

    1. Extremely Poor
    2. Poor
    3. Moderate
    4. Good
    5. Extremely Good

A Likert scale is ordinal (i.e., the items have a proper order, in this case from extremely poor to extremely good), not quantitative. Along a quantitative scale, distances between consecutive values are equal. For example, the quantitative scale 10, 20, 30, 40, 50, etc., exhibits equal intervals of 10 units from one value to the next. Distances between items on a Likert scale, however, are not necessarily equal. For example, the difference between “Extremely Poor” and “Poor” is not necessarily the same as the difference between “Poor” and “Moderate.” Also, with the quantitative scale mentioned above, 50 is five times greater than 10, but with the sample Likert scale, “Extremely Good” is not five times better than “Extremely Poor.” In the Likert scale that was used in this study, the distance between “Heard of it often” and “Know it well, understand the concept” seems quite a bit greater than the distance between any other two consecutive items, such as between “Never heard of it” and “Heard of it once or twice.” Likert scales require special handling when they’re used in research.

To quantify people’s responses to Likert scales (i.e., to convert them into quantitative scores), merely taking either of the sample Likert scales above and assigning the value 1 through 5 to the items (i.e., the value of 1 for “Extremely Poor,” etc.) would not produce a particularly useful measure. Researchers use various techniques for assigning values to items on Likert scales, and some are certainly better than others, but they are all pseudo-quantitative to some degree.

Imagine what it would be like to rely on people to determine air temperature using the following Likert scale:

    1. Extremely Cold
    2. Cold
    3. Average Temperature
    4. Hot
    5. Extremely Hot

Obviously, we wouldn’t use a Likert scale if we had an objective means, such as a thermometer, to measure something in a truly quantitatively manner. Subjective measures of objective reality are always suspect. When we convert subjective Likert scales into quantitative scores, as the researchers did in this study, the quantitative values that we assign to items along the scale are rough estimates at best. We must keep this in mind when we evaluate the merits of claims that are based on Likert scales.

Social science research studies are often plagued by many challenges, which is one of the reasons why attempts to replicate them frequently fail. This doesn’t seem to discourage many researchers, however, from making provocative claims.

Provocative Claims

Based on their dysfunctional definition of bullshitters, the researchers made several claims. I found one in particular to be especially provocative: a ranking of English-speaking countries based on their percentages of bullshitters, with Canada on top followed by the USA. As an American, I find it rather difficult to believe that our polite neighbors to the north are more inclined to bullshitting than we are. If we set aside our concerns about the researchers’ definition of bullshit for the moment and accept students’ responses to the three bogus mathematical concepts as a potentially reliable measure of bullshitting, we must then determine a meaningful way to convert those responses into a reliable bullshitter score before we can make any claims, especially provocative claims. Unfortunately, it is difficult to evaluate the method that the researchers used to do this because it’s hidden in a black box and they won’t explain it, except to say that they used an “Item Response Theory (IRT) model to produce a latent construct.” That was the answer that I received when I asked one of the researchers about this via email. Telling me that they used an IRT model didn’t really answer my question, did it? I want to know the exact logical and mathematical steps that they or their software took to produce their bullshitter score. How were the various Likert responses weighted quantitatively and why? Only by knowing this can we evaluate the merits of their results.

Social scientists aren’t supposed to obscure their methods. Given the fact that I couldn’t evaluate the researchers’ methods directly, I examined the data for myself and eventually tried several scoring approaches of my own. Upon examining the data, I soon became suspicious when I noticed that the bogus mathematical concept “Proper Number” elicited quite different responses than the other two. Notice how the patterns in the following graphs differ.

Only the items that I’ve numbered 1 through 4 indicate that the students claimed to be familiar with the bogus concepts. More than 50% of students indicated that they were familiar with the concept “Proper Number,” but only about 25% indicated that they were familiar with each of the other two concepts. Notice that responses indicating increasing degrees of familiarity with “Proper Number” correspond to increasing percentages of students. Far more students indicated that they “Knew it well, understand the concept,” than those who indicated that they “Heard of it once or twice.” This is the opposite of what we would expect if greater degrees of familiarity represented greater degrees of bullshitting. Declining percentages from left to right are what we would expect if students were bullshitting, which is exactly what we see in their responses to the concepts “Subjunctive Scaling” and “Declarative Fraction.” I suspect that this difference in behavior occurred because many students (perhaps most) who claimed to be familiar with the “Proper Number” concept were confusing it with some other concept that actually exists. To test this, I did a quick Google search on “Proper Number” and all of the links that were provided referenced “Perfect Number” instead, a legitimate concept, yet Google didn’t bother to mention that it substituted “Perfect Number” for “Proper Number.” Nothing similar occurred when I Googled the other two bogus concepts. This suggests that people search for “Proper Number” when they’re really looking for “Perfect Number” frequently enough for Google to make this automatic substitution. When I pointed this out to the primary researcher, expressed my concern, and asked her about it in our third email exchange, I never heard back. It is never a good sign when researchers stop responding to you when you ask reasonable questions or express legitimate concerns about their work. If responses to the three bogus concepts were due to the same behavior (i.e., bullshitting), we should see similar responses to all three, but this isn’t the case. In fact, when I compared responses per country, I found that the rank order of so-called bullshitting behavior per country was nearly identical for “Subjunctive Scaling” and “Declarative Fraction,” but quite different for “Proper Number.” Something different was definitely going on.

When I made variously weighted attempts to convert students’ Likert responses into bullshitter scores, I found that, if you consider all three bogus concepts, Canada does indeed take the prize for bullshitting, but if you exclude the question about “Proper Number,” Canada drops below the USA, which seems much more reasonable. As an American living at a time when the executive branch of government is being led by a prolific bullshitter, I can admit, albeit with great embarrassment, that we are plagued by an extraordinary tolerance of bullshitting.

Regardless, I don’t actually believe that we can put our trust even in students’ responses to the bogus concepts “Subjunctive Scaling” and “Declarative Fraction” as a reliable measure of bullshitting. Before I would be willing to publish scientific claims, I would need better measures.

Concluding Thoughts

I was trained in the social sciences and I value them greatly. For this reason, I’m bothered by practices that undermine the credibility of social science. The bullshitters study does not actually produce any reliable or useful knowledge about bullshitting behavior. Ironically, according to their own definition, the researchers are themselves bullshitters, for they are claiming knowledge that doesn’t actually exist. Social science can do better than this. At a time when voices in opposition to science are rising in volume, it’s imperative that it does.

One Comment on “Bullshit about Bullshitters and other Misadventures in Social Science”

By Ana. May 21st, 2019 at 10:13 am

To add on the possible confusion derived by the term “Proper Number”. I’m Spanish and though my math studies were long ago, as soon as I read the term, it sounded familiar. A Google search in Spanish for the term “Número propio” returned a definition for “Fracción propia/Proper Fraction”, and at least in Spanish, although the more correct terminology would be Proper Fraction I think both names were used when I was studying. Is possible that the same occurs in English?

Leave a Reply