The Malady of Lost Connections

August 14th, 2018

I just finished reading the most important book that I’ve encountered in years: “Lost Connections,” by Johann Hari. It succeeds in doing what its subtitle claims: “Uncovering the real causes of depression—and the unexpected solutions.”

As one of the many people who have struggled with depression, I greatly appreciate the insights that Hari shares in this book. My appreciation extends well beyond this, however, for this book isn’t just about depression and anxiety. It is also a thoughtful and thoroughly researched assessment of modern society. Depression and anxiety are symptoms of deep and systemic flaws in our modern, industrialized, consumerized, and technologized world, which has caused us to lose vital connections that are essential to human fulfillment. As it turns out, depression and anxiety are clear signals that something is very much amiss with our world.

If you collect all of the research data regarding anti-depressants—not just what the pharmaceutical industry has made public—and assess it without bias, you will find that depression is not the result of a chemical imbalance. There is no credible evidence that boosting serotonin actually reduces depression beyond the placebo effect. Furthermore, even though research indicates that some genes can predispose us to respond to our circumstances with depression, the causes of depression do not reside in human biology. Rather, they are rooted in human society and, in some cases, in traumatic experiences. Nevertheless, depression and anxiety are almost always treated by the medical community with drugs that are designed to correct a chemical imbalance that isn’t the cause. This approach has failed miserably.

As the book’s title suggests, depression and anxiety are rooted in disconnections:

  • Disconnection from meaningful work
  • Disconnection from other people
  • Disconnection from meaningful values
  • Disconnection from childhood trauma
  • Disconnection from status and respect
  • Disconnection from the natural world
  • Disconnection from a hopeful or secure future

It took Hari several years to track down the data and interview the experts, resulting in an incredible story. These disconnections are intricately woven into the fabric of modern society. Nevertheless, there are still places where depression and anxiety are rare. In those places still exist the connections that have been disrupted elsewhere.

There are steps that we can and should take as individuals to reestablish the connections that are vital to our lives, but the full solution lies in societal change. Hari lays out many of the steps that we can take to make this happen. Societal change isn’t easy and it takes time, but it’s the only thorough and lasting solution. The change that’s needed doesn’t require the rejection of useful advances in science and technology, but we must embrace these artifacts of modernity more intelligently and with greater care.

Please read this book. Please contribute to the restoration of connections in society that humankind sorely needs to endure and thrive.

The Perils of Technochauvinism

August 1st, 2018

More and more these days people are waking up to the fact that digital technologies often fail us and sometimes do great harm. The default assumption that digital technologies are always needed and beneficial is now being questioned by an increasing number of thoughtful people who understand these technologies well. One such person is Meredith Broussard, who, in her new book, Artificial Unintelligence, labels this erroneous assumption technochauvinism.

“Technochauvinism” is roughly equivalent to Evgeny Morozov’s term “technological solutionism,” which I’ve been using for years. Better than other writers so far, Broussard explains the nature of digital technologies—what they are, how they work, what they do well, the ways in which they’re limited, and how they fail—in a manner that’s practical and accessible to anyone who’s interested. As an accomplished journalist, her writing is clear and rooted in evidence. As an experienced digital technologist, what she says is well informed.

As the title suggests, much of this book focuses on artificial intelligence (AI), which is fitting given the prolific hype and common misunderstandings that obscure AI technologies in particular. Broussard explains what artificial intelligence is and isn’t. While others have described the important distinction between general AI and narrow AI, Broussard explains this difference more clearly and illustrates AI more realistically, using interesting examples. In one chapter, she walks readers through the use of algorithms to make sense of who survived the Titanic disaster, and in so doing reveals both the strengths and weaknesses of machine learning. In another chapter, she takes readers along on a ride in an autonomous vehicle to illustrate the dangers of AI that overreaches. She puts the proper application of AI into perspective.

Apart from AI in particular, she also describes the historical roots of technochauvinism as a byproduct of the worldview that is shared by most of high tech’s power elite. When this limited, self-serving worldview is incorporated into digital technologies, problems result, often promoting injustice.

Computers compute—they do math. As such, they’re better than humans at doing tasks that are based on mathematical computations. Despite the ubiquitous metaphor, computers are not like human brains. Computers don’t think, they aren’t sentient, even though terms such as artificial intelligence and machine learning suggest otherwise. Computers excel at computationally-based tasks, but we can’t rely on them to perform tasks that require understanding, which they lack, without humans in the loop.

This book will not be well received by those who are so invested in digital technologies that they refuse to think critically about them. I’ve already noticed a few undeserved, negative reviews of this book on Amazon that reflect this closed-minded, self-serving perspective. Writing this book took courage. You don’t write a book like this to gain popularity or make money. You do it because you care deeply about the world. This book speaks the truth—a truth that needs to be heard.

The Danger of Technological Arrogance

June 10th, 2018

Technologists have never been particularly good at considering the ramifications of their work apart from those outcomes that are intended and beneficial. Only the rare technologist is inclined to step back from the work and think deeply about unintended and harmful consequences. Technologists want to feel good about their work and they want to rock the world with their achievements as quickly as possible. Taking time to consider the larger implications of their work is annoyingly inconvenient. This shortsighted and myopic perspective is dangerous.

This is especially true regarding AI research and development. The potential risk of a superintelligent AI departing from the interests of humanity to pursue its own interests, assuming that general intelligence in a machine can be achieved, is all too real. No one who is informed about AI can responsibly ignore the risks or claim that they don’t exist.

With this in mind, I grew concerned while reading a thoughtful article published on June 9, 2018 in the New York Times titled “Mark Zuckerberg, Elon Musk and the Feud Over Killer Robots” by Cade Metz. At an exclusive conference organized by Jeff Bezos that took place in March of this year, Rodney Brooks, an MIT roboticist, debated the potential dangers of AI with neuroscientist and philosopher Sam Harris. When Harris warned that, because the world was in an arms race toward AI, researchers might not have the time needed to ensure that superintelligence is built in a safe way, Brooks responded: “This is something you have made up,” implying that Harris’ argument was not based on evidence and reason. Spurred on by Brooks, Oren Etzioni, who leads the Allen Institute for Artificial Intelligence, laid into Harris as well. He argued:

Today’s AI systems are so limited, spending so much time worrying about superintelligence just doesn’t make sense. The people who take Mr. Musk’s side are philosophers, social scientists, writers — not the researchers who are working on AI.

Etzioni further claimed that worrying about superintelligence “is very much a fringe argument.” Anyone familiar with this debate, however, knows that concerns about AI do not reside only on the fringes and their are certainly not “made up.” Etzioni cannot actually believe this without a great deal of self-imposed, head-in-the-sand delusion.

What especially concerned me about Etzioni’s position was the insinuation that only AI researchers were qualified to have informed opinions on the matter. This is the height of technological arrogance. As a longtime information technologist myself, I have always been annoyed by the tendency of my colleagues to see themselves as the smartest guys in the room. The intelligence of technologists is no greater on average than the intelligence of philosophers, social scientists, and writers. In fact, it’s quite possible that a comparison of intelligence, if such a thing were feasible, would relegate technologists to an inferior position. In my experience, philosophers, social scientists, and writers tend to think more broadly and with greater nuance than most technologists. Ethical considerations are more central to their training and therefore to their thinking. They are much more likely to consider the potential threats of technologies than technologists themselves.

We cannot leave decisions about the future of AI solely in the hands of AI researchers and developers any more than we can leave decisions about the protection and use of our personal data in the hands of social media executives and their employees. Recent findings regarding Facebook have underscored this fact. Despite their well-publicized rhetoric about “making the world a better place,” people who run technology companies are motivated by personal interests that bias their perspectives. They want to believe in the inherent good of their creations. The technologies that we create and the ways that we design them must be subject to thoughtful discussion that extends well beyond technologists. Technologies affect us all, and not always for good.

“Technically Wrong” Is Absolutely Right

April 11th, 2018

I’ve worked in high tech for 35 years. Over the years I’ve developed a love-hate relationship with this industry. I love technologies that are needed and work well. I love technology companies that respect their customers and employees. All too often, however, technologies and the companies that make them don’t deserve our love. Sara Wachter-Boettcher echoes this sentiment in her wonderful book Technically Wrong. Sara is not anti-technology, but she firmly believes that we should hold technologies and the companies that create them responsible for their failures, especially when they do harm.

Systemic problems in the ways that tech companies are managed and products are created are surfacing more and more often these days. In the last few days, Facebook is the tech company whose irresponsible behavior has dominated the news. Facebook is not alone. Tech companies can function responsibly and ethically, but those that do are the exceptions, not the norm. Tech companies have created the mystique that they are special, and for this reason we give them a pass. I’ve always been uncomfortable with this mystique, which veils the dysfunction of tech companies. People who work in tech are no more special on average than those who work in other organizations. They are neither smarter nore more talented, despite the fact that they are compensated as if they were.

Most tech companies are dominated by the rather narrow perspective of privileged white men, which contributes to many of their problems. Their lack of diversity and assumption that they’re smarter than others leads to a myopic view of the world—one that misunderstands the needs of a large portion of their users. They think of a significant portion of their users as “edge cases,” and edge cases aren’t significant enough to consider.

Yes, I’m a privileged white guy myself, but I know that my success has been due in many respects to good fortune—the luck of privileged birth. Perhaps my background in the humanities and social sciences has helped me to see the world more broadly than many of my privileged high-tech brethren.

The book Technically Wrong exposes these problems eloquently and suggests solutions. Here’s the description that appears on the book’s dust cover:

Buying groceries, tracking our health, finding a date: whatever we want to do, odds are that we can now do it online. But few of us ask why all these digital products are designed the way they are. It’s time we change that. Many of the services we rely on are full of oversights, biases, and downright ethical nightmares. Chatbots that harass women. Signup forms that fail anyone who’s not straight. Social media sites that send peppy messages about deal relatives. Algorithms that put more black people behind bars.

Sara Wachter-Boettcher takes an unflinching look at the values, processes, and assumptions that lead to these and other problems. Technically Wrong demystifies the tech industry, leaving those of us on the other side of the screen better prepared to make informed choices about the services we use—and demand more from the companies behind them.

We should have started demanding more of tech companies long ago. If we had, many problems could have been prevented. It’s not too late, however, to turn this around, and turn it around we must.

Know Your Audience — Good Luck with That

March 15th, 2018

I’ve long appreciated the fact that knowing your audience is an important prerequisite for effective communication. Over time, however, I’ve learned that this can rarely be achieved with specificity. The reason is simple: audiences are rarely homogeneous. If your audience is composed of two or more people, it is to some extent diverse. Consequently, it is only possible to finely tailor communication for an audience of one, and even then it’s challenging.

In most scenarios, we should do our best to communicate in ways that work well for people in general rather than for particular individuals. At best, we can assess the interests, abilities, proclivities, and experiences of our audience to determine a range of communication approaches that are suitable and to perhaps discard some approaches that don’t fit. For example, if you were the warm-up act at a Trump political rally, you could safely assume that discourse suitable for a convention of physicists should be avoided. You could also assume that emotionally charged statements would carry more weight for most of your audience than a rational presentation of facts. (To be fair, this is true of most audiences.) You could not, however, narrow your approach to suit people who exhibit a particular intelligence as defined by Howard Gardner’s seven intelligences (visual-spatial, bodily kinesthetic, musical, etc.), although you could certainly cover the same content in multiple ways to broaden its effectiveness. As diversity in audiences increases, our communication approach must increasingly be informed by general rather than specific principles of communication. In the business of communication, knowing what works best for most people is more often useful than knowing what works best for particular people.

In the interest of communicating in the ways that suit people’s interests, abilities, proclivities, and experiences, we often shape our audiences to narrow their diversity. Schools do this by grouping students into grade levels and by offering multiple courses in a particular subject to suit the interests and abilities of particular groups. With unlimited time and resources, we could finely select our audiences to match a tailored communication approach, but this isn’t practical.

One of the best ways to accommodate the diverse needs of an audience is to practice empathy. If we can see them, we can pay attention to them. We can read their reactions. In my data visualization workshops, I’ve always limited the number of participants to 70, in part to make sure that I could see everyone well enough to read their reactions and adapt my teaching accordingly. Obviously, there are limits to what I can discern in facial expressions and physical gestures, but such cues can be quite informative. It is also for this reason that I’ve never taught my courses remotely, but only in face-to-face settings. Web-based courses, though sometimes necessary given the circumstances, are an inferior substitute for face-to-face interaction.

Another way that we can accommodate the diverse needs of an audience is to address the same content in multiple ways. Though redundant to some degree, this redundancy is useful and it doesn’t annoy the audience. It takes more time to cover the same content in multiple ways, so it comes with a cost, but it usually pays off.

“Know your audience” is useful advice, but it can only be applied to communications in limited ways. In the business of communications, it is more useful overall to understand how people process information in general and to base most of our communications on that knowledge.

Take care,

Randomness is Often Not Random

March 12th, 2018

In statistics, what we often identify as randomness in data is not actually random. Bear in mind, I am not talking about randomly generated numbers or random samples. Instead, I am referring to events about which data has been recorded. We learn of these events when we examine the data. We refer to an event as random when it is not associated with a discernible pattern or cause. Random events, however, almost always have causes. We just don’t know them. Ignorance of cause is not the absence of cause.

Randomness is sometimes used as an excuse for preventable errors. I was poignantly reminded of this a decade or so ago when I became the victim of a so-called random event that occurred while undergoing one of the most despised medical procedures known to humankind: a colonoscopy. In my early fifties at the time, it was my first encounter with this dreaded procedure. After this initial encounter, which I’ll now describe, I hoped that it would be my last.

While the doctor was removing one of five polyps that he discovered during his spelunking adventure into my dark recesses, he inadvertently punctured my colon. Apparently, however, he didn’t know it at the time, so he sent me home with the encouraging news that I was polyp free. Having the contents of one’s colon leak out into other parts of the body isn’t healthy. During the next few days severe abdominal pain developed and I began to suspect that my 5-star rating was not deserved. Once admitted to the emergency room at the same facility where my illness was created, a scan revealed the truth of the colonoscopic transgression. Thus began my one and only overnight stay so far in a hospital.

After sharing a room with a fellow who was drunk out of his mind and wildly expressive, I hope to never repeat the experience. Things were touch and go for a few days as the medical staff pumped me full of antibiotics and hoped that the puncture would seal itself without surgical intervention. Had this not happened, the alternative would have involved removing a section of my colon and being fitted with a stylish bag for collecting solid waste. To make things more frightening than they needed to be, the doctor who provided this prognosis failed to mention that the bag would be temporary, lasting only about two months while my body ridded itself of infection, followed by another surgery to reconnect my plumbing.

In addition to a visit from the doctor whose communication skills and empathy were sorely lacking, I was also visited during my stay by a hospital administrator. She politely explained that punctures during a routine colonoscopy are random events that occur a tiny fraction of the time. According to her, these events should not to be confused with medical error, for they are random in nature, without cause, and therefore without fault. Lying there in pain, I remember thinking, but not expressing, “Bullshit!” Despite the administrator’s assertion of randomness, the source of my illness was not a mystery. It was that pointy little device that the doctor snaked up through my plumbing for the purpose of trimming polyps. Departing from its assigned purpose, the trimmer inadvertently forged a path through the wall of my colon. This event definitely had a cause.

Random events are typically rare, but the cause of something rare is not necessarily unknown and certainly not unknowable. The source of the problem in this case was known, but what was not known was the specific action that initiated the puncture. Several possibilities existed. Perhaps the doctor involuntarily flinched in response to an itch. Perhaps he was momentarily distracted by the charms of his medical assistant. Perhaps his snipper tool got snagged on something and then jerked to life when the obstruction was freed. Perhaps the image conveyed from the scope to the computer screen lost resolution for a moment while the computer processed the latest Windows update. In truth, the doctor might have known why the puncture happened, but if he did, he wasn’t sharing. Regardless, when we have reliable knowledge of several potential causes, we should not ignore an event just because we can’t narrow it down to the specific culprit.

The hospital administrator engaged in another bit of creative wordplay during her brief intervention. Apparently, according to the hospital, and perhaps to medical practice in general, something that happens this rarely doesn’t actually qualify as an error. Rare events, however harmful, are designated as unpreventable and therefore, for that reason, are not errors after all. This is a self-serving bit of semantic nonsense. Whether or not rare errors can be easily prevented, they remain errors.

We shouldn’t use randomness as an excuse for ongoing ignorance and negligence. While it makes no sense to assign blame without first understanding the causes of undesirable events, it also makes no sense to dismiss them as inconsequential and as necessarily beyond the realm of understanding. Think of random events as invitations to deepen our understanding. We needn’t make them a priority for responsive action necessarily, for other problems that are understood might deserve our attention more, but we shouldn’t dismiss them either. Randomness should usually be treated as a temporary label.

Take Care,

When Metrics Do Harm

March 6th, 2018

We are obsessed with data. One aspect of this obsession is our fixation on metrics. Quantitative measures—metrics—can be quite useful for monitoring and managing performance, but only when they are skillfully used in the right circumstances for the right purposes. In his wonderful new book, The Tyranny of Metrics, Jerry Muller convincingly argues that the balance has shifted toward counterproductive and often harmful misuses of metrics.

As an historian, Muller brought a high degree of scholarship to his examination of metrics. I’ll let the description that appears on the inside flap of the book’s slip cover give you sense of its contents.

Today, organizations of all kinds are fueled by the belief that the path to success is quantifying human performance, publicizing the results, and dividing up the rewards based on the numbers. But in our zeal to instill the evaluation process with scientific rigor, we’ve gone from measuring performance to fixating on measuring itself. The result is a tyranny of metrics that threatens the quality of our lives and most important institutions. In this timely and powerful book, Jerry Muller uncovers the damage our obsession with metrics is causing—and shows how we can begin to fix the problem.

Filled with examples from education, medicine, business and finance, government, the police and military, and philanthropy and foreign aid, this brief and accessible book explains why the seemingly irresistible pressure to quantify performance distorts and distracts, whether by encouraging “gaming the stats” or “teaching to the test.” That’s because what can and does get measured is not always worth measuring, may not be what we really want to know, and may draw effort away from the things we care about. Along the way, we learn why paying for measured performance doesn’t work, why surgical scorecards may increase deaths, and much more. But metrics can be good when used as a complement to—rather than a replacement for—judgment based on personal experience, and Muller also gives examples of when metrics have been beneficial

Complete with a checklist of when and how to use metrics, The Tyranny of Metrics is an essential corrective to a rarely questioned trend that increasingly affects us all.

I appreciate it when thoughtful people courageously challenge popular opinion by questioning what we blindly assume is good. It is the rare individual who struggles to row against the current. It is in this direction that we must set our course, however, when the wellspring of truth is located upstream.

Many skilled professionals who work with metrics already recognize ways in which metrics do harm when they are ill-defined, inappropriately chosen, improperly measured, or misapplied. If you’re one of these professionals, this book will help you make your concerns heard above the din that keeps your organization distracted and confused. This is a welcome voice of sanity in a world that worships data but seldom uses it meaningfully and skillfully.

Tony Stark is Not a Real Dude

March 2nd, 2018

The world that has emerged from the imagination of Stan Lee and his Marvel Comics colleagues is great fun. In recent years, DeadPool has become my new favorite superhero, with Wolverine close on his heels. Today, however, I want to talk about another Marvel superhero—Iron Man—or more specifically about Tony Stark, the man encased on that high-tech armor.

It’s important that, when we consider fictional characters like this wealthy high-tech entrepreneur and inventor, we clearly distinguish fantasy from reality. Tony Stark isn’t real. Furthermore, no one like Tony Stark actually exists. You know this, right? Our best and brightest high-tech moguls—Bill Gates of Microsoft, Elon Musk of Tesla and SpaceX, Larry Page and Sergey Brin of Google, Mark Zuckerberg of Facebook, and even the late Steve Jobs of Apple—don’t come close to the abilities of Tony Stark. No one does and no one can. Even if we combined all of these guys together into a single person and threw in a top scientist such as Stephen Hawking into the mix, we still wouldn’t have someone who could do what Tony Stark not only does but does with apparent ease.

What am I getting at? There is a tendency today to believe that high-tech entrepreneurs and their inventions are much more advanced than they actually are. High-tech entrepreneurs and their inventions are buggy as hell. Most high-tech products are poorly designed. Even though good technologies can and do provide wonderful benefits, they are not magical in the ways that Marvel’s universe or high-tech marketers suggest. Technologies cannot always swoop in and save us in the nick of time. Furthermore, technologies are not intrinsically good as their advocates often suggest.

We should pursue invention, always looking for that next tool that will extend our reach in useful ways, but we should not bet our future on technological solutions. We dare not allow the Doomsday Clock to approach midnight hoping for a last second invention that will turn back time. We must face the future with a more realistic assessment of our abilities and the limitations of technologies.

Earlier today, in his state of the nation address, Vladimir Putin announced the latest in Russian high-tech innovation: nuclear projectiles that cannot be intercepted. Assuming that his claim is true, and it probably is, Putin has just placed himself and Russia at the top of the potential threats list. A bully with the ability to destroy the world brings back frightening memories of my youth when we had to perform duck-and-cover drills, trusting that those tiny metal and wooden desks would shield us from a nuclear assault.

There is no Tony Stark to save us. Iron Man won’t be paying a visit to Putin to put that bully in his place. As I was reading the news story about Putin’s announcement in the Washington Post this morning, an ad appeared in the middle of the text with a photo of Taylor Swift and the caption “Look Inside Taylor Swift’s New $18 Million NYC Townhouse.” A news story that is the stuff of nightmares is paired with celebrity fluff, lulling us into complacency. If the news story makes you nervous, you can easily escape into Taylor’s luxurious abode and pull her 10,000 thread count satin sheets over your head. Perhaps we have nothing to fear, for our valiant and brave president, Donald Trump, will storm the Kremlin and take care of Putin with his bare hands if necessary (after the hugging is over, of course), just as he would have dispatched that high school shooter in Parkland, Florida.

We have every reason to believe in his altruism and utter superiority in a fight, don’t we? Anything less would be fake news.

Ah, but I digress. I was distracted by the allure of Taylor Swift and her soft sheets. Where was I? Oh yeah, Tony Stark is not a real dude. If we hope to survive, let alone thrive, we’ll need to focus on building character, improving our understanding of the world, and making some inconvenient decisions. Technologies will play a role, but they aren’t the main actors in this real-world drama. We are.

Big Data, Big Dupe: A Progress Report

February 23rd, 2018

My new book, Big Data, Big Dupe, was published early this month. Since its publication, several readers have expressed their gratitude in emails. As you can imagine, this is both heartwarming and affirming. Big Data, Big Dupe confirms what these seasoned data professionals recognized long ago on their own, and in some cases have been arguing for years. Here are a few excerpts from emails that I’ve received:

I hope your book is wildly successful in a hurry, does its job, and then sinks into obscurity along with its topic.  We can only hope! 

I hope this short book makes it into the hands of decision-makers everywhere just in time for their budget meetings… I can’t imagine the waste of time and money that this buzz word has cost over the past decade.

Like yourself I have been doing business intelligence, data science, data warehousing, etc., for 21 years this year and have never seen such a wool over the eyes sham as Big Data…The more we can do to destroy the ruse, the better!

I’m reading Big Data, Big Dupe and nodding my head through most of it. There is no lack of snake oil in the IT industry.

Having been in the BI world for the past 20 years…I lead a small (6 to 10) cross-functional/cross-team collaboration group with like-minded folks from across the organization. We often gather to pontificate, share, and collaborate on what we are actively working on with data in our various business units, among other topics.  Lately we’ve been discussing the Big Data, Big Dupe ideas and how within [our organization] it has become so true. At times we are like ‘been saying this for years!’…

I believe deeply in the arguments you put forward in support of the scientific method, data sensemaking, and the right things to do despite their lack of sexiness.

As the title suggests, I argue in the book that Big Data is a marketing ruse. It is a term in search of meaning. Big Data is not a specific type of data. It is not a specific volume of data. (If you believe otherwise, please identify the agreed-upon threshold in volume that must be surpassed for data to become Big Data.) It is not a specific method or technique for processing data. It is not a specific technology for making sense of data. If it is none of these, what is it?

The answer, I believe, is that Big Data is an unredeemably ill-defined and therefore meaningless term that has been used to fuel a marketing campaign that began about ten years ago to sell data technologies and services. Existing data products and services at the time were losing their luster in public consciousness, so a new campaign emerged to rejuvenate sales without making substantive changes to those products and services. This campaign has promoted a great deal of nonsense and downright bad practices.

Big Data cannot be redeemed by pointing to an example of something useful that someone has done with data and exclaiming “Three cheers for Big Data,” for that useful thing would have still been done had the term Big Data never been coined. Much of the disinformation that’s associated with Big Data is propogated by good people with good intentions who prolong its nonsense by erroneously attributing beneficial but unrelated uses of data to it. When they equate Big Data with something useful, they make a semantic connection that lacks a connection to anything real. That semantic connection is no more credible than attributing a beneficial use of data to astrology. People do useful things with data all the time. How we interact with and make use of data has been gradually evolving for many years. Nothing that is qualitatively different about data or its use emerged roughly ten years ago to correspond with the emergence of the term Big Data.

Although no there is no consensus about the meaning of Big Data, one thing is certain: the term is responsible for a great deal of confusion and waste.

I read an article yesterday titled “Big Data – Useful Tool or Fetish?” that exposes some failures of Big Data. For example, it cites the failed $200,000,000 Big Data initiative of the Obama administration. You might think that I would applaud this article, but I don’t. I certainly appreciate the fact that it recognizes failures associated with Big Data, but its argument is logically flawed. Big Data is a meaningless term. As such, Big Data can neither fail nor succeed. By pointing out the failures of Big Data, this article endorses its existence, and in so doing perpetuates the ruse.

The article correctly assigns blame to the “fetishization of data” that is promoted by the Big Data marketing campaign. While Big Data now languishes with an “increasingly negative perception,” the gradual growth of skilled professionals and useful technologies continue to make good uses of data, as they always have.

Take care,

P.S. On March 6th, Stacey Barr interviewed me about Big Data, Big Dupe. You can find an audio recording of the interview on Stacey’s website.

Different Tools for Different Tasks

February 19th, 2018

I am often asked a version of the following question: “What data visualization product do you recommend?” My response is always the same: “That depends on what you do with data.” Tools differ significantly in their intentions, strengths, and weaknesses. No one tool does everything well. Truth be told, most tools do relatively little well.

I’m always taken by surprise when the folks who ask me for a recommendation fail to understand that I can’t recommend a tool without first understanding what they do with data. A fellow emailed this week to request a tool recommendation, and when I asked him to describe what he does with data, he responded by describing the general nature of the data that he works with (medical device quality data) and the amount of data that he typically accesses (“around 10k entries…across multiple product lines”). He didn’t actually answer my question, did he? I think this was, in part, because he and many others like him don’t think of what they do with data as consisting of different types of tasks. This is a fundamental oversight.

The nature of your data (marketing, sales, healthcare, education, etc.) has little bearing on the tool that’s needed. Even the quantity of data has relatively little effect on my tool recommendations unless you’re dealing with excessively large data sets. What you do with the data—the tasks that you perform and the purposes for which you perform them—is what matters most.

Your work might involve tasks that are somewhat unique to you, which should be taken into account when selecting a tool, but you also perform general categories of tasks that should be considered. Here are a few of those general categories:

  • Exploratory data analysis (Exploring data in a free-form manner, getting to know it in general, from multiple perspectives, and asking many questions to understand it)
  • Rapid performance monitoring (Maintaining awareness of what’s currently going on as reflected in a specific set of data to fulfill a particular role)
  • A routine set of specific analytical tasks (Analyzing the data in the same specific ways again and again)
  • Production report development (Preparing reports that will be used by others to lookup data that’s needed to do their jobs)
  • Dashboard development (Developing displays that others can use to rapidly monitor performance)
  • Presentation preparation (Preparing displays of data that will be presented in meetings or in custom reports)
  • Customized analytical application development (Developing applications that others will use to analyze data in the same specific ways again and again)

Tools that do a good job of supporting exploratory data analysis usually do a poor job of supporting the development of production reports and dashboards, which require fine control over the positioning and sizing of objects. Tools that provide the most flexibility and control often do so by using a programming interface, which cannot support the fluid interaction with data that is required for exploratory data analysis. Every tool specializes in what it can do well, assuming it can do anything well.

In addition to the types of tasks that we perform, we must also consider the level of sophistication to which we peform them. For example, of you engage in exploratory data analysis, the tool that I recommend would vary significantly depending on the depth of your data analysis skills. For instance, I wouldn’t recommend a complex statistical analysis product such as SAS JMP if you’re untrained in statistics, just as I wouldn’t recommend a general purpose tool such as Tableau Software if you’re well trained in statistics, except for performing statistically lightweight tasks.

Apart from the tasks that we perform and the level of skill with which we perform them, we must also consider the size of our wallet. Some products require a significant investment to get started, while others can be purchased for an individual user at little cost or even downloaded for free.

So, what tool do I recommend? It depends. Finding the right tool begins with a clear understanting of what you need to do with data and with your ability to do it.

Take care,