Confusing Expressions of Relative Proportions

July 17th, 2017

During elementary school we learn to reason quantitatively in fundamental ways. One of the concepts that we learn along the way is that of proportions. We are taught to express a value that is greater than another either in terms of multiplication (e.g., “The value of A is three times the value of B”), as a ratio (e.g., a 3 to 1 ratio), as a fraction in which the numerator is greater than the denominator, usually with a denominator of 1  (e.g., 3/1), or as a percentage that is greater than 100% (e.g., 300%). We are taught to express a value that is less than another either as a ratio (e.g., 1 to 3), as a fraction with a numerator that is less than the denominator, usually with a numerator of 1 (e.g., 1/3), or as a percentage that is less than 100% (e.g., 33%). In later years, however, some of us begin to express proportions in confusing and sometimes inaccurate ways.

Consider a case in which the value of A is $100 and the value of B is $300. To express the greater value of B proportionally as a percentage of A’s value, would it be accurate to say that B is 300% greater than A? No, it wouldn’t. B is only 200% greater than A (300% – 100% = 200%). It is correct, however, to say that “the value of B is 300% the value of A.” To avoid confusion for most audiences, it usually works better to express this proportional difference in terms of multiplication, such as “The value of B is three times the value of A.”

Confusion can also occur when we describe lesser proportions. Recently, while reading a book by a neuroscientist who has closely studied how humans reason quantitatively, I came across the unexpectedly confusing expression “a million times less.” As I understand it, you can reduce a value through multiplication only by multiplying it by a value that is less than one (e.g., a fraction such as 1/3 or a negative value such as -1). The author should have expressed the lesser proportion as “one millionth,” which is conceptually clear.

Consider the following results that encountered in Google News:

Mac Management Cost Headline

Notice the sentence attributed to Business Insider: “Macs are 3 times cheaper to own than Windows PCs…” Is the meaning of this proportion clear? It isn’t clear to me. It makes sense to say that something is three times greater, but not three times less. What the writer should have said was “Macs are one-third as expensive to own as Windows PCs,” or could have reversed the comparison by describing the greater proportion, as Computerworld did: “IBM says it is 3X more expensive to manage PCs than Macs.” When you describe a lesser proportion, express the difference either as a fraction with a numerator that is less than the denominator, usually with a numerator of 1 (e.g., 1/3rd the cost), as a percentage less than 100% (e.g., 33% of the cost), or as a ratio that begins with the smaller value (e.g., a cost ratio of 1 to 3).

People often struggle to understand proportions. We can prevent many of these misunderstandings by expressing proportions properly.

Take care,

Signature

Data Analysts Must Be Critical Thinkers

July 13th, 2017

During my many years of teaching, I have often been surprised to discover a lack of essential thinking and communication skills among the educated. Back when I was in graduate school in Berkeley studying religion from a social science perspective, I taught a religious studies course to undergraduate students at San Jose State University. When I first began grading my students’ assignments, I was astounded to discover how poorly many of my students expressed themselves in writing. There were delightful exceptions, of course, but several of my students struggled to construct a coherent sentence. Much of my time was spent correcting failures of communication rather than failures in grasping the course material. Many years later, when I taught data visualization in the MBA program at U.C. Berkeley’s Haas School of Business, I found that several of my students struggled to think conceptually, even though the concepts that I taught were quite simple. They were more comfortable following simple procedures (“Do this; don’t do that.”) without understanding why. In the 14 years since I founded Perceptual Edge, I’ve had countless opportunities—in my courses, on my blog, in my discussion forum, and when reviewing academic research—to observe people making arguments that are based on logical fallacies. These are people whose work either directly involves or indirectly supports data analysis. This horrifies me. This is one of the reasons why analytics initiatives frequently fail. No analytical technologies or technical skills will overcome a scarcity of sound reason.

Many of those tasked with data sensemaking—perhaps most—have never been trained in critical thinking, including basic logic. Can you analyze data if you don’t possess critical thinking skills? You cannot. How many of you took a critical thinking course in college? I’ll wager that relatively few of you did. Perhaps you later recognized this hole in your education and worked to fill the gap through self-study. Good for you if you did. Critical thinking does not come naturally; it requires study. Even though I received instruction in critical thinking during my school years, I’ve worked diligently since that time to supplement these skills. Many books on critical thinking line my bookshelves.

Good data analysts have developed a broad range of skills. Training in analytical technologies is of little use if you haven’t already learned to think critically. If you recognize this gap in your own skills, you needn’t despair, for you can still develop them now. A good place to start is the book Asking the Right Questions: A Guide to Critical Thinking, by M.N. Browne and S.M. Keeley.

Take care,

Signature

The Devaluation of Expertise

July 11th, 2017

Like it or not, we rely heavily on experts to function as a society. Expertise—high levels of knowledge and skill in particular realms—fuels human progress and continues to maintain it. For this reason, it is frightening to observe the ways in which expertise has been devalued in the modern world, nowhere more so than in America.

My most vivid and direct observations of this problem involve the ways that my own area of expertise—data visualization—has been diluted by the ease with which anyone with a modicum of experience can claim to be a data visualization expert today. Learn how to use a product such as Tableau or Power BI today, or Xcelsius a few years ago, and you’re suddenly a data visualization expert. Write a blog about data visualization and you certainly must be an expert. With the relative ease of publication today, you can even write a book about data visualization without ever developing more than a superficial understanding. This is nonsense, it is frustrating to those of us who have actually developed expertise, and it is downright harmful to people who accept advice from faux-experts.

My other direct observation of this phenomenon is the way in which the Internet has inclined people to believe that they are instant experts in anything that they can read about on the Web. Not only do some of the people with scant data visualization knowledge who write comments in response to this blog believe that they know more about it than I do, but many of us are inclined to instruct our medical doctors or our attorneys after an hour or two of Web browsing. We even have the temerity to call simple Web searches “research,” disrespecting those whose work involves actual research. The phrases, “I’m doing research on…” and “I’m an expert in…” used to mean more than they do today.

I’m not alone in my concern about this. I just finished reading a book by Tom Nichols entitled The Death of Expertise: The Campaign Against Established Knowledge and Why it Matters, which clearly describes this problem in great breadth and depth.

Death of Expertise

The book’s title is a bit of a misnomer, no doubt chosen to get our attention, for Nichols isn’t arguing that expertise is going away, but that its value is being devalued and ignored. Here’s an excerpt from the book’s jacket:

Thanks to technological advances and increasing levels of education, we have access to more information than ever before. Yet rather than ushering in a new era of enlightenment, the information age has helped fuel a surge in narcissistic and misguided intellectual egalitarianism that has crippled informed debates on any number of issues. Today, everyone knows everything: with only a quick trip through WebMD or Wikipedia, average citizens believe themselves to be on an equal intellectual footing with doctors and diplomats. All voices, even the most ridiculous, demand to be taken with equal seriousness, and any claim to the contrary is dismissed as undemocratic elitism.

As I mentioned earlier, this problem is perhaps most extreme in America. We have always prided ourselves on being self-made and resistant to intellectual elitism. It’s a deeply ingrained strain of the American myth. Nichols writes:

Americans have reached a point where ignorance, especially of anything related to public policy, is an actual virtue. To reject the advice of experts is to assert autonomy, a way for Americans to insulate their increasingly fragile egos from ever being told they’re wrong about anything. It is a new Declaration of Independence: no longer do we hold these truths to be self-evident, we hold all truths to be self-evident, even the ones that aren’t true. All things are knowable and every opinion on any subject is as good as any other…The foundational knowledge of the average American is now so low that it has crashed through the floor of “uninformed,” passed “misinformed” on the way down, and is now plummeting to “aggressively wrong.” People don’t just believe dumb things; they actively resist further learning rather than let go of those beliefs.

This isn’t all due to the Internet. Other factors are contributing to the devaluation of expertise as well, including our institutions of higher learning.

Higher education is supposed to cure us of the false belief that everyone is as smart as everyone else. Unfortunately, in the twenty-first century the effect of widespread college attendance is just the opposite: the great number of people who have been in or near a college think of themselves as the educated peers of even the most accomplished scholars and experts. College is no longer a time devoted to learning and personal maturation; instead, the stampede of young Americans into college and the consequent competition for their tuition dollars have produced a consumer-oriented experience in which students learn, above all else, that the customer is always right.

I observed during my own time of teaching at U.C. Berkeley that institutions of higher learning have become businesses that do what they must to compete for customers. Professors must please their students (customers) by providing them with an enjoyable experience if they wish to keep their jobs. Learning, however, is hard work.

Journalism also contributes to this problem when it focuses on giving readers what they want, making the news entertaining, rather than seeking to truthfully and thoroughly inform the public. The customer is not always right. The public can be easily entertained into a state of ignorance.

Experts sometimes get it wrong, but true experts still know a lot more about their fields of knowledge than the rest of us and they get it right a lot more often than we do. Occasional errors by experts are no excuse for turning our backs on knowledge.

Democracy cannot function when every citizen is an expert. Yes, it is unbridled ego for experts to believe they can run a democracy while ignoring its voters; it is also, however, ignorant narcissism for laypeople to believe that they can maintain a large and advanced nation without listening to the voices of those more educated and experienced than themselves.

Look where the devaluation of expertise has taken us in America. We now have a president who is the poster child of narcissistic ignorance whose only expertise is in being a media celebrity. This is a slap in the face of the expertise that built this nation and made it strong. America did not become a city on a hill for the world to see and emulate by celebrating ignorance. History has revealed more than once what happens when you place extraordinary power into the hands of a narcissistic bully. This has perhaps never been done, however, with someone who exhibits Trump’s degree of prideful ignorance.

What do we do? Nichols reminds us that “Most causes of ignorance can be overcome, if people are willing to learn.” Are we willing to learn? That doesn’t seem to be the case.

The creation of a vibrant intellectual and scientific culture in the West and in the United States required democracy and secular tolerance. Without such virtues, knowledge and progress fall prey to ideological, religious, and populist attacks. Nations that have given in to such temptations have suffered any number of terrible fates, including mass repression, cultural and material poverty, and defeat in war.

How can we get back on track? It might take a disaster of spectacular scale to turn the tide. I hope this isn’t the case, but no divine power will bail us out if we continue on our current course. We must do what humans have always done to thrive and advance. We must use our brains.

Take care,

Signature

Basta, Big Data: It’s Time to Say Arrivederci

June 27th, 2017

One of my favorite Italian words is “basta,” followed by an exclamation point. No, basta does not mean “bastard”; it means “enough,” as in “I’ve had ENOUGH of you!” We’ve definitely had enough of Big Data. As a term, Big Data has been an utter failure. It has never managed to mean anything in particular. A term that means nothing in particular means nothing at all. The term can legitimately claim two outcomes that some consider useful:

  1. It has sold a great many products and services. Those who have collected the revenues love the term.
  2. It has awakened some people to the power of data to inform decisions. The usefulness of this outcome, however, is tainted by the deceit that some data today is substantially different from data of the past. As a result, Big Data encourages organizations to waste time and money seeking an illusion.

If you’ve thought much about Big Data, you’ve noticed the confusion that plagues the term. What is Big Data? This question lacks a clear answer for the following reasons:

  1. There are almost as many definitions of Big Data as there are people with opinions.
  2. None of the many definitions that have been proposed describe anything about data and its use that is substantially different from the past.
  3. Most of the definitions are so vague or ambiguous that they cannot be used to determine, one way or the other, if a particular set of data or use of data qualifies as Big Data.

The term remains what it was when it first became popular—a marketing campaign, and as such, a source of fabricated need and endless confusion. Nevertheless, like spam, it refuses to go away. Why does this matter? Because chasing illusions is a waste of time and money that also carries a high cost of lost opportunity. It makes no sense to chase Big Data, whatever you think it is, if you continue to derive little or no value from the data that you already have.

Ill-defined terms that capture minds and hearts, as Big Data has, often exert influence in irresponsible and harmful ways. Big Data has been the basis for several audacious claims, such as, “Now that we have Big Data…”

  • “…we no longer need to be concerned with data quality”
  • “…we no longer need to understand the nature of causality”
  • “…science has become a thing of the past”
  • “…we can’t survive without it”

People who make such claims either don’t understand data and its use or they are trying to sell you something. Even more disturbing in some respects are the ways in which the seemingly innocuous term Big Data has been used to justify unethical practices, such as gleaning information from our private emails to support targeted ads—a practice that Google is only now abandoning.

Data has always been big and getting bigger. Data has always been potentially valuable for informing better evidence-based decisions. On the other hand, data has always been useless unless it can inform us about something that matters. Even potentially informative data remains useless until we manage to make sense of it. How we make sense of data involves skills and methods that have, with few exceptions, been around for a long time. Skilled data sensemakers have always made good use of data. Those who don’t understand data and its use mask their ignorance and ineffectiveness by introducing new terms every few years as a bit of clever sleight of hand.

The definitions of Big Data that I’ve encountered fall into a few categories. Big Data is…

  1. …data sets that are extremely large (i.e., an exclusive emphasis on volume)
  2. …data from various sources and of various types, some of which are relatively new (i.e., an exclusive emphasis on variety)
  3. …data that is both large in volume and derived from various sources (and is sometimes also produced and acquired at fast speeds, to complete the three Vs of volume, velocity, and variety)
  4. …data that is especially complex
  5. …data that provides insights and informs decisions
  6. …data that is processed using advanced analytical methods
  7. …any data at all that is associated with a current fad

Let’s consider the problems that are associated with the definitions in each of these categories.

Data Sets That Are Extremely Large

According to the statistical software company SAS:

Big data is a term that describes the large volume of data – both structured and unstructured – that inundates a business on a day-to-day basis.

SAS.com

This definition fails in several respects, not the least of which is its limitation to business data. The fundamental problem with definitions such as this, which focus primarily on the size of data as the defining factor, is their failure to specify how large data must be to qualify as Big Data rather than just plain data. Large data sets have always existed. What threshold must be crossed to transition from data to Big Data? This definition doesn’t say.

Here’s a definition that attempts to identify the threshold:

Big Data is a phrase used to mean a massive volume of both structured and unstructured data that is so large it is difficult to process using traditional database and software techniques.

Vangie Beal, Webopedia.com

Do you recognize the problem of defining the threshold in this manner? What are “traditional database and software techniques”? The following definition is slightly less vague:

Big data means data that cannot fit easily into a standard relational database.

Hal Varian, Chief Economist, Google

(Source Note: All of the definitions that I quote that are attributed to an individual, independent of a particular publication, appeared in an article written by Jennifer Dutcher of the U.C. Berkeley School of Information titled “What is Big Data?” on September 3, 2014.)

In theory, there are no limits to the amount of data that can be stored in a relational database. Databases of all types have practical limits. People have suggested technology-based volume thresholds of various types, including anything that cannot fit into an Excel spreadsheet. All of these definitions establish arbitrary limits. Some are based on arbitrary measures.

Big data is data that even when efficiently compressed still contains 5-10 times more information (measured in entropy or predictive power, per unit of time) than what you are used to right now.

Vincent Granville, Co-Founder, Data Science Central

So, if you are accustomed to 1,000 row Excel tables, a simple SQL Server database consisting of 5,000 to 10,000 rows qualifies as Big Data. Such definitions highlight the uselessness of arbitrary limits on data volume. Here’s another definition that acknowledges its arbitrary nature:

Big data is when…the standard, simple methods (maybe it’s SQL, maybe it’s k-means, maybe it’s a single server with a cron job) break down on the size of the data set, causing time, effort, creativity, and money to be spent crafting a solution to the problem that leverages the data without simply sampling or tossing out records.

 John Foreman, Chief Data Scientist, MailChimp

Some definitions acknowledge the arbitrariness of the threshold without recognizing it as a definitional failure:

The term big data is really only useful if it describes a quantity of data that’s so large that traditional approaches to data analysis are doomed to failure. That can mean that you’re doing complex analytics on data that’s too large to fit into memory or it can mean that you’re dealing with a data storage system that doesn’t offer the full functionality of a standard relational database. What’s essential is that your old way of doing things doesn’t apply anymore and can’t just be scaled out.

John Myles White

What good is a definition that is based on a subjective threshold in data volume?

The following definition acknowledges that, when based on data volume, what qualifies as Big Data not only varies from organization to organization, but over time as well:

Big data is data that contains enough observations to demand unusual handling because of its sheer size, though what is unusual changes over time and varies from one discipline to another. Scientific computing is accustomed to pushing the envelope, constantly developing techniques to address relentless growth in dataset size, but many other disciplines are now just discovering the value — and hence the challenges — of working with data at the unwieldy end of the scale.

Annette Greiner, Lecturer, UC Berkeley School of Information

Not only do these definitions identify Big Data in a manner that lacks objective boundaries, they also acknowledge (perhaps inadvertently) that so-called Big Data has always been with us, for data has always been on the increase in a manner that leads to processing challenges. In other words, Big Data is just data.

There is a special breed of volume-based definitions that advocate “Collect and store everything.” Here is the most thorough definition of this sort that I’ve encountered:

The rising accessibility of platforms for the storage and analysis of large amounts of data (and the falling price per TB of doing so) has made it possible for a wide variety of organizations to store nearly all data in their purview — every log line, customer interaction, and event — unaggregated and for a significant period of time. The associated ethos of “store everything now and ask questions later” to me more than anything else characterizes how the world of computational systems looks under the lens of modern “big data” systems.

Josh Schwartz, Chief Data Scientist, Chartbeat

These definitions change the nature of the threshold from a measure of volume to an assumption: you should collect everything at the lowest level of granularity, whether useful or not, for you never know when it might be useful. Definitions of this type are a hardware vendor’s dream, but they are an organization’s nightmare, for the cost of unlimited storage extends well beyond the cost of hardware. The time and resources that are required to do this are enormous and rarely justified. Anyone who actually works with data knows that the vast majority of the data that exists in the world is noise and will always be noise. Don’t line the pockets of hardware vendor executives with gold by buying into this ludicrous assumption.

Data from Various Sources and of Various Types

Independent of a data set’s size, some definitions of Big Data emphasis its variety. Here’s one of the clearest:

What’s “big” in big data isn’t necessarily the size of the databases, it’s the big number of data sources we have, as digital sensors and behavior trackers migrate across the world. As we triangulate information in more ways, we will discover hitherto unknown patterns in nature and society — and pattern-making is the wellspring of new art, science, and commerce.

Quentin Hardy, Deputy Tech Editor, The New York Times

Definitions that emphasize variety suffer from the same problems as those that emphasize volume: where is the threshold? How many data sources are needed to qualify data as Big Data? In what sense does the addition of new data sources—something that is hardly new—change the nature of data or its use? It doesn’t. New data sources are sometimes useful and sometimes not. Collecting and storing every possible source of data is no more productive than collecting and storing every instance of data.

Data That Exhibits High Volume, Velocity, and Variety

I’ll use Gartner’s definition to represent this category in honor of the fact that Doug Laney of Gartner was the first to identify the three Vs (volume, velocity, and variety) as game changers.

Big Data is high-volume, high-velocity and/or high-variety information assets that demand cost-effective, innovative forms of information processing that enable enhanced insight, decision making, and process automation.

Gartner

Combining volume and variety, plus adding velocity—the speed at which data is generated and acquired—produces definitions that suffer from all of the problems that I’ve already identified. Increases in volume, velocity, and variety have been with us always. They have not fundamentally changed the nature of data or its use.

Data That Is Especially Complex

Some definitions focus on the complexity of data.

While the use of the term is quite nebulous and is often co-opted for other purposes, I’ve understood “big data” to be about analysis for data that’s really messy or where you don’t know the right questions or queries to make — analysis that can help you find patterns, anomalies, or new structures amidst otherwise chaotic or complex data points.

Philip Ashlock, Chief Architect, Data.gov

You can probably anticipate what I’ll say about definitions of this sort: once again they lack of a clear threshold and identify a quality that has always been true of data. How complex is complex enough and at what point in history has data not exhibited complexity?

Data That Provides Insights and Informs Decisions

As you no doubt already anticipate, definitions in this category exhibit the same problems as those in the categories that we’ve already considered. Here’s an example:

Big Data has the potential to help companies improve operations and make faster, more intelligent decisions. This data…can help a company to gain useful insight to increase revenues, get or retain customers, and improve operations.

Vangie Beal, Webopedia.com

Data That Is Processed Using Advanced Analytical Methods

According to definitions in this category, it is nothing about the data itself that determines Big Data, but is instead the methods that are used to make sense of it.

The term “big data” often refers simply to the use of predictive analytics, user behavior analytics, or certain other advanced data analytics methods that extract value from data.

Wikipedia

Some of these definitions allow quite a bit of leeway regarding the nature of the advanced methods, while others are more specific, such as the following:

Big data is an umbrella term that means…the possibility of doing extraordinary things using modern machine learning techniques on digital data. Whether it is predicting illness, the weather, the spread of infectious diseases, or what you will buy next, it offers a world of possibilities for improving people’s lives.

Shashi Upadhyay, CEO and Founder, Lattice Engines

What analytical methods qualify as Big Data? The answer usually depends on the methods that the person who is quoted uses or sells. Can you guess what kind of software Lattice Engines sells?

Advanced methods that are considered advanced have been around for a long time. Even most of the methods that are identified as advanced today when defining Big Data have been around for quite some time. For example, even though computers were not always powerful enough to run machine-learning algorithms on large data sets, these algorithms are fundamentally based on traditional statistical methods.

A few of the definitions in this category have emphasized advanced skills rather than technologies, such as the following:

As computational efficiency continues to increase, “big data” will be less about the actual size of a particular dataset and more about the specific expertise needed to process it. With that in mind, “big data” will ultimately describe any dataset large enough to necessitate high-level programming skill and statistically defensible methodologies in order to transform the data asset into something of value.

Reid Bryant, Data Scientist, Brooks Bell

Once again, however, there is nothing new about these skills.

Any Data at All that Is Associated With a Current Fad

Some definitions of Big Data apply the term to anything data related that is trending. Here’s an example:

I see big data as storytelling — whether it is through information graphics or other visual aids that explain it in a way that allows others to understand across sectors.

Mike Cavaretta, Data Scientist and Manager, Ford Motor Company

This tendency has been directly acknowledged by Ryan Swanstrom in his Data Science 101 blog: “Now big data has become a buzzword to mean anything related to data analytics or visualization.” This is what happens with fuzzy definitions. They can be easily manipulated to mean anything you wish. As such, they are meaningless and useless.

Now What?

The definitional messiness and thus uselessness of the term Big Data is far from unique. Many information technology terms exhibit these dysfunctional traits. I’ve worked in the field that goes by the name “business intelligence” for many years, but this industry has never adhered to or lived up to the definition provided by Howard Dresner, who coined the term: “Concepts and methods to improve business decision making by using fact-based support systems.” Instead, the term has primarily functioned as a name for technologies and processes that are used to collect, store, and produce automated reports of data. Rarely has there been an emphasis on “concepts and methods to improve business decision making,” which features humans rather than technologies. This failure of emphasis has resulted in the failure of most business intelligence efforts, which have produced relatively little intelligence.

All of the popular terms that have emerged during my career to describe the work that I and many others do with data, including decision support, data warehousing, analytics, data science, and of course, Big Data, have been plagued by definitional dysfunction, leading to confusion and bad practices. I prefer the term “data sensemaking” for the concepts, methods, and practices that we engage in to understand data. And to promote the value of data as the raw material from which understanding is woven, healthcare has suggested one of the most useful terms: “evidence-based medicine.” In its generic form, “evidence-based decision making” is simple, straightforward, and clear. If we used these terms to describe the work and its importance, we would stop wasting time chasing illusions and would focus on what’s fundamentally needed: data sensemaking skills, augmented by good technologies, to support evidence-based decision making. Perhaps then, we would make more progress.

Let’s say “goodbye” to the term Big Data. It doesn’t mean anything in particular and all of the many things that people have used it to mean merely refer to data. Do we really need a new term to promote the importance of evidence-based decision making? The only people who are prepared to glean real value from data don’t need a new term or a marketing campaign. Rallying those who don’t understand data or its use will only lead to good outcomes if we begin by helping them understand. Meaningless terms lead in the opposite direction.

Take care,

Signature

Building Better BI

June 14th, 2017

I was recently made aware of an article about Business Intelligence (BI) that struck me as extraordinary and smart. The author, Paul Holland, is a BI consultant who has a great deal of real-world BI experience. In the article, he emphasizes the role of people rather than technologies in BI success, which is correct and under-appreciated. Paul has given me permission to share his article with you here in my blog.


Building Better BI: Should I Ride the Donkey or the Bullet Train?

Recently, I was in conversation with a senior executive of a business who was explaining to me how spending many hundreds of thousands of pounds on a very shiny, new and aggressively marketed BI visualisation platform would enable them to access even more of their data than ever before.

Now it would have been impolite of me to point out that accessing data, any and nearly all of this company’s data, is not a problem to begin with and that this statement alone indicates a deeply flawed understanding and approach that many senior managers in need of analysing and understanding their business(es) seem to arrive at – namely, we should spend more money on cutting edge IT systems to gain a competitive advantage. Furthermore, these senior managers will often control the purse strings of an organisation or remain deeply influential in how a company invests in its IT infrastructure. On these occasions, such processes as they are, are little more than a fait accompli; your organisation will end up with a new and shiny silver bullet IT/Information system whether you like it or worse still, whether you need it or not.

Consequently, it should come as no surprise to many of you reading this that many organisations out there are putting down enough money to buy a nice apartment in Paris in order to buy a contemporary BI visualisation tool; the one you’ve been told enthusiastically by someone, ‘it will solve all your reporting problems, believe me, really it will’. What is a surprise though, to me at least, is that having sunk this money into something so powerful, and I’ve seen what I’m going to say next happen on many, many occasions, they simply then connect this expensive, shiny ‘BI system’ full of Promethean promise straight up to their existing data sources. Or in a similar vein will take the myriad of spreadsheets they’ve built over the years and just bang them into their shiny new system, spending an incalculable amount of time trying to get what they had before, working exactly like it did before, looking exactly like it did before. So, to reduce it to its crudest and least sophisticated form, they unplug the old thing and then simply plug the old thing into the new thing, thereby, producing a new old thing. Get it?

Job done then.

And so the rallying cry goes out through the organisation, “relax everybody, you are now going to get access to the most powerful information you’ve ever seen!”.

Only you are not.

You see, BI is not simply an IT system any more than an F1 car is simply a driving machine. It is the amalgamation and unification of a set of processes, different business units, strategies, people and skill-sets combining together within an organisation to produce meaningful and viable information. And this makes it most of all a ‘people thing’ rather than a technology thing. In fact, to be more precise I’m going to argue that BI is about what I call the three Ps; people, purposes and processes. So in my world, there’s no Ts in the Ps!

Certainly, from my experience, understanding and managing these three factors is what makes producing good BI much harder than simply buying what are increasingly becoming very large investments into visualisation systems or BI suites. And if you rely on the IT system delivering your reporting salvation, no matter how advanced and cutting edge you’ve been told it is, then you are probably heading straight for the interminable BI graveyard. Don’t just take my word for it, any review of literature on the subject will reveal you are in good company here. I know of one large organisation in England that has invested its scare resources in three, yes that is three, new shiny silver bullet BI visualisation tools in the last decade. All of them failures. In fact, two of them going virtually completely unused for the lifespan of their investment. This should serve as a cautionary tale to anyone thinking of getting an easy fix for their money.

Furthermore, what has most commonly been gained for this considerable investment is simply easier access to a morass of existing company data resplendent with all its inherent data quality problems. And all too often this comes with the added ‘benefit’ of actually increasing costs or workloads due to the subsequent addition of a ‘BI’ team to focus on making these old data connections work just like they did before. Typically, in a classic case of technological determinism, the system creates the organisation, the workflow and resourcing post facto, after the thing itself has been bought. So rather than simplifying things and reducing costs, you can end up with a larger team consisting of business or data analysts and IT people, all of whom will spend considerable time working towards returning you to, potentially, the state you began the exercise in. I mean what is the point in spending something like a quarter of a million pounds if you can’t replicate my trusty, ill defined spreadsheet I had before huh? I’ve actually seen people sit with their spreadsheet, which they call a ‘dashboard’, spending valuable time recreating the exact same thing in their sophisticated and powerful visualisation tool, usually sucking in the resources of two or three other office staff nearby. I’ve also been in training sessions where people have asked for the system to be redesigned so that they can recreate their local, specific and rather limited use, spreadsheet which they also then keep active afterwards to cross check the new view that has been created at great cost for them in their shiny new system full of Promethean promise. Is it just me or does this seem wrong, crazy even?

This type of behaviour, which I think is not uncommon across organisations (just look at the books and conferences out there), seems to me apropos of building something like a railroad then continuing to ride along it with your donkeys instead of trains. Sure your donkeys have better and easier access to the existing route, and you know your donkey so well too, you can keep it straight on the tracks but you absolutely do not need a train line to ride along on a donkey, do you? So why spend so much money on a railroad to keep your donkey’s running? Similarly, I ask myself then, why does any organisations spend close to £300k on hooking up a sophisticated visualisation platform only to recreate what you already had or have had before, a proliferation of rows and columns and some red and green (far too often) bar charts and pie graphs? I expect you to get bullet trains for 300k, not donkeys.

Now I’d argue that you don’t buy a new system just because you want better, or more efficient, or easier access to existing reports. You buy a system like this because your organisation has made a strategic decision to understand its business better, to measure activity and performance, to seek out inefficiencies and wasted resources or use of time; to understand, measure and refine its processes or proposition to its customers. Bear in mind though that often this is still looking backwards at your business, what’s happened already, but with a good strategy and the right people you can also use such a system to look forwards, to predict certain outcomes for you or to measure the magnitude of some change that is occurring in your business too. That’s the kind of power I expect to get at my fingertips for £300k.

I’m willing to argue that 90% or more of the companies, organisations and large corporations out there already have access to more data than they can possibly manipulate never mind contemplate. We’ve been collecting lots of data for a long time now and, furthermore, accessing this data has not really been a problem, people have been doing it routinely for decades. I think understanding how to define and make better use of this data is actually what we have been doing wrong for decades and for me it is this work that is fundamental to successfully building and deploying effective BI solutions in an organisation. This is where I think the focus needs to change, to move away in the first instance from the software, the data and databases and to focus your time and investment instead on engaging with people, purposes and processes. The three ‘Ps’, if you will.

In the work I do with clients, it’s the three Ps first rather than data which is the focus of any BI undertaking I am involved in. Even NASA and contemporary Astrophysicists know that people are really what you need to build a model, to confirm a hypothesis, to verify the data and help turn it into useful information. It is surprising in this day and age but there are some jobs people are just better at than a computer. That’s why Professor Brian Cox on the latest series of Skygazing Live ‘farmed’ out to viewers the task of analysing large amounts of astronomical data to identify patterns that might indicate a 9th body in the solar system. Surely science departments the world over have super computers and programmers to analyse this data, no? And yet it is deemed that people at home can do this job better. And that’s because data is just that, data, but with people you garner understanding, comprehension, nuances and connections about the subject of your inquiry too.

See it here: https://www.zooniverse.org/projects/marckuchner/backyard-worlds-planet-9/about/research

So, even with the greatest dataset, computers and powerful algorithms to hand some jobs are done best by people. And in keeping with this point of view that’s why when it comes to BI, I always start with people and not with data. Data will not build you an effective BI system, no matter how much data you cram into your data warehouse. But people who require information about their business to make informed decisions, to predict problems, to deploy resources efficiently, they will help you build an effective BI system, one that is fit for purpose, one that informs decision making and one that they themselves will have confidence in and in using too.

So what do I do then? It would be too much to detail here so I’ll outline my methodology briefly to counterpoint my arguments above.

Well you now know I don’t start with data when I help someone to build a BI solution. Instead, I start with the purpose, the reason why someone needs it. I investigate the processes that are the subject of the purpose which helps me understand the breadth of the subject area and systems related to it. I discuss these needs and work out with individuals and groups why they need that information, when they need that information and for whom they need or present this information to either internally or externally for the organisation. In unison with the relevant people we then grade the importance of each item/category identified for serving the purpose and collate it, thereby, building a record of prioritised needs for the technical team and any associated project members to continue working on. In short, we essentially build what I think of as an information landscape, an information map of requirements for the organisation that leads to the compilation of a set of contained business questions that address the purpose(s) that we started out with. I call this my ‘virtuous circle’, everything done should be harmonious with, and work towards, satisfying the purpose. Ultimately, this process also helps to delimit the scope of the design and solution, thus, helping to avoid the insidious ‘scope creep’ of a project. These processes also have the benefit of producing a definitive record of what has been included, excluded, assessed, defined and agreed upon by the business unit/owner of the solution.

It is only after all of this work has been done do we begin to sit down with technical database staff or such like and begin to identify the right data items to bring into the data warehouse and, subsequently, how that data will be treated when it comes into the data warehouse to ensure the veracity of the information that is produced. This process ensures that the data being brought into the warehouse matches the businesses’ definitions and meets the purpose of inquiry and not a technical definition of a field somewhere or a piece of data that could potentially be compiled of other or unknown things to the business.

I have no doubt that sometime in the future, this method/record will also prove invaluable to you when system problems occur, data items change in core systems and for those arguments that happen in meetings when people claim you have the wrong numbers or are measuring the wrong thing. You simply pull out your lovely fat A4 file and patiently take them through it and if you are feeling cheeky, you can ask them to show how their numbers were derived, who defined them, who agreed they are fit for this purpose etc. These definitions should, in practice, become the authoritative source for reporting in the area concerned. No more arguments about what something means, well, maybe a lot less argument! It also provides more confidence in using and sharing the reports built from this approach between departments, managers and analysts alike.

And of course we do end up talking about data with ‘the business’ no matter how hard I try and avoid it in the nascent stages. This is to a large degree historical in that often people are conditioned to see data as information and vice versa. Often the people responsible for providing requested information are also system/data gatekeepers in some part of the organisation. Understandably, they often make their own decisions about how to compile or consolidate different data to create a metric. They think in fields and tables and look ups and not in terms of information and the life-cycle it encapsulates for the consumer, how it will be consumed, its potential audience(s) and its purpose.

I know I’ve covered a lot of issues and ideas here already but consider this before I finish, I haven’t once yet mentioned the BI system itself, the software have I? No technology company fireworks and sexy quadrants, no industry white-papers, no product names, no slick features, no concurrent and fashionable packages or systems at all. And that’s because you don’t really need a silver bullet to begin building your BI suite or programme. You can do all these things I’ve suggested above without handing over a single cent to a software seller. In fact, remaining system or solution agnostic at an early stage will allow for more open thinking and for ideas to percolate to the top. So it will be no surprise to you by now to learn that I’m of the opinion that if you do some of these things I’ve suggested above prior to tendering for a system it will only aid your journey in finding the best aligned and most economical visualisation system for your organisation. And who knows, you may even find that the tools you have in house are capable of delivering the types of information and visualisations you need already which means you get to keep that hard earned £300k in your back pocket after all. Good for you and good for your business.

Paul Holland

Lollipop Charts: “Who Loves You, Baby?”

May 17th, 2017

If you were around in the ‘70s, you probably remember the hard-edged, bald-headed TV police detective named Kojak. He had a signature phrase—“Who loves you, baby?”—and a signature behavior—sucking a lollipop. The juxtaposition of a tough police detective engaged in the childish act of sucking a lollipop was entertaining. The same could be said of lollipop charts, but data visualization isn’t a joke (or shouldn’t be). “Lollipop Chart” is just cute name for a malformed bar graph.

Bar graphs encode quantitative values in two ways: the length of the bar and the position of its end. So-called lollipop charts encode values in the same two ways: the length of the line, which functions as a thin bar, and the position of its bulbous end.

Lollipop Chart Example

A lollipop chart is malformed in that it’s length has been rendered harder to see by making it thin, and its end has been rendered imprecise and inaccurate, by making it large and round. The center of the circle at the end of the lollipop marks the value, but the location of the center is difficult to judge, making it imprecise compared to the straight edge of a bar, and half of the circle extends beyond the value that it represents, making it inaccurate.

What inspired this less effective version of a bar graph? I suspect that it’s the same thing that has inspired so many silly graphs: a desire for cuteness and novelty. Both of these qualities wear off quickly, however, and you’re just left with a poorly designed graph.

You might feel that this is “much ado about nothing.” After all, you might argue, lollipop charts are not nearly as bad as other dessert or candy charts, such as pies and donuts. This is true, but when did it become our objective to create new charts that aren’t all that bad, rather than those that do the best job possible? Have we run out of potentially new ways to visualize data effectively? Not at all. Data visualization is still a fledgling collection of visual representations, methods, practices, and technologies. Let’s focus our creativity and passion on developing new approaches that work as effectively as possible and stop wasting our time striving for good enough.

Take care,

Signature

What Is Data Visualization?

May 4th, 2017

Since I founded Perceptual Edge in 2003, data visualization has transitioned from an obscure area of interest to a popular field of endeavor. As with many fields that experience rapid growth, the meaning and practice of data visualization have become muddled. Everyone has their own idea of its purpose and how it should be done. For me, data visualization has remained fairly clear and consistent in meaning and purpose. Here’s a simple definition:

Data visualization is a collection of methods that use visual representations to explore, make sense of, and communicate quantitative data.

You might bristle at the fact that this definition narrows the scope of data visualization to quantitative data. It is certainly true that non-quantitative data may be visualized, but charts, diagrams, and illustrations of this type are not typically categorized as data visualizations. For example, neither a flow chart, nor an organization chart, nor an ER (entity relationship) diagram qualifies as a data visualization unless it includes quantitative information.

The immediate purpose of data visualization is to improve understanding. When data visualization is done in ways that do not improve understanding, it is done poorly. The ultimate purpose of data visualization, beyond understanding, is to enable better decisions and actions.

Understanding the meaning and purpose of data visualization isn’t difficult, but doing the work well requires skill, augmented by good technologies. Data visualization is primarily enabled by skills—the human part of the equation—and these skills are augmented by technologies. The human component is primary, but sadly it receives much less attention than the technological component. For this reason data visualization is usually done poorly. The path to effective data visualization begins with the development of relevant skills through learning and a great deal of practice. Tools are used during this process; they do not drive it.

Data visualization technologies only work when they are designed by people who understand how humans interact with data to make sense of it. This requires an understanding of human perception and cognition. It also requires an understanding of what we humans need from data. Interacting with data is not useful unless it leads to an understanding of things that matter. Few data visualization technology vendors have provided tools that work effectively because their knowledge of the domain is superficial and often erroneous. You can only design good data visualization tools if you’ve engaged in the practice of data visualization yourself at an expert level. Poor tools exist, in part, because vendors care primarily about sales, and most consumers of data visualization products lack the skills that are needed to differentiate useful from useless tools, so they clamor for silly, dysfunctional features. Vendors justify the development of dumb tools by arguing that it is their job to give consumers what they want. I understand their responsibility differently. As parents, we don’t give our children what they want when it conflicts with what they need. Vendors should be good providers.

Data visualization can contribute a great deal to the world, but only if it is done well. We’ll get there eventually. We’ll get there faster if we have a clear understanding of what data visualization is and what it’s for.

Take care,

Signature

We Never Think Alone: The Distribution of Human Knowledge

May 3rd, 2017

Only a small portion of the knowledge that humans have acquired resides in your head. Even the brightest of us is mostly ignorant. Despite this fact, we all suffer from the illusion that we know more than we actually do. We suffer from the “knowledge illusion,” in part, because we fail to draw accurate boundaries between the knowledge that we carry in our own heads and the knowledge that resides in the world around us and the minds of others. A wonderful new book by two cognitive scientists, Steven Sloman and Philip Fernback, titled The Knowledge Illusion: Why We Never Think Alone, describes the distributed nature of human knowledge and suggests how we can make better use of it.

The Knowledge Illusion

The following four excerpts from the book provide a sense of the authors’ argument:

The human mind is both genius and pathetic, brilliant and idiotic. People are capable of the most remarkable feats, achievements that defy the gods…And yet we are equally capable of the most remarkable demonstrations of hubris and foolhardiness. Each of us is error-prone, sometimes irrational, and often ignorant…And yet human society works amazingly well…

The secret of our success is that we live in a world in which knowledge is all around us. It is in the things we make, in our bodies and workspaces, and in other people. We live in a community of knowledge.

The human mind is not like a desktop computer, designed to hold reams of information. The mind is a flexible problem solver that evolved to extract only the most useful information to guide decisions in new situations. As a consequence, individuals store very little detailed information about the world in their heads. In that sense, people are like bees and society a beehive: Our intelligence resides not in individual brains but in the collective mind.

Being smart is about having the ability to extract deeper, more abstract information from the flood of data that comes into our senses…The mind is busy trying to choose actions by picking out the most useful stuff and leaving the rest behind. Remembering everything gets in the way of focusing on the deeper principles that allow us to recognize how a new situation resembles past situations and what kinds of actions will be effective.

In a world with rapidly increasing stores of information, it is critical that we learn how to find the best information (the signals) among the mounds of meaningless, erroneous, or irrelevant information (the noise) that surrounds us. Individually, we can only be experts in a few domains, so we must identify and rely on the best expertise in other domains. We don’t benefit from more knowledge; we benefit from valid and useful knowledge. One of the great challenges of our time is to find ways to identify, bring together, and encourage the best of what we know.

The power of crowdsourcing and the promise of collaborative platforms suggest that the place to look for real superintelligence is not in a futuristic machine that can outsmart human beings. The superintelligence that is changing the world is the community of knowledge. The great advances in technology aren’t to be found in creating machines with superhuman horsepower; instead, they’ll come from helping information flow smoothly through ever-bigger communities of knowledge and by making collaboration easier. Intelligent technology is not replacing people so much as connecting them.

 This book is well written and accessible. It provided me with many valuable insights. I’m confident that it will do the same for you.

Take care,

Signature

Can VR Enhance Data Visualization?

May 1st, 2017

In addition to the growing hype about AI (artificial intelligence) and NLP (natural language processing) as enhancers of data visualization, VR (virtual reality) is now making the same erroneous claim. How does VR enhance data visualization? Here’s an answer that was recently given by Sony Green, the head of business development for a startup named Kineviz:

For a lot of things, 2D is still the best solution. But VR offers a lot of advantages over existing data visualization solutions, especially for certain kinds of data. When you get into really high dimensional data, something like 100 different dimensions per node. It’s difficult to keep track of all that info with lots of 2D graphics and it becomes a very large cognitive load for people to track them on multiple screens at once.

VR allows us to tap into our natural ability to process special information. Without looking around, we have an innate understanding of the spaces we are in because that’s how our brains are wired. In a simulated environment created by VR, we use these natural ways of processing information that a 2D screen can’t offer.

Furthermore, VR opens up use cases that were previously impossible by lowering the barrier for common users. You don’t have to be a data scientist: anyone who can play a game can use VR to explore data science in a way that is intuitive.

TechNode, Emma Lee, April 28, 2017

So, VR supposedly “offers a lot of advantages.” What are these advantages? According to Green, VR makes it possible for our brains to process “100 different dimensions.” This isn’t true. VR adds a simulation of a single spatial dimension: depth. I can think of no way that VR can enable our brains to process more than one additional dimension of data compared to what we can process using 2-D displays. Plus, the simulation of depth is of little benefit, for we don’t perceive depth well, unlike our perception of 2-D space (up and down, left and right). And let us not forget that we can only hold from three to four chunks of visual information in working memory at once, so even if VR could add many more dimensions of data in some way, it would be of no use to our limited brains if we weren’t able to process all of those dimensions simultaneously.

What else can VR do? “VR allows us to tap into our natural ability to process special information.” Apparently, this special information has something to do with spatial awareness, but how does this help us visualize data? According to Sony Green, we’d better figure it out and get on board, because, with VR, data exploration and analysis can be done by anyone who can play a game. Who knew that data analysis was so easy? The claim that “without looking around, we have an innate understanding of the spaces we are in” is humorous. We have no understanding of the spaces that we’re in without looking around or exploring them in some other way, such as by touch.

VR attempts to simulate the 3-D world that we live in. In the actual world, I can place data visualizations throughout a room on various screens or printed pages, and I can then walk up to and examine one at a time. Similarly, VR can place data visualizations throughout a virtual room, and when it does I must still virtually walk around to view them one at a time. Are the data visualizations themselves enhanced? Not in the least. Making the graphs appear more three-dimensional than they appear on a flat screen adds no real value.

Years ago I was approached by someone who was creating data visualizations for the VR environment Second Life. She was enthusiastic about her work. When I took a look, I found a collection of 3-D bar graphs, line graphs, scatterplots, etc., which I could walk around and even upon, looking down from the lofty heights of tall bars and lines, and with virtual superpowers I could even fly around them, but this actually made the graphs harder to read. It is much easier and efficient to sit still and view 2-D data visualizations on my desktop monitor.

Just to make sure that I haven’t missed any new uses of VR for data visualization, I did a quick search and found nothing but more of the same. In the example below, the Wall Street Journal allows us to ride along a line graph of the NASDAQ, much like riding a roller coaster:

WSJ VR

Imagine that you’re viewing this using a VR headset. What useless fun! And in the example below, Nirvana Labs allows us to view a map (currently off the bottom of the screen), a bar graph (the transparent vertical cylinders), and a line graph (the bottom edge appears at the top of the screen), but they are much harder to read in VR than they would be as a 2-D screen display. A VR headset makes it possible for us to walk around the graphs, but that isn’t useful.

Nirvana VR

I have seen 3-D displays of physical objects that are actually useful, but 3-D displays of graphs are almost never useful, and placing them in VR doesn’t change this fact.

Don’t let yourself be suckered in by false marketing claims. Software vendors are always looking for some new way to separate us from our money. When you encounter people who claim that VR adds value to data visualization, ask them to prove it. Request an example of VR that works better than a 2-D display of the same data. Look past the cool factor and attempt to make sense of the data. If you come across a beneficial use case for data visualization in VR, I’d love to see it.

Take care,

Signature

Do tooltips reduce the need for precision in graphs?

April 18th, 2017

This blog entry was written by Nick Desbarats of Perceptual Edge.

Should you include grid lines in your graph? If so, how many? Is it O.K. to add another variable to your graph by encoding it using size (for example, by varying the size of points in a scatterplot) or color intensity (for example, by representing lower values as a lighter shade of blue and higher ones as darker)? How small can you make your graph before it becomes problematic from a perceptual perspective? These and many other important graph design decisions depend, in part, on the degree of precision that we think that our audience will require. In other words, they depend on how precisely we think that our audience will need to be able to “eyeball” (i.e., visually estimate) the numerical values of bars, lines, points, etc. in our graph. If we think that our audience will require no more than approximate precision for our graph to be useful to them, then we can probably safely do away with the gridlines and add that other variable as color intensity or size. If we have limited space in which to include our graph on a page or screen, we could safely make it quite small since we know that, in this particular case, the reduction in precision that occurs when reducing a graph’s size wouldn’t be a problem.

Small Graph

Our audience won’t be able to eyeball values all that precisely (what exactly are Gross Sales or Profit for the South region?), though such a design would provide enough precision for our audience to see that, for example, the West had the highest Gross Sales, and that it was about four times greater than the East, which had relatively low Profit, etc. Despite its lack of high precision, this graph contains many useful insights and may be sufficient for our audience’s needs.

If, on the other hand, we’ve spoken to members of our audience and have realized that they’ll need to visually estimate values in the graph much more precisely in order for the graph to be useful to them, then we’re going to have to make some design changes to increase the precision of this graph. We might need to add gridlines, break the quantitative scale into smaller intervals, find a more precise way to represent Profit (for example, by encoding it using the varying lengths of bars or the 2D positions of points), and perhaps make our graph larger on the screen or page.

Side-by-side graphs

As you probably noticed, adding gridlines, making the graph larger, and breaking the quantitative scale into smaller intervals all came at a cost. While these changes did increase the precision of our graph, they also made it busier and bigger. Being forced to make these types of design trade-offs is, of course, very common. In nearly every graph that we design, we must struggle to balance the goal of providing the required level of precision with other, competing goals such as producing a clean, compact, uncluttered, and quick-to-read design.

What if we know, however, that our graph will only ever be viewed in a software application that supports tooltips, i.e., that allows our audience to hover their mouse cursor or finger over any bar, line, point, etc. to see its exact textual value(s) in a small popup box?

Small graph with tooltip

In this case, perfect precision is always available if the audience ever needs to know the exact value of any bar, point, etc. via what researcher Ben Shneiderman termed a “details-on-demand” feature. In the 1990’s, Shneiderman noted that suppressing details from a visualization and showing specific details only when the user requests them enables the user to see a potentially large amount of information without being overwhelmed by an overly detailed display. A well-designed visualization enables users to see where interesting details may lie and the details-on-demand feature then enables them to see those details when they’re needed, but then quickly hide them again so that they can return to an uncluttered view and look for other potentially interesting details.

So, does the availability of details-on-demand tooltips mean that we can stop worrying about precision when making design decisions and optimize solely for other considerations such as cleanness? Can we set the “precision vs. other design considerations” trade-off aside in this case? I think that the answer to this question is a conditional “yes.” We can worry less about precision if we know all of the following:

  1. Our graph will only be viewed in a software application that supports tooltips (which most data visualization products now support and enable by default). If we think that there’s anything more than a small risk that our audience will, for example, share the graph with others by taking a screenshot of it or printing it (thereby disabling the tooltips), then precision must become one of our primary design considerations again.
  2. Our audience is aware of the tooltips feature.
  3. Our audience will only need to know precise values of the bars, points, lines, etc. in our graph occasionally. If we think that our audience will frequently need to know the precise values, giving them a lower-precision graph with tooltips will force them to hover over elements too often, which would obviously be far from ideal. In my experience, however, it’s rare that audiences really do need to know the precise values of elements in a graph very often—although they may claim that they do.

If we don’t know if all three of these conditions will be true for a given graph, we don’t necessarily have to ramp up its size, add gridlines, etc. in order to increase its precision, though. If we have a little more screen or page real estate with which to work, another solution is to show a clean, compact, lower-precision version of our graph, but then add the textual values just below or to the side of it. If the audience requires a precise value for a given bar, point, etc. in our graph, it’s available just below or beside the graph.

Small graph and table
Graph with columns of text

If we think that, for example, our audience is going to be embedding this graph into a PDF for a management meeting (thus disabling the tooltips) and that higher precision will be required by the meeting attendees, this would be a reasonable solution. For some graphs, however, the set of textual values may end up being bigger than a higher-precision of version of the graph, in which case the higher-precision graph may actually be more compact.

As with so many other data visualization design decisions, knowing how to balance precision versus other design considerations requires knowing your audience, what they’re going to be using your visualizations for, and—particularly in this case—what devices, applications or media they’ll be using to view the visualization.

Nick Desbarats