The ability to create eye-catching visuals is not an inherited skill. The skills required for most effectively displaying information are not intuitive and rely largely on principles that can be learned. There are, indeed, some visualization techniques that are best left to designers, but there are others, e.g., audit findings, key performance indicators (KPIs) and cybermonitoring indicators, that do not need the designer’s touch.
Scientific evidence supports the importance of data visualization. As Neil DeGrasse Tyson once said, “The good thing about science is that it’s true whether or not you believe in it.” And there is as much science behind data visualization as there is behind analytics.
The brain receives 8.96 megabits of data from the eye every second. The average person comprehends 120 words per minute when reading, which is equivalent to 81.6 bits of data per second.1
Humans are not wired to read quickly; they are wired to visualize quickly. Brains perform more efficiently and more information is retained when the learning comes from visuals.
A well-designed dashboard allows the viewer to analyze massive data sets at a glance. Learning how to represent data in a way that immediately tells a story, sparks an insight or provokes discussion is as important as being able to run data analytics.
Data visualization is comprised of a set of tools and techniques to create graphs (also called charts or diagrams) that, when used the right way, are extremely powerful. It is not about flashy 3-dimensional rainbow graphs. Data visualization is about being simple and representing data effectively.
Before considering design principles, there are important layers to be covered in preparation for the design layer.
Know the Data
What data types are in the data set? Are the variables mostly categorical, e.g., regions and departments? Do the data have geospatial variables, such as country, city, address or postal codes? What about dates? What are the quantitative variables? Is the categorical data nominal, ordinal, hierarchical or interval (≥ 70 points, < 70 and ≥ 50 points, and < 50) in nature?
Understating the data types can aid in selecting the best graph to visualize the data. After identifying the data types, it is necessary to understand the relationships among all variables, because one variable by itself is not very interesting. The first question to ask about a number is “Compared to what?”2
Relationships are data’s way to tell their story. It is important to identify whether one or more of these relationships exist in the data set: nominal comparison, time-series, correlation, ranking, deviation, distribution, part-to-whole relationships and geospatial.
Know the Message
What is being conveyed? Is the purpose to find the story the data are telling or simply to provide an explanation of a known issue?
Visualization for exploring is useful when what the data have to tell is unknown and it is still necessary to get a sense of the relationships and patterns contained within it for the first time. It allows for imprecision.
Certain graphs are better at enabling exploration, while others, such as pie charts, provide only a simple explanation. Interactive dashboards allow internal auditors to find their own stories in the data. Some graphs are more forgiving, allowing for the use of many categories and colors since users will be able to apply filters.
Visualization for explaining is best when it is cleanest.3 Here, the ability to pare down the information to its simplest form—to strip away the noise entirely—will increase the efficiency with which a decision maker can understand it. This is the approach to take once it is understood what the data are saying and when that message is ready to be communicated to the audience.
Know the Audience
Knowing the audience goes a long way toward making a connection and maximizing the chances that management will understand and retain the information being conveyed. What experience, skill and understanding of the subject will the audience members have? What is their ability to focus? How interested are they?
Organize, group or prioritize the information in order to emphasize what should be conveyed to the audience, and do not bury the key messages in a mass of detail and graphs.
Most audiences are used to plain bar, line and pie charts. Bar charts can be useful to convey information, but it may help to provide the audience with some initial attention-grabbing visuals to “wow” them and, as a result, draw their attention to the story that follows. Consider compelling visuals to include everything that calls the reader’s attention without sacrificing or obfuscating the message. Do not be afraid of trying different chart types. As Steve Jobs said, “A lot of times, people don’t know what they want until you show it to them.”
And finally, color blindness and cultural differences must be taken into consideration as well. Both need to be factored in when selecting a color palette for graphs.
Know the Options
This is one of the most interesting parts of data visualization—choosing the correct graph for the data type and relationships in the data set. This is when the relationships identified earlier come to life.
When putting together a graph, one of four things with the data should be shown: a relationship between/among data points, a comparison of data points, a composition of data, a distribution of data or the geospatial location of data points.4
To help determine which graph to use, see figure 1, created by Andrew Abela, author of the book Advanced Presentations by Design.5
Some other graphs not featured in figure 1 are shown in figure 2:
- Bubble chart—This is a variation of the scatter plot. This is one of the charts that should be used only as eye-candy or when getting to know the data set. This chart looks fun, but is ineffective in communicating meaningful data, and it is often troublesome to get the scaling right.
- Tree map—This is a way to compare data types so that categorical data are represented by the colors and numerical variables by their size. Tree maps display large numbers of values that exceed the number that could be displayed simply and effectively with a bar graph.
- Heat map—This is a representation of data in which the individual values contained in a matrix are represented as colors. There are many types of heat maps, but they all have one thing in common—they use color to communicate the relationship between categorical and numerical variables.
- Trellis chart—This is not a grouping of line charts copied and pasted together. It is one graph using the same scale and axes as line charts, but it is divided into categories. Trellis charts are useful for finding the structure and patterns in complex data.
- Word cloud—This is an engaging way to visualize the frequency distribution of words with textual data; however, it should be used sparingly as word clouds cannot show how a word is used. It is best used to highlight categorical data types, e.g., departments with most sales. A word cloud can display the filter being applied if using an interactive tool.
- Bullet graph—This graph was developed by Stephen Few6 to replace the meters and gauges that are often used on dashboards. Its linear, no-frills design provides a rich display of data in a small space, which is essential on a dashboard. Like most meters and gauges, bullet graphs feature a single quantitative measure (i.e., year-to-date revenue) along with complementary measures to enrich the meaning of the featured measure. Its design not only gives it a small footprint, but it also supports more efficient readings than radial meters.
- Box plot—This represents data by showing the lowest value, highest value, median value, and the size of the first and third quartile. The box plot is useful in analyzing small data sets that do not lend themselves easily to histograms.
- Doughnut chart—This is basically a pie chart with a hole in the middle. The beauty of it is that the blank space in the middle gives room to add a text or even one more variables, e.g., total percentage complete. A doughnut chart can be combined with a pie chart to represent more than one categorical variable or data set (assuming those relate somehow). Each variable in a doughnut chart adds a ring to the chart. The first variable is displayed in the center of the chart.
- Map—This is the perfect graph to use if there are geospatial data types in the data set. It is not necessary to have street addresses; just the city is sufficient to map, for example, where most of the transactions are occurring.
For the purpose of awareness, figure 3 has examples of other graphs that may be useful. Many data visualization gurus make strong cases against using some of these for very valid reasons. These graphs combine two types of graphs in order to best convey the information.
As part 1 of this series, this article is meant to provide the basics in data visualization, but it also establishes the importance of data science. Knowledge is not power. Power is what is done with this knowledge and the decisions and actions taken as a result of understanding the information. And data visualization is key in making sense of it. Part 2 of this article will discuss how to make appropriate and effective use of colors, fonts and gestalt principles, and how to avoid chartjunk.
Endnotes
1 Koch, K., “How Much the Eye Tells the Brain,” Current Biology, 25 July 2006; 16(14): p. 1428–1434, www.ncbi.nlm.nih.gov/pmc/articles/PMC1564115/
2 Few, Stephen, “Effectively Communicating Numbers,” Perceptual Edge, November 2005
3 Steele, Julie; “Why Data Visualization Matters,” Radar, 15 February 2012, http://radar.oreilly.com/2012/02/why-data-visualization-matters.html
4 Op cit, Few
5 Abela, Andrew; “Choosing a Good Chart,” The Extreme Presentation Method, 6 September, 2006, http://extremepresentation.typepad.com/blog/2006/09/choosing_a_good.html
6 Stephen Few is an innovator, consultant and educator in the fields of business intelligence and information design. http://perceptualedge.com/about.php
Karina Korpela, CISA, CISM, CISSP, PMP, is the IT audit manager at AltaLink, a Berkshire Hathaway Energy Company and Alberta’s largest transmission provider. She has more than 13 years of experience with IT risk and controls, performing data analytics and developing continuous controls monitoring applications for many different business processes. She began her career at Coopers & Lybrand as a system administrator and she was later invited to join its Computer Audit Assistance Group (CAAG) as an IT auditor. Korpela later became a manager in PricewaterhouseCooper’s Global Risk Management Solution practice. She can be reached at karina.korpela@altalink.ca.