Bar Graph Error Bars

Overview

Graphs are the quintessential way to visualize data. Bar graphs in particular have become a popular method for reporting comparative data. But such graphs, when not prepared correctly, can be misleading at best or downright deceiving at worst. As Mark Twain was known to say, “There are lies, damned lies, and statistics.”

One factor that can greatly influence the interpretation of a bar graph is the margin of error. When a bar graph is used to report statistical information, such as from a poll or survey, the average value for each question is often used for the height of each bar. However, if we were to ask the same poll or survey again with a different sample of people, we may find a different result (i.e., a different height of the bar). This is what we call “error” in survey research.

The margin of error is basically a range of values for which we’d be confident that if we ran our poll or survey again, only this time with a different sample of people within the same broader population, the average would be somewhere within that range. We often use a 95% level of confidence for the margin of error. Statistical formulas can help you calculate this using your sample size, average, and standard deviation (a measure of how “spread out” your data are). Generally, the margin of error decreases as sample size increases or standard deviation decreases.

There’s a simple way of reporting the margin of error in a bar graph--error bars. Error bars are actually lines in the shape of a capital “T” that are centered at the height of each bar on the bar graph (i.e., each bar’s average). Below is an example, a happiness survey with 20 people spanning three days. Although one bar is higher than the others, the large margin of error (evidenced by the overlapping error bars) suggests that this difference may just be due to chance sampling.

Despite their usefulness for reporting more accurate statistical information, error bars may come across as confusing to those unfamiliar with them. It remains an open question whether the reporting of error bars increases statistical rigor at the cost of understandability. In essence, is there a trade-off?

To answer this question, we designed a large-scale research experiment involving a bar graph and error bars.

The Experiment

1,200 people were recruited from the research platform Prolific to take part in a research study. Participants viewed a graph that either included or excluded error bars (randomly assigned), then answered survey questions to measure our outcomes of interest, specifically how understandable, rigorous, and interesting the bar graph seemed.

To extend the generalizability of our findings, we also randomized another feature of the bar graph, whether its bars were vertically or horizontally formatted. This enabled us to test the main effect of error bars, any main effect for vertical vs. horizontal orientation, and any potential interaction between the two.

Participants were provided a description of the bar graph and the study it illustrated, followed by the bar graph itself. We randomized whether the bar graph included or excluded error bars representing a 95% level of confidence (as well as a short sentence describing the error bars) while also randomizing whether the graph’s bars ran vertically or horizontally. The description and vertical bar graphs are depicted below. Horizontal bar graphs can be found on our separate study page.

Below is a graph of some survey results about how much men and women enjoy different types of games. 400 people from Amazon MTurk were surveyed. Each was asked "To what extent do you enjoy playing the following types of games? (1 = Not at all, 7 = Very much)." Answer options were on a 1-7 survey scale.

Please look at this graph and answer the questions that follow.

Below is a graph of some survey results about how much men and women enjoy different types of games. 400 people from Amazon MTurk were surveyed. Each was asked "To what extent do you enjoy playing the following types of games? (1 = Not at all, 7 = Very much)." Answer options were on a 1-7 survey scale. Error bars represent the margin of error with a 95% confidence interval.

Please look at this graph and answer the questions that follow.

After viewing the bar graph, participants were asked three survey questions to measure our outcomes. Answer options were on a 1-7 scale (1 = Not at all, 7 = Very much). The three survey questions are listed below.

“To what extent is this graph easy to understand?”
“How scientifically rigorous is this graph?”
“How interesting is this graph?”

Results

We analyzed the data using OLS regression analyses and found a number of interesting effects. First, we tested the “trade-off hypothesis,” that including error bars in the graph might increase perceived rigor at the expense of understandability.

Although there was a very small 2.1% reduction in understandability between the no-error-bar graph (mean = 6.11) and the error-bar graph (mean = 5.98), (p = 0.054), there was a larger 8.3% increase in perceived rigor between the no-error-bar graph (mean = 3.76) and the error-bar graph (mean = 4.07), (p < 0.001). Adding error bars even increased interestingness slightly, by 3.8% (p = 0.057). Comparisons using our own bar graphs are presented below.

Interestingly, error bars were particularly beneficial for the horizontal bar graph. Including error bars in the horizontal bar graph significantly increased its perceived rigor by 16%, or 0.58 points on a 1-7 scale (p < 0.001). Error bars had no effect on the vertical bar graph (p = 0.627). The results were further supported by a significant interaction using OLS regression (p = 0.003). No such interaction was found for understandability (p = 0.548) nor interestingness (p = 0.102).

Conclusion

Including error bars to disclose the margin of error not only makes your graph more objectively rigorous, it also makes it look more rigorous to readers. It does so at little cost to understandability, and is particularly beneficial for horizontal bar graphs. So next time you’re creating a bar graph to present sampling data, consider adding error bars.

Methods Note

For our analyses, two-sample t-tests were used to test for significant differences in how understandable, rigorous, and interesting each bar graph seemed to be. Ordinary Least Squares (OLS) regression analyses with interaction terms were used to test whether the results differed based on other factors, in this case horizontal vs. vertical orientation. Our statistical significance threshold was a p-value below 0.05. The data and survey materials used for this study are available upon request.

Overview

The Experiment

Results

Conclusion

Methods Note

Research

Company

Support