Introduction & Practice Data File
There's 3 ways to create boxplots in SPSS:
The first approach is the simplest but it also has fewer options than the others. This tutorial walks you through all 3 approaches while creating different types of boxplots.
- Boxplot for 1 Variable - 1 Group of Cases
- Boxplot for Multiple Variables - 1 Group of Cases
- Boxplot for 1 Variable - Multiple Groups of Cases
- Tip 1 - Remove Outliers for Single Group
- Tip 2 - Show Outlier Values in Boxplot
- Tip 3 - Adding Titles to Boxplots
Example Data
All examples in this tutorial use driving-test.sav, partly shown below.
Our data file contains a sample of N = 238 people who were examined in a driving simulator. Participants were presented with 5 dangerous situations to which they had to respond as fast as possible. The data hold their reaction times and some other variables.
Boxplot for 1 Variable - 1 Group of Cases
We'll first run a boxplot for the reaction times on trial 1 for all cases. One option is
which opens the dialogs shown below.Completing these steps results in the syntax below.
EXAMINE VARIABLES=r01
/COMPARE VARIABLE
/PLOT=BOXPLOT
/STATISTICS=NONE
/NOTOTAL
/ID=id
/MISSING=LISTWISE.
Result
Our boxplot shows some potential outliers as well as extreme values. Interpreting these -and all other boxplot elements- is discussed in Boxplots - Beginners Tutorial. Also note that our boxplot doesn't have a title yet. Options for adding it are discussed in Tip 3 - Adding Titles to Boxplots.
Boxplot for Multiple Variables - 1 Group of Cases
We'll now create a single boxplot for our 5 reaction time variables for all participants. We navigate to
and fill out the dialogs as shown below.
“Dependents together” means that all dependent variables are shown together in each boxplot. If you enter a factor -say, sex- you'll get a separate boxplot for each factor level -female and male respondents. “Factor levels together” creates a separate boxplot for each dependent variable, showing all factor levels together in each boxplot.
“Exclude cases pairwise” means that the results for each variable are based on all cases that don't have a missing value for that variable. “Exclude cases listwise” uses only cases without any missing values on all variables.
A minor note here is that many SPSS users select “Normality plots and tests” in this dialog for running a
Anyway. Completing these steps results in the syntax below. Let's run it.
EXAMINE VARIABLES=r01 r02 r03 r04 r05
/COMPARE VARIABLE
/PLOT=BOXPLOT
/STATISTICS=NONE
/NOTOTAL
/ID=id
/MISSING=PAIRWISE /* IMPORTANT! */.
Result
Now, before inspecting our boxplot, take a close look at the Case Processing Summary table first.
The first columns tells how many cases were used for each variable. Note that trial 5 has N = 205 or 86.1% missing values. Remember that “Exclude cases listwise” was the default in the Explore dialog. If we hadn't changed that, then none of our variables would have used more than N = 33 cases. The actual boxplot, however, wouldn't show anything wrong. This really is a major pitfall. Please avoid it.
Anyway, the figure below shows our actual boxplot.
Note that we already saw the first boxplot bar in our previous example. Second, trials 2 and 4 seem strongly positively skewed. Both variables look odd. We'd better inspect their histograms to see what's really going on.
Boxplot for 1 Variable - Multiple Groups of Cases
We'll now run a boxplot for trial 3 for age groups separately. We first navigate to
and fill out the dialogs as shown below.Select “Point ID Label” in this tab and then drag & drop r03 into the ID box on the canvas. Doing so will show actual outlier values in the final boxplot.
Completing these steps results in the syntax below.
GGRAPH
/GRAPHDATASET NAME="graphdataset" VARIABLES=agegroup r03 MISSING=LISTWISE REPORTMISSING=NO
/GRAPHSPEC SOURCE=INLINE.
BEGIN GPL
SOURCE: s=userSource(id("graphdataset"))
DATA: agegroup=col(source(s), name("agegroup"), unit.category())
DATA: r03=col(source(s), name("r03"))
GUIDE: axis(dim(1), label("Age Group"))
GUIDE: axis(dim(2), label("Reaction time trial 3"))
GUIDE: text.title(label("I CAN TYPE MY AMAZING TITLE RIGHT HERE!"))
SCALE: cat(dim(1), include("1", "2", "3"))
SCALE: linear(dim(2), include(0))
ELEMENT: schema(position(bin.quantile.letter(agegroup*r03)), label(r03))
END GPL.
Result
This boxplot shows increasing medians and standard deviations with increasing ages. Note that our boxplot also shows outlier values. In this example, these are reaction times of 1,441 and 1,455 milliseconds but for the youngest age group only.
Tip 1 - Remove Outliers for Single Group
If you'd like to remove outliers based on boxplot results, you'd normally set them as user missing values. For example, MISSING VALUES r03 (1441 THRU HI). sets values of 1441 and higher as missing for r03. In our example, however, this won't work: the aforementioned values are potential outliers only for the youngest age group. For the other age groups, they're within a normal range.
A solution is converting these values into different values for the youngest age group only. One option is combining DO IF with RECODE. The syntax below, however, shows a shorter option based on IF.
means r03 by agegroup
/cells count min max mean stddev.
*Recode potential outliers into 999999998 but only for agegroup 1.
if(agegroup = 1 and r03 >= 1441) r03 = 999999998.
*Set recoded outliers as user missing values.
missing values r03 (999999998).
*Apply value label to recoded outliers.
add value labels r03 999999998 'Value removed because outlier'.
*Rerun checktable.
means r03 by agegroup
/cells count min max mean stddev.
Tip 2 - Show Outlier Values in Boxplot
You can show data values for potential outliers and extreme values in boxplots. This only works if each boxplot involves a single dependent variable. Simply use this dependent variable as the ID variable too.
The only dialog that supports this is the Chart Builder. If you prefer the other dialogs, modifying the /ID subcommand in the syntax also does the trick.
EXAMINE VARIABLES=r03 BY agegroup
/PLOT=BOXPLOT
/STATISTICS=NONE
/NOTOTAL
/ID=r03. /*Label outliers with actual data values.
Tip 3 - Adding Titles to Boxplots
There's 3 options for showing titles in SPSS boxplots:
- create your boxplot via the Chart Builder as in example 3;
- use a chart template that has a fixed title and/or subtitle;
- add a title manually after creating your boxplot.
For this last option, open a Chart Editor window by double-clicking your chart. You can now add a title from the
menu.Note that you can adjust your title after adding it.
Final Notes
There's many more variations on boxplots, especially clustered boxplots. However, I think you'll get them done fairly easily after studying this tutorial.
If you've any questions or remarks, please throw me a comment below.
Thanks for reading!
THIS TUTORIAL HAS 14 COMMENTS:
By Jon Peck on June 18th, 2024
And a fourth method (!): graphs>Graphboard Template Chooser.
There is also an extension command installable from the Extension Hub, STATS BAGPLOT, that does a two-dimensional generalization of a boxplot
By Ruben Geert van den Berg on June 19th, 2024
Interesting suggestion!
However, the basic boxplot in the Graphboard Template Chooser seems to require an X-axis variable.
So it may be suitable for comparing subgroups but not variables - unless I missed something...
By Jon K Peck on June 19th, 2024
The GTC does require an X axis variable. Most often, boxplot charts would be used for group comparisons. BTW, the GTC also provides vase or violin plots, which generalize the standard boxplot.
By Ruben Geert van den Berg on June 20th, 2024
I find that a little tricky because it's not obvious what to use as the X-variable.
However, this subjective decision may have consequences for which values are labeled as outliers.
So if I don't want to see outliers, I'll just try several likely X-variables and settle for the one that doesn't result in any outliers. Right?
Also, I realized that one could simply add a constant to some dataset and use that as the X-variable if one does not want to compare subgroups. So that would overcome the "mandatory X problem".