Making publication quality plots using R's base graphics


Hi all,

Here are some notes from today’s R users group on creating publication quality plots. We’re focusing on plotting using the base R graphics (so not ggplot).

My goal is to

  • share some of my favourite plotting tricks
  • provide you with a solid worked example showing how to use them.

This post does not try to give a complete coverage of the topics discussed, just a
working example.

Please download the data from this link and save in your current working directory: gapminder-FiveYearData.csv. This the gapminder data on gdp per capita through time.

We want to make a plot, maybe a little like Han’s Rosling’s plot from his great TED talk.

First, here is the not so nice version using R’s defaults:

data <- read.csv('gapminder-FiveYearData.csv', stringsAsFactors=FALSE)
df <- data[data$year==2002, ]
plot(df$gdpPercap, df$lifeExp)

Now let’s make something a bit nicer. We’re going to:

  • log-scale the axes
  • put in a label
  • make nicely formatted power expressions as labels
  • make points a little transparent and coloured by continent
  • size the points by country’s population size

You’ll want to read in some handy utility functions, included at the end of this post – copy them and save into a file called plot-utils.R.


This is a file of helpful functions I copy into many of my projects.

Using some of these function, we can make a nicer plot:

plot(df$gdpPercap, df$lifeExp, log="xy", type='n', ann=FALSE, axes=FALSE,
  xlim=c(10^2, 10^5), ylim = c(30, 90))
points(df$gdpPercap, df$lifeExp, col=make.transparent($continent), 0.8), pch=16,
  cex = linear.rescale(sqrt(df$pop)))
mtext("GDP per capita ($ p.a.)", 1, line=3)
mtext("Life expectancy (yr)", 2, line=3)
axis(2, las=1)
label("2002", cex=2)

Note that part of the trick to build the plot up in layers.

Next we want to wrap this into a function, so that we can make a similar plot
for any year:

my_plot <- function(data, year) {

  df <- data[data$year==year, ]

  plot(df$gdpPercap, df$lifeExp, log="xy", type='n', ann=FALSE, axes=FALSE,
    xlim=c(10^2, 10^5), ylim = c(30, 90))
  points(df$gdpPercap, df$lifeExp, col=make.transparent($continent), 0.8), pch=16,
    cex = linear.rescale(sqrt(df$pop)))
  mtext("GDP per capita ($ p.a.)", 1, line=3)
  mtext("Life expectancy (yr)", 2, line=3)
  axis(2, las=1)
  label(year, cex=2)

Now we can make a plot for any old year

my_plot(data, 1987)

Finally, save to pdf use the handy to.pdf function (for more on this function see Rich FitzJohn’s blog post on the topic)

  my_plot(data, 1987),
  "plot.pdf", width= 6, height=6)

Now we have things set up so nicely it’s also possible to generate a series of plots, one for each year:

for(y in unique(data$year)) {
    my_plot(data, y),
      sprintf("plot%s.pdf", y),
      width= 6, height=6)

Now here’s my list of handy plotting utility functions

Save these into a file called plot-utils.R.

# Returns up to 80 unique, nice colors, generated using
# Starts repeating after 80

##' Adds a text label at a fixed specified percentage inside plot boundaries.
##' To get the label outside the boundary change px or py to be negative.
##' @title Adds a text label at a fixed specified percentage inside plot boundaries.
##' @param px position in x dimension
##' @param py position in y dimension
##' @param ... other arguments to pass through to \code{axis} function
label <- function(text, px=0.03, py=NULL, ..., adj=c(0, 1)) {
  if (is.null(py)) {
    fin <- par("fin")
    r <- fin[[1]] / fin[[2]]
    if (r > 1) { # x is longer.
      py <- 1 - px
      px <- (1 - py) / r
    } else {
      py <- 1 - px * r
  usr <- par("usr")
  x <- usr[1] + px*(usr[2] - usr[1])
  y <- usr[3] + py*(usr[4] - usr[3])

  ## NOTE: base 10 log:
  if (par("xlog")) {
    x <- 10^x
  if (par("ylog")) {
    y <- 10^y

  text(x, y, text, adj=adj, ...)

##' Make a given color transparent
##' @param col Base colour
##' @param opacity Desired opacity
##' @author Rich FitzJohn
make.transparent <- function(col, opacity=.5) {
  alpha <- opacity
  if (length(alpha) > 1 && any( {
    n <- max(length(col), length(alpha))
    alpha <- rep(alpha, length.out=n)
    col <- rep(col, length.out=n)
    ok <- !
    ret <- rep(NA, length(col))
    ret[ok] <- make_transparent(col[ok], alpha[ok])
  } else {
    tmp <- col2rgb(col)/255
    rgb(tmp[1,], tmp[2,], tmp[3,], alpha=alpha)

##' Save context of \code{expr} to the specified device. For more information see
##' @param expr Expression creating plot
##' @param dev Plotting device, e.g. \code{pdf} or \code{png}
##' @param ... Other arguments to pass through to device
##' @author Rich FitzJohn <- function(expr, dev, filename, ..., verbose=TRUE) {
  if ( verbose )
    cat(sprintf("Creating %s\n", filename))
  dev(filename, ...)

to.pdf <- function(expr, filename, ...) {, pdf, filename, ...)

to.png <- function(expr, filename, ...) {, png, filename, ...)

##' Add log-scaled axes to current plot
##' If provided, uses specified values of \code{at}
##' Otherwise generates suitable values.
##' @param side Side of plot to add axis
##' @param label Add labels?
##' @param wholenumbers Only use whole numbers?
##' @param labelEnds Include label ends?
##' @param las text orientation
##' @author Daniel Falster
axis.log10 <- function(side=1, add.labels=TRUE, wholenumbers=TRUE,
  labelEnds=TRUE, las=1) {

  #get range on axis
  if(side ==1 | side ==3) {
    r <- par("usr")[1:2]   #upper and lower limits of x-axis
    logged <- par("xlog")
  } else {
    r <- par("usr")[3:4] #upper and lower limits of y-axis
    logged <- par("xlog")

  # make pretty intervals
  at <- pretty(r)

  # drop ends if desirable
    at <- at[at > r[1] & at < r[2]]

  # restrict to whole numbers if desirable

  labels <-, lapply(at, function(i) bquote(10^.(i))))

  at <- 10^at

  # make labels
  if(add.labels) {
    axis(side, at=at, labels=labels, las=las)
  } else {
    axis(side, at=at, labels=FALSE, las=las)

is.wholenumber <-  function(x, tol = .Machine$double.eps^0.5) {
  abs(x - round(x)) < tol

##' Return a vector of colours based on values of \code{x}
##' @param x Categorical data
##' @param v Range of rescaled data (min, max)
##' @author Daniel Falster <- function(x=NULL){

  if(is.null(x)) {
  v <- unique(x)

  ret <- niceColors(length(v))
  names(ret) <- as.character(v)

##' Linear rescale x between range specified by \code{r.out}
##' @param x Data to be rescaled
##' @param r.out Range of rescaled data (min, max)
##' @author Daniel Falster
linear.rescale <- function(x, r.out= c(0.2, 10)) {
  p <- (x - min(x)) / (max(x) - min(x))
  r.out[[1]] + p * (r.out[[2]] - r.out[[1]])


Thanks Dan for reviewing your last session here! This gives me a chance to go through it and finally find out the REAL difference between R basic plotting and ggplot2…


Thank you Dan for sharing your plot utils file!
I just thought I’d also share Dan’s advice on how to generate a color ramp or gradient. You can use the RColorBrewer package, which is quite simple. You can also use to color Ramp function in the package grDevices.

Syntax: colorRampPalette(c(“blue”, “white”, “red”))(100)

This returns 100 colors along a blue-white-red color ramp. A bit weird I know, but the colorRampPalette function returns another function and the (100) is its argument.


Differences between basic plots and ggplot2

Since we had this discussion now once in a while I would like to make it public now to make the most out of it. It just fits right in here.

I had the one or other read through Wickham’s ggplot2 book and the first few chapter try to point out the differences to other packages or also basic plot. Some things I get others not (yet…). I copy/pasted some lines from the book to give you the chance to skim it and give me some feedback where you think the main differences are!

Here we go:

ggplot2 is an R package for producing statistical, or data, graphics, but
it is unlike most other graphics packages because it has a deep underlying

This makes ggplot2 very powerful, because you are
not limited to a set of pre-specified graphics, but you can create new graphics
that are precisely tailored for your problem.

ggplot2 is designed to work in a layered fashion, starting with a layer
showing the raw data then adding layers of annotations and statistical summaries.

Learning the grammar will help you not only create graphics that you know
about now, but will also help you to think about new graphics that would be
even better. Without the grammar, there is no underlying theory and existing
graphics packages are just a big collection of special cases. For example, in
base R, if you design a new graphic, it’s composed of raw plot elements like
points and lines, and it’s hard to design new components that combine with
existing plots. In ggplot2, the expressions used to create a new graphic are
composed of higher-level elements like representations of the raw data and
statistical transformations, and can easily be combined with new datasets and
other plots.

• The data that you want to visualise and a set of aesthetic mappings
describing how variables in the data are mapped to aesthetic attributes
that you can perceive.
• Geometric objects, geoms for short, represent what you actually see on
the plot: points, lines, polygons, etc.
• Statistical transformations, stats for short, summarise data in many useful
ways. For example, binning and counting observations to create a histogram,
or summarising a 2d relationship with a linear model. Stats are optional,
but very useful.
• The scales map values in the data space to values in an aesthetic space,
whether it be colour, or size, or shape. Scales draw a legend or axes, which
provide an inverse mapping to make it possible to read the original data
values from the graph.
• A coordinate system, coord for short, describes how data coordinates are
mapped to the plane of the graphic. It also provides axes and gridlines to
make it possible to read the graph. We normally use a Cartesian coordinate
system, but a number of others are available, including polar coordinates
and map projections.
• A faceting specification describes how to break up the data into subsets
and how to display those subsets as small multiples. This is also known as
conditioning or latticing/trellising.

Also happy for a coffee and some discussions about plotting. It appears to me that ggplot2 makes just a big difference when used in its full extent.

Awesome panel here:-)



Hi Rene,

I’m not much interested in debating which approach is “better”, especially since you know little about plotting with base graphics and I know little about using ggplot. The points I would make are:

  • The two approaches (base, ggplot) are quite different and largely incompatible. It is important to know which approach you are using and don’t try to apply tweaks from one style to a plot made using the other style.
  • There is no right or best approach, they are just different. People should use whatever works for them. Mostly that will be style you have most experience with.
  • Even if ggplot is very powerful and you’re preferred style, it still helps to know how to make a nice plot in base R, as sometimes you have no other choice (e.g. you’re collaborating or using a built in function).

I know very little about ggplot; but next week I’ll have the pleasure of hearing Hadley Wickham himself talk on “the grammar of graphics” so I’ll let you know how that goes!

All the best,


Wow…I didn’t mean to debate which one is better! I didn’t even mention the word “better”. I just try to understand the differences and was hoping to get some input from other users in this forum (including you). I would love to get some feedback from Wickham’s talk. I am just keen to get some deeper understanding of ggplot2 and other stuff.



Sounds good, understanding and appreciating the strengths of different approaches is an excellent ambition. And apologies if my response came across badly.
It turns out Hadley’s talk will be available online so you’ll be able to listen in:


Awesome!!! Thank you! Here is an interesting source that compares both approaches