## Interpolation between a start and end position

In these two examples, we are dealing with scatterplots for pairs of continuous variables, and unlike the line charts we have looked at, we do not have a data point for each frame of the video, so drawing the graphs requires us to interpolate between the points we do have. There are a variety of ways of interpolating, but I am only considering linear formulas here, because I want to keep the parts that are not directly relevant to animation simple. By linear interpolation, I mean that we have a position at time t and a position at time (t+1) and between these we want to have m frames, so we divide the distance from t to (t+1) in the x and y directions by m and move the dots in the graph by that amount for each frame.

In the first example, we only have a start point and end point, so the entire video is interpolation. You can download the artificial dataset here. There are variables x0 and y0 which contain the starting co-ordinates, and x1 and y1 which contain the stopping co-ordinates.

## Interpolation between succesive measurements in time series

This next example is only slightly more complicated by having 62 time points at which measurements are taken, while we want to produce an animated graph with hundreds of frames. So, we will have to interpolate in each interval between time points. In theory, using the linear interpolation could make the motion appear jerky, but most of the time the viewer's brain works wonders in interpreting what it sees, and more complex smooth interpolation would probably only be necessary in really bad cases where the speed and direction of movement change suddenly throughout the video.

The real challenge in this example is dealing with additional variables: we want to draw a scatterplot of GDP (on a log scale) against life expectancy, and include continent as a colour and population as the size of the markers - this is called a bubble plot. In fact, this is the "200 Countries, 200 Years" animation. In R, the plotting of bubble plots is very straightforward because you can supply numbers from the continent and population variables to control colour and size (col and cex parameters respectively) directly. In Stata, you have to be a bit more cunning, and specify a collection of superimposed scatterplots, one for each colour. However, this introduces a problem in that the usual way of controlling marker size (by weights) is relative within each plot, and the proportionality does not hold across the entire graph (there has been more of a discussion about this on StataList). The best way I have of getting round this - and it is interesting because it is a useful fall-back if you cannot easily draw a smooth gradation of any attribute in your graph - is to categorise your populations into 6 or so groups and make 4x6=24 superimposed scatterplots, fixing the marker size and colour within each. It is not graphically ideal, and it is slow, but it conveys the message just as clearly.

You can download the data from Gapminder but I have cut out any pre-1950 measurements and simplified the files here for GDP, population and life expectancy. You can download the Stata do-file here and the R code here. The example shown on YouTube is from R, but the Stata one looks very similar.