1.2.2. Surface Analysis

Information

In this tutorial we are using visualization to explore and evaluate the spatial and temporal structure of a point data set - the photographs georeferenced through termFlickr - and in doing so are comparing these with data about the underlying population. Here we will consider how we can transform the 1.7 million photo records into a raster surface for analysis to contribute to this process.

A common way of transforming such data is to count the number of point values within a fixed area on the ground to generate a density measure. If this is done for each cell in a raster, a density surface can be generated for the area covering the distribution of point values. However, simply counting the number of points per cell is likely to produce a very 'spiky' surface with plenty of cells with densities of 0 (where no points are found), and a few with high densities (where many points are found within a single cell). GI Science provides plenty of alternative ways of calculating smoother density surfaces that attempt to spread density measures around those points that contribute to it. Spreading is not just for visual convenience as it can also be used to reflect the inherent spatial uncertainty of the point locations in a dataset.

The method used by termLandSerf and explored here is so-called "Cressman analysis" (Cooper 1992). There are more sophisticated interpolation methods, but this approach is relatively simple to understand and is used as the basis for several population density estimation algorithms : e.g. (Wood et al. 1999); (Atkinson et al. 1999); (Martin 2003). A kernel of a fixed size is passed over the area to estimate density. The number of points falling within every cell in the kernel is counted, and the weighted average of cell counts is assigned to the central cell in that window. The weights are calculated according to :

The effect of this spreading function is shown below for three different window sizes.

Left: Cressman spreading functions for three window sizes. Right: Four point values in a 9x9 window with different                                             weights according to their distance from the centre of the window. Note that point 3 is outside the radius (4.5),                                             so is excluded from the density calculation for the central cell.Left: Cressman spreading functions for three window sizes. Right: Four point values in a 9x9 window with different weights according to their distance from the centre of the window. Note that point 3 is outside the radius (4.5), so is excluded from the density calculation for the central cell.

Constructing a density surface of a phenomenon that is in some way related to the location of people allows us to perform an important statistical comparison - that between the spatial distribution of the phenomenon of interest and the underlying population. It allows us to say whether or not the numbers of observations found at a given location are more or less than would be expected given the underlying distribution of people. This can be achieved by calculating the termchi-statistic :

The results of mapping the chi-statistic for every cell in a raster will be a range of values that are negative when there are fewer than expected observations and positive when more than expected. This form of distribution is best mapped using a diverging colour scheme that contrasts positive and negative deviations away from expected values (for example the termColorBrewer diverging schemes (Brewer 2002)).

Example

The figure below shows a Flickr photo density map for the continental United States. This represents the density of the 1.7 million fully georeferenced photos calculated per pixel. In this example, each pixel measures 15 arc-seconds square, which is roughly equivalent to an area of 5km x 5km on the ground. Density varies from 0 to over 7000 photos per pixel, although most of the spatial variation is in the range of 0-20 photos per pixel. The colour table uses Brewer's 'Orange-Red' colour scheme, but scaled exponentially to account for the non-linear variation in density. Additionally, the scheme has been reversed (red representing the lower densities) and black added to indicate no photos. This helps to differentiate areas of low photo density from those outside of the study region.

Photo density calculated using a Cressman density function with kernel set to 5x5 at a 15 arc-second resolutionPhoto density calculated using a Cressman density function with kernel set to 5x5 at a 15 arc-second resolution

While there are some interesting patterns revealed by the map of photo density, it largely reflects the density of population across the United States, so we can use LandSerf to calculate the chi expectation statistic taking local population into account. The termLandScript code for producing a chi surface in termLandSerf is shown below. The observed values (flickrDensity) are represented by a raster containing the density of termFlickr photos across the United States. It was calculated using Cressman weighting of a 5x5 kernel. The expected values (population) contain the 2005 population density over the same area - we considered population density in raster form in the previous exercise.

flickrChi.lsc

# Script to calculate the chi expectation surface for Flickr photo locations.
# Expected values based on 2005 gridded population data.
version(1.0);

# Change the directory below match your data directory.
basedir = "/Users/jwo/tutorial/data/";

# Open the two surfaces to compare and calculate proportion in each cell.
population = open(basedir&"continentalUSPopulation.srf");
popTotal = info(population,"sum");
pPop = new(population);
pPop = population/popTotal;

flickrDensity = open(basedir&"flickrDensity.srf");
flickrTotal = info(flickrDensity,"sum");
pFlickr = new(flickrDensity);
pFlickr = flickrDensity/flickrTotal;

# Calculate a new surface of chi-statistics
chi = new(flickrDensity);
edit(chi,"title","Chi expectation surface");
chi = ifelse(population >0.01,(pFlickr-pPop)/sqrt(pPop), null());


# Give new surface a diverging colour scheme that emphasises the
# non-outlier differences.
chiMin = info(chi,"min")/5;
chiMax = -chiMin;
colouredit(chi,"diverging1",chiMin&" "&chiMax);

# Save expectation surface as a LandSerf raster and KML ground overlay.
save(chi,basedir&"flickrChi.srf");
save(chi,basedir&"flickrChi.kmz","kmz");

We would expect there to be a greater density of photos in areas where there are more people to take them (e.g. in cities), but the flickrChi output of the script above is a surface that tells us if we have a greater or lesser number of photos in any given area than we would expect given the local population density. This provides us with greater insight than simply examining the density of photos alone. The result of applying the script to our data is shown below.

Chi expectation surface produced by flickrChi.lsc at a 15 arc-second resolution. Compares Flickr local photo density with local population density.Chi expectation surface produced by flickrChi.lsc at a 15 arc-second resolution. Compares Flickr local photo density with local population density.

The chi expectation statistic allows us to compare any pair of observed and expected values over space. As a further example, consider the spatial pattern of photos uploaded to Flickr using the Apple iPhone (Apple Inc. 2008). Using the iPhone's built-in GPS, users can upload photos taken with the camera to Flickr with automatic geocoding (Airme Inc. 2008). We can compare the distribution of these iPhone photos (13,636 uploaded in the first month after the launch of the iPhone 3G) using all other photos on Flickr rather than population density to represent the 'expected' distribution. The resulting chi expectation surface therefore shows areas where there is a higher or lower proportion of iPhone photos on Flickr than the US average of about 1 in 100).

Chi expectation surface comparing iPhone photos with all other georeferenced photos in FlickrChi expectation surface comparing iPhone photos with all other georeferenced photos in Flickr

Exercise

  1. Start termLandSerf and load the files ContinentalUSPopulation.srf, flickrDensity.srf, flickrChi.srf (created by the termLandScript above) and iPhoneChi.srf.
    Optional: If you would rather, you can use termLandScript to generate the chi surfaces directly :
    • start the termLandScript Editor by selecting Edit->LandScript editor from the LandSerf menu
    • copy and past the code provided in the example shown above
    • change the basedir to the folder on your computer in which you saved the termLandSerf surface data
    • Use Run->Run (or click the 'run' button) in the termLandScript editor to run the script - this may take a few minutes to complete
    • To create the iPhoneChi.srf, open iPhoneChi.lsc in the LandScript editor. Before running the script make sure basedir is pointing to the correct directory on your machine.
  2. Using the zooming and panning discussed in the previous exercise, explore the flickrChi.srf try to identify locations where significantly more and significantly fewer photos then expected have been taken.
    • What are the possible causes of the patterns you have noted?
    • Are there any erroneous values in the distribution? What might be the cause of any errors you have identified?
  3. Perform similar exploration of iPhoneChi.srf (remember this surface shows where photos uploaded with an iPhone 3G are more or less dense than expected).
    • How typical are iPhone photo locations of all those in the Flickr database?
    • Have any of the errors identified in the previous task been eliminated?
    • The lowest chi value in this surface is a point just south of the northern boundary of Kansas (an isolated dark blue circle). What does this value represent, and why do you think its chi value is so low? (Further clues are revealed when we create a Google Earth mashup in the next section.)