Following my map of London’s green and blue infrastructure, I have been working on some analysis of the land uses.
I was inspired and encouraged to try this by Liliana’s interesting work called “imagining all of Southwark“. Lili and Ari have managed to get the council to release lots of data on properties and car parking, and they are producing analysis of this data by postal code area and by street. They haven’t managed to get anything on land uses, so I thought, why not produce this with OpenStreetMap data?
A few evenings later, here is the result shared on Google docs (direct link) covering the eight postal code areas that between them cover most of the borough (SE1, SE5, SE15, SE16, SE17, SE21, SE22, SE24):
What the data means
The “summary” worksheet shows the total land area, expressed in hectares (10,000 m2), for various different types of land coverage. I have also calculated the percentage of that postal code area that the land uses represent, which gives an interesting insight into the differences between the areas.
Some of the land uses will overlap, for example miscellaneous bits of green space are often mapped on top of residential areas. So the numbers aren’t supposed to add up to anything like 100%.
The spreadsheet also contains worksheets for each postal code area. These contain a dump of all the objects in OpenStreetMap in those postal code areas, and this is the raw data the summary spreadsheet uses to get the totals.
Flaws in the data
You should use this data with a large spoonful of salt. Here are the significant flaws I have noticed:
Postal code areas are approximate, for example the boundary between SE15 and SE22 should mark the boundary between Peckham Rye Common (SE15) and Peckham Rye Park (SE22). In my data both the park and the common show up in both of the postal codes, because the boundary isn’t quite right. Read down to my method to see why. The errors introduced are pretty tiny in most places (plus or minus a few meters along the full boundary), and probably cancel themselves out for big land uses like residential, but they probably also introduce some significant errors for parks where the boundaries go awry by 20-30m in places. Sadly there aren’t any accurate open data polygons I can use.
Data is missing because OpenStreetMap contributors haven’t mapped it. Of course the easy solution here is to get more of it mapped and up to date! My estimate of the different types is as follows:
- Allotments: complete for the whole borough.
- Parks and commons: all major and district parks complete.
- Misc green spaces: very poor coverage of, for example, large areas of grass on estates, especially in SE5, the north pat of SE15 and SE17.
- Woods/forest: all major woods complete, coverage of big clumps of trees e.g. on a housing estate or in a park is very uneven.
- Residential: complete except for SE16.
- Industrial, retail, commercial: large areas are complete, but small shopping parades, industrial parks and rows of offices are very patchy.
- Brownfield/construction: patchy across the borough and sometimes out of date as sites are built on.
Data is also sometimes missing because of flaws in the Geofabrik shapefiles, not all of which I have corrected. For example, I noticed they were missing commons so I manually added those in, but I may have missed other land uses. One major omission, a shame given the interest in them, is the humble sports pitch/playing field.
How I produced this
After a lot of experimentation – I’ve never been trained to use GIS tools – I worked out this method. If you know of an easier way I’d love to hear about it.
- Prepare the boundary data:
- Extract a polygon for the London Borough of Southwark from the OS Boundary-Line data.
- Download the OS Code-Point-Open data, open the spreadsheet for the SE area in QGIS and use the ftools ‘Voronoi polygons’ plugin to infer polygons for the postal codes from the centroids. Post code centroids are very dense in the middle of residential areas, so the boundary between SE15 4HR and SE22 9BD is only going to be out by a few meters, but are quite far apart with large parks and commons, so the inferred boundaries get less accurate in those areas. See this map for an illustration of the Peckham Rye Park / Common problem mentioned above.
- Merge together postal codes into the areas (e.g. SE22 9QF, SE22 4DU etc. into SE22) by quering the shapefile for all objects with postal codes starting with SE22, then using the mmqgis merge tool to merge them into single polygons. Clean up the attributes so the shapefile just has one attribute for the correct postal code area.
- Clip the postal codes by the Southwark polygon and save the result – finally – as the postal codes shapefile for Southwark.
- Prepare the land use data:
- Download the OpenStreetMap shapefiles from Geofabrik for Greater London.
- Download common and marsh ways/relations using the Overpass API (with the meta flag on), import the data into QGIS using the OpenStreetMap plugin, and save the data as a Shapefile.
- Merge together the Geofabrik natural and landuse shapefiles with my Overpass-derived shapefile into one land use shape file using the mmqgis plugin.
- Clip the land use file by the Southwark polygon and save the result – finally – as the land uses shapefile for Southwark.
- Produce the postal code stats; for each postal code:
- Select the postal code, and clip the land use layer to that selected code, saving it as a new shapefile.
- Open that shapefile, then save it in a new projection that will be in meters rather than degrees (I used EPSG:32631 – WGS 84 / UTM zone 31N).
- Open the new shapefile, then run the ftools ‘Export/add geometry columns’ tool (in Vector/Geometry Tools) to add two attributes to the objects for the area and perimeter.
- Save the layer again as a CSV file.
- Produce the stats for the area of each postal code so we can calculate % of the area as well as ha for each land use:
- Save the Southwark postal codes polygon in the meters projection, add the geometry columns, and save as a CSV file.
- Collate all the data
- Tidy up and copy the data from each CSV file into a spreadsheet, then add in the formulae to tot everything up. You’re done!
For reference, some of the totals in the summary work off more than one land use type so here are the categories and the corresponding OpenStreetMap tags:
- Allotments – landuse=allotments
- Parks and commons – leisure=park / leisure=common
- Misc green spaces – landuse=conservation / landuse=farm / leisure=garden / landuse=grass / landuse=greenfield / landuse=greenspace / landuse=meadow / landuse=orchard / landuse=recreation_ground
- Woods and forest – landuse=forest / natural=wood
- Residential, industrial, retail, commercial, brownfield, construction – corresponding landuse tags
One obvious improvement would be to get more data in. Perhaps this first analysis will encourage people to help out with that? I have also emailed Geofabrik about the flaws I have discovered in their shapefiles, so I hope those get fixed.
Another thought is to produce the stats by council ward. But given that there are far more wards, I’d like to find a quicker way of producing the stats for each ward (step three above) first.
It would also be interesting to do it by town/suburb, for example comparing Peckham to East Dulwich. But we don’t have any meaningful boundaries for those natural areas. It would be really interesting to do a mass version of “this isn’t fucking Dalston” for a whole borough, using the Voronoi polygons method to infer areas from surveys at thousands of locations around the borough. One day…