For the last few months I’ve been attempting to map every bit of “grey belt” land within London’s green belt. As there’s been some uncertainty over exactly what qualifies as grey belt, I’ve adopted a cautious definition, including areas which appear to have been previously developed, including car parks and yards, earthworks and hard standing.
This builds on work I’ve previously undertaken which attempts to use AI to locate small site development opportunities in London’ssuburban areas. You can read more about this here.
I used our existing learning model (based on the InceptionV3 convoluted neural network) to assess an area of green belt using higher-resolution aerial photography provided to me by aerial mapping specialist Bluesky with the purpose of categorising the types of land cover. Our learning model was trained on aerial photography / satellite imagery from Google.
The following describes the methodology followed and some of the initial findings.
The area of sample data provided by Bluesky is contained within the white rectangle shown within the following image. The pink masked area is not within metropolitan green belt; the remainder is. The total area of the sample data provided by Bluesky covers an area of approximately 390ha, with just over three quarters of this within the metropolitan green belt.
Within this area I created a grid of points at 25m centres (i.e. a 625sqm tile). The image to the right shows the distribution of these within the examination area.
Using our existing learning model (based on Google satellite data), I assessed the ground conditions at each data point. The image on the right shows how the AI learning model has categorised the landscape features at each point. The pattern of ground cover can be seen here, with dark green representing woodland and tree cover, blue corresponding with areas of water (although there are some limitations to this, described below).
One of the limitations of this is that using the higher-resolution Bluesky photography some tiles contain multiple “types” and the AI struggles to distinguish between, say, a domestic garden and a residential house. Because of the limited resolution of the Google imagery this was not a significant issue, but at higher resolutions the data points can miss key features, such as entire houses, classifying the gardens to the front and rear, but not the structures themselves.
To address this, I introduced a data point grid with double the resolution, with points at 12.5m intervals. The image here shows the distribution of these on the same map. Given that the 25m grid across the entirety of London’s metropolitan green belt results in more than 8m data points, adopting a 12.5m grid will result in four times this amount.
Running the AI again using data with the Bluesky imagery results in this image, which includes the non-green belt area in the south west of the map, (which was included in error, but demonstrates the efficacy of the process). The patterns of land cover are clearly visible at this scale.
The image below shows a small area of the sample data. Moving the slider across the page reveals the AI predictions for each point at 12.5m intervals.
Golf courses are generally identified correctly, likely due to the presence of sand bunkers and the curved shapes of the fairways visible in the image, although there are some odd predictions for “water” which need further investigation.
Water is being identified correctly, as this lake within Aldwickbury Park Golf Club shows. There are some anomalous categorisations occurring around the periphery of the lake, likely due to surface markings fooling the AI into thinking these are sports pitches, as the pale line around the edge resembles the white lines of a football pitch.
As with the golf course, crops and woodland which appear as dark areas of consistent colour are incorrectly being identified as water. An improved learning model will be needed to account for this, as with this resolution these types should be distinguishable from bodies of water.
Car parks appear to be identified correctly, although due to the resolution of the image some areas within the car parking areas are being incorrectly identified. In some cases the markings on roads, or areas where no cars are parked, are fooling the AI into thinking that these are hard courts. The presence of parked cars is generally necessary for the AI to correctly identify land used for this purpose.
Domestic gardens need to be included within the “allotments and garden centres” type. Previously, most gardens fell within the “buildings – domestic” type as the 25m tile would generally include both. At 12.5m, the two are usually found in separate image tiles and therefore correctly identified.
For an unknown reason, some crops appear to be categorised as golf courses. Unclear as to why. Further investigation required. It is likely that adding these areas to the test and training data will teach the AI to distinguish between these with a greater degree of accuracy.
Fallow fields appear to be correctly identified, although the sample data from Bluesky does not include and “earthwork” sites so the accuracy of this prediction cannot be assessed. Note some anomalous identifications as “water” to the right of this image. Futher training of the learning model using a larger sample of the Bluesky data should assist in correcting these.
This exploratory exercise has demonstrated that higher-resolution mapping data, with imagery taken within a narrower seasonal timeframe, results in more accurate predictions, although to take advantage of this, a significant increase in data points will be required. The length of time required to generate the imagery and process the data will be significant and demand an increase in processing power.
All aerial imagery is copyright Bluesky International Limited, all rights reserved.