In this challenge, we were tasked with finding automated methods for extracting map-ready road networks from high-resolution satellite imagery. Moving towards more accurate fully automated extraction of road networks will help bring innovation to computer vision methodologies applied to high-resolution satellite imagery, and ultimately help create better maps where they are needed most. The goal is to extract navigable road networks that represent roads from satellite images. The linestrings the algorithm returns is compared to ground truth data, and the quality of the solution is judged by the Average Path Length Similarity (APLS) metric.
- The official problem statement: https://community.topcoder.com/longcontest/?module=ViewProblemStatement&rd=17036&pm=14735.
- The challenge website: https://www.topcoder.com/spacenet.
- How the scoring function works: https://medium.com/the-downlinq/spacenet-road-detection-and-routing-challenge-part-i-d4f59d55bfce.
- Data: https://aws.amazon.com/public-datasets/spacenet/
We solved the road detection problem in SpacenNet challenge as a semantic segmentation task in computer vision. Our model is based on a variant of fully convolutional neural network, U-Net [https://arxiv.org/abs/1505.04597]. U-Net is one of the most successful and popular convolutional neural network architecture for image segmentation.
First, we created a function which would draw the ground truth LineString objects into a binary mask, with different width settings available for the road.
Here are the results of skeletonizing a 10 pixel wide mask.
Assuming we could generate single lines roads (possible with the skeletonizing function), we would follow a line of white pixels until that line ended, writing each point into the linestring. At an intersection, other paths would be saved to explore later. For our test example, we were able to perfectly replicate the image, using 16 linestrings but 7000 points. We further improved this solution to reduce the number of points to 3500 by tracking our current direction, and only adding a point when the direction changed.
Here is the mask generated when we draw the linestrings extracted from the skeletonized 10 pixel image.