A Deep Learning Approach to Enhance 3D City Models

Original article was published by Chris Eijgenstein on Artificial Intelligence on Medium


A Deep Learning Approach to Enhance 3D City Models

In this post we describe a system that has been created to automatically add windows and doors to the 3D city model of Amsterdam (which can be found at: http://3d.amsterdam.nl/). Computer vision is used to extract information about the location of windows and doors, from panoramic images of the city. Since this type of street level imagery is widely available, the method can be used for a large geographical region.

The 3D city model of Amsterdam, which is in the development stage, can be used to easier communicate spatial plans to the public. Moreover, the public can be more involved in the planning decisions and purposes of the municipality. The city model consists of buildings of simplified shape; in CityGML¹ terms they are at Level Of Detail 2 (LOD2). Adding windows and doors to the buildings enables a number of new use cases, including emergency response planning, urban sustainability and urban simulations (e.g., ‘right to light’ impact on the potential construction of a new building).

3D city model of Amsterdam

The proposed system can be divided into the following three steps:

Step 1: Extract façade textures from panoramic images

The first step identifies, rectifies and extracts the texture region of a building from street level panoramic images by utilising building footprint data.

An example illustrating the idea of the first stage of the proposed pipeline; (left) a panoramic image taken from the Hartenstraat in Amsterdam. (middle) The location of the panoramic image and its relationship among the building footprint data. (right) A rectified façade texture.

Step 2: Detect windows and doors from those textures

The second step is based on detecting windows and doors from the texture region of buildings using Mask R-CNN², a deep convolutional neural network. We generated over 980 high-quality segmentation mask images for training the network.

(Left) A rectified façade texture. (Middle-left) A manually annotated segmentation mask image, the ground truth. (Middle-right) Bounding boxes, segmentation masks and the corresponding class tags. (Right) The bounding boxes indicate detected windows and doors.

Step 3: Add the detected windows and doors to the 3D model

In the third and final step, the previously detected windows and doors are aligned with the input CityGML LOD2 model to construct a CityGML LOD3 model.

Virtual street scenes generated in LOD3 using the proposed system. Visualised with Azul CityGML viewer³.

Here is a demo video of the project:

More information on this project will be available soon.