Source: Deep Learning on Medium
Our team finished in 12th place at AlphaPilot, the Lockheed Martin AI Drone Racing Innovation Challenge. The challenge tested our ability of implementing a drone racing gate detection mechanism, as well as designing an autonomous drone control system.
Lockheed Martin is a global security and aerospace company that strives to change the future of flight by developing modern AI-enabled autonomy. They have now teamed up with the Drone Racing League (DRL), a professional drone racing circuit for elite FPV pilots, to launch the AlphaPilot innovation challenge. Designed by DRL, the autonomous drone powered by the NVIDIA Xavier GPU will be used for the new Artificial Intelligence Robotic Racing (AIRR) Circuit, a four-event season starting in fall 2019.
(AIRR) Circuit, a four-event season starting in fall 2019.
The AlphaPilot challenge qualifier launched on the HeroX platform, testing 424 teams of up to 10 members from 81 countries, out of which 9 finalists earned the opportunity to compete in autonomous flight racing.
The challenge consisted of three different tasks:
- Answering a series of questions about the team’s strategic approach in this challenge. The participants also needed to make a short video, showcasing the experience, skills and strengths among the team members.
- A computer vision problem: developing an object detection algorithm that is capable of detecting the exact coordinates of the corners on a racing gate in images. Apart from having to be accurate, the solution also needed to be very fast, in order to score high.
- Designing an autonomous drone control system. This final, most challenging task tested each team’s ability to design a system for guidance, navigation and control of an autonomous drone, with a goal of setting the fastest laps.
The finalists will need to design an AI framework capable of flying a drone through race courses, using computer vision, navigation and control algorithms. Apart from having no prior knowledge of the course layouts, GPS, or human intervention of any kind, the teams should automate the drone to be as fast as possible on the circuit. The winning team will take the prize of $1,000,000, and the first AlphaPilot team that beats a human-piloted drone in a race will win extra $250,000
Our team for this qualifier consisted of six Netcetera employees, each of us having a unique skill set, contributing using our biggest strengths.
In this challenge, we were faced with a problem of image localization: the image contains multiple objects (gates) and our system needs to predict the locations of the objects in the image. As a starting point, we are given a training set consisting of around 9000 images, each one labeled with coordinates of the corners that we need to detect.
This is not a classic bounding box regression problem, since the gates are shot from many different perspectives, resulting in coordinates that do not form a square. Besides the accuracy of the algorithm, the score also depends on the computation time needed from the algorithm to estimate the test images. We were faced with a trade-off problem: when we improved the accuracy, the speed score got worse, and vice versa. Our team had to find a sweet spot between accuracy and performance.
First, we approached the detection problem in a way that each of the corners represents a separate object class, since the ordinary bounding box detection would greatly reduce our accuracy on all of the images which were not shot directly straight. We used a strictly deep learning approach, a neural network that is based on the YOLO architecture.
We start with a pre-trained DarkNet model, which is a 53 layer network trained on the ImageNet dataset. For the task of detection, 53 more layers are stacked on top, resulting in a 106 fully convolutional underlying architecture for YOLO. The detection is done by applying 1×1 detection kernels on feature maps of three different sizes at three different places in the network, which are precisely given by downsampling the dimensions of the input image by 32, 16 and 8 respectively. These detection at different layers address the issue of detecting small objects. The gate itself is not small, but the corner areas that we are trying to predict are in fact pretty small. The upsampled layers preserve the features which help in detecting these corner areas. Finally, for each of the visible corners of the image, a class score is predicted by applying softmax on these scores and choosing the class with the maximum score.
The final implementation consists of two models, YoloV3 and TinyYolo. The reason for using both models is to combine the power of YoloV3 and the speed of TinyYolo. If the objects are not detected using the fast TinyYolo, YoloV3 will do the job instead. This approach utilizes both speed and accuracy, having only a small speed drawback when TinyYolo does not detect the coordinates. After detecting the corners, there are additional corrections when only 1, 2 or 3 corners are detected. The positions of the missing corners are predicted by using the gate widths and detected corners’ size and positions.
On the runtime side, we used TensorFlow with Keras to allow GPU usage thus speeding up the detection.
Task 3: Autonomous Drone Control
This final task required a design and implementation of a real-time, vision-based navigation system for simulated drones. We developed a GNC (guidance, navigation and control) algorithm for flying a simulated drone in a virtual reality environment using on-board sensor feedback and ACRO/RATE flight mode for control inputs. The test environment is a structured circuit, however the gates are perturbed between different test-cases and the algorithm needs to be able to deal with this issue.
The racer drone itself is equipped sensors that provide a couple of data streams that include:
- 60Hz camera feed
- Gate detection algorithms
- IR marker locations of the gates
- Noisy IMU data (accelerometer and gyroscope)
- Noisy downward-facing laser (rangefinder)
In summary, given a visual 2D input of the space that the drone is flying in and the noisy input from the IMU/laser sensors, we need to find the fastest way around the circuit while passing through the gates without any collision with the surrounding environment. The system should utilize the data received from 3 sensors (accelerometer, gyroscope and an IR sensor) to autonomously navigate through the detected gates, starting from an arbitrary initial position and orientation.
We apply the divide & conquer approach in order to split the given problem into multiple sub-problems, such as: drone stabilization on each axis, drone height control and flight path calculation.
Drone stabilization: in order to eliminate the noisy data from the sensor feeds, we first apply a low-pass filter. This filter impedes all frequencies above 15Hz, which reduces the errors we receive during the integration of the data from the gyroscope and the accelerometer, and it allows us to accurately calculate the current orientation, velocity and the position of the racing drone. Each of these parameters are very important because they are used to calculate and control the flight path, and they need to be appropriately set in every moment of the flight.
Drone height control: using the on-board rangefinder, we approximate the distance between the floor and the drone, which is afterwards used to correct the noisy calculated drone position that is retrieved from the accelerometer. In order to perform the correction, we first check whether the difference between the global z position and the value obtained from the rangefinder is above a predefined threshold. If this requirement is met, it means that there is an accumulated error on the velocity of the z-axis of the drone, resulting in gradually increasing/decreasing the calculated velocity to achieve the desired height.
Flight path calculation: the drone operates in two modes: search and focus.
The search mode occurs when the drone does not have the gate in the model’s viewport, so it orientates towards the approximate position of the gate using the ground-truth data that’s provided at the beginning, and it starts searching until the gate is detected.
The system starts operating in the focus mode when the gate is detected (meaning 2 or more markers are present in the field of vision). In this mode, we first calculate the x-coordinate of the center of the gate, and start moving towards it. The y-coordinate is not needed, as we regulate the height using the sensor. The movement itself is performed by rotation on the z-axis using an angle, which is calculated using the distance between the center of the gate and the center of the screen. Meanwhile, thrust is applied on the y-axis while trying to maintain the desired height. Since this movement creates a drag force, we try to negate it by rotating on a predetermined angle on the x-axis.
Despite not qualifying for the finals, we finished in 12th place on the final leaderboard (out of 420 teams).
Future research and a longer timeframe would be based around applying machine learning to try and estimate the errors/noise on the sensors, better algorithm to enforce or reduce drag when it’s needed in order to get a better pace around the track, and implementing a different approach based on reinforcement learning.
Thanks for reading!
Alex, Panche, Vojche, Ivan, Kjanko, Borijan