Run Five Simultaneous Neural Networks on VOXL 2 with TensorFlow Lite
Written by Matt Turi
Similar to a sentinel guard that keeps watch at Buckingham Palace, the VOXL 2 Sentinel development drone has unprecedented perception capabilities- enabling its user with an arsenal of six embedded image sensors for maximum surveillance. The Sentinel, powered by VOXL 2, can run five concurrent neural networks with TensorFlow Lite. TensorFlow Lite is an embedded, open source program that allows developers to run pre-trained models for machine learning or computer vision applications.
VOXL 2 Unlocking a Dedicated Neural Processing Unit for Computer Vision
With a low power Neural Processing Unit (NPU) embedded in the Qualcomm QRB5165 and ModalAI’s voxl-tflite-server onboard, VOXL 2 can simultaneously run five different neural networks, at 30 frames per second, out of the box. Instead of running neural networks solely on the computer processing unit (CPU), VOXL 2 uses the built in TensorFlow Lite NNAPI to unlock parallel networks on the dedicated NPU and graphics processing unit (GPU), where it runs 30Hz of neural network data- freeing up CPU resources. That leaves VOXL 2’s powerful CPU horsepower for the rest of an autonomous robotics stack.
VOXL 2 Runs 5 Concurrent Imager Inputs Out of the Box with TensorFlow Lite
The VOXL SDK included with the VOXL 2 is optimized for advanced computer vision. To accelerate time to market, voxl-tflite-server is enabled with five pre-trained neural networks that developers can run with TensorFlow Lite out of the box. The number of use cases for image-based deep learning is growing, and with VOXL 2, developers have access to important visual data for their use cases. Attach a Hi-Res 4K30 imx214 or imx412 to VOXL 2 to unlock these five computer vision models:
Object Detection: Identify known objects in your robot’s FOV. Object detection uses localization and classification data to categorize and describe the location of objects. The models we provide for this task are optimized for onboard inference and use either the SSD (single-shot detector) or YOLO (you only look once) architecture to achieve such low latency. This is extremely useful onboard a drone, as it enables intelligent surveillance of a scene and can provide key information depending on the task. Object detection can be used to find and track objects from the air, as an aid in autonomous flight or exploration, or even for more specific use cases like warehouse/asset inspection.
Image Classification: Discern the most predominant object in your robot’s FOV. Image classification is used to classify the most important features in an image, and can provide similar information to an object detector at a much faster speed. In cases where location of the object within the image is unimportant, classification models can be used for extreme efficiency. VOXL 2 comes equipped with pre-trained image classification models with over 1000 known categories in the dataset.
Depth Estimation: Build depth maps with VOXL 2 from monocular images. VOXL 2 can infer the distance between its Hi-Res 4k30 image sensor and certain objects in its field of view (FOV). Monocular depth estimation is conducted by predicting the depth value of pixels given a singular RGB image as an input. Depth estimation is a crucial computer vision feature for autonomous drones and ground robots as it allows them to perceive their environment and navigate safely and autonomously.
Pose Estimation: Identify the orientation and position of human targets. VOXL 2 can use human pose estimation to identify points in a person’s face, body, arm, and leg, with four key points per category. Pose estimation enables developers to track a person, or multiple people, in real time and monitor or study their movements. This computer vision technique is useful in applications such as tracking human movements for animation, AR/VR, sport or dance technique analysis, or security and surveillance enhancement.
Image Segmentation: Understand what objects in your robot’s FOV consists of. Image segmentation divides portions of the images your robot detects into segments- creating a pixel-based mask of each object. By eliminating regions that don’t contain pertinent information (think of the boxes from object detection), image segmentation identifies an accurate shape of each object. Drones can use image segmentation to accurately navigate through a cluster of trees without bumping into branches.
TensorFlow Lite on VOXL 1 vs. VOXL 2
VOXL 2 unlocks lightweight, powerful computing. See how the neural networks perform on VOXL 2 compared to previous generation VOXL 1. A dedicated NPU enables VOXL 2 to process images at an extremely fast rate; enabling low latency or no lag in computer vision applications.
VOXL
Model | Task | Avg. CPU Inference (ms) | Avg. GPU Inference (ms) | Max Frames Per Second (fps) | Input Dimensions | Source |
MobileNet V2-SSDlite | Object Detection | 127.78ms | 21.82ms | 37.28560776 | [1,300,300,3] | link |
MobileNet V1-SSD | Object Detection | 75.48ms | 64.40ms | 14.619883041 | [1,300,300,3] | link |
MobileNet V1-SSD | Classifier | 56.70ms | 56.85ms | 16.47446458 | [1,224,224,3] | link |
VOXL 2
Model | Task | Avg. CPU Inference (ms) | Avg. GPU Inference (ms) | Avg. NNAPI Inference (ms) | Max Frames Per Second (fps) | Input Dimensions | Source |
MobileNetV2-SSDlite |
Object Detection | 33.89ms | 24.68ms | 34.42ms | 34.86750349 | [1,300,300,3] | link |
Efficient Net Life4 | Classifier | 115.30ms | 24.74ms | 16.42ms | 48.97159647 | [1,300,300,3] | link |
FastDepth | Monocular Depth | 37.34ms | 18.00ms | 37.32ms | 45.45454546 | [1,320,320,3] | link |
DeepLab V3 | Segmentation | 63.03ms | 26.81ms | 61.77ms | 32.45699448 | [1,321,321,3] | link |
Movenet SinglePose Lightning | Pose Estimation | 24.58ms | 28.49ms | 24.61ms | 34.98950315 | [1,192,192,3] | link |
YoloV5 | Object Detection | 88.49ms | 23.37ms | 83.87ms | 36.536335367 | [1,320320,3] | link |
MobileNetV1-SSD | Object Detection | 19.56ms | 21.35ms | 7.72ms | 85.324232082 | [1,300,300,3] | link |
MobileNetV1 | Classifier | 19.66ms | 6.28ms | 3.98ms | 125.313283208 | [1,224,224,3] | link |
Vision-Based Drones for Mission Critical
An autonomous drone enabled with multiple simultaneous neural networks reduces cognitive load of the pilot on mission critical flight operations. The more data a drone can process through various image outputs allows for more enhanced and safe autonomous navigation. VOXL 2 is pre-programmed to support five simultaneous neural networks out of the box with TensorFlow Lite. To learn more about TensorFlow Lite on VOXL 2, visit: https://docs.modalai.com/voxl-tflite-server/