Run Five Simultaneous Neural Networks on VOXL 2 with TensorFlow Lite

Written by Matt Turi

Similar to a sentinel guard that keeps watch at Buckingham Palace, the VOXL 2 Sentinel development drone has unprecedented perception capabilities- enabling its user with an arsenal of six embedded image sensors for maximum surveillance. The Sentinel, powered by VOXL 2, can run five concurrent neural networks with TensorFlow Lite. TensorFlow Lite is an embedded, open source program that allows developers to run pre-trained models for machine learning or computer vision applications.

VOXL 2 Unlocking a Dedicated Neural Processing Unit for Computer Vision

With a low power Neural Processing Unit (NPU) embedded in the Qualcomm QRB5165 and ModalAI’s voxl-tflite-server onboard, VOXL 2 can simultaneously run five different neural networks, at 30 frames per second, out of the box. Instead of running neural networks solely on the computer processing unit (CPU), VOXL 2 uses the built in TensorFlow Lite NNAPI to unlock parallel networks on the dedicated NPU and graphics processing unit (GPU), where it runs 30Hz of neural network data- freeing up CPU resources. That leaves VOXL 2’s powerful CPU horsepower for the rest of an autonomous robotics stack.

VOXL 2 Runs 5 Concurrent Imager Inputs Out of the Box with TensorFlow Lite

The VOXL SDK included with the VOXL 2 is optimized for advanced computer vision. To accelerate time to market, voxl-tflite-server is enabled with five pre-trained neural networks that developers can run with TensorFlow Lite out of the box. The number of use cases for image-based deep learning is growing, and with VOXL 2, developers have access to important visual data for their use cases. Attach a Hi-Res 4K30 imx214 or imx412 to VOXL 2 to unlock these five computer vision models:

Object Detection: Identify known objects in your robot’s FOV. Object detection uses localization and classification data to categorize and describe the location of objects. The models we provide for this task are optimized for onboard inference and use either the SSD (single-shot detector) or YOLO (you only look once) architecture to achieve such low latency. This is extremely useful onboard a drone, as it enables intelligent surveillance of a scene and can provide key information depending on the task. Object detection can be used to find and track objects from the air, as an aid in autonomous flight or exploration, or even for more specific use cases like warehouse/asset inspection.

Image Classification: Discern the most predominant object in your robot’s FOV. Image classification is used to classify the most important features in an image, and can provide similar information to an object detector at a much faster speed. In cases where location of the object within the image is unimportant, classification models can be used for extreme efficiency. VOXL 2 comes equipped with pre-trained image classification models with over 1000 known categories in the dataset.

Depth Estimation: Build depth maps with VOXL 2 from monocular images. VOXL 2 can infer the distance between its Hi-Res 4k30 image sensor and certain objects in its field of view (FOV). Monocular depth estimation is conducted by predicting the depth value of pixels given a singular RGB image as an input. Depth estimation is a crucial computer vision feature for autonomous drones and ground robots as it allows them to perceive their environment and navigate safely and autonomously.

Pose Estimation: Identify the orientation and position of human targets. VOXL 2 can use human pose estimation to identify points in a person’s face, body, arm, and leg, with four key points per category. Pose estimation enables developers to track a person, or multiple people, in real time and monitor or study their movements. This computer vision technique is useful in applications such as tracking human movements for animation, AR/VR, sport or dance technique analysis, or security and surveillance enhancement.

Image Segmentation: Understand what objects in your robot’s FOV consists of. Image segmentation divides portions of the images your robot detects into segments- creating a pixel-based mask of each object. By eliminating regions that don’t contain pertinent information (think of the boxes from object detection), image segmentation identifies an accurate shape of each object. Drones can use image segmentation to accurately navigate through a cluster of trees without bumping into branches.

TensorFlow Lite on VOXL 1 vs. VOXL 2

VOXL 2 unlocks lightweight, powerful computing. See how the neural networks perform on VOXL 2 compared to previous generation VOXL 1. A dedicated NPU enables VOXL 2 to process images at an extremely fast rate; enabling low latency or no lag in computer vision applications.

VOXL

Model	Task	Avg. CPU Inference (ms)	Avg. GPU Inference (ms)	Max Frames Per Second (fps)	Input Dimensions	Source
MobileNet V2-SSDlite	Object Detection	127.78ms	21.82ms	37.28560776	[1,300,300,3]	link
MobileNet V1-SSD	Object Detection	75.48ms	64.40ms	14.619883041	[1,300,300,3]	link
MobileNet V1-SSD	Classifier	56.70ms	56.85ms	16.47446458	[1,224,224,3]	link

VOXL 2

Model	Task	Avg. CPU Inference (ms)	Avg. GPU Inference (ms)	Avg. NNAPI Inference (ms)	Max Frames Per Second (fps)	Input Dimensions	Source
MobileNetV2-SSDlite	Object Detection	33.89ms	24.68ms	34.42ms	34.86750349	[1,300,300,3]	link
Efficient Net Life4	Classifier	115.30ms	24.74ms	16.42ms	48.97159647	[1,300,300,3]	link
FastDepth	Monocular Depth	37.34ms	18.00ms	37.32ms	45.45454546	[1,320,320,3]	link
DeepLab V3	Segmentation	63.03ms	26.81ms	61.77ms	32.45699448	[1,321,321,3]	link
Movenet SinglePose Lightning	Pose Estimation	24.58ms	28.49ms	24.61ms	34.98950315	[1,192,192,3]	link
YoloV5	Object Detection	88.49ms	23.37ms	83.87ms	36.536335367	[1,320320,3]	link
MobileNetV1-SSD	Object Detection	19.56ms	21.35ms	7.72ms	85.324232082	[1,300,300,3]	link
MobileNetV1	Classifier	19.66ms	6.28ms	3.98ms	125.313283208	[1,224,224,3]	link

Vision-Based Drones for Mission Critical

An autonomous drone enabled with multiple simultaneous neural networks reduces cognitive load of the pilot on mission critical flight operations. The more data a drone can process through various image outputs allows for more enhanced and safe autonomous navigation. VOXL 2 is pre-programmed to support five simultaneous neural networks out of the box with TensorFlow Lite. To learn more about TensorFlow Lite on VOXL 2, visit: https://docs.modalai.com/voxl-tflite-server/