Deep Learning with Time-of-Flight 3D Sensors

16.6.2020
June 16, 2020

Using approaches of image processing and artificial intelligence (AI), we at Data Spree implement specialized solutions for our customers by relying on ourAI platform Deep Learning DS. We support you from beginning to end, i.e. from data acquisition and annotation through the training  of AI models to the final provision of the solution on the target hardware.

The approach of Deep Learning offers advantages compared to traditional image processing methods not only due to increased accuracy, but also to the reduced development time of ready-to-use systems. Furthermore, the applications can be continuously improved over the complete life cycle of the systems and thus achieve consistent results under changed boundary conditions. In addition to the fact that the methods can be applied to many areas, the strengths of Deep Learning based image processing are particularly evident when the analysed objects are highly varied. A good example is the sorting and processing of agricultural products. These products can vary immensely in shape and colour which poses a great challenge to classical image processing methods. Additionally, different light conditions often make it difficult to create generalized solutions implying that RGB cameras only have slight advantages compared to grayscale cameras.

This is where 3D cameras, such as the Basler blaze, step in. They not only use the time-of-flight (ToF) method to generate grayscale images as intensity images, but also measure the distance to each individual pixel by measuring the travel time of light pulses in the near infrared (NIR) range. Afterwards, the resulting image can be further processed as a 2D depth image or as a 3D point cloud and provides additional information about the depicted scene. Compared to 2D RGB images, the color information is replaced by shape information which does not only have advantages in the simultaneous detection of red and green apples, but also enables additional applications, such as the precise positioning and measurement of the detected objects.

Basler blaze ToF 3D Kamera
Basler blaze ToF 3D Kamera

For highly accurate and robust applications, the strengths of Deep Learning and ToF can be combined to reliably overcome yet unsolved problems. In an exemplary application for fruit detection and classification, we demonstrate how to develop a real-time solution by means of the Basler blaze 3D camera and our Deep Learning DS AI platform without any previous programming or deep learning experience. Due to the daylight capability and IP67 protection class of the Basler blaze, this solution can also be used directly on mobile machines in harsh environments.

The workflow to create deep learning models can generally be divided into five sections:

  • Data acquisition: Acquisition of sample images
  • Annotation: Enrichment with metadata
  • Training: Optimization of the Deep Neural Network (DNN)
  • Deployment: Running network on the target hardware
  • Continuous improvement of the neural network through new data

Since these steps seem like a big challenge in the first instance, we have developed Deep Learning DS, a platform to make it as easy as possible for users to develop their own Deep Learning solution in the shortest time.

In the above example, we had to take pictures of the fruits which we wanted to recognize and later classify. For this purpose, we took about 500 images of bananas, apples, and pears with the Basler blaze camera. Our acquisition software creates 2-channel image data from the grayscale intensity image and the depth image containing the distance in millimeters for each pixel. This image data can be loaded directly into the Deep Learning DS platform.

Subsequently, these data will be enriched with metadata. For this purpose, boxes are manually drawn around the fruits and the corresponding category (apple, pear, etc.) is assigned. This determines what will be “taught” to the neural network in the following step. We can speed up this process already after approximately 100 manually annotated images by training an initial deep learning model. This generates suggestions for further images which we only have to correct.

Once all 500 images are annotated, we can create another model with just a few mouse clicks and train it automatically. Depending on the amount of data and the complexity of the task, this process takes between a few hours and a day. During the training, we cyclically evaluate the recognition accuracy on a withheld test data set to estimate the current quality of the model. When the sufficient accuracy is achieved, we run the training a bit longer to improve the robustness of the recognition.

After the training is completed, we download the fully trained model and can run it directly from our Inference DS execution software. In addition to USB, network, and standard industrial cameras, the Basler blaze ToF camera is fully integrated including pre-processing, thus the Deep Learning application can be started directly.

As with any image processing method, Deep Learning is a tool for extracting specific information from the camera image which can be used to create applications. With the additional depth information provided by the ToF camera, we can also locate the detected fruits three-dimensionally in space, e.g. to transfer the exact position to a robot in a sorting plant.

The combination of time-of-flight cameras and Deep Learning enables to solve complex tasks in a time and cost efficient manner since the training of neural networks benefits greatly from the spatial information. Furthermore, the captured 3D point cloud allows the precise positioning and measurement of objects making complementary sensor technology unnecessary for a broad range of applications.