Depth Perception — Using Intel RealSense

Dharsan Kd
7 min readMar 19, 2021

An article on…

Pedestrian detection and distance estimation using intel depth sense technology — vision intelligence

-A project by Optisol data labs

Here is a quick demo video of our project:

DEPTH PERCEPTION — AN OVERVIEW:

Depth is a key prerequisite to perform multiple tasks such as perception, navigation, and planning in the world of industries. Our human eyes are capable to view the world in three dimensions, which us enables to perform a wide range of tasks.

Similarly, providing a perception of depth to machines that already have a computer vision opens a boundless range of applications in the field of robotics, industrial automation, and various autonomous systems.

So how does depth perception work?

Depth estimation sensors that are available in the market today, primarily use two technologies: Infrared (IR) light-based and Laser-based systems, both having their own compensations over the other.

Both the system estimates the depth of objects in their surroundings by emitting light on objects and measuring the time taken to receive the reflected light back from the object.

These sensors' functionality depends on various factors like their resolution, range, field of view, etc. Typically laser-based Lidars are more accurate, up to ±1 inch. But the IR light-based sensors aren’t that accurate.

Choosing the apt hardware

Targeting our application, we may choose the specification of our hardware.

A few common and essential specs would be to determine:

Range: The range of the depth perception sensor. It is the most important constituent in most cases as they determine the usability of the component.

Accuracy: It is important to understand the sensor’s accuracy as they help in identifying objects in addition to just detecting them. And also improves the overall functionality of the system.

Resolution: Resolution paired with accuracy determines the overall precision of the system.

Field of view: They are useful in computing the scope of the sensor, as a wide field of view can facilitate processing more data simultaneously but impacting the processor, on the other hand when a limited area needs to be monitored, opting for a sensor with a narrower field of view will provide a competitively lesser data to be processed and thereby having a positive impact on the processor.

Frame rate: For applications involving fast-moving objects or use-cases requiring continuous monitoring, a frame rate of up to 90 fps is supported by most sensors. Similar to the field of view, increasing the frame rate will also have a negative impact on the processor.

Processing power: Sensors that have an in-built processor are available in the market. But using an in-built processor will have its limitations by offering only a standard capability, without room for adjustments. They are very suitable for a fixed type of application, but lack usability when comes to multiple applications using the same device. At the end of the day it is the processor that does all the computations, so choosing one a little more than the required specs is advisable.

RGB camera: To project the output of objects in a human-understandable format, we also need an RGB camera aka a standard visible light camera to identify objects with both computer vision and the naked eye.

So, in conclusion, we chose to move ahead with the Intel RealSense Depth Camera D455 as we needed the longest range and the widest field of view available. Similar models that have the same interfacing options and work using the same SDK are Intel RealSense D435, Intel RealSense D435 i, and Intel RealSense D415. These variants offer a reduced range and field of view for a much affordable price.

Calibration of the sensors

These devices like any other sensors, requires calibration. To calibrate the sensors, it is important to understand, a few parameters:

Depth Field of View at Distance (Z)

Depth Field of View (Depth FOV) at any distance (Z) can be calculated using the equation:

Depth FOV = Depth Field of View

HFOV = Horizontal Field of View of Left Imager on Depth Module

B = Baseline

Z = Distance of Scene from Depth Module

Depth Field of View to Depth Map illustration

Depth start point (ground zero reference)

The depth start point or the ground zero reference can be described as the starting pointer plane where depth = 0. For intel real sense cameras, this point is referenced from the front of the camera cover glass.

Illustration of depth Camera Depth Start Point Reference

Depth camera functions

Firmware

The firmware contains the operation instructions. Upon runtime, Vision Processor D4loads the firmware and programs the component registers. If the Vision Processor D4is configured for update or recovery, the unlocked R/W region of the firmware can be changed.

Initializing & setting-up the device

For this build, we will proceed with an intel realsense depth camera. After quickly unboxing the camera, plug them into any PC, preferably running a windows operating system. Now download and install Intel’s SDKs which includes the following:

Intel provides Software Development Kits (SDK) which includes the following: Intel® RealSense™ Viewer — This application can be used to view, record, and playback depth streams, set camera configurations, and other controls.

Depth Quality Tool — This application can be used to test depth quality, including distance to plane accuracy, Z accuracy, the standard deviation of the Z accuracy, and fill rate.

Debug Tools — These command-line tools gather data and generate logs to assist in debugging the camera.

Code Examples — Examples to demonstrate the use of SDK to include D400 Series camera code snippets into applications.

Wrappers — Software wrappers supporting common programming languages and environments such as ROS, Python, MATLAB, node.js, LabVIEW, OpenCV, PCL, .NET, and more.

Programing the sensor to detect the person and estimate his depth

To begin with, we shall use any code editor that has a python 3.7 or above installed. Now following the below steps will help us achieve our target of detecting and identifying a person and measuring his/her distance from the camera.

Step 1: Import

Initially, we shall import all the necessary libraries: pyrealsense2, NumPy, cv2.

Step 2: Configure depth & color streams

Then we can start streaming image data from both the cameras (depth & RBG), along with fps and resolution specifications.

RGB / BGR image frames & Depth point-cloud frames

Step 3: Defining the point of measurement

Using intel’s real sense we can measure the distance on any given pixel, so it is important to define the pixel on which the measurement is to be taken.

Step 4: person detection

We have used MobileNetSSD as the model to detect persons as it’s lighter and most compatible.

Step 5: Extracting depth of the detected object

The points of measurements are passed on to the detected object, which will now return the depth values of that particular pixel in the output screen.

Step 6: Displaying the depth values on the detected object

The points of measurements are passed on to the detected object, which will now return the depth values of that particular pixel in the output screen.

Step 7: Displaying the depth values on the detected object

Finally, the pipeline is stopped to end the streaming.

The final output of the person detection with depth estimation project

--

--