• Ei tuloksia

There exist several researches that have devolved into movement detection models and its application to Kinect environments. Some of the research like Chin et al (2015), focuses on optimum desistance for Kinect model detection. The researchers focused on accuracy and reliability of Kinect cameras and sensors. Apart from giving insights on the quality of the pictures. Calculations were conducted to analyze abso-lute error percentages at varying distance.

The researchers studied the Kinect camera as a research based camera. There exists little research that illustrate the reliability and accuracy of the Kinect camera as a

The Kinect camera hardware components are as shown below.

Figure 32- Chin et al (2013) Three Kinect sensors, IR light, RGB camera, IR detector

According to the product specifications. The Kinect sensor has a dual depth range, default and near range. At the two ranges the depth sensor returns 3-D images with x,y and z co-ordinates. There exists a blind spot on at approximately 0-0.8m. At the spot, the camera is not able to return accurate depth range data. This data can’t also be generated on any spot after 4m. Near range blind spot exists at 0-0.4m. At this spot the camera cannot generate raw depth data after 3m.

The distance analysis is as seen below.

Figure 33- Chin et al (2013) Depth sensor distance

C programming language is used to program the SDK of the Kinect sensor. The de-veloper kit gives access to the source code and other technical resources like Kinect studio. All these tools enable easier development of applications.

The sensor can calculate distance on a straight line between the sensor and the object.

The distance is obtained by a perpendicular line drawn by the sensor. When an image frame is captured, the Kinect sensor returns max and min depth ranges in mm.

The diagram below shows the 16-bit raw depth frame returned by the Kinect sensor;

Figure 34- Chin et al (2013) Depth frame bit pixel

Technically all the bits have specific functions. For instance, the first three bits are used as players identifiers whole the following 13 gives the distance in mm. The fol-lowing programming operations are used to calculate the bits.

Figure 35- Chin et al (2013) Algorithm depth distance

To calculate depth distance, 5 tests were done for the two ranges, default range and near range. Distance range goes from 200mm to 4000mm with a differentiation of 100mm. Objective of the 5 tests was to approximate the average distance. The 1st

equation calculated the average distance for the experiment. Additionally, the AMPE (Absolute mean percentage error) was calculated to establish the best range of esti-mate. Further, standard deviation was calculated to ascertain the precision of the pro-vided depth data. To analyze consistency the Kuder-Richardson was used.

Summary of the equations used for the experiment is as shown below.

Average, x (mm) = (1)

AMPE(%) = | X

|

(2)

Standard Deviation =

(3)

i is the number of each test, i= 1,2,3,4,5 x is the average of each distance

indicates to sum

n is the total number of test taken which n=5

rkr20 = (4)

rkr20 is the kuder-richardson formula 20 k is the total number of test items p is the distance of testing is pass ±5 mm

q is the distance of testing is fail is the variation of entire test

Equation 1- Chin et al (2013) Experiment equations

To conduct the experiment, the researchers placed a box (cardboard) in the field of view of the sensor. The range of the application is between 200mm and 400mm. The experiment was designed to primarily focus on the centre of view. Cardborad bodies are used instead of human bodies. This was to channel the focus to depth distance as opposed to detection of a human frame. Human frame (bodies) may present errors due to curved outline surface.

Different range estimations.

The experiment used Microsoft SDK for the programming part (software framework). The SDK, available for windows allows for depth skeleton tracking which creates anomated avatar images. According to the software package of the SDK, it’s indicated that the SDK supports depth values of up to 4000mm. The SDK, has an upper limit depth of 800mm and a lower limit of 500mm. Below is an experimental result of the comparison between debt distance and lower distance.

Equation 2- Chin et al (2013) Default range vs near range

Figure 36- Chin et al (2013) Average depth distance vs Actual distance With the default range the Kinect sensor was able to show images for object as far as 4000mm in front of the camera and those cloase as 500mm. At the distances, the sensor was capable of assuring accuracy, relaibility and precision. Further, less than 1% error (AMPE) was recorded between the range of 600mm to 2900mm. Above graph graph shows a similar quadratic shape that plots near the actual range. The error depth is at 1.5%.

The experiment concluded that for the two ranges, there exist different depth quality images. The default range returns all human joints (20). On the other hand, the near range returns 10 joints. For near range, the sensor tends to focus on the users head, hands and torso. This is because at a near range, the sensor had a limited view because of the close distance. Default range can be used for a lot of applications like facial recognition, human pose estimation and robotics.

Figure 37- Chin et al (2013) Accuracy analysis AMPE vs Distance

Figure 38- Chin et al (2013) Precision analysis std vs Distance

In general, the researchers concluded that the Kinect sensor provides object infor-mation with a high level of precision and accuracy. The Kinect could also be relied to provide accurate distance. Additionally, the following conclusions were made by the researchers;

 Kinect sensors has low errors in measurement of depth distance. The error is only more pronounced at 600mm low range and 2900mm high range.

 There is a quadratic increase in random error of the sensor up to a maximum range of 40mm.

 The Kinect sensor shows consistency in different distance ranges.

The researchers recommend about 600mm to 3000mm for biomedical applications.

Different researchers like Alexiadis (2017), have proposed alternative methods for motion and 3-D body detection using RGB-D streams. The method is based on vol-umetric Fourier transformation method. The researchers also proposed a qualitative evaluation framework for real-time 3-D reconstruction systems. In their paper, they propose elements and ways of capturing and reconstruction of human 3-D appear-ance.

For the system setup, the devices were placed in a radius pointing towards the loca-tion of the object being captured. The radius used was [2m, 4m]. Because of the limi-tation presented by Kinect v2 (one sensor per computer) a network architecture was setup. For image storage, RGB JPEG and LZ4 (for compression) was used. The models allowed for on-line constructions of 3-D images and higher quality results.

The mapping calibration was approximated using a fixed KRT matrix. The approxi-mations were based on a dense 3-D rigid condensation. The SDK Kinect package was used for the programming part. The below figure shows the calibration setup.

The researchers achieved external calibration through the use of “novel registration model”. The model uses and easy to build structure that works as an anchor in which all registration is featured. The model is built on Scale Invariant Feature Transform (SIFT). The advantage of the calibration is that after it was set up, no human input was further required. For the calibration object, the researchers went for an easy to obtain materials and one which had unique patterns that could support a SIFT fea-ture. The IKEA standard package box was used as a calibration strucfea-ture. The image of the box is as seen in figure-40, the size of the box used was 56×33×41 cm3. The calibration procedure involved placing the calibration structure at the center of the room where it could be properly captured by all the sensors. For each view point image, a color image and depth image is captured. Since there are more than one Kinect sensors in all sections of the space the researchers concluded it was better to not operate the sensors simultaneously to avoid interference. Additionally, the re-searchers performed a quick post synchronization procedure to quickly synchronize the data obtained. The data was synchronized in 16msec.

As per the 3-D image texture, the vertices were clearly visible in multiple RGB cam-eras. The different colors from the RGB cameras were synchronized to produce eve-ry reconstructed vertex. The researchers found that the color quality of the image significantly depended on the angle of view. To this end, the researchers assigned a smaller weight to color information at the object boundaries.

Evaluation- the researchers use a capturing system fitted with calibrated RGB-D sensors for performance evaluation. The kinect sensors are also used in the recon-struction procedure and serve as checks and balances to ensure accuracy. The

captur-ing system is as shown below.

Figure 39-Alexiadis et al (2017)3-D Camera and sensor setup

In terms of volume, the researchers observed the image was distorted. The 3-D image suffered from cut limbs, holes and other different distortions. On the other hand, the appearance quality, defined by the image quality was measured using the Structural Similarity Index-based Measure (SSIM).

After evaluation and determination of the algorithm to be used in volume-based tracking. The researchers proposed the below stages for the proposed model.

2-4m

C3

C5

C2

C4

C0

C1

Figure 40- Alexiadis et al (2017) Stages for the proposed model

Results of the experiments show a reliable quality reconstruction technique. The re-sults were mostly obtained through the Kinect v2 sensors in different configuration and spatial set up. Even though the Kinect v2 provided above board quality pictures, the reconstruction done by the researcher’s present image quality with less distortion.

The reconstruction method employed resembles the poison reconstruction as it ren-ders images that are properly blended in color and texture. The quality of the recon-struction is better than TSDF-based reconrecon-struction.

Figure 41- Alexiadis et al (2017) Image quality reconstruction; Kinect data, waterlight geometry and Poisson

Generally, the researchers set out to describe the key elements for a system that tracks and captures real time 3-D images including skeletal motion movements. The researchers propose a novel system for 3-D reconstruction that is replicable since the elements used are widely available. Limitations such as the non-perfect synchronization of RGB cameras is discussed and expounded. Further, the researchers recommended areas of improvement like visual quality and frame rates.

For instance, pre-scanned users face can be used to reconstruct the face since it’s one of the most important part of the bidy reconstruction.

For the experiment, the Kinect sensor became the most important component since there it was able to correctly recreate and provide high quality images.

In the field of motion tracking, researchers like Tahavori et al (2013), have also contributed a lot in testing the technical capabilities of the Kinect sensor. For their research, the researchers used both Kinect for windows and Kinect for Xbox to check the ideal device in detecting respiratory motion in patients. The results were that the Kinect for windows gave a more accurate detection with error in the range of less than 2 mm. The goal of the experiment was to use Kinect for depth distribution on the body of the patient, this will then allow monitoring of the patient’s motion. The researchers wanted to know the potential of using Kinect for measuring and detecting respiratory motion.

To investigate the technical capabilities of the Kinect, the researchers used a planar object that was mounted on an optical rail. The rail was used to ensure precision of measurements. The researchers also made use of the Gail motion controller to inves-tigate the respiratory displacement. There were volunteers in the experiment.

To check for the technical capabilities of the Kinect for Xbox vs windows they were both mounted on the rail.

Below graph shows the performance of the two devices.

Figure 42- Tahavori et al (2013) Kinect for Xbox vs Windows

Above data was analysed using Matlab and the kinect programming package SDK.

To reduce noise, the data was averaged at 1000 frames of depth. For the experiment, distance was varied with a range of 40-140cm nthen data was recorded for both devices in normal and near mode. As seen from figure-43, both devices have a lower linit of 50 cm.

The result showed that Kinect for Windows had a higher accuracy and precision level compared to the Xbox Kinect. To further check the performance of the Kinect

for windows, the researchers condicted further tests. The rsearchers used a rectangular box whith a measurement of 20cm × 20cm × 5cm. The box was placed on the rail and the distance was varied around the range of 80-100cm. The box was then moved in steps from the sensor and measured at 2mm, 5mm and 10mm. This was done both in normal and near mode. It was concluded that the Kinect sensor for windows in near mode showed an error of <1mm.

The researchers also analyzed the rotational accuracy of the Kinect sensor. This was done by use of a rectangular object which was placed in front of the Kinect sensor.

To estimate accuracy of the Kinect sensor, the known rotational difference between the Kinect sensor and the test object was compared. The results obtained were as below;

Ground Truth Normal Mode Near-mode

3⸰ 1.4⸰ 2.5⸰

4⸰ 2.4⸰ 4.8⸰

7⸰ 3.6⸰ 6.8⸰

Table 2- Tahavori et al (2013) Normal mode vs near mode rotational results

There also exist research in other movement detection aspects like pose estimation. A research conducted by Sengupta and Ohya (1996), showed how multiple cameras can be effectively used to analyze a person’s pose. The aim of the paper was to introduce a method of easily obtaining an approximation of the pose of a 3-D or a 2-D image.

The researchers make use of a 3-CAD model on which they hypothesize a set of models using a spatial extent function. In the case, the hypothesized points are then used to derive the pose parameter.

In the experiment setting, the researchers assume everything in the space is in 3-D and they are modelled in advance. To pose the image of a human body, a new two staged edge is proposed. The two staged edge is as show in the below image.

Figure 43- Sengupta and Ohya (1996) Two staged pose estimation illustration

Figure 44- Sengupta and Ohya (1996) back projection method estimation The process of approximating the images, first involves processing of images ob-tained from the multiple cameras as seen in figure -46. The images are obob-tained by background subtraction and thresholding. To obtain a pose estimate, the image is solved as a 3-D image and not as a CAD 2-D image. The 4*3 camera calibration ma-trix is obtained and is then used to calculate the back projected ray.

When finding the approximate pose estimation, the researcher obtains a 3*3 matrix, a 3*1 rotation matrix which when calibrated maps a 3-D point with co-ordinate values of X in the CAD model. The exact mapping between the three non-collinear points is projected with volume V.

Experimental results showed that the cameras could successfully extract the edges of the model and transform it to the appropriate pose parameter set by the researchers.

Results from this experiment provided the basis upon which several later experi-ments were conducted in regard to Kinect development.

To illustrate the pose estimation technique, the researchers conducted the experiment in a controlled environment which had CAD model of a human head positioned at equal intervals from the cameras in a semi-circle. The edges of the models were found using a zero crossing edge detector. For the experiments, the silhouette was separated manually. Transformation parameters were calculated for each rigidity constraint. The figure obtained from the experiments are as seen below.

Figure 45- Sengupta and Ohya (1996) images used for the experiment

Figure 46- Sengupta and Ohya (1996) extracted silhouette images

Figure 47- Sengupta and Ohya (1996) rendered images from the parameter set

Figure 48- Sengupta and Ohya (1996) rendered images of the transferred model The researchers through the experiment presented a theoretical technique of pose estimation. The designed algorithm could extract and estimate the edge of the silhou-ettes through use of a spatial extent function. To verify the pose parameters, each image was projected to the model images. This leads to better refining of pose pa-rameters.

Finally, a stable value is obtained by constant repetition of the process. This is done until a reasonable pose is obtained.