Generalized Vision-Based Detection, Identification and Pose Estimation of Lamps for BIM Integration

This paper introduces a comprehensive approach based on computer vision for the automatic detection, identification and pose estimation of lamps in a building using the image and location data from low-cost sensors, allowing the incorporation into the building information modelling (BIM). The procedure is based on our previous work, but the algorithms are substantially improved by generalizing the detection to any light surface type, including polygonal and circular shapes, and refining the BIM integration. We validate the complete methodology with a case study at the Mining and Energy Engineering School and achieve reliable results, increasing the successful real-time processing detections while using low computational resources, leading to an accurate, cost-effective and advanced method. The suitability and the adequacy of the method are proved and concluded.


Introduction
Lighting accounts for approximately 19% of the electricity consumed all over the world [1], but there are great possibilities of achieving savings by replacing inefficient lighting sources [2,3]. Indeed, over the past decade, the worldwide demand for artificial lighting increased at an average rate of 2.4% per year [1]. In buildings, artificial lighting is a significant contributor to energy consumption and costs, consuming the highest electrical energy, approximately one-third of the electricity used [3][4][5]. Therefore, the knowledge of the real lighting inventory and conditions and the adequate management of lighting systems are crucial when addressing energy conservation measures (ECMs) [5]. Not only does this knowledge allow us to reduce energy consumption, but it can also save money for the building's owners [3]. Consequently, the building lighting must be accurately known and then reliably integrated into the building information modelling (BIM).
BIM is a technology widely recognized and increasingly investigated in the architecture, engineering and construction (AEC) industry [6][7][8]. BIM can be defined as "a set of interacting policies, processes and technologies producing a methodology to manage essential building design and project data in digital format throughout the building's lifecycle" [9]. It represents the digital model of the building as an integrated and coordinated database that enables sharing and transferring information about the whole building [8]. BIM tools are designed mainly for the analysis of multiple performance criteria, including lighting as a main issue [7,8,10]. Typically, BIM software implements internally a lighting condition analysis, differentiating between the natural and artificial lighting [7]. However, the main obstacle is the lack of accurate information [7]. The work presented in this article tries to solve this issue by looking for new methods that allow the accurate identification and state of lamps. Although Lastly, the pose is refined using the D 2 CO [17]. In the second step, a clustering operation is performed on the set of individual detections, and a centre is calculated for each of the resulting clusters, leading to a collection of localized objects. In the last step, the information from the detected objects is inserted into the BIM model of the building, assigning the detections to the corresponding space. We introduce the following major enhancements to our previous work [14]: (i) the generalization of the shape and pose estimation to automatically detect polygonal shapes with different numbers of sides and elliptical shapes; and (ii) the use of the available BIM information in the final insertion step by means of a surface projection method. These improvements yield more refined results and provide a wider range of application.
The complete system and each of the custom algorithms presented in this work have been developed in C++, with the help of the following supporting software libraries: OpenCV [26] for general artificial vision algorithms, OpenMesh [27] to read and process the 3D geometric information of object models, Ceres Solver [28] to solve the different optimization problems involved in the method, and OpenGL [29] to obtain the occlusion information on the 3D projections.

Generalized Shape and Pose Estimation
In our previous work [14], we introduced an algorithm to obtain the shape and the pose of objects projecting a quadrilateral on the image. Here, we generalize the shape estimation to automatically detect the number of sides of the final polygon, with the possibility of also detecting elliptical shapes, and introduce the necessary changes to the pose estimation to be compatible with either polygonal or elliptical shapes. We use the term pose to denote a rigid transformation of an object, composed of a vector in R 3 that determines the translation and a vector in so(3)-the Lie algebra associated with the special orthogonal group SO(3)-that determines the orientation.

Polygon Estimation
The method presented in [14] aims to obtain an estimation of the shape of a polygon with a fixed number of sides k based on an initial contour with n > k sides. The method is an extension of the work of Visvalingam et al. [30] for strictly inner, strictly outer, or general polygons, based on a predefined score function. Here, we use an area-based score function to detect polygons with an arbitrary number of sides based on a threshold a max as the termination criterion. This method is presented in Algorithm 1 with the additional functions in Algorithm 2. We use the method of Sklansky [31] to make the initial contour convex. The method stops when the next best area relative to the original contour area is greater than a max . This method is based on the fact that the reduction of the area should be relatively small until the final number of sides is reached, at which point there should be a noticeable increase in the area reduction.

Algorithm 1 Fit polygon.
Require: P = {p k } is a sequence of n points a max is the area threshold to stop removing sides Ensure: for k ← 1, len F do 6: r k ←INNERSCORE(F, k) Algorithm 2 7: s k , q k ←OUTERSCORE(F, k) Algorithm 2 8: end for 9: while true do 10: i ← arg min{r k } ; j ← arg min{s k } 11: if a/A > a max then 13: break 14: end if 15: l ← arg min(r i , s j ) 16: REMOVEELEMENT(R, l) ; REMOVEELEMENT(S, l) ; REMOVEELEMENT(F, l) 17: if s i < r j then 18: f l ← q l

Shape Estimation
The polygon estimation method is included in a more general shape and pose estimation technique presented in Algorithm 3. First, we obtain a coefficient to determine if the shape is polygonal or elliptical based on a predefined threshold sth. In the first case, we estimate the polygon using Algorithm 1; in the second case, we use the method introduced by Fitzgibbon and Fisher [32] to obtain the final shape parameters.

Algorithm 3 Fit shape.
Require: P = {p k } is a sequence of n points a max is the area threshold to stop removing sides sth is the maximum number of sides for the shape to be considered a polygon M = {m i } is a set of object models C are the parameters of the camera model for all m i ∈ M do 5: if m i has a non-circular shape then 6: end if 8: end for 9: else 10: for all m i ∈ M do 12: if m i has a circular shape then 13: Π i ←ESTIMATECIRCULAR(F, P, C, m i ) Section 2.1.3 14: end if 15: end for 16: end if 17: return F, C 18: end function The shape coefficient s is obtained based on the circularity [33] of the shape as follows: being p the shape perimeter and a its area. The aim is to obtain higher values for polygons compared to those for ellipses.

Pose Estimation
We use two different methods to estimate the pose based on the image shape. In the case of a polygon, we solve a PnP (Perspective-n-Point) problem using an iterative method based on the Levenberg-Marquardt optimization [34,35] as described in [14]. However, if the shape is elliptical, we do not have a direct correspondence between points in 2D and in 3D. We could use the four axis points from the projected ellipse, but Luhmann [36] showed that the eccentricity in the projection of circular target centres should not be ignored in real applications. Therefore, we have to modify the classic PnP problem to account for the absence of a direct correspondence. Using the contour points from the image, we formulate a minimization problem based on the distance of the projected image points on the circle plane to its circumference.
Let E be an ellipse with a centre q E = (u E , v E ), a semi-major axis of length a and a semi-minor axis of length b, rotated by an angle θ. Let C be a circle for which E is a projection on the image plane, with a centre p C = (x C , y C , z C ) and a radius R C , included in the plane P with a unit normal vector n = (x n , y n , z n ). Let K be the matrix of the intrinsic parameters of the camera: with focal lengths f x and f y , and principal point offsets c x and c y . For each point q i = (u, v) on the contour of the ellipse, we can obtain its corresponding position p i = (x i , y i , z i ) in the camera coordinate system on the plane z = 1 as Let L be the projection line from the camera origin to the point p i . The intersection point p i between this line and the circle plane is given by their corresponding equations: Then, for each point, we try to minimize the distance from its projection to the circumference: As for the classic PnP problem, we solve the minimization using an iterative method based on the Levenberg-Marquardt optimization [34,35]. The constraint on the unit normal vector is taken into account by performing a local parameterization ofn in the tangent space of the unit sphere.
To improve the convergence of the method, we adopt the following initial guess of p C andn: being p E the corresponding position of q E in the camera coordinate system on the plane z = 1 and m = (m x , m y , 0) a unit vector along the direction of the minor axis of the projected ellipse. Lastly, we obtain the rotation vector from the resulting unit normal vector of the plane as follows:

Surface Projection in the BIM Integration
The BIM model of the building represents an additional source of information that can be used to improve the accuracy of the detections. Apart from the insertion of the new data exemplified in [14], we can also use the geometric information from the BIM model to extract a list of surfaces with spatial information and use them to adjust the positions of the detections. Assuming gbXML [37]-an open schema created to facilitate the transference of building data stored in BIM to engineering analysis tools-as the supporting format for the BIM information, we can obtain the required data by accessing the elements with path "gbXML/Campus/Surface/PlanarGeometry" in the XML tree.
Given that the detected lamps are embedded in the ceilings of the building, we can perform a projection in the 3D space of each of the detections to the nearest building surface. Let S = {s i } be the set of the surfaces of a building model, each one with a unit normal vectorn k and a point x k included in the plane defined by the surface. Then, the surface in the model that is the closest to a point p is given by Then, the projected location p of a detection positioned at p, with the nearest model surface s K at a distance d K and with a unit normal vectorn K , is With this method, we can improve the location of the detections and, at the same time, assign the detections to the corresponding space in the building model based on the nearest surface. This is a more effective and simpler approach compared to the point-in-polyhedron test used in [14].

Description of the Experimental System
The acquisition of the experimental data took place in two locations at the Mining and Energy Engineering School of the University of Vigo in Spain. Figure 2 shows the geometry of the BIM model of this building. The two locations used for our tests are displayed in Figure 3. The first one consists of a corridor of a classroom area with rectangular lamps, while the second one is a hall with circular lamps. Both lamp types are embedded in the ceiling.  We used point clouds extracted with high-accuracy sensors as the ground truth for our experiments for the position of the lamps. These clouds are shown in Figure 4. The cloud in Figure 4a was obtained using a backpack-based inspection system based on LiDAR sensors and inertial measurement unit (IMU), whose data were processed with simultaneous localization and mapping (SLAM) techniques [38,39]. The second cloud, in Figure 4b, was captured with a FARO Focus3D X 330 Laser Scanner from FARO Technologies Inc. (Lake Mary, FL, USA). The technical characteristics of both systems are presented in Table 1.   We obtained the greyscale images and the location data for the two places using a Lenovo Phab 2 Pro with Google Tango [40]. The images were extracted at an approximate rate of 30 frames per second and had an original resolution of 1920 × 1080 but were later downscaled to 960 × 540 before the processing to improve the speed of the method. The location data were obtained from the information provided by the IMU of the device combined with the visual features of the environment using advanced computer vision and image processing techniques to improve the accuracy of the motion tracking information [40]. Some statistics of the complete dataset of images and the two locations used in the experiments are displayed in Table 2. The acquisition process, depicted in Figure 5, was done at a walking speed of ≈1 m/s, positioning the camera at 1.5 m from the floor with a pitch of ≈60 • with respect to the horizontal plane.  Regarding the 3D models, we added two new items to the ones presented in [14], corresponding to the lamps found in the locations of the experiments. With this addition, the geometric characteristics of all the elements in the database used for the experiments are shown in Figure 6, including the two new lamp models (Models 4 and 5). We keep the original three lamps to assess the identification capability of our system with additional models of similar geometries. The specifications of the lamp bulbs for each model are shown in Table 3.

Results and Discussion
We performed tests for each of the technical contributions presented in this work. In this section, we show their outcomes as well as the final values for the new case study described in Section 3. Figure 7 includes some examples of the detections for this new case study for each lamp type.

Generalized Shape and Pose Estimation
We verified the generalized polygon estimation technique presented in Section 2.1.1. Figure 8a shows the area ratio used to stop eliminating points in Algorithm 1 with respect to the number of sides for the shapes obtained from light surfaces with four sides. The light surface instances used in this test were obtained from the image test dataset, comprising a total of 1343 contours of rectangular lamps. As presented in Figure 8b, the great majority of the shapes were correctly classified as quadrilaterals, with an equal error rate (EER) of 0.003723. The few remaining shapes corresponded to very distorted light surface detections with a higher number of apparent sides. The second verification corresponds to the generalized shape estimation. The results of the shape coefficient of Equation (1) for the subset of light surface shapes in the dataset are shown in Figure 9b. This subset contains 1343 shapes corresponding to the rectangular lamps and 4020 corresponding to the circular lamps. We can see that all shapes were correctly classified as polygonal or elliptical for this dataset when we selected a shape threshold of 21.437 < sth < 35.869.

Identification
These results are related to the identification of the specific lamp model among the ones registered in the database. As previously mentioned, there are a total of five lamp models in the database, resulting in five target and output classes in the classification problem. However, the input consist of instances of Models 4 and 5 only, while the others are kept to test the ability of the system to identify the correct lamp even in the presence of additional models, verifying the validity of the system in a more realistic case of a potentially larger database with additional elements not included in an specific area of the building. Figure 10a shows the confusion matrix for the individual detections with the five classes corresponding to the five lamp models for a total of 1335 and 4012 detections of the rectangular (Model 4) and circular (Model 5) lamps, respectively. We can see that all detections were correctly classified, and, even when the three additional models were included, none of the detections were incorrectly identified as one of these, as shown in the first three rows/columns of the confusion matrix. Moreover, there are no errors between Model 4 and Model 5, which is expected from the results of the shape type classification procedure, with 100% correct classifications in the last two rows/columns of the confusion matrix. Figure 10b illustrates the distribution of detections for each cluster. Some of the clusters for the circular lamps have a very low number of detections, due to the fast-moving blurred images or the low ambient lighting conditions that result in the target light being too bright, removing important edge information from the surrounding area. Nevertheless, the average number of detections per cluster is 64.42, which is sufficiently high to compensate the potential negative effect of outliers in the cluster.

Localization, State and Surface Projection
These results are intended to quantify the errors in the localization outcome and the improvements of the surface projection method. Figure 11 shows the positions of all the cluster centres obtained from the detections of our system as well as the reference values based on the high-accuracy point clouds with their corresponding ON/OFF state. There should be one detection per turned on lamp; however, the lamps that are turned off should not be registered by the system to correctly identify the lamp state.  Figure 12 presents the confusion matrices for the lamp state of the rectangular lamps, the circular lamps, and both, where Class 0 corresponds to the OFF state and Class 1 to the ON state. As shown in Figure 12a, the state of all rectangular lamps was captured accurately, while Figure 12b shows that there were some errors for the circular lamps: three of them were incorrectly detected as OFF, while two were incorrectly detected as ON. Altogether, 95.7% of the lamps were assigned to the correct state, as represented in Figure 12c. Regarding the localization of the lamps, Figure 13 shows the distance from the detected to the reference lamp positions. We include the results with and without the surface projection step. We can see that the use of the surface projection method reduces the distance to the reference values when assigning the detections to the corresponding BIM space. As displayed in Table 4, the error was reduced by 2.94% for the rectangular lamps, 36.0% for the circular lamps and 26.3% for the entire dataset.

Conclusions
We have presented a complete method for the automatic detection, identification and localization of the lamps to be directly integrated into the BIM of the building. The method is based on our previous work, extending its applicability to a much wider type of lamps and improving the integration method in the BIM. We have applied this method to a completely new case study with different lamp models to assess the performance benefits and the enhanced versatility accomplished with the introduction of the novel contributions.
The results show that there is a high percentage of polygonal shapes correctly identified as quadrilaterals, with an EER of 0.003723. Moreover, all 5363 light surface contours in the dataset are accurately classified as either polygonal or elliptical. Finally, the identification of 5347 detections has a 100% success rate, even when three additional models are kept in the database. With respect to the lamp state, there is a high percentage of correct classification, with 95.7% of the lamps assigned to the appropriate state. Additionally, the distance between the detected and actual lamp positions in the building is 14.54 cm on average and is reduced to 10.71 cm if the surface projection step is included, which results in a 26.3% decrease in the location error. Considering all the results obtained in the experiments, we have verified that the method can be applied to the intended use cases and that the new additions lead to better results in terms of the identification and the localization.
Our method relies only on single-image information; thus, a procedure to distinguish lamps with the same shape and different size does not exist. We are working on extensions to our methodology to overcome this limitation by leveraging the combined information of the same detection from different camera views and to also use the available depth information provided by the Tango platform. Moreover, if the BIM information is known beforehand, which can be used in prior steps of the methodology. Therefore, we are working on methods to utilize this information earlier to better adjust the data to the specific model for each of the individual detections and improve the overall accuracy of the results. Funding: Authors want to give thanks to the Xunta de Galicia (Grant ED481A). This investigation article was partially supported by CANDELA project, through the Xunta de Galicia CONECTA PEME 2016 (IN852A/81).

Conflicts of Interest:
The authors declare no conflict of interest.

Abbreviations
The following abbreviations are used in this manuscript: