Non-contact object measurement has a long history of applications in several fields of industry and study. A few applications include measuring industrial fractures/cracks [1
], measuring plant/leaf size [2
], wound morphology [3
], forensics [4
], and archaeology [5
]. Non-contact object measurement refers to the measuring of objects via a method or device which does not interrupt or come in contact with the object of focus. This can either be done with a device in real time, or via software, after an image of the object is captured. To measure captured images a reference or proxy marker is often used to spatially calibrate the resolution of the image [6
]. In this sense, a graduated device or ruler placed in close proximity can also be used to better spatially comprehend the contents of an image. Therefore a reference marker, specifically a ruler, can be used to spatially register the size of the contents in an image. For these images, the digital measurement in pixels (px) and the spatial reference marker, such as a ruler, would need to be captured in the same plane and then measured. A common metric for the combination of these two measurements is dots per inch (DPI) which is the number of pixels per inch.
Often, measuring an image in pixels involves manual work somewhere in the pipeline and in order to maintain a confident level of accuracy/consistency the task becomes time consuming, disregarding the abilities of semantic segmentation [8
] momentarily. Dynamically or automatically achieving this level of image calibration would over come two limitations. The first is that manually measuring each image in a database can take a lot of human hours. The second is that manual measurement induces subjectivity to the task which is objective in nature, meaning different measurements will be retrieved by different people for the same image. This second hurdle can be evaluated by considering the aforementioned applications’ individual needs for consistently accurate measurements.
In this work, we chose to implement non-contact object measurement for images containing rulers. This is primarily because rulers are well standardized and readily available in common practice. This system can be used to either automatically convert pixel measurements for an image with an object and a ruler or to manually measure elements with the generated graduation to pixel ratio. However, if measurements are provided, the system is able to take the objects morphological data, measured in pixels, and convert it to whichever graduation system was provided.
This system is capable of calculating the graduation to pixel ratio in DPI or DPM (dots per millimeter) of an image provided it contains a ruler. To do this, we created a heuristic technique for approximating the distances between the graduations on a ruler. In this work the resulting measurement in pixels from one graduation to another translates to the mode of the spatial frequencies along a region of a ruler, which identifies the element that occurs most often in the spatial frequencies set.
Hough Transforms have been cited as perhaps the most popular technique for measuring rulers in images for determining scale and measuring objects. Hough lines are extracted following a similar methodology to edge detection and often involve some initial metric of edges to retrieve a result. Although it is a very straightforward solution to the problem, it typically requires specific parameters for different rulers and images. One system which uses the Hough transform to detect measurement tools such as rulers and concentric circles of fixed radii [10
] was developed. This system works by gathering several regions with vertical lines with respect to a large horizontal line. The filtering of the method looks at sample mean, variance and standard deviation to evaluate the weighted input information for graduations on a ruler. The specified inputs to the system are the type of ruler, the graduation distances, and the minimum number of elements to be considered [10
]. For the two images listed in their paper under the results section given for manual vs semi-automatic the measurements varied from 2.3 mm to 0.027 mm.
Another method which uses the Hough transform in forensics [11
] extracts the spatial frequencies of a ruler first by using a two-dimensional Discrete Fourier Transform (2-D DFT) which is modeled after a known ruler. Both this methodology and Calatron’s et al. [10
] method are largely similar for extracting the graduation distances, although here the evaluation is done using a block based DFT. This method is capable of sampling at a sub-pixel level, this is a necessity when assessing some forensic samples such as finger prints [11
]. Additionally, this method appears to only work specifically for forensic rulers which have very unique features. Our system was not targeted for one particular reference ruler, thereofore we chose to forgo Hough transforms as it would only narrow the solution to a specific category or ruler type.
Object detection and recognition have been major topics of computer vision with a variety of solutions addressing several problems such as: character recognition, semantic segmentation, self-driving cars, and visual search engines [12
]. In brief, object detection is a way to identify whether or not instances of an object are contained in an image [17
], and object recognition a way to classify an object given a label that characterizes that object [18
]. State of the art solutions for object detection and recognition today rely on one or a few combinations of deep neural network techniques. In particular, Region proposal based Convolutional Neural Networks (R-CNN) [19
] are among the most proficient. Others, such as You Only Look Once (YOLO) [20
], Fast YOLO [21
], Single Shot Detector (SSD) [22
], Neural Architecture Search Network (NASNet) [23
], and Region-based Fully Convolutional Network (R-FCN) [24
] apply different approaches. These discoveries/optimizations along with advances in R-CNNs occured all in a relatively short period of just a few years. Enhancements to the standard R-CNN architecture, such as Fast R-CNN [25
], Faster R-CNN [26
], and Mask R-CNN (MRCNN) [27
] have led to improved speed, accuracy, and other benefits in the realm of object detection/recognition [28
Year over year we see that smartphone devices and cameras are used for an increasing number of smart technologies and applications. Our heuristic system uses a combination of methods as a novel solution for detecting and measuring rulers to scale objects in the same plane. The capacity for our system is made possible by taking advantage of the high resolution images that have become the norm over the past few years. Our proposed method can be split into two distinct pieces in order to achieve heuristic analysis for in-plane non-contact calibration of rulers. First, we perform semantic segmentation by adapting the Mask R-CNN architecture which extracts the area containing a ruler. In Figure 1
this area is represented as “Segmented Result” which is where the proposed heuristic method begins to assess the ruler. We then perform the in-plane non-contact measurement of the segmented ruler returning a calibrated measurement, the architecture in this figure represents the proposed method end to end. In the Section 3
and Section 4
, we will expand on the outcome of region for “Pixel Conversion” which is an integer given as pixels per unit of measurement, e.g., pixels per millimeter.
In this paper we proposed a new system for automated calibration of images when a ruler is present in the scene. This is done by extracting the ruler and measuring its graduations as a spatial frequency. This system produces accurate and reproducible results, removing the tedious manual labor to perform metrological tasks. The robust and extensible OpenCV packages allowed us to formulate and execute image transforms that were necessary to move through the pipeline of data extraction. Aside from OpenCV we created the initial/deep search from scratch instead of applying more pre-built methods. This allowed us to fine tune the search and consider broader scenarios. We used Mask R-CNN to produce satisfactory ruler masks, with this technique we were able to remove the majority of uncertainty when handling rulers with widely varying backgrounds. Our system samples the image data at the pixel level which was mainly done to broaden the capabilities that the system could work on, namely different resolutions. When sampling at a sub-pixel level, assumptions about recording devices’ parameters need to be made and manually input to the system. In order to overcome this difficulty, we trained Mask R-CNN to do semantic segmentation on rulers normalizing the segmented masks.
In addition to Hough line transform or Fourier transforms to extract the graduations on a ruler, limiting the ability to perform sub-pixel measurements, we iteratively reduce the search space of the graduations and perform a search for spatial frequencies corresponding to the ruler’s graduation. This provides reproducable results on a wide variety of images. From end to end the extracted pixel to metric ratios or DPI/DPM can be found on average in 5.6 s.
In the future we look forward to optimizing the system, further reducing the search time and space on the deep search stage. We are currently working on implementing several methods to decrease the number of search candidates that would most likely not be in the region of the ruler’s graduation. The system today stores many search results in case the heuristic model point backwards, this ultimately leads to still having many empty/wasted searches. Additionally, we look forward to finding a solution for halting the system early, potentially prior to performing either the initial or deep search, skipping the deep search when possible. This would lead to faster results from our system.