Identiﬁcation of Markers in Challenging Conditions for People with Visual Impairment Using Convolutional Neural Network

: People with visual impairment face a lot of di ﬃ culties in their daily activities. Several researches have been conducted to ﬁnd smart solutions using mobile devices to help people with visual impairment perform tasks. This paper focuses on using assistive technology to help people with visual impairment in indoor navigation using markers. The essential steps of a typical navigation system are identifying the current location, ﬁnding the shortest path to the destination, and navigating safely to the destination using navigation feedback. In this research, the authors proposed a system to help people with visual impairment in indoor navigation using markers. In this system, the authors have re-deﬁned the identiﬁcation step to a classiﬁcation problem and used convolutional neural networks to identify markers. The main contributions of this paper are: (1) A system to help people with visual impairment in indoor navigation using markers. (2) Comparing QR codes with Aruco markers to prove that Aruco markers work better. (3) Convolutional neural network has been implemented and simpliﬁed to detect the candidate markers in challenging conditions and improve response time. (4) Comparing the proposed model with another model to prove that it gives better accuracy for training and testing.


Introduction
People with visual impairment (PVI) have limitations in the function of their visual system.These limitations prevent them from seeing and performing daily activities such as navigation or shopping [1][2][3][4][5].Therefore, it is important to develop solutions to help PVI improve their mobility, protect them from injury, and encourage them to interact socially [6].PVI also face challenges when navigating in shops, and to access and identify products [7].While outdoor navigation is already solved by the use of a global positioning system, indoor navigation is still problematic [8][9][10][11].
As a result, multiple researches have been carried out to help PVI in indoor navigation and identifying products in shops or other objects using Wi-Fi, Bluetooth, RFID, NFC, and tags.These solutions differ in the types of the signals used, the positioning method, and the accuracy [12][13][14][15][16][17][18][19].These solutions fall into three categories: Tag Based Systems: Such as radio-frequency identification (RFID) and near field communication (NFC), which use wireless components to transfer data from a tag attached to an object for the purpose of automatic identification and tracking [20,21].Computer Vision Based Systems: These systems install or attach unique visual tags such as quick response (QR) codes or augmented reality (AR) markers to help in the navigation and recognition process [22][23][24][25].Other systems do not require tags to be installed as they use computer vision (CV) techniques to analyze images and extract some features from them which help in navigating and identifying objects [26][27][28][29][30][31][32][33].Hybrid Systems: These systems take the strengths of two or more systems and combine them into a new system to deliver better accuracy and performance [34][35][36].In this research, the authors have focused on CV techniques.
CV non-tag-based systems are cost-effective, as they need little or no infrastructure and most of them can be easily installed on smartphones.However, they have several limitations.Their performance may be unpredictable in real-world environments, as image resolution may be affected by some factors including motion blur and changes in certain conditions, such as illumination, orientation, and scale.These systems also use extensive computational power which is not suitable for real-time usage.It is also difficult for PVI to capture photos in good quality [26,37].CV tag-based systems rely on detectable visual landmarks such as QR codes, barcodes, and AR markers, which improve precision, robustness, and speed [5,37].However, CV tag-based techniques require a prior installation of tags in the correct place and building a map to store these locations.Moreover, if many tagged items are installed in a small area, PVI may be confused by receiving information about all of them at the same time.Furthermore, tags not related to the system may confuse PVI.These tags are susceptible to damage caused by movement through the supply chain or by the weather.Moreover, it is difficult for a smartphone camera to detect CV tags if the PVI are moving fast, and the recognition rate decreases as the distance between the reader and the tags increases [38].Based on our evaluation of the available technologies, the authors have concentrated on CV tag-based techniques.
Our goal is to build an indoor navigation system for PVI using CV tag-based techniques.A wide range of tags may be used, but QR codes and square markers are the most popular ones as they provide four correspondence points which are enough to perform camera position estimation.However, captured images in many real-life applications may have some problems such as blur due to motion, distortion of marker, and marker occlusion which makes identification difficult.The authors proposed a navigation system to improve marker identification in challenging conditions using convolutional neural network (CNN).By using CNN, the identification problem is converted to a typical classification problem.
This manuscript presents a system to help PVI navigate using markers.In this system, markers have been printed on papers and installed in indoor locations.When any marker is detected during navigation using the smartphone camera, the system uses this marker as a starting position.Then, it searches for the shortest path from that node to the destination node.During navigation, when these markers are detected, navigation commands are sent as feedback to help PVI.The main contributions of this article are the following:

•
Building a system to help PVI in indoor navigation using markers.

•
Making a comparison between multiple CV tags to select which is the best.The results proved that Aruco markers are the most fit for purpose of all markers.

•
Formulating the marker identification problem of the CV tag-based system as a classification problem and solving it using the convolutional neural network (CNN).CNN has been implemented to detect candidate markers in challenging conditions.The authors have tested this system on real examples and achieved a significantly high accuracy in marker identification.

•
Simplifying the CNN model to improve response time and making it suitable for real-time usage.

•
Comparing the model of the authors with another model presented in [39] to evaluate the accuracy of training and testing.
The next part of this article is structured in the following way.Section 2 reviews the most relevant related works.Section 3 discusses the design of the proposed system.Section 4 presents the experimentation carried out, while section five draws some conclusions and future works.

Related Work
The main task of indoor navigation is to find the location of PVI and allow them to navigate safely in indoor environments such as public buildings and shopping malls.In recent years, smartphones have become useful because of their integrated cameras and various sensors.These technologies allow developers to build many applications to help PVI to navigate indoors and avoid obstacles [40].Additionally, multiple researchers used the field of CV to improve the quality of these applications by improving accuracy and allowing them to be suitable for real-time use.In the marker-based method, the system consists of a mobile device with a camera, markers, and a server.The camera is used to scan markers, while the server is used for storing information such as the map [14].In this section, the authors have focused on the available CV solutions based on QR codes and markers.

QR Codes
Ebsar is an Arabic system which provides indoor navigation for PVI by preparing the building and then guiding them using navigation commands [41].At first, this system constructed a graph where each node represents a place or a checkpoint and a QR code is generated for each.Each edge is labelled with the number of steps and the direction between nodes connected to it.To start navigating, it seeks the nearest node to the PVI's location.After finding this node, it searches for the shortest path from that node to the destination node.During navigation, it provides Arabic voice feedback to the PVI using Google Glass.This system used Google glasses to facilitate detecting QR codes and communicating with the user, which makes it inexpensive and allows PVI to navigate without having to use their hands.However, this system requires an internet connection to download the building graphs from the server.Moreover, it is better to use markers which can be detected from a long distance rather than a QR code.Adding haptic feedback enables operation in a noisy environment.
AssisT-In used QR codes to help cognitively impaired people navigate inside new and complex environments [42].QR codes are installed inside the building.Then, PVI use one of them as a starting point by scanning it.After determining the desired destination, the system calculates the shortest path to reach it.The system starts navigation by guiding PVI from the start node to subsequent nodes until reaching the destination node.When PVI scan a QR code during navigation, the system provides a feedback to them as a text message in the voice of a virtual pet such as a cartoon dog.If PVI scan a QR code not belonging to AssisT-In, it notifies them to keep searching for the correct QR code.However, it is difficult for PVI to capture good quality photos with their smartphone camera as most of the photos may be blurry.If more than one QR code is detected at the same time, it is better to select the one to use based on the distance and not select it randomly.
Zhang et al. proposed a navigation approach using a mobile robot in an indoor environment based on QR codes as landmarks [38].These QR codes are placed in a distribution such as a grid pattern at the ceiling and the system constructs a map for them.Furthermore, an industrial camera is added to the robot to rapidly identify these QR codes.With this configuration, the camera can detect at least one QR code in its field of view and can estimate the position of the robot.The proposed recognition algorithm can localize the robot accurately and is suitable for real-time tasks.However, the robot failed to recognize QR codes in a completely dark environment.It is also hard to identify QR codes in adverse conditions such as a blurring effect due to motion and occlusion.
A smartphone system was developed to help PVI navigate in unknown indoor places using QR codes [43].It starts by determining the type of current position, then fetching the environmental information from colour QR codes using a simple CV algorithm based on their colour and edges.When in motion, the change in location is computed continuously using two inertial sensors and PVI routes are recorded to guide them during the return route.During navigation, the system provides PVI with feedbacks using beeping or Text-to-Speech (TTS).The proposed method combined spatial language and virtual sounds to provide productive feedback which leads to better performance and reduces navigation errors.This system only requires minor modifications to the environment such as installing QR codes.Coloured QR codes facilitate separating and identifying them from the background.However, objects only within 2.5 m were detected, which needed improvement.In adverse conditions, such as the blurring effect of motion, the system has difficulties identifying QR codes.
An android navigation application was introduced for PVI using QR codes that utilizes the smartphone's camera [44].QR codes intended to be used by PVI are installed on the floor.Initially, the current location is defined by scanning one of the existing QR codes.Then, it finds the shortest and most optimal path to the PVI's destination.During navigation, any deviation from the predefined path is detected and corrected.All the instructions are given in an audio form to the PVI.This application provides automatic navigation on pre-defined paths for PVI and does not require any additional hardware.This application is capable of scanning QR codes of different sizes and in different challenging environments.However, instructions in audio and haptic form should be added to increase performance and reduce navigation errors.Moreover, it is better to use markers which can be detected from a long distance rather than a QR code.
Blind shopping is a solution which offers a better shopping experience for PVI with features including product search and navigation inside the store using voice messages [45].The system combined an RFID reader on the tip of a white cane with mobile technology to identify RFID tags and navigate inside the shop.The system provided a web-based management part for configuration, generating QR codes for product shelves and RFID tag markers attached to the supermarket floor.It also, gives navigation feedback to PVI using voice commands via their smartphone.However, a Wi-Fi connection is required to retrieve the data from an online database.RFID tags and QR codes cannot be detected from a long distance.

Markers
Square markers are square shaped tags, as shown in Figure 1.They have a thick black border and the inner region contains images or binary codes represented in the form of grids of black and white regions.The reason for using thick black border is to ensure quick detection on any surface.For example, when considering Aruco markers [25], which have been used in our system, it is a square marker where the detection process is performed in two main steps: Detection of the marker candidates and codification analysis.
Appl.Sci.2019, 9, x FOR PEER REVIEW 4 of 24 installing QR codes.Coloured QR codes facilitate separating and identifying them from the background.However, objects only within 2.5 m were detected, which needed improvement.In adverse conditions, such as the blurring effect of motion, the system has difficulties identifying QR codes.
An android navigation application was introduced for PVI using QR codes that utilizes the smartphone's camera [44].QR codes intended to be used by PVI are installed on the floor.Initially, the current location is defined by scanning one of the existing QR codes.Then, it finds the shortest and most optimal path to the PVI's destination.During navigation, any deviation from the predefined path is detected and corrected.All the instructions are given in an audio form to the PVI.This application provides automatic navigation on pre-defined paths for PVI and does not require any additional hardware.This application is capable of scanning QR codes of different sizes and in different challenging environments.However, instructions in audio and haptic form should be added to increase performance and reduce navigation errors.Moreover, it is better to use markers which can be detected from a long distance rather than a QR code.
Blind shopping is a solution which offers a better shopping experience for PVI with features including product search and navigation inside the store using voice messages [45].The system combined an RFID reader on the tip of a white cane with mobile technology to identify RFID tags and navigate inside the shop.The system provided a web-based management part for configuration, generating QR codes for product shelves and RFID tag markers attached to the supermarket floor.It also, gives navigation feedback to PVI using voice commands via their smartphone.However, a Wi-Fi connection is required to retrieve the data from an online database.RFID tags and QR codes cannot be detected from a long distance.

Markers
Square markers are square shaped tags, as shown in Figure 1.They have a thick black border and the inner region contains images or binary codes represented in the form of grids of black and white regions.The reason for using thick black border is to ensure quick detection on any surface.For example, when considering Aruco markers [25], which have been used in our system, it is a square marker where the detection process is performed in two main steps: Detection of the marker candidates and codification analysis.Dash et al. proposed an AR system to be used in kindergarten to learn the alphabet by detecting the markers that may be present in a scene using CNN [35].Markers have been printed on papers within a rectangular box.Then, children can show them in front of the camera attached to automatically render the virtual object over the marker with appropriate position and orientation.This system achieved high accuracy in marker identification and augmentation of the virtual objects Dash et al. proposed an AR system to be used in kindergarten to learn the alphabet by detecting the markers that may be present in a scene using CNN [35].Markers have been printed on papers within a rectangular box.Then, children can show them in front of the camera attached to automatically render the virtual object over the marker with appropriate position and orientation.This system achieved high accuracy in marker identification and augmentation of the virtual objects making the system resistant to environmental noises and position variation.However, it may fail to detect markers from a long distance.Delfa et al. proposed an approach for indoor localization and navigation using Bluetooth and an embedded camera on the smartphone [46,47].It operates in two modes: Background and foreground mode.The background mode gives a low-level accuracy position estimation using Bluetooth.The foreground mode provides high accuracy by using the smartphone's camera to detect visual tags deployed onto the floor at known points.The system can detect tags in real-time to estimate the PVI position with a high level of accuracy and navigate towards the target.The marker's colours are chosen to be different from the colour of the floor to enhance the speed and efficiency.These colours guarantee high contrast between the floor's colour and the tag's border.However, the system failed to detect markers from a long distance and cannot detect more than one tag at the same time.Moreover, adding haptic feedback enables it to work in noisy places.
Bacik et al. presented an autonomous flying quadrocopter using a single onboard camera and augmented reality markers for localization and mapping [48].The system is capable of estimating the position of the quadrocopter using a coordinate system defined by the first detected marker.To improve the robustness of marker-based navigation, it used a fuzzy control to achieve a fully autonomous flight.However, the precision of the mapping approach and the response time requires improvement.The system also fails to detect markers from a long distance.

Materials and Methods
Navigating and tracking PVI and objects inside buildings have many obstacles.While Global Positioning System (GPS) is used for outdoor navigation with good results, it is not suitable for indoor use, as it may give inaccurate results.For example, it is impossible to use a GPS to automatically determine which floor the user is currently on in a tall building.Using the mobile device's accelerometer and compass to track users showed better results and proved to be more available than GPS tracking but, requires a wireless connection or General Packet Radio Service (GPRS) service to function.Tags can be used to ensure that even when the PVI deviate from the correct path, they will immediately allow the system to adjust the exact location and continue navigating.This manuscript proposed a system to help PVI navigate indoors using markers.At first, tags are printed on multiple pieces of paper and placed in the specified location of interest.Then, a graph is created where nodes represent the markers' position.Edges are labelled with the number of steps and the direction between nodes are connected to it.The system starts by requesting the PVI to select their starting point based on the surrounding tags.When a marker is detected by the smartphone camera, the system will use this marker as a starting position.
To start navigation, the system asks the PVI to choose their destination using voice commands.Then, it searches in the database for the shortest path from this point to the destination.This path is a list of checkpoints that the PVI should pass to arrive at their destination.During navigation, continuous guidance is given to them when moving from one point to the next.Figure 2 shows a diagram to illustrate these steps.
The authors found that the essential components of this prototype are: 1.How to accurately find PVI location at any time based on the installed tags so, they can continue navigating safely.2. How to navigate from the starting point to the destination based on these tags.Therefore, the authors divided the proposed system into two parts: Identifying PVI location and navigating to a destination.

Navigating to a Destination
The proposed system functions through two major phases: (a) Preparing the building for PVI by installing markers at the specified locations and building a map for them.(b) Guiding PVI during navigation using TTS.The following sections describe these phases in detail.

Building a Map
Before PVI navigate inside the building, a sighted person should construct a floor plan.This person moves around the building to explore the available paths of the interest points such as labs and classrooms.After that, markers are generated and attached to the walls at those points to accurately guide PVI. Figure 3 shows a floor plan of the fourth floor of the Faculty of Information Technology at the University of Pannonia in Veszprem, Hungary.As shown in the Figure 3, the interest points are marked with red circles.Then, the authors built an internal map using a graph to store the relations between these points.Our prototype used the number of steps between markers during map construction.To do this, the authors tracked a sighted person's movements between markers to use them on the graph edges.Figure 4 shows a constructed graph representation for the fourth-floor plan.Finally, nodes and edges are stored in a database to be used by PVI during navigation.

Building a Map
Before PVI navigate inside the building, a sighted person should construct a floor plan.This person moves around the building to explore the available paths of the interest points such as labs and classrooms.After that, markers are generated and attached to the walls at those points to accurately guide PVI. Figure 3 shows a floor plan of the fourth floor of the Faculty of Information Technology at the University of Pannonia in Veszprem, Hungary.As shown in the Figure 3, the interest points are marked with red circles.Then, the authors built an internal map using a graph to store the relations between these points.Our prototype used the number of steps between markers during map construction.To do this, the authors tracked a sighted person's movements between markers to use them on the graph edges.Figure 4 shows a constructed graph representation for the fourth-floor plan.Finally, nodes and edges are stored in a database to be used by PVI during navigation.

Building a Map
Before PVI navigate inside the building, a sighted person should construct a floor plan.This person moves around the building to explore the available paths of the interest points such as labs and classrooms.After that, markers are generated and attached to the walls at those points to accurately guide PVI. Figure 3 shows a floor plan of the fourth floor of the Faculty of Information Technology at the University of Pannonia in Veszprem, Hungary.As shown in the Figure 3, the interest points are marked with red circles.Then, the authors built an internal map using a graph to store the relations between these points.Our prototype used the number of steps between markers during map construction.To do this, the authors tracked a sighted person's movements between markers to use them on the graph edges.Figure 4 shows a constructed graph representation for the fourth-floor plan.Finally, nodes and edges are stored in a database to be used by PVI during navigation.

Navigation
Tracking the exact location of the PVI indoors is impeded by many obstacles.While a GPS is used widely in outdoor tracking, it is inaccurate when used for tracking objects indoors.Using the accelerometer and the compass (or magnetometer) to track users proved to work better than GPS tracking, which requires a wireless connection to function.At first, the PVI search for a marker around them to use it as a starting point.Once a marker is detected, it tells them that the initial location is identified.After that, it asks PVI to specify the desired target location.Then, it estimates the shortest path from the start point to the target location and instructs the PVI to start walking in the appropriate direction.Several techniques may be implemented, but the authors have chosen the Dijkstra algorithm.When walking, it guides them to the destination using navigation commands.When the PVI reach the first location of the path and eventually scan the marker placed on the wall, our system guides them to the next maker on the graph.This process is repeated until the PVI arrive at their destination.The main feature of the navigation module is to give continuous navigation commands to reach the destination.

Identifying PVI Location
A typical CV tag-based system to identify PVI location consists of tags, a database, a camera, a processing unit, and the use of audio and haptic feedback [41].Identifying markers is the most crucial part of our system.They help find the destination successfully, as long as they are identified accurately.So, the authors put together the following three research questions.RQ1: Which tag gives better accuracy based on the chosen criteria?RQ2: Are there any difficulties or problems with identifying tags?RQ3: Is this solution suitable for real-time usage?
To answer these questions, the authors have divided the implementation of this system into three steps: 1. Comparing QR code with Aruco markers to select which tag is the best.2. Solving the problem of detecting markers in challenging conditions using the convolutional neural network (CNN) model.3. Simplifying the CNN model to minimize the response time.

Comparing QR code with Aruco markers
To select which tag is the best, the authors have compared them based on certain criteria [5].The first criterion is the cost of applying the technology to any solution.It is shown that CV tag techniques can be used without any cost except for printing the QR codes or AR markers and putting them in

Navigation
Tracking the exact location of the PVI indoors is impeded by many obstacles.While a GPS is used widely in outdoor tracking, it is inaccurate when used for tracking objects indoors.Using the accelerometer and the compass (or magnetometer) to track users proved to work better than GPS tracking, which requires a wireless connection to function.At first, the PVI search for a marker around them to use it as a starting point.Once a marker is detected, it tells them that the initial location is identified.After that, it asks PVI to specify the desired target location.Then, it estimates the shortest path from the start point to the target location and instructs the PVI to start walking in the appropriate direction.Several techniques may be implemented, but the authors have chosen the Dijkstra algorithm.When walking, it guides them to the destination using navigation commands.When the PVI reach the first location of the path and eventually scan the marker placed on the wall, our system guides them to the next maker on the graph.This process is repeated until the PVI arrive at their destination.The main feature of the navigation module is to give continuous navigation commands to reach the destination.

Identifying PVI Location
A typical CV tag-based system to identify PVI location consists of tags, a database, a camera, a processing unit, and the use of audio and haptic feedback [41].Identifying markers is the most crucial part of our system.They help find the destination successfully, as long as they are identified accurately.So, the authors put together the following three research questions.RQ1: Which tag gives better accuracy based on the chosen criteria?RQ2: Are there any difficulties or problems with identifying tags?RQ3: Is this solution suitable for real-time usage?
To answer these questions, the authors have divided the implementation of this system into three steps: 1.
Comparing QR code with Aruco markers to select which tag is the best.

2.
Solving the problem of detecting markers in challenging conditions using the convolutional neural network (CNN) model.

3.
Simplifying the CNN model to minimize the response time.

Comparing QR code with Aruco markers
To select which tag is the best, the authors have compared them based on certain criteria [5].The first criterion is the cost of applying the technology to any solution.It is shown that CV tag techniques can be used without any cost except for printing the QR codes or AR markers and putting them in the correct place.When using a barcode, there is no need to print them, as they are already placed on each product.Tag-based techniques can be used at a low cost, as shops only need the RFID or NFC tags to be installed in the correct places.If CV techniques are used, high-quality equipment, such as cameras, are required for satisfactory results.The second criterion is the equipment needed to detect and identify products or places.For CV tag-based solutions, PVI only need their smartphone cameras to detect and identify items.For non-CV tag-based techniques, some solutions only need a smartphone's camera, while others need high-quality cameras to capture high resolution images and machines with powerful processors for computation.In tag-based techniques, PVI need RFID readers or smartphones supporting NFC technology.The third criterion is the number of items to be scanned simultaneously.Only RFID readers, AR markers, and CV techniques can scan multiple items at the same time, which is useful in some situations, such as if PVI want to identify and count the items in their shopping cart.The fourth criterion is whether the PVI must be in the line of sight with the identified products or not.For tag-based solutions, PVI do not have to be in the line of sight of items, and the PVI can identify them in any direction.In tag-based solutions, there is no need for RFID tags and NFC tags to be in the line of sight, while CV solutions depend on some other parameters, such as the tag size in QR codes or barcodes, and the marker or camera parameters for CV techniques.The fifth criterion is the storage capacity of each solution.Only some tags, such as RFID, NFC, and QR codes, have a storage capacity, while others, such as AR markers and barcodes, do not have any storage capacity.The last criterion is the detection range, in tag-based solutions, tags must be within 3 m for RFID tags and within 10 cm for NFC tags.In CV tag-based solutions, tags can be detected from a distance which depends on tag size and image resolution.Based on this evaluation, the authors found that QR codes and markers were the most suitable markers to select.
So, the authors have compared QR codes with markers to select which give more accuracy and put together the following hypotheses: H1: Aruco markers are better than QR codes.H2: Aruco markers can be detected from long distances.
The authors have developed two applications to identify the location of PVI with the architecture shown in Figure 5.An important task after the detection and identification of markers is to obtain the camera's position from them.For the position estimation, the authors need to know the parameters of the camera which came from the calibration.These parameters are the camera matrix and the distortion coefficients.The camera has some parameters such as extrinsic, intrinsic, and distortion coefficients.To estimate these parameters, it is important to have 3D world points of a real scene and the corresponding 2D image points which are called camera calibration.Camera calibration estimates the parameters of the lens and sensor of an image which makes it important for AR applications.This can be done using multiple images of a pattern such as a checkerboard, square grid, circle hexagonal grid or circle regular grid.So, before using the two applications, the authors made camera calibration to obtain the camera position from markers.These applications work the following way: At first, the application opens the camera to get a live stream of images.Then, it converts the image to grayscale and sends it to the desired library to detect and identify the marker.After that, it calculates the distance to the marker and gives feedback to the PVI using voice commands.QR codes were used as markers in the first application, while Aruco markers were used instead of QR codes in the second one.An important task after the detection and identification of markers is to obtain the camera's position from them.For the position estimation, the authors need to know the parameters of the camera which came from the calibration.These parameters are the camera matrix and the distortion coefficients.The camera has some parameters such as extrinsic, intrinsic, and distortion coefficients.To estimate these parameters, it is important to have 3D world points of a real scene and the corresponding 2D image points which are called camera calibration.Camera calibration estimates the parameters of the lens and sensor of an image which makes it important for AR applications.This can be done using multiple images of a pattern such as a checkerboard, square grid, circle hexagonal grid or circle regular grid.So, before using the two applications, the authors made camera calibration to obtain the camera position from markers.These applications work the following way: At first, the application opens the camera to get a live stream of images.Then, it converts the image to grayscale and sends it to the desired library to detect and identify the marker.After that, it calculates the distance to the marker and gives feedback to the PVI using voice commands.QR codes were used as markers in the first application, while Aruco markers were used instead of QR codes in the second one.To detect markers in the mentioned challenging conditions, the authors converted the identification component in the architecture shown in Figure 5 to a classification one and CNN is used to identify the markers.The architecture consists of two main units, the first unit marked in white is used to extract markers from captured frames, while the second one is marked in grey and used for classification.Following the image conversion to greyscale and detecting markers, a CNN model is used to identify them.This model returns the correct ID of detected markers or if it fails to do so, it determines that no marker is available.In a typical marker-based application, image frames are affected by various noisy conditions such as blurring and distortion of markers, etc.Such noises affect the accuracy of marker identification.The authors have created a dataset that covers most of the challenges above.A dataset is created corresponding to each class of challenging conditions and different orientations.This dataset contains 40 classes of 10 markers and one to indicate no markers [37].As shown in Figure 7, the authors have created different effects by randomly applying various transformations to the original images.For each class, the authors generated 500 samples.So, a total of 14,000 images were created for the 28 classes.To identify that no markers were visible in the image captured by the camera, the authors assigned the null class with images taken from a dataset [35].75% of the input samples from each class were used for training and rest for validation.CNN automatically learns the most efficiently from raw data instead of using traditional machine learning techniques.A typical CNN has a convolutional layer, which is the basis.It also contains an activation function for transforming the summed weighted input from the node into the activation of the node or output for that input.Another so called pooling layer is used, which performs a downsampling operation along the spatial dimension.These layers together are used for To detect markers in the mentioned challenging conditions, the authors converted the identification component in the architecture shown in Figure 5 to a classification one and CNN is used to identify the markers.The architecture consists of two main units, the first unit marked in white is used to extract markers from captured frames, while the second one is marked in grey and used for classification.Following the image conversion to greyscale and detecting markers, a CNN model is used to identify them.This model returns the correct ID of detected markers or if it fails to do so, it determines that no marker is available.In a typical marker-based application, image frames are affected by various noisy conditions such as blurring and distortion of markers, etc.Such noises affect the accuracy of marker identification.The authors have created a dataset that covers most of the challenges above.A dataset is created corresponding to each class of challenging conditions and different orientations.This dataset contains 40 classes of 10 markers and one to indicate no markers [37].As shown in Figure 7, the authors have created different effects by randomly applying various transformations to the original images.For each class, the authors generated 500 samples.So, a total of 14,000 images were created for the 28 classes.To identify that no markers were visible in the image captured by the camera, the authors assigned the null class with images taken from a dataset [35].75% of the input samples from each class were used for training and rest for validation.

Detecting Markers in Challenging Conditions
After comparison, the authors found that Aruco markers are better and give more accuracy than QR codes.So, the authors decided to use Aruco markers in the proposed system.The authors also found that both cannot be detected in challenging conditions such as long distances, blurring effects, distortion of marker, and occlusion.To solve this problem, the authors converted the identification part of the problem to a classification one and used CNN to identify the markers as shown in Figure 6.The authors formed the following hypothesis: H3: CNN models can be used to detect Aruco markers in challenging conditions.To detect markers in the mentioned challenging conditions, the authors converted the identification component in the architecture shown in Figure 5 to a classification one and CNN is used to identify the markers.The architecture consists of two main units, the first unit marked in white is used to extract markers from captured frames, while the second one is marked in grey and used for classification.Following the image conversion to greyscale and detecting markers, a CNN model is used to identify them.This model returns the correct ID of detected markers or if it fails to do so, it determines that no marker is available.In a typical marker-based application, image frames are affected by various noisy conditions such as blurring and distortion of markers, etc.Such noises affect the accuracy of marker identification.The authors have created a dataset that covers most of the challenges above.A dataset is created corresponding to each class of challenging conditions and different orientations.This dataset contains 40 classes of 10 markers and one to indicate no markers [37].As shown in Figure 7, the authors have created different effects by randomly applying various transformations to the original images.For each class, the authors generated 500 samples.So, a total of 14,000 images were created for the 28 classes.To identify that no markers were visible in the image captured by the camera, the authors assigned the null class with images taken from a dataset [35].75% of the input samples from each class were used for training and rest for validation.CNN automatically learns the most efficiently from raw data instead of using traditional machine learning techniques.A typical CNN has a convolutional layer, which is the basis.It also contains an activation function for transforming the summed weighted input from the node into the activation of the node or output for that input.Another so called pooling layer is used, which performs a downsampling operation along the spatial dimension.These layers together are used for CNN automatically learns the most efficiently from raw data instead of using traditional machine learning techniques.A typical CNN has a convolutional layer, which is the basis.It also contains an activation function for transforming the summed weighted input from the node into the activation of the node or output for that input.Another so called pooling layer is used, which performs a downsampling operation along the spatial dimension.These layers together are used for effective feature extraction and are called feature extraction unit (FEU).For classification, CNN also contains the FC layer.As shown in Figure 8, the authors used three FEUs in the proposed model.In the first FEU, the authors used a convolutional layer with a dimension of 20 × 5 × 5 and a sigmoid activation function.Then, max-pooling with a pooling filter size of 2 and stride of 2 is used to down sample the outputs by a scale factor of 2. The second and the third FEU are of the same structure as the first one with a slight variation.The authors used a convolutional layer with a dimension of 50 × 5 × 5 in the second FEU, while a convolutional layer with a dimension of 200 × 5 × 5 was used in the third one.For classification, the output of the last feature extraction unit is converted into a vector through flattening and is provided as an input to the fully connected layer.The output of the fully connected layer is converted into probabilities corresponding to 29 classes of the classifier.The authors used a fully connected layer after the flatten operation contained 500 components which were mapped to 29 number of classes.During the training stage, the Adam optimizer with a learning rate of has been used over the cost function to determine the optimum value of the weight parameters.The results showed that Aruco markers can be detected in challenging conditions using our CNN model.However, this model still failed to detect markers under occlusive conditions.Moreover, the time required for this model to detect markers should be minimized.

Simplified CNN Model to Minimize the Response Time
The results showed that Aruco markers can be detected in adverse conditions, albeit the execution time is not suitable for real-time usage.To minimize the time required to detect markers, the authors formulated the following hypothesis: H4: CNN models can be simplified by removing some convolutional layers to improve the response time.
To minimize the response time, the authors have simplified the convolutions layers of the CNN model and used the same parameters for training and validation.As shown in Figure 9, the authors used two FEUs in the simplified model.In the first FEU, the authors used a convolutional layer with a dimension of 20 × 5 × 5 and a sigmoid activation function; then Max pooling was used with a pooling filter size of 2 and stride of 2 to downsample the layer's outputs by a scale factor of 2. In the second FEU, a convolutional layer with a dimension of 200 × 5 × 5 was used.Dropout layers are placed to For classification, the output of the last feature extraction unit is converted into a vector through flattening and is provided as an input to the fully connected layer.The output of the fully connected layer is converted into probabilities corresponding to 29 classes of the classifier.The authors used a fully connected layer after the flatten operation contained 500 components which were mapped to 29 number of classes.During the training stage, the Adam optimizer with a learning rate of has been used over the cost function to determine the optimum value of the weight parameters.The results showed that Aruco markers can be detected in challenging conditions using our CNN model.However, this model still failed to detect markers under occlusive conditions.Moreover, the time required for this model to detect markers should be minimized.

Simplified CNN Model to Minimize the Response Time
The results showed that Aruco markers can be detected in adverse conditions, albeit the execution time is not suitable for real-time usage.To minimize the time required to detect markers, the authors formulated the following hypothesis: H4: CNN models can be simplified by removing some convolutional layers to improve the response time.
To minimize the response time, the authors have simplified the convolutions layers of the CNN model and used the same parameters for training and validation.As shown in Figure 9, the authors used two FEUs in the simplified model.In the first FEU, the authors used a convolutional layer with a dimension of 20 × 5 × 5 and a sigmoid activation function; then Max pooling was used with a pooling filter size of 2 and stride of 2 to downsample the layer's outputs by a scale factor of 2. In the second After testing these two applications in different situations, the authors found that Aruco markers can be detected from distances up to 4 m while QR codes were limited to 2 m only.To detect Aruco markers from long and short distances, there is no need for the camera to be in their line of sight.For QR codes, they cannot be detected from farther than 2 m and whether the camera was in their line of sight was irrelevant.From these results, the authors have deduced that Aruco markers are better than QR codes, which lead to the verification H1.So, the authors formulate Thesis T1: Aruco markers work better than QR codes.As a result, the authors selected Aruco markers as tags for our prototype.

Navigation Using Aruco Markers
After selecting Aruco markers our system, the authors built maps for the first and the fourth floor.Depending on which floor the PVI wish to navigate, the right map is loaded from the database once the application has been launched.At that point, the prototype is controlled by the user via voice commands.Figure 11 showed screenshots of the system.As shown in part (a) of the Figure 11, it asks the PVI to select the starting point by pressing on the screen.As shown in part (b), it launches the mobile camera and guides them to search for any markers to be used as a starting point.As shown in part (c), the third step is to select the destination using voice commands.For example, PVI say "lab 417" to be their destination.The system calculates the shortest path from the starting point to the destination using the Dijkstra algorithm.Then, it launches the smartphone camera and starts guiding the PVI to the next point using voice commands as shown in part (d).The authors used the commands shown in Table 1.After testing these two applications in different situations, the authors found that Aruco markers can be detected from distances up to 4 m while QR codes were limited to 2 m only.To detect Aruco markers from long and short distances, there is no need for the camera to be in their line of sight.For QR codes, they cannot be detected from farther than 2 m and whether the camera was in their line of sight was irrelevant.From these results, the authors have deduced that Aruco markers are better than QR codes, which lead to the verification H1.So, the authors formulate Thesis T1: Aruco markers work better than QR codes.As a result, the authors selected Aruco markers as tags for our prototype.

Navigation Using Aruco Markers
After selecting Aruco markers for our system, the authors built maps for the first and the fourth floor.Depending on which floor the PVI wish to navigate, the right map is loaded from the database once the application has been launched.At that point, the prototype is controlled by the user via voice commands.Figure 11 showed screenshots of the system.As shown in part (a) of the Figure 11, it asks the PVI to select the starting point by pressing on the screen.As shown in part (b), it launches the mobile camera and guides them to search for any markers to be used as a starting point.As shown in part (c), the third step is to select the destination using voice commands.For example, PVI say "lab 417" to be their destination.The system calculates the shortest path from the starting point to the destination using the Dijkstra algorithm.Then, it launches the smartphone camera and starts guiding the PVI to the next point using voice commands as shown in part (d).The authors used the commands shown in Table 1.Table 1.Commands given by the PVI and navigation commands by the prototype.

Command Type Name Description
PVI's voice commands "Go to" + destination The PVI order the prototype to lead them toward the predefined destinations.

"Start"
The PVI order the prototype to go to the start activity to select the start point.

"Exit"
The PVI order the prototype to exit.Table 1.Commands given by the PVI and navigation commands by the prototype.

PVI's voice commands "Go to" + destination
The PVI order the prototype to lead them toward the predefined destinations.

"Start"
The PVI order the prototype to go to the start activity to select the start point."Exit" The PVI order the prototype to exit.

Navigation instructions
"Incorrect destination, you should press on the screen and select it again" The prototype informs the user that they should provide another destination.
"Go straight" + number of steps The prototype directs the user to go straight for number of steps.

Command Type Name Description
"Turn left", "Turn right" The prototype directs the user to turn left or right."You have detected your next point, so, you should go straight to reach it" The prototype informs the user that the next point is detected, and the user should approach it.
"You have passed this point successfully" prototype informs the user that they passed this point successfully and have started navigating to the next point."You have reached your destination so, go straight to it" Once the user reaches the desired destination, the prototype informs him.
For example, PVI select the elevator as a starting point which is stored in the database as a node 1.They also choose their destination as "lab 417" which is stored as node 5. Using our system, the shortest path between node 1 and node 5 is calculated as shown in Figure 12.To reach laboratory number 417 (node 5), PVI start walking from the elevator (node 1).They go from node 1 to the next node, which is node 7.At node 7, navigation commands guide them to turn left and move straight for ten steps to reach node 6.Finally, PVI walk for twenty steps to reach node 5, which is the destination.

Navigation instructions
"Incorrect destination, you should press on the screen and select it again" The prototype informs the user that they should provide another destination.

"Go straight" + number of steps
The prototype directs the user to go straight for number of steps.
"Turn left", "Turn right" The prototype directs the user to turn left or right.
"You have detected your next point, so, you should go straight to reach it" The prototype informs the user that the next point is detected, and the user should approach it.
"You have passed this point successfully" The prototype informs the user that they passed this point successfully and have started navigating to the next point.
"You have reached your destination so, go straight to it" Once the user reaches the desired destination, the prototype informs him.
For example, PVI select the elevator as a starting point which is stored in the database as a node 1.They also choose their destination as "lab 417" which is stored as node 5. Using our system, the shortest path between node 1 and node 5 is calculated as shown in Figure 12.To reach laboratory number 417 (node 5), PVI start walking from the elevator (node 1).They go from node 1 to the next node, which is node 7.At node 7, navigation commands guide them to turn left and move straight for ten steps to reach node 6.Finally, PVI walk for twenty steps to reach node 5, which is the destination.The authors divided testing of the prototype into two test cases.The first case was to test it with blindfolded people or PVI, collection of the feedback and updating it.The second case is to evaluate the prototype again after applying modifications using the PVI's comments.The prototype was tested on the corridor of the first and the fourth floor (Video S1, Video S2).In the beginning, a short introduction to the case study was provided to the participants.The authors trained the users for 30 min to know how to use the prototype for navigating from one place to another.The goal was to test whether the prototype was easy to use or not.It also tested if the users could effectively interpret the feedbacks or not.The authors made sure that there were no obstacles on the way to the destination.During navigation, the user held the smartphone in his hands roughly at chest level with the screen facing towards him.The smartphone was held in portrait orientation while slightly tilted at an angle nearly perpendicular to the horizontal plane.As shown in Figure 13, using this angle is enough for covering the walking area in front of the PVI and identifying markers.For a hands-free option, the The authors divided testing of the prototype into two test cases.The first case was to test it with blindfolded people or PVI, collection of the feedback and updating it.The second case is to evaluate the prototype again after applying modifications using the PVI's comments.The prototype was tested on the corridor of the first and the fourth floor (Video S1, Video S2).In the beginning, a short introduction to the case study was provided to the participants.The authors trained the users for 30 min to know how to use the prototype for navigating from one place to another.The goal was to test whether the prototype was easy to use or not.It also tested if the users could effectively interpret the feedbacks or not.The authors made sure that there were no obstacles on the way to the destination.During navigation, the user held the smartphone in his hands roughly at chest level with the screen facing towards him.The smartphone was held in portrait orientation while slightly tilted at an angle nearly perpendicular to the horizontal plane.As shown in Figure 13, using this angle is enough for covering the walking area in front of the PVI and identifying markers.For a hands-free option, the smartphone may also be mounted on the user's chest.Audio feedback is provided to the user via headphones connected to the smartphone or by a smartphone's speaker.
smartphone may also be mounted on the user's chest.Audio feedback is provided to the user via headphones connected to the smartphone or by a smartphone's speaker.

First Test Case
After learning how to use the prototype, they have tested it several times by selecting a start point and destination.The prototype assists them in moving from the starting point to the destination using navigation feedback.During the process, the authors have discovered some problems: 1.Sometimes PVI failed to understand the feedbacks and as a result, the feedback needed to be improved.2. PVI can hardly detect markers because they were placed higher than the view of the camera.So, installing markers at a lower position is necessary.3. PVI move their hands rapidly during navigation which causes images to be captured with occlusion.4. PVI cannot detect markers because they are moving their hands a lot and tags move out of the smartphone's camera view.5. PVI take shorter steps compared to blindfolded participants, so the authors should calculate the number of steps based on PVI rather than the blindfolded individuals.6. PVI occasionally create situations that cannot be managed by our prototype.For example, if their next point is node 7 and they go in the wrong direction leading to another point.The prototype should check whether this point is node 6 or not.If it is node 6, the prototype should continue navigating, because node 6 in the graph is the next point to destination after the node.However, if it was another point, the prototype should ask the PVI to go back and search for node 7 again.

Second Test Case
The authors tried to solve the problem occurring during the first test case.For the first problem, they improved the feedback based on the comments of the PVI.As shown in Figure 14, markers should be installed in a different style to solve the second problem.Instead of adding one marker at each interest point, eight markers are installed with the same ID.This implementation makes detection easier and solved the third and fourth problems.It also helps PVI of different heights to

First Test Case
After learning how to use the prototype, they have tested it several times by selecting a start point and destination.The prototype assists them in moving from the starting point to the destination using navigation feedback.During the process, the authors have discovered some problems: 1.Sometimes PVI failed to understand the feedbacks and as a result, the feedback needed to be improved.2. PVI can hardly detect markers because they were placed higher than the view of the camera.So, installing markers at a lower position is necessary.3. PVI move their hands rapidly during navigation which causes images to be captured with occlusion.4. PVI cannot detect markers because they are moving their hands a lot and tags move out of the smartphone's camera view.5. PVI take shorter steps compared to blindfolded participants, so the authors should calculate the number of steps based on PVI rather than the blindfolded individuals.6. PVI occasionally create situations that cannot be managed by our prototype.For example, if their next point is node 7 and they go in the wrong direction leading to another point.The prototype should check whether this point is node 6 or not.If it is node 6, the prototype should continue navigating, because node 6 in the graph is the next point to destination after the node.However, if it was another point, the prototype should ask the PVI to go back and search for node 7 again.

Second Test Case
The authors tried to solve the problem occurring during the first test case.For the first problem, they improved the feedback based on the comments of the PVI.As shown in Figure 14, markers should be installed in a different style to solve the second problem.Instead of adding one marker at each interest point, eight markers are installed with the same ID.This implementation makes detection easier and solved the third and fourth problems.It also helps PVI of different heights to detect markers easily.The authors count the steps in the same way as the PVI walk to solve the fifth problem.The authors tried to manage all situations and conditions raised during the testing phase of the prototype.Users have tested the prototype several times by selecting a starting point and destination.PVI found it easier to detect markers faster than before.Using this arrangement of markers can be detected easily while moving their hands rapidly.Additionally, they found the audio feedback to be satisfactory.
Appl.Sci.2019, 9, x FOR PEER REVIEW 16 of 24 detect markers easily.The authors count the steps in the same way as the PVI walk to solve the fifth problem.The authors tried to manage all situations and conditions raised during the testing phase of the prototype.Users have tested the prototype several times by selecting a starting point and destination.PVI found it easier to detect markers faster than Using this arrangement of markers can be detected easily while moving their hands rapidly.Additionally, they found the audio feedback to be satisfactory.

CNN Performance Evaluation in Challenging Conditions
As discussed, the authors trained the CNN model using the created dataset.The execution time for detecting markers in the simplified model is better than the complex one.Thus, this model is suitable for real-time identification of markers.The authors have simplified the convolutional layers of the CNN model and used the same parameters for training and validation.The execution time for detecting markers in the simplified model is better than the complex one.Thus, this model is suitable for real-time identification of markers.The authors have simplified the convolutional layers of the CNN model and used the same parameters for training and validation.The simplified model can detect Aruco markers very well in challenging conditions as it gives approximately 95.5% accuracy for training and 99.82% accuracy for testing.Furthermore, the training and testing accuracy of the simplified model is better than the accuracy of the other dataset.The training and testing curves of our models are also close to each other.The execution time for detecting markers in the simplified model is better than the complex one.So, this simplified model minimized the response time and led to the verification H4.The authors formulate T4.T4: CNN models can be simplified by removing some convolutional layers to improve the response time.However, this model still failed to detect markers under occlusive conditions.
As a summary, the authors found the following results based on the above experiments.Aruco markers are selected to be used based on a comparison with QR codes in different situations.A navigation system is proposed to help PVI in navigation indoors using Aruco markers.After testing the prototype by PVI and blindfolded individuals, feedback and comments were discussed.Based on the discussion, the prototype was updated and tested again by PVI.The authors solved the problem of detecting markers in challenging conditions by using CNN and compared the results to another model.The results showed that our CNN gives better accuracy.The CNN model was simplified to be suitable for real-time usage.

Conclusions
The goal was to design a navigation system for PVI using makers.First, a sighted user is required to walk through the building to indicate different points of interest.Then, markers are printed and placed on walls at the specified locations.A map is built with a graph to be used for finding the shortest path to the destination.PVI use a smartphone to find the current location based on markers around them.Finally, they are guided to the destination using voice feedback.Our evaluation showed that Aruco markers are better for localization than QR codes as they can be detected from distances twice as long (compared to QR codes).Moreover, they can be detected in adverse conditions using CNN and the time for identification can be minimized by simplifying the CNN.
For future work, the prototype should be improved to add automatic map construction.The authors identified markers in challenging conditions using CNN architecture.This work has only dealt with the identification steps, while detection has been done using a method based on image thresholding and rectangle extraction.The authors can use CNN's models such as you only look once (YOLO) or single shot detector (SSD) to fully perform the detection and identification steps as they can be trained to automatically to it.The authors did not analyze the behavior of their proposal against occlusions, which are indeed a common problem when markers are used.Consequently, a set of experiments to evaluate our proposal against the occlusion problem and its solutions, in case of bad performance, could be of great interest.

Figure 2 .
Figure 2. Components of the proposed system.

Figure 3 .
Figure 3.The plan of the fourth floor's corridor of the Faculty of Information Technology at the University of Pannonia.

Figure 2 .
Figure 2. Components of the proposed system.

24 Figure 2 .
Figure 2. Components of the proposed system.

Figure 3 .
Figure 3.The plan of the fourth floor's corridor of the Faculty of Information Technology at the University of Pannonia.

Figure 3 .
Figure 3.The plan of the fourth floor's corridor of the Faculty of Information Technology at the University of Pannonia.

Figure 4 .
Figure 4.A graph of the fourth floor's corridor at the same faculty.

Figure 4 .
Figure 4.A graph of the fourth floor's corridor at the same faculty.
Appl.Sci.2019, 9, x FOR PEER REVIEW 8 of 24 the correct place.When using a barcode, there is no need to print them, as they are already placed on each product.Tag-based techniques can be used at a low cost, as shops only need the RFID or NFC tags to be installed in the correct places.If CV techniques are used, high-quality equipment, such as cameras, are required for satisfactory results.The second criterion is the equipment needed to detect and identify products or places.For CV tag-based solutions, PVI only need their smartphone cameras to detect and identify items.For non-CV tag-based techniques, some solutions only need a smartphone's camera, while others need high-quality cameras to capture high resolution images and machines with powerful processors for computation.In tag-based techniques, PVI need RFID readers or smartphones supporting NFC technology.The third criterion is the number of items to be scanned simultaneously.Only RFID readers, AR markers, and CV techniques can scan multiple items at the same time, which is useful in some situations, such as if PVI want to identify and count the items in their shopping cart.The fourth criterion is whether the PVI must be in the line of sight with the identified products or not.For tag-based solutions, PVI do not have to be in the line of sight of items, and the PVI can identify them in any direction.In tag-based solutions, there is no need for RFID tags and NFC tags to be in the line of sight, while CV solutions depend on some other parameters, such as the tag size in QR codes or barcodes, and the marker or camera parameters for CV techniques.The fifth criterion is the storage capacity of each solution.Only some tags, such as RFID, NFC, and QR codes, have a storage capacity, while others, such as AR markers and barcodes, do not have any storage capacity.The last criterion is the detection range, in tag-based solutions, tags must be within 3 m for RFID tags and within 10 cm for NFC tags.In CV tag-based solutions, tags can be detected from a distance which depends on tag size and image resolution.Based on this evaluation, the authors found that QR codes and markers were the most suitable markers to select.So, the authors have compared QR codes with markers to select which give more accuracy and put together the following hypotheses: H1: Aruco markers are better than QR codes.H2: Aruco markers can be detected from long distances.The authors have developed two applications to identify the location of PVI with the architecture shown in Figure5.

Figure 5 .
Figure 5. Main components of the comparison application.

Figure 5 .
Figure 5. Main components of the comparison application.

Figure 6 .
Figure 6.Main components of the application using CNN.

Figure 7 .
Figure 7.A few samples of makers with illumination change and motion blur: (a) Original; (b) Illumination change; (c) Motion blur; (d) Horizontal motion blur; (e) Vertical motion blur.

Figure 6 .
Figure 6.Main components of the application using CNN.

Figure 6 .
Figure 6.Main components of the application using CNN.

Figure 7 .
Figure 7.A few samples of makers with illumination change and motion blur: (a) Original; (b) Illumination change; (c) Motion blur; (d) Horizontal motion blur; (e) Vertical motion blur.

Figure 7 .
Figure 7.A few samples of makers with illumination change and motion blur: (a) Original; (b) Illumination change; (c) Motion blur; (d) Horizontal motion blur; (e) Vertical motion blur.
Appl.Sci.2019, 9, x FOR PEER REVIEW 10 of24 effective feature extraction and are called feature extraction unit (FEU).For classification, CNN also contains the FC layer.As shown in Figure8, the authors used three FEUs in the proposed model.In the first FEU, the authors used a convolutional layer with a dimension of 20 × 5 × 5 and a sigmoid activation function.Then, max-pooling with a pooling filter size of 2 and stride of 2 is used to down sample the outputs by a scale factor of 2. The second and the third FEU are of the same structure as the first one with a slight variation.The authors used a convolutional layer with a dimension of 50 × 5 × 5 in the second FEU, while a convolutional layer with a dimension of 200 × 5 × 5 was used in the third one.

Figure 8 .
Figure 8.The proposed convolutional neural network (CNN) architecture used in training of markers.

Figure 8 .
Figure 8.The proposed convolutional neural network (CNN) architecture used in training of markers.

Figure 11 .
Figure 11.Screenshots of our prototype: (a) User presses this button to select the starting point; (b) User moves the smartphone's camera left and right to detect the starting point; (c) User presses this button to select the destination point; (d) User selects the destination using voice commands to start navigation; (e) User moves the smartphone camera left and right to detect the next point until reaching destination.

Figure 11 .
Figure 11.Screenshots of our prototype: (a) User presses this button to select the starting point; (b) User moves the smartphone's camera left and right to detect the starting point; (c) User presses this button to select the destination point; (d) User selects the destination using voice commands to start navigation; (e) User moves the smartphone camera left and right to detect the next point until reaching destination.

Figure 13 .
Figure 13.Screenshots of our prototype: (a) A screenshot of a blindfolded person while testing our prototype on the floor; (b) a screenshot of a PVI while testing our prototype on the fourth floor.

Figure 13 .
Figure 13.Screenshots of our prototype: (a) A screenshot of a blindfolded person while testing our prototype on the first floor; (b) a screenshot of a PVI while testing our prototype on the fourth floor.

Figure 14 .
Figure 14.Screenshots of our prototype: (a) A screenshot of markers after updates; (b) a screenshot of PVI while testing our prototype on the fourth floor.
The training was conducted in batches size of 32.The Keras framework for deep learning was used for implementing the CNN model.The original size of the markers in synthetic data set were 512 × 512.For training the CNN model, the original images were resized to 64 × 64 and were given as an input to the first convolution layer.The entire dataset was divided into two parts: 75% for training and 25% for validation.Training and validation datasets were used to determine the layer parameters during the training phase of the model.Additionally, the authors tested the model with new data that had not been encountered before.The test performance of the model was carried out on the trained model.The output was produced with 29 class values.For all the experimental results, the training phase of the CNN model was carried out for 100 epochs.Experimental results were obtained for all classes.The authors also trained the model with the dataset proposed to find which marker produces better results.The training and validation plots for accuracy and loss are shown in Figure 15 and Figure 16.

Figure 14 .
Figure 14.Screenshots of our prototype: (a) A screenshot of markers after updates; (b) a screenshot of PVI while testing our prototype on the fourth floor.

4. 3 .
CNN Performance Evaluation in Challenging Conditions As discussed, the authors trained the CNN model using the created dataset.The training was conducted in batches with size of 32.The Keras framework for deep learning was used for implementing the CNN model.The original size of the markers in the synthetic data set were 512 × 512.For training the CNN model, the original images were resized to 64 × 64 and were given as an input to the first convolution layer.The entire dataset was divided into two parts: 75% for training and 25% for validation.Training and validation datasets were used to determine the layer parameters during the training phase of the model.Additionally, the authors tested the model with new data that had not been encountered before.The test performance of the model was carried out on the trained model.The output was produced with 29 class values.For all the experimental results, the training phase of the CNN model was carried out for 100 epochs.Experimental results were obtained for all classes.The authors also trained the model with the dataset proposed to find which marker produces better results.The training and validation plots for accuracy and loss are shown in Figures 15 and 16 .

Figure 15 .
Figure 15.Comparative accuracy graphs after applying our model on two datasets where our dataset is drawn in red and the other dataset is drawn in blue: (a) Results of the training accuracy of the two datasets; (b) results of the validation accuracy of the two datasets.

Figure 15 .Figure 18 .
Figure 15.Comparative accuracy graphs after applying our model on two datasets where our dataset is drawn in red and the other dataset is drawn in blue: (a) Results of the training accuracy of the two datasets; (b) results of the validation accuracy of the two datasets.

Figure 18 .
Figure 18.Comparative loss graphs after applying our first model to our dataset which is drawn in red and applying the simplified models to the two datasets where our dataset is drawn in green and the other dataset is drawn in blue: (a) Results of the training loss of the two datasets; (b) results of the validation loss of the two datasets.