Research on a Web System Data-Filling Method Based on Optical Character Recognition and Multi-Text Similarity

.


Introduction
The development of web systems refers to the use of various technologies and tools to create and build web-based applications or systems.These applications typically run in web browsers and communicate with servers through the HTTP protocol [1].Nowadays, with the emergence of various new technologies and tools, as well as the iterative updates of various frameworks and libraries, the development of web systems has become more flexible and efficient.
The function of data uploading is a task requirement that is involved in web system development.In web systems, there are many ways to upload data, including form uploading, the remote file transfer protocol (FTP), remote interface upload (API upload), and so on [2].Among them, uploading through forms is a common web upload method that uses form controls to collect corresponding data and send them to the server in the form of key value pairs to complete the upload operation [3].There are also various ways to fill in data when collecting corresponding data in the control of the form.The most basic filling method is for users to manually fill in each field in the form sequentially, which is relatively inefficient and accurate [4].In addition, another common filling method is to complete data filling through data communication between different systems.By accessing and reading data from the database, the corresponding fields are automatically filled in.This filling method significantly improves efficiency, but manual filling cannot be avoided when creating database tables in the previous system [5].In recent years, deep learning technology has been widely applied in the field of OCR.Among them, convolutional neural networks (CNNs) have achieved great results in handwritten document retrieval.Through the word recognition method of Monte Carlo dropout CNNs, the recognition accuracy in the scenarios of querying by example and querying by string has reached a level superior to existing methods [6].In addition, an end-to-end trainable hybrid CNN-RNN architecture has been proposed to solve the problem of building powerful text recognition systems for Urdu and other cursive languages [7].At the same time, combining CRNN, LSTM, and CTC to construct a set of methods has shown good results in searching and deciphering handwritten text, and can be used on relatively simple machines [8].These achievements provide new ideas and methods for the development of OCR technology, making it more widely applied.In addition, handwritten form recognition technology has also been relatively complete, and there are many achievements in automatic data filling.Common application scenarios include handwritten case forms in the health and medical field [9], medical insurance reimbursement application forms [10], etc. Users can use handwritten form recognition technology to automatically fill the patient's handwritten content into the corresponding form fields, accelerating the medical service process; there are also exam answer sheets and student evaluation forms in the field of education and training [11].Users can also use handwriting recognition technology to convert student handwritten content into machine-readable text and automatically fill in the corresponding form fields to improve the efficiency and accuracy of data entry.However, in the examples in the above-mentioned fields, the framework for data collection forms is often fixed, and the fields are also known in advance, which cannot be well filled in for multi-source data.In response to the reality that data content in certain industries often exists in the form of images and the data form frameworks are not consistent, this paper proposes a new data filling method.By combining advanced OCR technology and multiple text similarity algorithms, it can achieve automatic parsing and filling of complex form images from different frameworks in web systems, and the final data filling accuracy can reach over 90%.

OCR Recognition Technology
OCR refers to the process in which electronic devices examine printed characters on paper, determine their shape by detecting dark and bright patterns, and then translate the shape into computer text using character recognition methods [12].The recognition process of OCR mainly includes several steps, as shown in Figure 1 [13].
Appl.Sci.2024, 14, x FOR PEER REVIEW 2 of 15 reached a level superior to existing methods [6].In addition, an end-to-end trainable hybrid CNN-RNN architecture has been proposed to solve the problem of building powerful text recognition systems for Urdu and other cursive languages [7].At the same time, combining CRNN, LSTM, and CTC to construct a set of methods has shown good results in searching and deciphering handwritten text, and can be used on relatively simple machines [8].These achievements provide new ideas and methods for the development of OCR technology, making it more widely applied.In addition, handwritten form recognition technology has also been relatively complete, and there are many achievements in automatic data filling.Common application scenarios include handwritten case forms in the health and medical field [9], medical insurance reimbursement application forms [10], etc. Users can use handwritten form recognition technology to automatically fill the patient's handwritten content into the corresponding form fields, accelerating the medical service process; there are also exam answer sheets and student evaluation forms in the field of education and training [11].Users can also use handwriting recognition technology to convert student handwritten content into machine-readable text and automatically fill in the corresponding form fields to improve the efficiency and accuracy of data entry.However, in the examples in the above-mentioned fields, the framework for data collection forms is often fixed, and the fields are also known in advance, which cannot be well filled in for multi-source data.In response to the reality that data content in certain industries often exists in the form of images and the data form frameworks are not consistent, this paper proposes a new data filling method.By combining advanced OCR technology and multiple text similarity algorithms, it can achieve automatic parsing and filling of complex form images from different frameworks in web systems, and the final data filling accuracy can reach over 90%.

OCR Recognition Technology
OCR refers to the process in which electronic devices examine printed characters on paper, determine their shape by detecting dark and bright patterns, and then translate the shape into computer text using character recognition methods [12].The recognition process of OCR mainly includes several steps, as shown in Figure 1 [13].In some special cases, the obtained images may have problems such as angular tilt, unclear images, noise, or information loss [14], so before performing character recognition, it is necessary to pre-process the image to improve the accuracy of subsequent recognition.Common pre-processing operations include geometric transformation, image grayscale, binarization, denoising, etc. [15].Image grayscale refers to transforming the original image from three channels to a single channel, converting the original color information into a single brightness information, in order to reduce the influence of irrelevant information in pixels.The weighted average method is the most commonly used grayscale method [16].The following is the formula for the weighted average method: In some special cases, the obtained images may have problems such as angular tilt, unclear images, noise, or information loss [14], so before performing character recognition, it is necessary to pre-process the image to improve the accuracy of subsequent recognition.Common pre-processing operations include geometric transformation, image grayscale, binarization, denoising, etc. [15].Image grayscale refers to transforming the original image from three channels to a single channel, converting the original color information into a single brightness information, in order to reduce the influence of irrelevant information in pixels.The weighted average method is the most commonly used grayscale method [16].The following is the formula for the weighted average method: R, G, and B represent the values of the three channels, and the weights are determined based on the sensitivity of the human eye to different colors.The calculated grayscale image will be more in line with the visual perception of the human eye [17].After converting the image into a grayscale image, some threshold segmentation methods can be used for binarization processing, converting the grayscale image into a binary image with only black and white values.The purpose of doing so is also to further highlight the contours and edges of characters, facilitating subsequent character recognition [18].
After image pre-processing, feature extraction and character classification are performed.Features are the key information used to recognize text, and each different text can be distinguished from other texts through features [19].Character classification is the process of passing the extracted features to the classifier, allowing the trained classifier to recognize the given features as corresponding text [20].In recent years, most scholars in the field of computer vision have used the CRNN algorithm to solve this problem [21].The network structure of the CRNN algorithm is shown in Figure 2 [22].
Appl.Sci.2024, 14, x FOR PEER REVIEW 3 of 15 R, G, and B represent the values of the three channels, and the weights are determined based on the sensitivity of the human eye to different colors.The calculated grayscale image will be more in line with the visual perception of the human eye [17].After converting the image into a grayscale image, some threshold segmentation methods can be used for binarization processing, converting the grayscale image into a binary image with only black and white values.The purpose of doing so is also to further highlight the contours and edges of characters, facilitating subsequent character recognition [18].
After image pre-processing, feature extraction and character classification are performed.Features are the key information used to recognize text, and each different text can be distinguished from other texts through features [19].Character classification is the process of passing the extracted features to the classifier, allowing the trained classifier to recognize the given features as corresponding text [20].In recent years, most scholars in the field of computer vision have used the CRNN algorithm to solve this problem [21].The network structure of the CRNN algorithm is shown in Figure 2 [22].The network structure consists of three parts: a convolutional layer, a recurrent layer, and a transcriptional layer [23].The function of the convolutional layer is to extract features from the input image, and the extracted feature sequence is input into the loop layer.The loop layer can predict the labels of the feature sequence, and finally, through the transcription layer, integrate the results of the predicted feature sequence labels.By predicting each time step, the sequence label with the highest probability of occurrence is obtained, which is converted into the final recognition result [24].For various parameters in the network structure, they generally need to be adjusted and optimized according to specific problems and datasets.For example, in Xinyu Fu's article "CRNN: A Joint Neural Network for Redundancy Detection" [25], he configured global training parameters to achieve better results, including setting the filters to 400, the hidden size to 400, the window size to 20, the pooling window to two, and the stride to one, as well as setting the learning rate to 0.01, training steps to 1000, and the optimizer to Adam.For the system in this article, OCR recognition is not the focus of the research, so the online recognition method of Baidu Zhiyun is adopted.Here, only a simple introduction to the network structure is provided, and specific parameter configurations are not required.
The final post-processing mainly involves further processing and optimizing the classification results to improve accuracy, eliminate errors, and provide more reliable and usable recognition results [26].The specific processing can include text correction, semantic parsing, error correction, and so on.

Field Matching Technology
In practical work, it is common to have pre-defined form templates.For example, in common scenarios such as opening a bank account, insurance claims [27], and school exams, the same format of forms is often used to collect information.The field positions and sizes of these forms are usually fixed, so it is possible to define the form template in The network structure consists of three parts: a convolutional layer, a recurrent layer, and a transcriptional layer [23].The function of the convolutional layer is to extract features from the input image, and the extracted feature sequence is input into the loop layer.The loop layer can predict the labels of the feature sequence, and finally, through the transcription layer, integrate the results of the predicted feature sequence labels.By predicting each time step, the sequence label with the highest probability of occurrence is obtained, which is converted into the final recognition result [24].For various parameters in the network structure, they generally need to be adjusted and optimized according to specific problems and datasets.For example, in Xinyu Fu's article "CRNN: A Joint Neural Network for Redundancy Detection" [25], he configured global training parameters to achieve better results, including setting the filters to 400, the hidden size to 400, the window size to 20, the pooling window to two, and the stride to one, as well as setting the learning rate to 0.01, training steps to 1000, and the optimizer to Adam.For the system in this article, OCR recognition is not the focus of the research, so the online recognition method of Baidu Zhiyun is adopted.Here, only a simple introduction to the network structure is provided, and specific parameter configurations are not required.
The final post-processing mainly involves further processing and optimizing the classification results to improve accuracy, eliminate errors, and provide more reliable and usable recognition results [26].The specific processing can include text correction, semantic parsing, error correction, and so on.

Field Matching Technology
In practical work, it is common to have pre-defined form templates.For example, in common scenarios such as opening a bank account, insurance claims [27], and school exams, the same format of forms is often used to collect information.The field positions and sizes of these forms are usually fixed, so it is possible to define the form template in advance and directly match and recognize the fields based on the template.The specific field matching steps are shown in the Figure 3:  After experiencing previous OCR image recognition, a series of actionable text data will be obtained, which contains useful information required for the task, as well as a large number of irrelevant characters such as spaces and line breaks.Therefore, it is necessary to pre-process the text data first, including removing spaces, line breaks, and formatting specific formatted data.In the pre-processed data, the position information of relevant fields will also be retained, which refers to the relative position and layout rules of the fields in the form.This can help us to better understand the structure and context of the form.For tasks with pre-defined form templates, this can greatly simplify the difficulty of field matching.We can directly narrow the matching range based on the pre-defined field position, and for pre-defined forms, their corresponding fields are also very similar, so field matching can be performed directly on a small scale based on the field names defined in the form template, and filtering can be carried out using simple regular expressions or string matching [28].However, this approach also has its limitations.For cases where the form framework is not fixed, the usefulness of field position information may not be as obvious, as the field positions may vary greatly across different forms.In addition, this method cannot accurately fill in some fields with different names but similar meanings.Therefore, in this article, we introduce a multiple similarity algorithm to compare from the perspectives of key and value in order to further improve the accuracy of field filling.

Levenshtein Editing Distance
The editing distance obtained using the Levenshtein algorithm is an indicator used to measure the degree of difference between two strings [29].The definition of this distance is the minimum number of operations required to convert the source string to the target string, including insertion, deletion, and replacement.Using dynamic programming, each string is sequentially compared, and the algorithm has a time complexity of O (mn) and a spatial complexity of O (mn), where m and n represent the lengths of the source string (S) and the target string (T), respectively [30].The calculation formula for the editing distance, D (S, T), is as follows: In the above equation, Dij = D(S0…Si, T0…Tj), 0 ≤ i ≤ m, and 0 ≤ j ≤ n, where S0…Si is the source string and T0…Tj is the target string.The values Wa ,Wb ,Wc in the equation represent three different operations (delete, insert, and replace), each corresponding to an operand.In general experimental studies, researchers set the number of deletion and replacement operations to 1 and the number of replacement operations to 2 [31], so Dij refers to the minimum number of edits from the source string (S) to the target string (T) in order to calculate the similarity between S and T. After experiencing previous OCR image recognition, a series of actionable text data will be obtained, which contains useful information required for the task, as well as a large number of irrelevant characters such as spaces and line breaks.Therefore, it is necessary to pre-process the text data first, including removing spaces, line breaks, and formatting specific formatted data.In the pre-processed data, the position information of relevant fields will also be retained, which refers to the relative position and layout rules of the fields in the form.This can help us to better understand the structure and context of the form.For tasks with pre-defined form templates, this can greatly simplify the difficulty of field matching.We can directly narrow the matching range based on the pre-defined field position, and for pre-defined forms, their corresponding fields are also very similar, so field matching can be performed directly on a small scale based on the field names defined in the form template, and filtering can be carried out using simple regular expressions or string matching [28].However, this approach also has its limitations.For cases where the form framework is not fixed, the usefulness of field position information may not be as obvious, as the field positions may vary greatly across different forms.In addition, this method cannot accurately fill in some fields with different names but similar meanings.Therefore, in this article, we introduce a multiple similarity algorithm to compare from the perspectives of key and value in order to further improve the accuracy of field filling.

Levenshtein Editing Distance
The editing distance obtained using the Levenshtein algorithm is an indicator used to measure the degree of difference between two strings [29].The definition of this distance is the minimum number of operations required to convert the source string to the target string, including insertion, deletion, and replacement.Using dynamic programming, each string is sequentially compared, and the algorithm has a time complexity of O (mn) and a spatial complexity of O (mn), where m and n represent the lengths of the source string (S) and the target string (T), respectively [30].The calculation formula for the editing distance, D (S, T), is as follows: In the above equation, D ij = D(S 0 . . .S i , T 0 . . .T j ), 0 ≤ i ≤ m, and 0 ≤ j ≤ n, where S 0 . . .S i is the source string and T 0 . . .T j is the target string.The values W a ,W b ,W c in the equation represent three different operations (delete, insert, and replace), each corresponding to an operand.In general experimental studies, researchers set the number of deletion and replacement operations to 1 and the number of replacement operations to 2 [31], so D ij refers to the minimum number of edits from the source string (S) to the target string (T) in order to calculate the similarity between S and T.

Similarity Calculation Method
After obtaining the edited distance result, the similarity calculation between the two strings can be performed.The traditional similarity calculation formula is as follows [32]: ld represents the editing distance between two strings, and m and n are the lengths of the two strings.However, this formula cannot handle the problem of inverted strings, and this calculation formula does not consider the existence of common substrings, which does not have universal applicability.Scholars have proposed the following similarity calculation formula [33]: In the above equation, S is the source string, T is the target string, lcs is the length of the longest common substring, lm is the length of the S string, and p is the starting position of the common substring.After introducing the longest common substring, the problem of inverted strings can be solved to some extent when calculating similarity, and more accurate judgments can be made when editing distances that are equal.And the introduction, p/(lm+p), is also aimed at making further distinctions when ld and lcs are equal; that is, the higher the starting position of the common substring, the greater the impact on similarity.At present, the most commonly used method for calculating the length of LCS is to use dynamic programming [34], and the specific formula is as follows: In the above equation, S and T represent two strings, and i and j represent the characters at the corresponding positions of the strings (S).If either of the two strings has a length of 0, it indicates that there is no common substring.Therefore, dp(0,j) = dp(i,0) = 0; if the rest is calculated according to the formula, the final result is the length of LCS we need.

Filling Method Based on OCR and Text Similarity
On the basis of the existing web system development framework, by integrating OCR technology and text similarity algorithms, as well as improving the relevant field similarity comparison process according to the actual functional needs of the system, we can achieve our proposed goal of automatically filling images containing data content into corresponding web pages for different data form frameworks.As shown in the Figure 4 below, it is the overall structure of the system's functions.
In the front-end construction of the system, the most commonly used method is to use the combination of Vue and Element to build the page [35].Therefore, we choose the el-upload component in the Element component to implement the function of uploading image files.It can convert images into binary data and send it to the backend through the Axios plugin.The most commonly used construction method in the backend is to use the springboot framework [36].When an image is sent to the backend springboot server through a post request, the backend will parse the binary data in the request body and convert the binary data into usable byte arrays according to the HTTP protocol.At this point, the image is successfully uploaded to the backend server for subsequent recognition processing.In the front-end construction of the system, the most commonly used method is to use the combination of Vue and Element to build the page [35].Therefore, we choose the el-upload component in the Element component to implement the function of uploading image files.It can convert images into binary data and send it to the backend through the Axios plugin.The most commonly used construction method in the backend is to use the springboot framework [36].When an image is sent to the backend springboot server through a post request, the backend will parse the binary data in the request body and convert the binary data into usable byte arrays according to the HTTP protocol.At this point, the image is successfully uploaded to the backend server for subsequent recognition processing.
After the image is converted into a byte array in the backend, the next OCR recognition operation can be performed.In the OCR recognition process of this system, the relevant interfaces of Baidu Zhiyun are used [37].The following Figure 5 shows the recognition process.After the image is converted into a byte array in the backend, the next OCR recognition operation can be performed.In the OCR recognition process of this system, the relevant interfaces of Baidu Zhiyun are used [37].The following Figure 5 shows the recognition process.After obtaining the byte array of the image, the first step is to convert it to a Base64 encoded string, as each character encoded with Base64 is an ASCII character, which can be easily transmitted in various communication protocols.Afterwards, the encoding method in Java.net was used to encode the string into a URL and concatenate it into a POST request parameter, completing the configuration of the params parameter.In order to ensure the security and timeliness of use, we also need to obtain a valid token from the cloud platform before we can use the OCR function [38].This requires us to first pass the API to the cloud_Key and Secret_Key.After both the token and params have been obtained, we can request the cloud again and return the desired recognition result.The in- After obtaining the byte array of the image, the first step is to convert it to a Base64 encoded string, as each character encoded with Base64 is an ASCII character, which can be easily transmitted in various communication protocols.Afterwards, the encoding method in Java.net was used to encode the string into a URL and concatenate it into a POST request parameter, completing the configuration of the params parameter.In order to ensure the security and timeliness of use, we also need to obtain a valid token from the cloud platform before we can use the OCR function [38].This requires us to first pass the API to the cloud_Key and Secret_Key.After both the token and params have been obtained, we can request the cloud again and return the desired recognition result.The intelligent cloud also undergoes pre-processing, feature extraction, character classification, post-processing, and other operations in this process [39], and it also utilizes the network structure of CRNN in the intermediate feature extraction and character recognition stages.
After obtaining the information in the image, it is necessary to filter it because not all the information is what we need.For this system, as it is filling in web forms, we only need the data related to form filling in the image.These types of data are observed to exist in the form of "name: content", so we can use them for segmentation.The following Figure 6 is a flowchart of a segmentation method.We use a HashMap to store the "key" and "value" values for the filtered data.However, before conducting formal text similarity comparison, it is necessary to establish the data information of form fields.We also use a HashMap to store it, and store the key value as the field name."Value" stores the corresponding field's data table number, which corresponds to the standard answer of previous fields.At this point, the image filtering data OCR will be obtained before entering the similarity comparison_Map and field data f_Map.
For the determination of similarity in Chinese short texts, methods have always been improved, from SOW/BOW statistical frequency [40] to n-gram sliding windows [41], from topic models [42] to deep learning [43].The evolution of methods is to meet the similarity situation in different situations.However, for the field matching problem in this system, due to its own problems such as a short text word count, concise semantics, and small training data, it is not suitable to use deep learning for similarity judgment.Therefore, we have returned to the most naive way of judging similarity based on editing distance and common strings.In response to this task, we proposed the concept of importance, which divides fields into importance levels to ensure the accuracy of calculations, as shown in the following Figure 7.We use a HashMap to store the "key" and "value" values for the filtered data.However, before conducting formal text similarity comparison, it is necessary to establish the data information of form fields.We also use a HashMap to store it, and store the key value as the field name."Value" stores the corresponding field's data table number, which corresponds to the standard answer of previous fields.At this point, the image filtering data OCR will be obtained before entering the similarity comparison_Map and field data f_Map.
For the determination of similarity in Chinese short texts, methods have always been improved, from SOW/BOW statistical frequency [40] to n-gram sliding windows [41], from topic models [42] to deep learning [43].The evolution of methods is to meet the similarity situation in different situations.However, for the field matching problem in this system, due to its own problems such as a short text word count, concise semantics, and small training data, it is not suitable to use deep learning for similarity judgment.Therefore, we have returned to the most naive way of judging similarity based on editing distance and common strings.In response to this task, we proposed the concept of importance, which divides fields into importance levels to ensure the accuracy of calculations, as shown in the following Figure 7.
ilarity situation in different situations.However, for the field matching problem in this system, due to its own problems such as a short text word count, concise semantics, and small training data, it is not suitable to use deep learning for similarity judgment.Therefore, we have returned to the most naive way of judging similarity based on editing distance and common strings.In response to this task, we proposed the concept of importance, which divides fields into importance levels to ensure the accuracy of calculations, as shown in the following Figure 7.In the above figure, it can be seen that if the demonstration field "reporting time" is calculated according to the usual editing distance formula, the distance between the "reporting" and "time" fields is equal, which does not meet the expected results.When we assign an importance of 0.8 to "time" and 0.2 to "reporting", recalculating the distance will make a difference.The result obtained is that "time" is more important, which means that when removing or adding variables with higher importance, the editing distance will also be larger.For the storage of importance, we also use a nested structure of HashMap, where the key stores the corresponding field and the value stores the importance-related content of the field.
After obtaining the relevant data above, you can enter the similarity comparison stage to extract the required information content from the front-end form.The specific process is shown in the following Figure 8.
In the figure, we obtained the result by double comparing the "key" and "value" values.The map is the final desired result, and after some packaging and integration operations, the data can be sent to the front-end for filling.The evaluation criteria for the two sims in the figure are also determined through multiple experiments.At this point, all image filling methods based on OCR and text similarity have been introduced clearly.
Appl.Sci.2024, 14, x FOR PEER REVIEW 9 of 15 In the above figure, it can be seen that if the demonstration field "reporting time" is calculated according to the usual editing distance formula, the distance between the "reporting" and "time" fields is equal, which does not meet the expected results.When we assign an importance of 0.8 to "time" and 0.2 to "reporting", recalculating the distance will make a difference.The result obtained is that "time" is more important, which means that when removing or adding variables with higher importance, the editing distance will also be larger.For the storage of importance, we also use a nested structure of HashMap, where the key stores the corresponding field and the value stores the importance-related content of the field.
After obtaining the relevant data above, you can enter the similarity comparison stage to extract the required information content from the front-end form.The specific process is shown in the following Figure 8.In the figure, we obtained the result by double comparing the "key" and "value" values.The map is the final desired result, and after some packaging and integration

Experimental Results and Analysis
To verify the effectiveness of the method proposed in this article, we conducted tests on 80 self-made images that roughly meet the upload requirements.Forty of the images were used to train the two required sim values for comparison in the system.The remaining forty images were used to test and determine the final adjusted system filling accuracy.Each image contains multiple similar fields and irrelevant edge information, which is more in line with the actual complex situation.The approximate content of the image is shown in Figure 9 below: As shown in the above figure, the form contains multiple sets of similar information, such as "reporting location", "reporting department", "reporting name", "detection name", "detection method", etc.In addition, it also contains multiple sets of unrelated information, such as the form name, description, and warning signs.Due to the fact that the submitted form is designed by different departments in different regions, there may be some additions or deletions in the content of the submitted form.Some may have the same requirements, but their names may also be different.This requires our similarity algorithm to distinguish them.
The information content in self-made images includes useful information and irrelevant information.The useful information is further divided into similar field information and dissimilar field information.The following Figures 10 and 11 shows the distribution of the ratio of similar field information to useful information and the ratio of irrelevant information to overall information in the image.
Appl.Sci.2024, 14, x FOR PEER REVIEW 10 of 15 operations, the data can be sent to the front-end for filling.The evaluation criteria for the two sims in the figure are also determined through multiple experiments.At this point, all image filling methods based on OCR and text similarity have been introduced clearly.

Experimental Results and Analysis
To verify the effectiveness of the method proposed in this article, we conducted tests on 80 self-made images that roughly meet the upload requirements.Forty of the images were used to train the two required sim values for comparison in the system.The remaining forty images were used to test and determine the final adjusted system filling accuracy.Each image contains multiple similar fields and irrelevant edge information, which is more in line with the actual complex situation.The approximate content of the image is shown in Figure 9 below: As shown in the above figure, the form contains multiple sets of similar information, such as "reporting location", "reporting department", "reporting name", "detection name", "detection method", etc.In addition, it also contains multiple sets of unrelated information, such as the form name, description, and warning signs.Due to the fact that the submitted form is designed by different departments in different regions, there may be some additions or deletions in the content of the submitted form.Some may have the of the ratio of similar field information to useful information and the ratio of irrelevant information to overall information in the image.The evaluation criteria for sim mentioned above are the final results obtained through multiple experiments.For the evaluation indicators of results, we use the following accuracy formula [44]: For the accuracy of the final filling of an image, we divide the number of correctly filled fields (TP) by the number of fields it fills (TP + FP) [45].The accuracy corresponding to a set sim standard is the average of all, imagesPavg, as shown in the following formula: of the ratio of similar field information to useful information and the ratio of irrelevant information to overall information in the image.The evaluation criteria for sim mentioned above are the final results obtained through multiple experiments.For the evaluation indicators of results, we use the following accuracy formula [44]: For the accuracy of the final filling of an image, we divide the number of correctly filled fields (TP) by the number of fields it fills (TP + FP) [45].The accuracy corresponding to a set sim standard is the average of all, imagesPavg, as shown in the following formula: The evaluation criteria for sim mentioned above are the final results obtained through multiple experiments.For the evaluation indicators of results, we use the following accuracy formula [44]: For the accuracy of the final filling of an image, we divide the number of correctly filled fields (TP) by the number of fields it fills (TP + FP) [45].The accuracy corresponding to a set sim standard is the average of all, imagesP avg , as shown in the following formula: We use the method of controlling variables to sequentially determine the two sims mentioned above.The results are shown in the following figure.
As shown in Figure 12, we can see that setting the first similarity judgment condition (sim1) to around 0.5 will result in the highest accuracy.When we fix the value of sim1 and change the value of sim2, it can be determined from Figure 13 that when the value of sim2 is around 0.8, the accuracy will reach its highest.Finally, we fixed the values of sim1 and sim2 and tested 40 images, resulting in the following figure: As shown in the Figure 14, the filling accuracy of the vast majority of the 40 images tested reached over 90%.In actual work, staff can first upload relevant images for automatic field filling, and then manually check for content supplementation.This can greatly improve work efficiency and reduce the probability of manual errors.The low filling accuracy of individual images is due to the fact that most of the fields in the form of individual images are not filled in, which rarely happens in practical work.Even if it happens, it will be quickly screened out during manual inspection within the allowable error range.This result meets our expected content and also proves the usability of the method proposed in this paper.We use the method of controlling variables to sequentially determine the two sims mentioned above.The results are shown in the following figure.
As shown in Figure 12, we can see that setting the first similarity judgment condition (sim1) to around 0.5 will result in the highest accuracy.When we fix the value of sim1 and change the value of sim2, it can be determined from Figure 13 that when the value of sim2 is around 0.8, the accuracy will reach its highest.Finally, we fixed the values of sim1 and sim2 and tested 40 images, resulting in the following figure: We use the method of controlling variables to sequentially determine the two sims mentioned above.The results are shown in the following figure.
As shown in Figure 12, we can see that setting the first similarity judgment condition (sim1) to around 0.5 will result in the highest accuracy.When we fix the value of sim1 and change the value of sim2, it can be determined from Figure 13 that when the value of sim2 is around 0.8, the accuracy will reach its highest.Finally, we fixed the values of sim1 and sim2 and tested 40 images, resulting in the following figure:   curacy of individual images is due to the fact that most of the fields in the form of individual images are not filled in, which rarely happens in practical work.Even if it happens, it will be quickly screened out during manual inspection within the allowable error range.This result meets our expected content and also proves the usability of the method proposed in this paper.

Conclusions
This article focuses on the problem of data uploading and filling in web systems.Based on the existing advanced OCR technology and text similarity algorithms, combined and improved, the goal of filling fields in complex form images was effectively achieved.According to the test results, it was found that the accuracy of image recognition and filling in practical applications can reach over 90%.However, this method also has some limitations, as its proposal is based on practical engineering problems, so the field information considered is also related to this project.If fields need to be changed, they may need to be readjusted.In the future, further optimization can be based on this method, such as considering real-time updates of database tables corresponding to fields, in order to better adapt to different image forms.
advance and directly match and recognize the fields based on the template.The specific field matching steps are shown in the Figure3:

Figure 10 .
Figure 10.The proportion of irrelevant information in the image.

Figure 11 .
Figure 11.The proportion of similar fields in the image.

Figure 10 .
Figure 10.The proportion of irrelevant information in the image.

Figure 10 .
Figure 10.The proportion of irrelevant information in the image.

Figure 11 .
Figure 11.The proportion of similar fields in the image.

Figure 11 .
Figure 11.The proportion of similar fields in the image.