Compression of GNSS Data with the Aim of Speeding up Communication to Autonomous Vehicles

: Autonomous vehicles contain many sensors, enabling them to drive by themselves. Au-tonomous vehicles need to communicate with other vehicles (V2V) wirelessly and with infrastructures (V2I) like satellites with diverse connections as well, to implement safety, reliability, and efﬁciency. Information transfer from remote communication appliances is a critical task and should be accomplished quickly, in real time and with maximum reliability. A message that arrives late, arrives with errors, or does not arrive at all can create an unsafe situation. This study aims at employing data compression to efﬁciently transmit GNSS information to an autonomous vehicle or other infrastructure such as a satellite with maximum accuracy and efﬁciency. We developed a method for compressing NMEA data. Furthermore, our results were better than other ones in current studies, while supporting error tolerance and data omission.


Introduction
Autonomous vehicles [1,2] are able to control a full car system in collaboration with human interaction. Sometimes, the vehicle's control and computer system can also take full control when the driver cannot handle them, as in falling asleep behind the steering wheel, experiencing an emergency medical situation, or undergoing a vehicle emergency due to a flat tire or a mechanical problem. One of the challenges today is gaining public trust in the concept of autonomous driving [3].
Today, technology giants and automakers have been working toward full automation with the goal of selling cars that can drive safely and efficiently with an emphasis on reducing GNSS errors [4]. In our former paper, we explained the correlation between compression and errors [5].
Ideally, essential information passes in a fraction of a second and without delays and losses. Unfortunately, navigation software is often slow in delivering initial results, or, when there are no satellite signals, the navigation software works well because it receives wrong data. The compression and decompression methods presented in this investigation endeavor to improve this situation.
We review in this paper some relevant aspects of the suggested system-autonomous vehicles, Data Compression, GNSS devices and the NMEA Standard [6]. We deal with several compression methods, some of them known; however, we made an effort to produce a new (hybrid) method that is based on known methods and algorithms.
During our research, we noticed that adjacent frames of GNSS data are frequently very similar; accordingly, we looked for a compression method that efficiently works with differences to prepare the data for an entropy encoder like Huffman coding. We analyzed several compression methods, eliminating ones like JPEG2000 [7] because, not making use of the comparison of frames, they were unsuitable for our method as a preprocessing step.
The aim of this work is not to correct errors but to suggest a time-saving method to transmit information with no less reliability than raw information transmission. If The Huffman Codes are an algorithm that belongs to a group called prefix codes. This algorithm provides good data compression and stores the items in a minimum number of bits when their frequency is high-according to the probabilities at which each item appears. This method is based on assigning a variable-length code to each item according to its frequency, so that a frequent item will be represented by a small number of bits whereas an infrequent item will be represented by a longer code.
Huffman's coding has a legendary and important status in the field of computer science and engineering, for its simplicity and applicability make it an idyllic example in algorithm courses. Moreover, it is one of the most common techniques used for data compression [17].

Global Navigation Satellite System
The Global Navigation Satellite System (GNSS) makes use of satellites in space around the Earth. The idea began in the 1970s among the U.S. military as Americans sought a way to overcome the difficulties of previous navigation systems in use. GNSS began to be extensively used in the 1990s [18].
The principle of the GNSS is sending the location of the transmitter to the receiver in space (satellite) and next retrieving information from the receiver to the transmitter on Earth [19]. GNSS transmitters require access to open skies; otherwise, interference or loss of GNSS signals can occur. This type of communication experiences failures in areas where construction is very high or in areas of forests or mountains. GNSS tracking is applied in a variety of fields, including animals, car travel, hiking, and even sports [20][21][22].
The GNSS device is essential to autonomous vehicles in need of high availability and accuracy. A vehicle can plot a pre-known or pre-programmed route autonomously without any human control [23]. Therefore, it is very important that an autonomous vehicle is able to receive and send its location promptly to the mobile network and/or satellite communication. A major advantage of using GNSS is that the data do not depend on previously received information and therefore localization errors do not accumulate over time.
One of the GNSS's essential attributes is its accuracy, which depends on the environment in which the GNSS transmitter is located and the number of satellites it reads at any given time, alongside the location of the transmitter and the environment in which it is located: for example, in an urban environment, underground parking, a forest, or an open space.

NMEA Standard
The NMEA (National Marine Electronics Association) standard, also known as Standard 0183, was introduced in 1983 as a standard for data communication between ships. The NMEA protocol uses ASCII codes, and the data transfer is slow at 4800 bytes per second. However, it is still widely used and is perfectly suited to situations where one end, such as a GNSS device, needs to be connected to another end, such as a satellite [24,25].
The default transmission rate of the NMEA GNSS standard is 4.8 kb/s. It uses 8 bits for the data of ASCII characters and 1 stop bit. More than a few years later, the NMEA2000 protocol, much more advanced than the previous one, was invented. The new protocol allows multiple units to transmit and receive data simultaneously; its cables are less sensitive to noise (with wired connections) and its information transfer is superior to that of NMEA0183.Furthermore, it allows data transfer rates up to 250 kb/s (about 50 times faster) [6].
The use of NMEA in ships is very significant and important because in the sea we do not have signs, and one of the options to navigate at sea in the modern world is through GNSS, in contrast with other transportation such as vehicles that can navigate the roads thanks to road signs and directions.
Most systems that provide real-time placement ensure that the data are in NMEA form. These data include, among others, PVT: Position, Velocity and Time [26]. Most often we will see standard NMEA sentences in any GNSS devices commercial production [27]. It is also possible to define unique, proprietary sentences for a particular purpose instead of existing ones. For example, a Garmin sentence would start with PGRM [28]. We discuss standard sentences below, and all the sentences have something in common.
Each NMEA sentence is represented by ASCII codes, starting with a $ sign and a prefix of "GP", which defines GNSS receivers. Three letters mark this type of sentence [29].
For example, most popular types: Each NMEA sentence type can contain no more than 80 characters of plain text, with data items separated by commas. In each type of data, a checksum is included for the sentences. These begin with '*' and two HEX digits representing the XOR action of all characters between the '$' (not including) and up to '*'. A test amount is not required for all the sentences.
In autonomous vehicles and vehicles in general, NMEA sentences are used in the following formats: GPGGA, GPGSV, GPGSA, GPRMC [30].

GNSS Data Compression Review and Related Work
Today, the processors are very powerful and can process data and RAM rapidly. Nevertheless, a busy communication channel can be challenging. The busier the communication channel and the more information passes through it, the more information can be delayed. The transmission time is significantly greater than the processing time, which is negligible in the transmission time [31]. That is why previous investigations have assessed the amount of information transferred and not the processing times [32], A procedure we follow later in the results section.
Among several recent studies and articles about GNSS data compression, one that was recently carried out examined the compression of GNSS data in maritime usage by ships and vessels. To the best of our knowledge, there is no study of data compression originating from GNSS in combination with autonomous vehicles. This work reports on the ability to compress data at a very high efficiency' obtaining a compression ratio of about 4% of the raw information [33]. However, naval vessels generally avoid turns due to their dimensions and usually perform only prearranged turns in very wide areas. Commonly, ships do not navigate many turns or U-turns, unlike vehicles, but rather go in straight lines [34]. Therefore, the changes in the GNSS information are typically small and yield a much better compression ratio.
A recently published paper explored and proposed GNSS data compression in IOT components and trajectory reconstruction. Unlike in our data, the authors of this paper consider several different trajectory typologies. They compress this data employing a combination of their suggested technique with a lossless compression method, trying as a lossless method the well-known methods-Huffman, LZ77 and LZW. Nonetheless, they came to a similar conclusion that combining the Huffman codes with another method indeed gives better results than just compressing the data using Huffman codes [35].
The subject of GNSS data compression has already previously been studied; however, the studies are from different directions with usually dissimilar approaches, unlike our research.
Some of the studies that have been done on the subject compress completely different information [36]. One proposed compression algorithm [37] requires accompanying hardware and other supporting equipment such as a server that performs data analysis, compression and transmission. Working with a server cannot be practical for our research because it works in a real time environment.
An algorithm more similar to our work is suggested in [38]. It analyzes NMEA data compression using some combinations with LZ77 and Huffman coding. The compression results in this paper are an output of about 30% of the original information. Furthermore, the writers have taken out some of the information so that all the records will have the same fields and the compression ratio will be enhanced. Even with these features, this paper achieved significantly inferior results because the differences method, such as H.264, which can substantially improve the performance, is not employed. In our research, we made use of H.264 and as a result were able to get much better results of about 13%.
The goal of compression is to efficiently reduce the amount of information transmitted by GNSS. Raw data transmitted by GNSS is very expensive, costing thousands of dollars per day and millions of dollars per year for only about 4000 vehicles that use it [39].
Today every vehicle (even non-autonomous) has built-in GNSS components [40], but the topic becomes very significant when we talk about GNSS in autonomous vehicles [41]. The amount of information that these vehicles transmit will be significantly bigger. One of the components constantly changed and transmitted across bandwidths is the location data of the vehicle.
This research presents a method for compressing GNSS data as a difference between location and time. An additional example can be found in [42], which shows that GNSS information contains many commas between the parameters. Commas are information repeated over and over again within the message [It undoubtedly takes up bandwidth and therefore needs to be more efficiently compressed (e.g., Huffman code).
In the algorithm discussed above, compressed information is transferred most of the time, but occasionally full information is transferred to avoid retaining errors over time.

Employing H.264-Like Compression
The main objectives of the H.264/AVC standardization efforts were improving the compression performance, providing "network-friendly" video representation, and compressing the information more efficiently than in previous standards, such as H.263 [43].
H.264/AVC represents advances in standard video encoding technology, improving encoding efficiency and flexibility for use in a wide range of networks and applications [43,44].
H.264 provides about 50%-bit rate savings for equal perceptual quality compared to the performance of previous standards, such as H.263 [45,46].
H.264 makes use of three frames (I-FRAMES, P-FRAMES and B-FRAMES) to improve error resilience, avoid failures in video streaming, and improve the efficiency of compression and the compressed stream [46][47][48].
Compression can be improved by further modifications, such as by using differences between time and location data and compressing the commas by using the Huffman Code. Moreover, vehicles often get stuck in traffic jams; the more vehicles, the bigger the traffic jams will be [49]. The average speed today in big cities can be even under 30 km/h, as in New York [50] or Tel Aviv (around 15 km/h).
Beyond the traffic jams, the vehicles idle at traffic lights or stop for various purposes. On all these occasions, the vehicle sends information about its location using the method described in [42] but with several changes.
In autonomous vehicles, several types of connections help vehicles to receive information from the environment (other AVs) and vice versa, as with vehicle-to-vehicle (V2V), vehicle-to-infrastructure (V2I), and vehicle-to-everything (V2X) [51].
H.264 takes advantage of the property that adjacent video frames are usually very similar. Therefore, the values of the adjacent blocks' differences will be zero or close to zero. The more zeros we obtain, the better the Huffman coding efficiency. The information that GNSS generates also usually has this feature of similarity between adjacent blocks, so we adapted the idea of H.264 and achieved an effective and efficient preprocessing step.
We analyzed the GNSS data compression and decompression, which can affect the total time and resources used by the GNSS. The differences between the consequent NMEA sentences will be compressed in the way that H.264 handles consequent frames with small differences.

Methodology
This investigation was performed with several compression and optimization tools and methods, which we describe below, explaining their advantages and disadvantages. We furthermore explain why some of the methods achieved better results than others we tried but rejected. Our tool for compression and decompression was written by us in C#, along with another tool written by us, which aims at comparing results and checking them visually. Examples of output files from the C# tool are shown in Figures A1-A5. The visual tool is explained in more detail below in Section 7.
Accordingly, the algorithm for encoding a raw GNSS data file is (Algorithm 1):

Algorithm 1: Compression GNSS data
Input: Raw GNSS data Output: compressed binary data (1) Checking the correctness of the information from the GPS data by using GPRMC protocol like indecation in 3rd parameter (V=invalid data).
(2) If (information is incorrect) do: Remove incorrect information from the data file  Correspondingly, the algorithm for decoding data file is (Algorithm 2):
(2) Performing decompression using the difference method.

Experiments
Data was collected by traveling on many roads in several vehicles with the same receiver (smartphone). First, for data collection, the following NMEA LOGGER application was utilized during the trips [52].
This application allowed us to record NMEA data in raw form and transfer the data file to a computer for analysis and further analysis. The application is intended for use on smartphones using the Android operating system from Google Play. In our research, the NMEA LOGGER application version 2.3.35 was used on a Samsung S20 ULTRA device (Samsung, Seoul, South Korea) running the Android operating system. Aiming at analyzing information and collecting data, we incorporated urban and intercity routes and interspersed long and short ranges.
This application can create a large log file with all the GNSS data and sub-protocols. We did not need some of the data for this study, so we filtered it for only the relevant data. Nowadays, a filtering method before analyzing results or performing actions is prevalent in many applications, as in [53].
Several short and long samples have been extracted and are shown in the tables that appear in the Results section. The shortest trip was about 50 km in 20 min, and the longest was about 4.5 h and 300 km, which included driving on a highway and standing in urban traffic jams.
In Figure A1 in Appendix A, we see many information lines not relevant to the content that we want to compress. For us it is a sort of noise that we have filtered out.
The content of the desirable protocols has been extracted. The extracted protocols are: The contents of the desired protocols are extracted. After the filtering out of this noise, a new text file was built and through it, repeating patterns within each line of each protocol were examined. An example of a filtered file can be seen in Figure A2 in Appendix A, which includes only the four protocols mentioned above.
For example, one of the methods tried in the research was calculating differences between rows' values in each block. This shows very good compression data, but it is very difficult to recover information, so we rejected this method.
Sometimes the vehicle sends or receives incorrect satellite information, which can happen for many reasons. For example, the vehicle enters an underground parking lot or tunnel, or a momentary malfunction occurs in the reception of GNSS satellites [54][55][56]. During the first step, it was decided to remove incorrect information.
In Figure A3 in Appendix A, we see noises in both the GPGSA and GPRMC protocols, which are incorrect data. These noises cause an omission of most of the information block that includes additional protocols, as seen, for example, in the GPRMC protocol when its third value is the V value, which means the information is incorrect. Also, incorrect information can be detected in the GPGSA protocol when the third parameter is 1 (Mode: 1 = Fix not available).
If there is incorrect information for any reason, the algorithm will delete most of the block with the incorrect information, as exemplified in Figure A3 in Appendix A.
We suggest calculating differences between the rows with the same protocol like it is done in the H.264 method. For example, in each block there are k rows where: The example in Figure 1 demonstrates how it works: • GPRMC is a 1-line message.
The example in Figure 1 demonstrates how it works: One of the difficulties was with GPGSV. The challenge emerges when in certain blocks in the GPGSV protocol, we have a certain number of lines, and then after a few iterations, a block appears with the GPGSV protocol with a different number of lines (more lines or less). Figure A4 in Appendix A shows a case with a different number of GPGSV lines. In iteration 4 (line [22][23][24][25][26][27][28][29], the GPGSV lines are not the same as the GPGSV lines of iteration 1 (line 1-7).
Since the H.264-like algorithm employs differences between iterations, it is important for us to have the same number of rows in each iteration to calculate a difference from another line or a difference from the average. Monitoring and testing made it possible to see that the changes are made infrequently, so it was decided to split the file. That is, as soon as an iteration is received in which the number of lines in the GPGSV protocol differs from the previous number of lines, the file is closed, and a new file initialized with a number of lines in the latest GPGSV, and so on.
In the process of building a binary file, another problem was detected. When a computer writes a file, the size of the file will be rounded to a multiple of 8 bits because computers work with bytes. This caused a problem in the decoding stage as the decoding came out wrong due to the added bits at the end of the file.
As a result, it was decided to add another byte at the end of the binary file, in which it is indicated how many zeros that were added as padding must be removed from the previous byte.
An issue that should be handled occurs when there are lines from different iterations with different fields. Figure A5 in Appendix A shows such a case, where there is a value between the commas, and the algorithm needs to make a difference from a row from an upper block where there is no value between the commas (NULL) or vice versa. For such cases we used a special pattern, seen in Figure 2. One of the difficulties was with GPGSV. The challenge emerges when in certain blocks in the GPGSV protocol, we have a certain number of lines, and then after a few iterations, a block appears with the GPGSV protocol with a different number of lines (more lines or less). Figure A4 in Appendix A shows a case with a different number of GPGSV lines. In iteration 4 (line [22][23][24][25][26][27][28][29], the GPGSV lines are not the same as the GPGSV lines of iteration 1 (line 1-7).
Since the H.264-like algorithm employs differences between iterations, it is important for us to have the same number of rows in each iteration to calculate a difference from another line or a difference from the average. Monitoring and testing made it possible to see that the changes are made infrequently, so it was decided to split the file. That is, as soon as an iteration is received in which the number of lines in the GPGSV protocol differs from the previous number of lines, the file is closed, and a new file initialized with a number of lines in the latest GPGSV, and so on.
In the process of building a binary file, another problem was detected. When a computer writes a file, the size of the file will be rounded to a multiple of 8 bits because computers work with bytes. This caused a problem in the decoding stage as the decoding came out wrong due to the added bits at the end of the file.
As a result, it was decided to add another byte at the end of the binary file, in which it is indicated how many zeros that were added as padding must be removed from the previous byte.
An issue that should be handled occurs when there are lines from different iterations with different fields. Figure A5 in Appendix A shows such a case, where there is a value between the commas, and the algorithm needs to make a difference from a row from an upper block where there is no value between the commas (NULL) or vice versa. For such cases we used a special pattern, seen in Figure 2.   , there may be no values between commas in block 4 in a certain protocol compared to block 1 (or vice versa).
An example of the different file is shown in Figure A6 in Appendix A.
• After receiving the different file in Figure A6 in Appendix A, the algorithm prepares a file that contains very long prefixes that usually repeat in various files. Our algorithm maps each of these prefixes with a distinct symbol.

•
The algorithm takes the output file from step 1 and executes a mapping file, creating a Huffman encoding for the file from step 1.
A simulation program was built for compression and decompression in C#. This tool tests and calculates different compression, coding, and decoding methods.
A diagram illustrating the compression is in Figure 3.
1,2,3,4 = 4 2,4,6 We stopped calculating after (9 + 3)/2 and we do not need to save all the information after index 4 (2 + null) because we have iteration 1 (it appears iteratively) all the time and it can be simply recovered/decoded. After performing the actions mentioned above, we get a difference file, saving only around 28% in space. At this point, we achieve a file with many repeating zeroes, which allows us to use Huffman coding more efficiently and thus attain a better compression ratio later.
An example of the different file is shown in Figure A6 in Appendix A.
• After receiving the different file in Figure A6 in Appendix A, the algorithm prepares a file that contains very long prefixes that usually repeat in various files. Our algorithm maps each of these prefixes with a distinct symbol.

•
The algorithm takes the output file from step 1 and executes a mapping file, creating a Huffman encoding for the file from step 1.
A simulation program was built for compression and decompression in C#. This tool tests and calculates different compression, coding, and decoding methods.
A diagram illustrating the compression is in Figure 3. We developed simulation software to implement the suggestions of this work. The interface of the simulation program is shown in Figure 4. We developed simulation software to implement the suggestions of this work. The interface of the simulation program is shown in Figure 4.  This simulator is given a raw GNSS data file.  This simulator is given a raw GNSS data file. The course of action is: A.
For encoding:

Results
This compression method contains three steps: The difference method is based on H.264.
One of the significant advantages of this compression method (compared to zip compression) is that ours has error resilience because of the Huffman Codes [5].
ZIP does not have this attribute of error resilience, and it is almost impossible to recover damaged files. In addition, if the files using ZIP were sent in parts and one part were not received, then the information could not be recovered at all [57].
In each message transmission using the method proposed in this paper, we send a first packet that is always original and without any changes. Therefore, because of this feature, we recover the subsequent packets.
If some of the packages are lost, because of the Huffman property of synchronization, we would recover the rest of the information at a relatively high speed. In addition, the first packet is transmitted at a certain frequency each time the number of messages in the GPGSV protocol changes. Therefore, when there is a change in the GPGSV protocol, the existing file is closed, and a new file is opened in a renewed procedure, and thus we create additional error resilience to information loss due to the repeatability of the first package.
This durability does not exist in the Zip compression method (not in real time). If some of the information does not arrive or goes wrong in transmission, ZIP will not be able even partially to restore the information. The suggested method of compression has been evaluated by several benchmark tests employing real GNSS information obtained from the GNSS of real vehicles. We tried Zip as it is, as well as Diff' the method employing the concept of H.264 described above. The output of Diff was sent to Huffman, Zip, and a mapping of repeated strings before doing the Zip.
We have marked in bold in the following tables the method we propose in this study, the winning method among all those tested and reviewed. The results of the first benchmark are detailed in Table 1 (GNSS data file and compression results). In the next results, we will be able to see a slight improvement with larger files, but at a certain stage, the improvement is not significant, and the results remain about the same numbers.
We see that after calculating and using the difference method (Diff file), which is a preprocessing stage, we get a reduction of 28%. The significant gain is not just the gain of 28% but the number of zeros we get after the differences. These help us to compress the information more efficiently using the Huffman code and achieving better significant compression percentages.
After using the difference method, which is a preprocessor for Huffman coding, we will get a significantly better compression ratio (compared to the original file), of about 87%.Nevertheless, if we want to improve and compete with ZIP, first we perform the difference method, which preprocesses for Huffman, and then directly apply ZIP compression and thus get a better compression percentage of about 91.3% compared to about 90% by directly using ZIP on an original file Furthermore, it is possible to compress even better after performing the difference method, then using the mapping method, and then Zip; the result is significantly better-93.4%.
Naturally, we must take into account that the world is not perfect and sometimes successes come at the expense of something else. The high compression rates we get here, which are better than with Zip, come at the expense of not being able to recover if the file or parts of it are damaged during transmitting or building (survivability).
We also tried some larger files, and the results were a little bit better. The results of the larger files can be found in Table 2 (GNSS data file and compression results).  Figure 5 compares the average percentages of size that have been saved by each of the methods. It can be concluded that Diff&Mapping&Zip gives the best compression, but there is no error resilience in this method, whereas Diff&Huffman, even with somewhat lower results, has the feature of automatic error resilience.
Remote Sens. 2023, 15, x FOR PEER REVIEW 13 of 21 Figure 5 compares the average percentages of size that have been saved by each of the methods. It can be concluded that Diff&Mapping&Zip gives the best compression, but there is no error resilience in this method, whereas Diff&Huffman, even with somewhat lower results, has the feature of automatic error resilience.
Another option that has been tested is performing ZIP compression and then running compression using our algorithm. This option was ruled out because after doing ZIP, a file that is mostly random will be created, and random files cannot be compressed [58].
Huffman coding is a very popular method used by many applications such as MP3 and JPEG. At the same time, another method can be used-that of Shannon Fano [59], but its compression is inferior to Huffman coding [60]. Therefore, we preferred Huffman codes. In the table in Appendix B, we see how many bits we save per symbol by using Huffman codes. However, we save even more because the compression method presented here has several stages, so in the preliminary stage we have already converted a sequence of repeating characters (prefixes) into one symbol and have already made some reduction in the data size.
To evaluate the efficiency of the proposed method, we calculate Shannon Entropy. Another option that has been tested is performing ZIP compression and then running compression using our algorithm. This option was ruled out because after doing ZIP, a file that is mostly random will be created, and random files cannot be compressed [58].
Huffman coding is a very popular method used by many applications such as MP3 and JPEG. At the same time, another method can be used-that of Shannon Fano [59], but its compression is inferior to Huffman coding [60]. Therefore, we preferred Huffman codes.
In the table in Appendix B, we see how many bits we save per symbol by using Huffman codes. However, we save even more because the compression method presented here has several stages, so in the preliminary stage we have already converted a sequence of repeating characters (prefixes) into one symbol and have already made some reduction in the data size.
To evaluate the efficiency of the proposed method, we calculate Shannon Entropy. Shannon Entropy Formula is S(x) = −∑ n i=0 p(x i )log 2 (p(x i )) where p(x i ) is the probability of getting the value x i . More explanation of the Shannon Entropy formula can be found in [61].
It can be seen in the detailed spreadsheet in Appendix B that Shannon's entropy gives a result of 5.548, which is indeed optimal. This entropy is very close to the result obtained in this work by using Huffman coding 5.581. The results are shown in Figure 6. In the table in Appendix B, we see how many bits we save per symbol by using Huffman codes. However, we save even more because the compression method presented here has several stages, so in the preliminary stage we have already converted a sequence of repeating characters (prefixes) into one symbol and have already made some reduction in the data size.
To evaluate the efficiency of the proposed method, we calculate Shannon Entropy. Shannon Entropy Formula is S(x) = − p(x ) log (p(x )) where p(x ) is the probability of getting the value x . More explanation of the Shannon Entropy formula can be found in [61]. It can be seen in the detailed spreadsheet in Appendix B that Shannon's entropy gives a result of 5.548, which is indeed optimal. This entropy is very close to the result obtained in this work by using Huffman coding 5.581. The results are shown in Figure 6. The slight increment occurs because the Huffman algorithm rounds off the number of bits for each codeword and therefore a very close entropy is obtained by the Huffman algorithm, even if not optimal.

Conclusions
Several compression methods as well as a combination of different compression methods and their results have been presented. Compression using only Zip is about a saving of 90%. This is undoubtedly a substantial compression ratio, but unfortunately with quite a few shortcomings as noted in this work.
It is possible to compress with a full method as suggested in this research and obtain a reduction of over 87% of all the raw information. Considering the good error resilience because of Huffman and sometimes sending raw information, it can be concluded that there is an evident trade-off here with a cost of about 3% compared to ZIP.
A better compression ratio can be obtained by combining part of the method developed in this paper and then performing compression by ZIP. This algorithm outperforms the two previous and obtains a data reduction of about 94% (3% better than ZIP and 6% better than the Huffman-based method which we suggested here). Nevertheless, it should be noted that in the combined method, there is no good error resilience because Huffman Coding is not used.
Our choice and recommendation are to use the difference method together with Huffman rather than a pure ZIP method or other combinations that we mentioned above.
We prefer the difference method together with Huffman because the other methods do not provide error resilience, even though they can slightly improve the compression ratio.
It should also be noted that during the coding, we used static and not dynamic coding tables. This feature has pros and cons. It is common to have a static and known table in advance, because this is efficient and saves the need to update and transfer the compression tables. Everyone has the known tables and there is no need that the transferred information to include new tables each time. Yet, a custom table can generate a shorter average codeword. Therefore, we chose to use static tables as customary in most compression methods such as JPEG, TIFF, MP3 and more.
We believe that using the methods presented in this paper can significantly improve the efficiency and speed of information transfer. This is particularly important nowadays since there is no use of NMEA data compression in GNSSes and not enough research and work has been done on this subject. Therefore, further attention should be given to NMEA raw data that is currently transmitted uncompressed. In a future study, we suggest checking how to improve the compression and reach optimal results by using alternative codes for Huffman's. Other methods that can be used without rounding the bits up like Arithmetic Coding should be considered. Also, a possibility of adding error resilience and/or error checking like checksum to the more efficient zip methods presented here (Diff&Mapping&Zip) can be considered.
The Huffman code synchronizes after an error with almost 100% probability. That is, after several wrong code words, the Huffman algorithm automatically comes back to itself and starts reading real code words. In contrast, arithmetic coding does not synchronize after an error and all the data that is read and decoded after the error is wrong. The proposal to work with arithmetic coding improves the compression percentages, but it comes at the cost of no synchronization after an error, so a mechanism for synchronization after errors such as a checksum may possibly be considered.
We also suggest considering in advance the possibility of not transmitting unchanged information in relation to the previous information (for example, if the vehicle is stopped at traffic lights or in a traffic jam).

Acknowledgments:
We are very grateful to all those who contributed to the success of this research. First and foremost, we would like to thank Eduard Yakubov and Eugen Mandrescu for all their support and encouragement throughout the process of research. Also, we would like to express our gratitude to Radel Ben-Av, who provided valuable information, insights and assistance with his extensive experience in the field. In addition, many thanks to my good friend Alrajoub Eyas for his contribution in supporting the development of tools that were very important to the success of this research, and we are grateful for his intense and dedicated help.

Conflicts of Interest:
The authors declare no conflict of interest. Appendix A Figure A1. Untouched file from NMEA Logger before filtering.

Appendix B
This spreadsheet is a static conversion table for each symbol. This table also shows the occurrence frequency of each symbol and its Huffman coding. We can see in addition in this table how much per-symbol space we were able to save and an entropy calculation in relation to the per-character average.
Average bits per symbol is 5.5814. Entropy is 5.5481.

Appendix B
This spreadsheet is a static conversion table for each symbol. This table also shows the occurrence frequency of each symbol and its Huffman coding. We can see in addition in this table how much per-symbol space we were able to save and an entropy calculation in relation to the per-character average.
Average bits per symbol is 5.5814. Entropy is 5.5481.