An Automated Refactoring Approach to Improve IoT Software Quality

: Internet of Things (IoT) software should provide good support for IoT devices as IoT devices are growing in quantity and complexity. Communication between IoT devices is largely realized in a concurrent way. How to ensure the correctness of concurrent access becomes a big challenge to IoT software development. This paper proposes a general refactoring framework for ﬁne-grained read–write locking and implements an automatic refactoring tool to help developers convert built-in monitors into ﬁne-grained ReentrantReadWriteLocks . Several program analysis techniques, such as visitor pattern analysis, alias analysis, and side-effect analysis, are used to assist with refactoring. Our tool is tested by several real-world applications including HSQLDB , Cassandra , JGroups , Freedomotic , and MINA . A total of 1072 built-in monitors are refactored into ReentrantReadWriteLocks . The experiments revealed that our tool can help developers with refactoring for ReentrantReadWriteLocks and save their time and energy. executed under different conﬁgurations. We plot the execution time where WT represents the number of write threads and RT represents the number of read threads. All measuring results are obtained by calculating the mean value of 10 runs.


Introduction
The exponential growth of IoT devices is changing our world [1,2]. According to a recent Gartner report [3], 8.4 billion devices including smart phones, tablets and laptops will be connected by 2020, and this number is expected to grow up to 20.4 billion by 2022. To enable smooth interaction between these IoT devices, IoT software should provide good support. [4,5].
Communication between IoT devices is primarily realized in a concurrent way. How to ensure the correctness of concurrent access becomes a major challenge in IoT software development [6,7]. Java has become one of the most popular programming languages for IoT software development because of its capacity to handle concurrency-related problems [8]. Java virtual machine (JVM) provides support for Java-based IoT applications running almost on any chip. Java provides support for IoT software in different aspects, such as cloud computing, big data, sensors, and M2M computing. Java's ability of combining different devices makes it a good choice for development of IoT applications.
Writing a high-quality concurrent program is still challenging. Developers usually employ coarse-grained locks which introduce lock contention and decrease performance.

•
We developed algorithms that could convert built-in monitor locks to fine-grained ReentrantReadWriteLocks.

•
We developed an automated refactoring tool implemented as Eclipse plugins.

•
We evaluated our tool on several real-world applications.
The remainder of this paper is organized as follows. The related works are examined in Section 2. Some advantages of the ReentrantReadWriteLock over built-in monitor locks are presented in Section 3. Section 4 demonstrates our refactoring framework, and our refactoring and analysis algorithm design. Some practical problems are discussed in Section 5. Section 6 presents the evaluation of our proposed tool on a set of Java applications, and conclusions and future works are presented in Section 7.

Related Work
In the early study of lock-oriented refactoring, Aldrich et al. [9] and Bogda et al. [10] focused on eliminating unnecessary synchronization, but it was complicated on the compiler level and was limited by many factors. Tao et al. [11] proposed a method of lock splitting based on synchronization requirement analysis. Diniz et al. [13] reduced the overhead by coarsening the granularity at which the computation locks objects. Schäfer et al. [14] designed a refactoring tool Relocker to convert built-in monitors to ReentrantLocks and ReentrantReadWriteLocks. Inspired by his work, Zhang et al. [15] worked on refactoring from a built-in monitor to a StampedLock by lock downgrading/upgrading and optimistic synchronization.
Bavarsad et al. [16] proposed a way to overcome the overhead of the global clock for Software Transactional Memory (STM) by two optimization techniques. The first was read-write lock allocation (RWLA), which could only improve the performance of the STM if the transaction committed successfully. However, if conflicts occurred frequently, RWLA would increase the abort cost and reduce performance; the second optimization technique was a dynamic selection baseline scheme or adaptive technique to reduce the abort cost of RWLA.
Emmi et al. [17] proposed an automatic lock allocation technique to infer the location of a lock in a program and ensure that the lock was correct and avoid deadlocks. Kawachiya et al. [18] proposed a lock retention algorithm that allowed a lock to be retained by a thread. When a thread tried to acquire a lock operation, if the thread retained the lock, it would not have to perform an atomic operation to get the lock; otherwise, the thread would use the traditional method to obtain a lock. Hofer et al. [19] came up with a new method for analyzing lock contention in Java applications by tracking locking events in Java virtual machines. Their method detected not only when threads were blocked on locks, but also when other threads held the lock to block it and recorded their call chains. This method could reveal the causes for lock contention and identify the performance bottlenecks of locks. Our previous research implemented an automatic refactoring tool to convert built-in monitors to StampedLocks [15], and refactored Java programs for customized locks [20].
For the software refactoring tools, Dig et al. probed into concurrent refactoring. They proposed a software parallelization refactoring tool, CONCURRENCER [21], which could refactor serial Java codes into parallel Java codes. By refactoring a serial program into a re-entrant parallel program using the java.util.concurrent library, they converted a thread into a Fork/Join framework, converted int. into AtomicIntegers, and converted HashMaps into ConcurrentHashMaps, making data access thread-safe. Tip et al. focused on the validation of software correctness in the early stages of refactoring and designed a refactoring tool Reentrancer [22] to make programs reentrant by transforming a sequence program into a reentrant program.
The impact of IoT is worldwide [23,24]. Refactoring can improve the quality of software, but in the meantime incurs security risks [25,26]. Some researchers have paid attention to the security of IoT software [27,28].

Motivation
In this section, we first introduce the background of the ReentrantReadWriteLock and present some possible application scenarios of the ReentrantReadWriteLock. Also, the performance of the synchronized lock and the ReentrantReadWriteLock is compared.

Background
The ReentrantReadWriteLock [29] is a locking mechanism introduced in JDK 1.5. It maintains a pair of associated locks, read locks, and write locks. As long as there is no write thread, the read lock may be held simultaneously by multiple read threads. The write lock is an exclusive lock and can be held only by one thread at a time.
The ReentrantReadWriteLock supports lock downgrading, which means that a current thread can acquire a read lock while holding a write lock, and then release the write lock. Acquiring the read lock and then releasing the write lock is to ensure the visibility of the data. However, a ReentrantReadWriteLock does not support upgrading from a read lock to a write lock, the purpose of which is also to ensure data visibility. If the read lock is acquired by multiple threads, any thread among them can successfully acquire the write lock and update the data, but its update is not visible to other threads that have the read lock.
The ReentrantReadWriteLock enables more concurrency when accessing shared data. In theory, the performance of using ReentrantReadWriteLocks would be significantly better than using mutually exclusive locks. However, in practice, the performance also depends on the concurrent processing power of multi-core processors and access patterns to shared data.
The standard library java.util.concurrent [30] has provided classes and interfaces to enable flexible usage of locks, such as ReentrantLocks, ReentrantReadWriteLocks, and StampedLocks. These will allow the program to run with a fine-grained lock. Nevertheless, Pinto et al. [31], after analyzing 2227 Java projects with concurrent structures on SourceForge.net, concluded that the Java concurrency library had not been used sufficiently and only~23% of Java projects with concurrent programming structures had used it.  Figure 1b is implemented by ReentrantReadWriteLocks, where the query operation inquire() is a read operation, so the read lock is used. The insertion operation insert() is a write operation, so the write lock is used. Note that when using ReentrantReadWriteLocks for synchronization, the unlock() command must be called to release the read lock or the write lock after the operation that requires synchronization. A try-finally construct is usually used when the ReentrantReadWriteLock is used, and the operation of releasing the lock is placed in the final block to ensure that the lock is always released to avoid deadlocks. The method CacheprocessData() in Figure 1c implements the operation of the database. The data are used directly if the data exist in the cache. The data from the database are read and written to the local cache if the data do not exist in the local cache. Figure 1d shows the method processCachedData() implemented through the lock downgrading mode of ReentrantReadWriteLocks. The code shows that the first read lock is acquired to read the data into the local cache. If the data do not exist in the local cache, the current thread will release a read lock to acquire a write lock and find the data in the database, then write the found data into the local cache. The thread finally acquires the read lock and then releases the write lock to complete lock downgrading.

Performance Evaluation
This section first compares the performance of the synchronized lock and the ReentrantReadWriteLock, then compares the results of the synchronized lock and lock downgrading of ReentrantReadWriteLocks. Figure 2 shows the results of four code fragments executed under different configurations. We plot the execution time where WT represents the number of write threads and RT represents the number of read threads. All measuring results are obtained by calculating the mean value of 10 runs.  Figure 1a,b) with 10 total threads, each thread performing 100 operations). When RT = 9, and WT = 1 (WT represents the number of write threads and RT represents the number of read threads), ReentrantReadWriteLocks and synchronized locks have notable difference in execution time. However, the execution time is basically the same when RT = 9, and WT = 1. Figure 2b is the result of the execution of the code with a total of 100 threads, with 100 operations per thread performed, and the difference of the execution time between the ReentrantReadWriteLock and the synchronized lock is even more significant when RT = 90 and WT = 10. Figure 2c presents the result of operations using synchronized locks and lock downgrading (the source code is similar to that in Figure 1c,d) with a total of 10 threads. Each thread executes operations 100 times. The read thread represents that the data exist in the local cache, so the data are used directly. The write thread represents that the data are not in the local cache and need to be written to the cache from the database. The figure shows that the impact of the number of read and write threads has not significant impact on the performance of using synchronized locks, but the execution time of the program using ReentrantReadWriteLocks decreases when the read threads increase. Figure 2d is the execution result under 100 threads, and the overall trend is similar to Figure 2c. The results indicate that a program using ReentrantReadWriteLocks will perform better than that using synchronized locks when the read operations exceed the write operations and the read operation takes a relatively long time. When lock downgrading is employed, a program using ReentrantReadWriteLocks also shows better performance.

Refactoring Framework
The refactoring framework is shown in Figure 3. WALA [32] was used to design our analysis algorithm. Visiting pattern analysis was employed to find the target code; alias analysis was used to check the alias of monitor objects; side-effect analysis was used to analyze the critical section and generate a character sequence. We designed five lock modes for fine-grained ReentrantReadWriteLocks, which could be inferred by our analysis and followed the inference rules.

Visitor Pattern Analysis
We parse the Java code into an abstract syntax tree (AST) through ASTParser [33] (a Java language parser for creating abstract syntax trees in Eclipse JDT). An AST node represents a Java source code construct, such as a name, type, expression, statement, or declaration. We use the visitor pattern to traverse all nodes on the AST and find all monitors in the program.
We must distinguish the built-in monitors and collect the monitor objects. For a synchronized lock, synchronizing methods and synchronized blocks should be considered separately, so should static methods and non-static ones.
For an object instance: • For synchronized instance methods, the monitor object is this; • For a synchronized block with an instance monitor object o, the monitor object is o; For a class: • For synchronized static methods, the monitor object is a class object; • For a synchronized block with a static monitor object O, the monitor object is O; For an object instance, we declare a new instance of ReentrantReadWriteLock in the class. For the monitor behavior that acts within the scope of a class, we declare a static ReentrantReadWriteLock instance in this class.
We define a HashMap lockmap to store the key-value pairs between the monitor object and the lock field, where the key is the monitor object, and the value is the corresponding lock field.

Lock Mode
Our refactoring tool transforms a built-in monitor into a fine-grained ReentrantReadWriteLock by lock downgrading and lock splitting.
Our refactoring tool directly applies the read locks to methods and synchronization blocks that have no side effects. For methods or blocks with side effects, the write locks, downgrading locks and splitting locks are used.
The fine-grained lock mode shown in Figure 4 is implemented by lock downgrading [29]. Figure 4a shows the program before refactoring. The method cache() will first judge the conditional variant flag. Only when the flag is true, the write operations will be executed. Figure 4b presents the program after refactoring. Under the control of a read lock, a conditional statement flag is read (Line 4). If the condition is met, write operations are executed. Therefore, the current thread will release the read lock (Line 5) and acquire the write lock (Line 6) to perform the write operations. Note that a thread needs to release the read lock before getting the write lock. After acquiring the write lock, the conditional state will be rechecked (Line 7) in case other threads acquire the write lock and modify the state.  Figure 5 shows three fine-grained locking modes through splitting of the ReentrantReadWriteLock. Figure 5a shows the code before refactoring, and Figure 5b presents the code after refactoring. In the code of Figure 5b, the read lock is used to read the conditional statement (Line 5). If the conditional state is true, the read lock is released (Line 6), and then the write lock is acquired. The finally block will check what lock the thread is holding (Line 13). The write lock will be released when the write lock is held by the thread, and the read lock will be released when the read lock is held.
Other refactoring implementations shown in Figure 5c may cause threads to lose their perception of data updates. For instance, the section protected by the write lock has a write operation on the shared variable s, and the section protected by the read lock has a read operation on s. This will cause a problem: the thread may not read the data it has already modified. Synchronization problems may arise when the section protected by the write lock and the section protected by the read lock have the same shared variables.
We made a precondition that the read and write operations cannot access the same variable for refactoring under this mode. Because two operations access the same variable, the visibility of the data will be lost. For example, thread A acquires the read lock and reads the value of variable i, then releases the read lock. Thread B acquires write lock and modifies the value of variable i. But thread A cannot know the update of the value.
We put the shared variables read by the read lock into the list readlist, and the shared variables written by the write lock into the collection writelist. If the two lists do not share the same element, the refactoring tool uses the read lock. Otherwise, the tool uses the write lock for refactoring.

Alias Analysis
When synchronization blocks are transformed, our tool will analyze the lock set. The monitor objects of the synchronization blocks may have different names but two or more objects point to the same memory position. We use the program analysis framework WALA [32] to design our alias analysis to check alias on the lock set. Our alias analysis is based on context-sensitivity pointer analysis.
WALA uses a HeapModel to abstract pointers and heap locations and provides a HeapGraph to navigate the results of a pointer analysis. The nodes in a HeapGraph are PointerKeys and InstanceKeys. The PointerKey represents an abstract pointer and the InstanceKey represents an abstract heap location. There is an edge from a PointerKey to an InstanceKey when thePointerKey points to the InstanceKey, and there is an edge from an InstanceKey to a PointerKey when the PointerKey represents a field of an object instance modeled by the InstanceKey.
For example, we have a HeapGraph h and an InstanceKey p of a monitor object. We first use h.getSuccNodes(p) to find all pointer keys that InstanceKeys p may point to, and for each such InstanceKey i, h.getPredNodes(i) are used to find other PointerKeys that the alias p may point to. Our pointer analysis is based on this example.

Side Effect Analysis
An operation, method or expression has a side effect if it modifies the state outside its local environment. Our side effect analysis is to identify whether the critical section has side effects. WALA uses the Intermediate Representation (IR) structure to get all instructions in the method. The WALA IR is the central data structure that represents the instructions of a particular method. The IR represents a method's instructions in a language close to JVM bytecode, but in an SSA-based register transfer language which eliminates the stack abstraction, it relies instead on a set of symbolic registers. As shown in the code in Figure 6, we analyze each instruction in the method and generate a sequence string for read and write operations for each method.
The side-effect analysis algorithm is shown in Figure 6. We first get all the instructions in the method and store them in a collection (Line 4), then traverse each instruction and analyze the side effects using the method getAnalysis (Line 14). The analysis method determines whether there is any instruction that modifies the memory. If the instruction is InvokeInstruction, the analysis method will get the called method, and the instruction in the method will be traversally analyzed. The method has side effects if it has a write instruction. As ReentrantReadWriteLocks have many types of locking modes, such as read/write locks, lock degrading and lock splitting, the use of lock modes depends on the side effects of the critical section. The side-effect analysis will analyze the critical section and generate a character sequence.
To match the character sequence, we define five regular expressions for inferring lock modes. The regulation sequence and representation of the characters are shown in Table 1. Table 1. Regulation sequence and character representation.

Regulation Sequence
Regulation 1 R + Regulation 2 ((C|T) * |(R|W) * ) * Regulation 3 R * CR * W(W|R) * T Regulation 4 R + W + |W + R + Regulation 5 R * CR * W(W|R) * TR + R: Read operation; W: Write operation; C: If condition; T: End of if condition; *: Zero or multiple times; + : Once or multiple times. Regulation 1: The read lock mode. Regulation 1 shows the read lock mode, in which the critical section has at least one read operation and does not have write operations.
Regulation 2: The write lock mode. Regulation 2 represents a mode in which the critical section has at least one write operation. Regulation 3: The lock downgrading mode. Regulation 3 represents a mode in which a critical section has an if statement that has write operations and at least one read operation in the end.
Regulation 4: The lock splitting mode. Regulation 4 represents a mode in which a critical section only has one if statement and has write operations in the body of statement.
Regulation 5: The lock splitting mode.
Regulation 5 represents the separation of read and write operations in the critical section. We now describe the refactoring algorithm in more detail. The code in Figure 7 first gets the monitor object of the method m (Line 2), and then checks it in the lock set lockmap. If the monitor object exists in lockmap, the corresponding lock field is obtained. Otherwise, it creates an appropriate lock field based on the type of the monitor object and presents a new mapping relationship in lockmap.
After the lock field is obtained, the synchronizing methods and synchronized blocks are refactored accordingly. For a synchronizing method (Line 19), the method is analyzed first by side effect analysis, and the corresponding reading and writing sequence is generated. A finite automaton is used to identify the reading and writing sequence (Line 20). Because the monitor object in the synchronized block may have aliases, alias analysis is included in all methods for refactoring the synchronized blocks (Line 27).

Refactoring Tool
We implement our refactoring tool as an Eclipse plugin (the source code and the jar of the repository are available at https://uzhangyang.github.io/refactoring.html). Figure 8 is a screenshot of our refactoring tool, which displays a comparison between the code before and after refactoring. The source code before refactoring is presented on the left while that after refactoring is on the right.

Practical Issues
As our refactoring tool inserts part of the code into the try-finally construct, the scope of variables may be changed for some variables defined in the critical section. To resolve this problem, our tool checks these defined variables and allows them to be defined outside of the try block.
HSQLDB, Cassandra, and JGroups are widely used, and contain a lot of built-in monitors. HSQLDB is fully multi-threaded and supports high performance 2PL and MVCC (multiversion concurrency control) transaction control models. HSQLDB is used as a database and persistence engine in over 1700 Open Source Software projects and many commercial products. Cassandra is an Apache distributed database. It can be used to manage large amounts of structured data. Cassandra is the most commonly used NoSQL database. Because the data provided by the IoT are time series, Cassandra is often used to store data generated by sensors and devices in IoT applications. Over 1500 more companies worldwide with massive, active data sets are using Cassandra. JGroups is a reliable group communication tool written by Java. It is widely used in distributed systems, including JBoss, ElasticMQ, etc.
Freedomotic is an open-source, flexible, and secure IoT application framework for building and managing modern smart spaces. Freedomotic can run on Raspberry Pi and can easily interact with DIY Arduino projects. It is widely used in IoT applications. Apache MINA is a network application framework that helps developers develop high-performance, high-scalability network applications. MINA comes with many sub-projects such as AsyncWeb, FtpServer, SSHD, etc. Table 2 presents the evaluation results, with parameters including the lines of source code and built-in monitors in each benchmark. The last four columns show the number of four lock modes of each benchmark refactored by our tool. Our refactoring tool detected and refactored 621 built-in monitors in HSQLDB, 239 in Cassandra, 179 in JGroups, 21 in Freedomotic, and 21 in MINA. In total, all benchmarks have 1072 built-in monitors refactored by our refactoring tool. 12,008 SLOCs were modified. These results show that our refactoring tool can effectively save the developers' time and energy.

Results
For IoT applications, our tool has refactored Freedomotic-an IoT framework-and Cassandra, a widely used database engine. Our tool is not running in the IoT environment, but our tool can refactor IoT concurrent software, which runs in the IoT environment. We conclude that there are programs in our real-world applications that conform to our lock downgrading and lock splitting rules. In most cases, the tool uses write locks, and there is not much lock downgrading. We don't suggest converting all built-in monitors into ReentrantReadWriteLocks, because the performance of the ReentrantReadWriteLock is not necessarily better than the built-in monitor. The actual situation should be considered during refactoring.

Correctness Of Refactoring
HSQLDB benchmark is evaluated by connecting the database under several connection modes, such as in-memory mode, standalone mode and server mode. The evaluation results show that HSQLDB under all modes can connect to the database. We also create database, run SQL statements to insert and delete data, and perform other database operations. They all execute correctly. We run the JDBCBench and TestBench in package org.hsqldb.test. The JDBCBench is a test of JDBC connection and TestBench is a stress test of transaction processing. They all run correctly and return a benchmark report.
For Cassandra, we connect to the database and executed some CQL statements after the refactoring. They all work correctly, and we run all the unit tests in the test folder of Cassandra. A total of 648 unit tests, cover almost all classes in the source code. We find that they all run smoothly without reporting any errors. Cassandra has part of code that already use ReentrantReadWriteLock. We then manually refactor them back to synchronized locks, and use our tool to infer the original ReentrantReadWriteLock usage. The method mayReload() in class CompactStrategyManager uses write lock before refactoring. Our tool infers this method uses splitting lock. After manual check, the splitting lock does not change the behavior of method mayReload(). The other locks are inferred as same as original usage.
We used JGroups to deploy the cluster of three nodes after refactoring and successfully completed the communication between them. There are some test programs in JGroups, 49 of which were tested, and they all ran smoothly without reporting any errors.
Because Freedomotic and MINA have fewer built-in monitors, we manually inspected all the refactored locks. We manually identified (1) if the refactoring had changed the behavior; (2) if a correct kind of lock was inferred, (3) if a lock was inserted to a correct position, (4) if a lock structure was used correctly, and (5) if the critical section was protected safely. During the inspection, we found that the refactoring had not changed any behavior of the original programs, and each critical section had been inferred with the kind of lock according to the lock mode and almost all of them were accurate. The position that the lock inserted and the used lock structure were correct. Finally, the critical sections are surrounded by locks and the protection of the critical section is safe.

Comparison With Relocker
Max et al. [14] have proposed an algorithm of refactoring for ReentrantLocks and ReentrantReadWriteLocks, as well as a refactoring tool Relocker.
Running Relocker requires an earlier JDK version, so the JDK version used in this experiment is 1.6. HSQLDB version is 1.8.0.10, and Cassandra version is 0.4.0. Table 3 shows the comparison result between Relocker and our tool. Compared with Relocker which only uses read locks or write locks for synchronization protection, our tool uses lock downgrading and lock splitting to realize more fine-grained locking. Our tool has inferred more read locks than Relocker. After manual verification, the read locks inferred by our tool is used correctly. Relocker still relies heavily on manual selection of codes to refactor, and our tool is more automatic.

Conclusions and Future Work
The JDK library provides flexible locking constructs that can improve performance of software by reducing lock contention. In this paper, we presented an approach might improve the software quality by using ReentrantReadWriteLocks. We proposed a refactoring algorithm for fine-grained ReentrantReadWriteLocks, and implemented a refactoring tool as an Eclipse plugin. Our tool has been tested by several real-world applications. The refactoring approach is applicable not only to IoT software, but also to other concurrent software.
The major limitation of this study is that the selected applications cannot represent all applications which may have different concurrent behaviors. In future studies, we will use our refactoring tool to refactor more IoT programs, find more application scenarios suitable for fine-grained read-write locking, and explore more refactoring modes that reduce lock contention.