Smart Home Forensics—Data Analysis of IoT Devices

: A smart home is a residence that provides a variety of automation services based on Internet of Things (IoT) devices equipped with sensors, cameras, and lights. These devices can be remotely controlled through controllers such as smartphones and smart speakers. In a smart home, IoT devices collect and process data related to motion, temperature, lighting control, and other factors and store more diverse and complex user data. This data can be useful in forensic investigations but it is a challenge to extract meaningful data from various smart home devices because they have different data storage methods. Therefore, data collection from different smart home devices and identiﬁcation and analysis of data that can be used in digital forensics is crucial. This study focuses on how to acquire, classify, and analyze smart home data from Google Nest Hub, Samsung SmartThings, and Kasa cam for forensic purposes. We thus analyzed the smart home data collected using companion apps, Web interfaces, and APIs to identify meaningful data available for the investigation. Moreover, the paper discusses various types of smart home data and their usage as core evidence in some forensic scenarios.


Introduction
A smart home provides a variety of automation services based on Internet of Things (IoT) devices connected to a central hub or a gateway, which can be remotely controlled by controllers such as smartphones, tablets, and smart speakers. Each smart home device provides an independent service, while its companion apps, typically installed on smartphones, can be used to operate and monitor the device. In addition to smartphones, smart speakers such as Google Nest Hub and Alexa can be used as controllers of smart home devices connected to a smart home via voice commands.
Data from devices such as smart speakers, fitness wearables, pacemakers, and biometric devices are often used in courts as evidence during trials. For example, a file recorded by a a smart speaker played a crucial role in proving the innocence of a murder suspect in 2015 [1]. In the same year, records from a fitness tracker were used to establish that the statement of a suspect was false [2]. In 2017, pacemaker data were used as evidence to prove an insurance fraud [3].
IoT devices exist in various forms in-service platforms such as smart homes, smart cities, smart farms, and AMR systems. Devices belonging to each platform process data in a unique storage and management method, and have a relationship between data due to a connection between devices. Therefore, in order to utilize these data as evidence, it is necessary to analyze the relationship of the data of the connected devices as well as each device.
In this paper, we analyze data collected from companion apps, web interfaces, and APIs for smart home, one of the IoT service platforms, and propose digital forensic scenarios to utilize them.
We identify the sources and the format of smart home data and classified useful data. In addition, we perform correlational analysis between data to obtain accurate and diverse data. This study focuses on using data in forensic investigations. Figure 1 illustrates the workflow of our framework for smart home forensics. The workflow starts with identifying the detailed functions of each smart home device and conducting experiments to accumulate data. The functional profiling of each smart home device is important for conducting efficient experiments and generating investigation artifacts. The second step involves collecting accumulated data. Due to the diversity of devices in a smart home, several data extraction methods, one for each device, should be considered. The third step involves analyzing smart home data. In particular, correlational analysis is performed in this step to identify the relationships between acquired data and user behavior. The final step involves employing analyzed data in forensic investigations. In this step, the meaning of each data source is established and data to be correlated are identified.

Contributions
This study demonstrates that a variety of smart home data can be used in forensic investigations and suggests scenarios of applying digital forensics to smart home devices. In particular, the contributions of this study are the following.
(a) Identifying the data that can be used as forensic evidence and establishing the sources of these data. The data is identified and classified based on the functions of each smart home device; (b) Describing the ways of applying analyzed data to some specific forensic scenarios. Examples of data sources in theses scenarios include smart home devices, movements, voice commands, and the call history of incident responses; (c) Performing correlational analysis for obtaining user information, including event-specific time and user behavior information, and employing the relationships between data elements in digital forensics. Table 1 lists the data considered in this study. 'Item' refers to an app or a device list while 'Source' refers to the origin of data. 'Data type' refers to the format of the acquired data; one potential scenario where this information can be used is discussed in Section 6.

Related Work
Analyzing data generated by smart homes for forensic purposes involves data extraction, data analysis, and an app with forensic tools for a variety of devices such as smart TVs, and smart speakers. The development of such forensic tools, including methods for include acquiring root privileges, pre-imaging, testing, post-imaging, and comparing binary data from smart TV forensics to trace user behavior is described in Reference [4]. A method for extracting data from a smart TV using embedded MultiMediaCard (eMMC) chips and companion apps is proposed in Reference [5]. The extraction and analysis of data from the fitness bands Xiaomi Mi Band2 and Fitbit Alta HR are described for monitoring user activity and sleep, together with schemes for recovering deleted data, are described in Reference [6]. Li et al. proposed an IoT-based forensic model and verified it using the Amazon Echo [7]. Seila et al. acquired and analyzed data of a Samsung Gear S3 Frontier smart watch [8].
Building on these studies, research has been conducted on the collection and analysis of data from controllers controlling one or more devices, rather than merely analyzing data from a single device. Gokila et al. examined the Nest artifacts produced by an iPhone and developed a forensic tool called the Forensic Evidence Acquisition and Analysis System [9]. Guidance on the forensic data acquisition and analysis of artifacts from Securifi Almond+, another smart home hub environment, was reported in Reference [10]. Amazon Echo, a smart speaker widely used as a smart device controller, was forensically analyzed by a proof-of-concept tool called Cloud-based IoT Forensic Toolkit [11]. Douglas et al. suggested that the evidentiary value of data from Amazon Echo is considerable [12].
Fensel et al. proposed an IoT semantic platform called OpenFrigde which incorporated the IoT, Web, and Semantic Web technologies, and enabled users to collect, process, link, and view various IoT data [13].
This study considers new devices not previously studied to utilize various IoT devices in forensic investigations. The existing studies that are focused on single device analysis are difficult to utilize in a smart home environment connected with various IoT devices. In this study, we perform correlation analysis by extracting data of various IoT devices in the smart home, and present scenarios that can utilize their data in digital forensic investigations. Figure 2 illustrates the smart home environment purposely built for this study to enable data collection. The environment included a Google Nest Hub, a Kasa Cam, a SmartThings Multi-purpose Sensor, a SmartThings Motion Sensor, and a SmartThings Outlet for smart homes [25][26][27]. The following steps were completed to collect data. The SmartThing Hub connected to the low-end smart home devices, serves as a base station for the data exchange between the home devices. Each smart home kit employed in the experiment can be used alone or in conjunction with a supported smart speaker. In this study, the Google Nest Hub was employed as a smart speaker given its popularity. The installed smart home devices can be controlled by a smart speaker using voice commands or an app on a smartphone. Table 2 lists the devices employed in the experiment. Table 3 lists the features and functions of each device used in the experiments. The Google Nest Hub can control numerous elements of the home, from an outlet to a camera. The hub is equipped with a display, making it possible to see and control connected devices in a single window, with no need to switch between apps. The display can also show videos from the smart camera, Kasa cam. Calls between a smartphone and the Google Nest Hub are possible through the video calling Google Duo app. The Kasa cam can be watched live; alternatively, the feed can be recorded on a smartphone using the Kasa Smart app. When motion and sounds are detected, the camera sends a notification to a smartphone and saves the video. The cam has a built-in microphone and speaker enabling two-way communication through the camera and its companion app. The SmartThings Outlet can control whether power is on or off. The SmartThings Multipurpose Sensor can detect open and closed status, as well as vibration. The multipurpose sensor and motion sensor can both detect temperature.

Data Acquisition from Smart Home Devices
In this study, companion app data for Android smartphones were collected from smart home devices. Google Nest Hub data were obtained from its application programming interface (API) and Web pages returned by the Google.

Companion Apps
The manufacturers of smart home devices provide companion apps that are synchronized with the devices to enable the device control, management, and configuration. Due to the limited storage capacity of smart home devices, information about the operation of these devices is generally stored on a smartphone under the companion app package name folder in the data partition. Table 4 lists the package names and versions of the apps used in this study. The SmartThings and SmartThings Classic apps support Samsung smart home devices. The SmartThings Classic app is an old version, and the SmartThings app is a new version.

Google Web Interface
The Google Web interface utility called My Activity was used to generate data, including the service usage history for applications such as YouTube, Google Maps, the Chrome browser, and a variety of other apps. The Google Nest Hub voice commands were utilized to obtain data about the use of Google services but accessing this utility, a Google account is required.
The path we used to obtain the Google product data in My Activity was 'Google Account > Data & personalization > Download your data.' The following steps were performed to acquire data: (a) Select data Product data is stored in 'Home App' and 'My Activity'. Home App data can be exported only in the JSON format, whereas My Activity data can be exported in either the HTML or JSON format.
(b) Customize archive format A link for downloading data can be sent by email, or the data can be added to a drive, Dropbox, One Drive, or Box. The file type is either .zip or .tgz. Large archives can be split into multiple files.
(c) Output generated file The output file name format is 'YYYYMMDDThhmmssZ-number.' The field 'number' indicates the order if acquired data is split across multiple files. The timestamp of the file name is set to UTC+0.

Private Google Home API
The Google Home app provides the ability for local APIs to communicate with devices. While the majority for the APIs are not publicly available, we learned how to use the APIs by referring to private documents that are available through communities such as GitHub. Data from Google Home was obtained using the HTTP method of the local APIs of Google Home as described in Reference [28]. The GET method was used to obtain data from Google Home instead of the POST method; the latter is mainly used for changing settings. The GET method performs a request through a browser. When the request is concatenated to the base URL and sent to the server, the response data can be received in the JSON format. Since Google Home uses the server port 8008, a request can be sent in the form of http://(google-home-ip):8008/(request). Here the Google Home IP can be identified via the Google Home app (Device Settings-Info) or the Google Nest Hub (System Settings).

Analyzing Data Generated by Smart Home Devices
This section presents the analysis of the data obtained from companion apps, Web interfaces, and an API. In particular, the stored path and content from the collected data was identified and the useful data was classified.

Companion Apps
Each companion app can register a respective smart home device and then manage activity records generated whenever the device detects an action. Table 5 lists the file paths of data stored by smart home apps. Google Nest Hub stores data about location, accounts, Wi-Fi SSID, homegroup, and registered devices on a smartphone; information such as voice commands is not recorded as part of the app data. The home_graph_Base64Encoding (Google account).proto file, which contains the homegroup and registered device information, uses an encoded protocol buffer to store the data; hence, decoding is required to confirm the content. Decoding can be performed using an online decoding tool [29].
The call history between a smartphone and a Google Nest Hub is stored as duo app data. In tachyon.db, the [activity_history] stores the call history. other_id column stores the phone number of the opposite party and self_id column stores the user phone number. Google Nest Hub is set the same phone number and Google account. If other_id and self_id are the same, it implies that the call occurs between the smartphone and the Google Nest Hub. The value 1 of in call_state indicates that a call failed, whereas the value of 2 means that the call succeeded. The call duration can be obtained from basic call app data.

Kasa Cam
The TP-Link camera stores videos recorded by users or automatically when detecting motion or sounds. Given a limit of 1 GB memory, the video history is stored for up to two days less than four hours of 1080p video by default. However, video data is stored only if users download them. Otherwise, only the thumbnail file remains. The video title format is 'KC_Camera name_ddmmyyyy_msmsmsms.mp4', and the downloaded video files are stored in the gallery. Thumbnails represent either the video recorded when motion is detected or images were taken when the live video is viewed through the smartphone app. The former generates a .PNG file, while the latter generates a .JPG file. However, since the extension of the thumbnail recorded in the app data is .0, it is necessary to change the extension after checking the signature to be able to view the image. The camera is equipped with a built-in microphone and speaker, allowing two-way communication with the camera and smartphone companion app.

Samsung SmartThings
The SmartThings door and motion sensor can recognize the open and close states of door, motion, and temperature. When the temperature is changed, the sensor sends the new data to a smartphone, where it is then stored. The outlet also records data about the power consumption and state of power being on or off.

SmartThings Classic App
The http directory contains two files, 16bytes_hex_value.0 and 16bytes_hex_value. [messagesUI] contain notification information generated when the app alerts the smartphone.

Google Web Interface
My Activity shows the user's activity history in chronological order. Item Details contain voice commands, voice records, and answers by the Google Nest Hub, time, device type, and location. Every command to the Google Nest Hub is recorded as a voice command, but voice records exist only for the voice commands. Location, defined by the longitude and latitude, shows the corresponding Google Map position.
My Activity and Google Home data acquired from the Web is stored in MyActivity.json or MyActivity.html files according to the option chosen. 'Assistant' and 'Voice and Audio' directories contain .mp3 files of voice commands; the file name format is YYYY-MM-DD_hh_mm_fff_UTC.mp3.
The HomeApp.json file contains all Google Nest Hub app data such as owner emails and homes, rooms and location, and the details of mapping devices to rooms.
The My Activity > Voice and Audio > MyActivity.json file contains objects comprising a header, a title, a titleUrl, time, audioFiles, products, and details. The meaning of each value is as follows: The My Activity > Assistant > MyActivity.json file contains a header, a title, a titleUrl, a subtitle, time, products, and locations (name, url). The meaning of each value that is not defined above is as follows: • Subtitle: answer to the user command -Name: answer to the user command; -Url: URL of the answer to the search query.
• Locations: device location information -Name: action inside the device; -Url: Google Map URL of the location.
Voice and Audio directory store not only the voice data processed through the assistant but also the data used for voice learning, such as voice match. The Assistant directory is therefore a subset of the Voice and Audio directory, but the MyACtivity.json file of the Assistant is more detailed than the Voice and Audio.

Private Google Home APIs
Google Nest Hub data can be obtained throgh private APIs and the Web in the JSON format. Table 6 list the APIs selected as useful for digital forensics among all available private Google Home APIs. The output content type is JSON, so we can easily distinguish the data. Both the reboot and factory reset functions, which can act as anti-forensic approaches, are available using the private Google Home APIs. Rebooting can cause the data remaining in the memory to disappear, while factory reset deletes all information, including user accounts and setting information, and others. These two functions can be exploited by an attacker who is on the same network as the Google Nest Hub and knows its IP address. The reboot and factory reset functions are available with tools such as curl using the POST method.

Exploiting Smart Home Data in Digital Forensics
This section outlines possible scenarios of utilizing smart home data described in the previous sections in digital forensics. The focus is on data generated by the motion/door sensor and AI speaker for intrusion detection. Correlation analysis is demonstrated to provide insights into the meaning of the data.

Information from Smart Home Devices
Due to the miniaturization and diversity of smart home devices, it is difficult to identify the ones that can be useful for digital forensics. The configuration information of a smart home can be exploited as the basis for identifying smart home devices. The .proto file (Google Home app) and the HomeApp.json file (Google My Activity utility) contain information about the smart home that is organized around the Google Nest Hub. The files can provide such details as the name of the smart home, information about each room, and a list of devices installed in each room. In Samsung SmartThings, the device information is included in the 'devices' table of cloudDb.db. Based on these details, the investigator can decide which devices and data to target.

Information about Movement
Data about movement can help to determine the time of the invasion in the case of an intrusion event. For example, such data can be used to determine the actual time of the raid and establish if the house owner made a false statement.
The multipurpose SmartThings sensors, which can be used as door or window sensors, detect the open and close states of the objects they attached to. The data collected by each such sensor is recorded in ActivityLogDb.db (Samsung SmartThings app) and /cache/http/16bytes_hex_value.1 (Samsung SmartThings Classic app), which includes the following:
These data can provide information about whether or not a person was in a specific place at a particular time and the duration of the stay. Motion data can be obtained from two devices, the motion sensor of the Samsung SmartThings and the TP-Link camera. In the Samsung SmartThings, motion sensor data such as the time of motion start and finish and temperature is stored in the same file as data from the multipurpose sensor. The TP-Link camera stores thumbnails of videos recorded when motion occurs in /cache/image manager disk cache/32bytes_hex_value.0. These thumbnails can provide information about the appearance or shape of a suspect in a housebreaking incident, for example.
By combining motion and door sensor information, it is possible to determine whether a person has entered or left the house. The open and close event of the door sensor occurred before or after motion detection indicates that someone has entered or left the home, respectively.

Information about Voice Commands
Voice information can be very useful for identification. Voice files are generated when voice commands are used to control devices or retrieve information. These data provide information about the activity history and voices of users. For example, suppose a person identified as a suspect in crime used a voice command from a Google Nest Hub to get directions to the crime scene. Investigators could identify the voice through the recorded file. Voice command data can also be used for purposes such as verifying statements or understanding the situation at the site.

Information about Calls
The user call logs, which include the call duration and information about the other party, can be used as very important evidence. If a user has a call log with the Google Nest Hub, investigators can guess that someone was home at that time. To arrange a call between the Google Nest Hub and a smartphone, the user can use the "Call my Home devices" function within the Google Duo app. In this case, the phone numbers for other_id and self_id would be the same in the call log of the Duo app. If the use tries to call his/her mobile number using the default system call app, the call will be directed to the voicemail. This means that the call log in the default system call app does not allow to identify whether the user has tried to call the Google Nest Hub or check the voicemail. In this case, investigators need to compare the default system call app and Google Duo app data to identify the purpose of the call. If the call record exists in the default system call app but not in the Google Duo app, this means that the call has been connected to the voicemail. If not, this indicates that the user has tried to call the Google Nest Hub. Figure 3 visualizes the sources for obtaining different types of information. The Google Home, SmartThings, SmartThings Classic, and Google Duo apps (but not the Kasa Smart app) do not allow users to delete the user data in the app unless deleting the app or the account; hence users cannot arbitrarily manipulate the data. Therefore, the data sources identified in this study are expected to be useful for providing convincing evidence in courts.

Conclusions and Future Development
This study looked at the data that can be acquired from smartphones, the Google Web interface, and private Google Home APIs, and identified the ones that can be useful for digital forensic analysis. Companion apps installed on smartphones and paired with smart home devices can provide information on the location and model of each device. The activity history recorded by each device can be useful for forensic investigations. For example, the statement of a suspect can be verified through motion, temperature, command, and voice data, which are also useful for inferring the time of an event occurrence. This study identified the sources of various types of data generated by a set number of smart home devices that can be cross-checked to establish the actual sequence of events. Future work should focus on analyzing a wider range of smart home devices to cover many different scenarios during a forensic investigation since the type and amount of data generated by smart home devices varies across device manufacturers and models. Moreover, the investigation of the physical memory forensic of smart home devices should be included.