A Methodology and Tool for Investigation of Artifacts Left by the BitTorrent Client

.


Introduction
Internet technologies are improving every day, occupying new areas of our daily life, presenting new services, making things easier, and changing the way we live.Millions of individuals benefit from the online virtual world.This phenomenon has exponentially increased in recent years.One of these Internet technologies is peer-to-peer (P2P) applications [1].P2P applications enable file sharing over the Internet.There are many kinds of P2P network protocols.One of the best known is the BitTorrent protocol [2].This protocol introduced a new technology and a new terminology to file sharing and became the de facto standard for file distribution over the Internet.The BitTorrent protocol was designed to ease the sharing of large files, but this technology was so innovative and simple to use that people adopted it to carry out illegal activities, like sharing of copyrighted material [2].Illegal copies of copyrighted content can be found in more than two-thirds of torrents registered at one of the most popular BitTorrent trackers [3].Such a use of the protocol reduces the copyright holder's possible revenue due to lost sales of the original copy.Consequently, the use of the BitTorrent protocol for illegal purposes creates a new type of cybercrime and new challenges for forensics investigators in order to fight against it.Moreover, users involved in the use of the BitTorrent protocol have to allow their resources to be used for file sharing since BitTorrent applications may punish non-uploaders by limiting their download bandwidth [4].In such a way, users who desire the service have to choose either to be involved in committing cybercrime or refuse the service.Only aware users can make this Symmetry 2016, 8, 40; doi:10.3390/sym8060040www.mdpi.com/journal/symmetrychoice.However, many users are unaware and take part in cybercrime without knowing it.Therefore, the fight against illegal uses of the BitTorrent protocol must be intensified.The universe of the BitTorrent protocol can be regarded as a structure consisting of several levels of hierarchy.At the highest level, the universe of the BitTorrent network can be represented as being divided into many BitTorrent swarms [5].Each shared piece of content forms a BitTorrent swarm that consists of trackers and peers [6].A peer is an agent that runs an implementation of the protocol.To initiate a download of the file, there are two possibilities: (1) the user must download a metadata .torrentfile from a BitTorrent indexing website, or (2) the user must download a magnet universal resource identifier from a BitTorrent indexing website.Then, the user's BitTorrent client application interprets the metadata and uses it to detect other peers using one of the following methods: a tracker, a distributed hash table (DHT), or peer exchange (PEX).A tracker is a server that maintains a list of peers.Peers are required to distribute the file, exchange control messages in order to update its status and keep the list of active peers up to date.DHT is a distributed tracker that allows peers to locate the other peers requesting information from BitTorrent clients without the requirement for a central server.PEX enables a direct interchange of peer lists with other peers.
The family of products using the BitTorrent protocol constantly improves and expands.The most recent product in this family is BitTorrent Sync.It is a file replication utility released in April 2013 [7].The utility is very convenient for those who are involved in illegal activities, because it guarantees the data transfer to be secure from inspection while in transit [8].Therefore, the utility can be exploited for several potential cybercrimes as follows: distribution of child pornography, distribution of copyrighted material, delivery of malicious software, industrial espionage, etc. [9].
Not only are new applications and software released, but new versions of operating systems are released, as well.Several years ago, Microsoft released a new version of the operating system Windows 8, which was then improved and upgraded to Windows 8.1.This operating system is the newest one in the Windows family, except Windows 10.Windows 8.1 is quite successful, and the popularity of Windows 8.1 is growing [10].The purpose of this paper is to investigate and locate the digital fingerprints [11], called artifacts at the higher abstraction level, left by the BitTorrent client on a local computer and to present the tool we created that provides a convenient way to study those artifacts.
The artifacts of the BitTorrent client can be separated into two parts: (1) the artifacts left on the local client computer; and (2) the artifacts observed on the computer network.In the next section, we review the related work that considers the artifacts left by the BitTorrent client on a local computer.

Review of Related Work
We present this review of related work in chronological order of their emergence, because it might make it easier to understand the direction of the development of BitTorrent client forensic investigation on local computers.
Adelstein and Joyce [12] introduced the File Marshal tool for automatic detection and analysis of P2P application usage.The tool is oriented toward the analysis of a static image of the computer disk.File Marshal is a universal tool; it is not dedicated to the specific P2P protocol.The tool's management is based on the configuration file.The file indicates the location of log files and names of registry keys.If a special code is required to analyze a file (e.g., to decode a peer list or time format), the configuration file enumerates the Java modules, which are used for parsing; new parsers have to be created as needed.Subsequently, experience is needed to successfully use the File Marshal tool.Nevertheless, the tool had success and maintenance was ensured for it.Later, the File Marshal project was renamed P2P Marshal [13].One might note that the File Marshal tool initially did not include the analysis of BitTorrent protocol artifacts.
Anyone consciously participating in illegal activity tries to hide the evidence.Special erasure programs are created for such a purpose.Woodward and Valli [14] examined whether the available erasure programs remove the evidence of BitTorrent protocol artifacts.The erasure programs MaxErase, P2PDoctor, Privacy Suite, Window Washer, Windows R-Clean and Wipe were investigated on a machine that had used the BitTorrent client Azureus.Woodward and Valli concluded that the available erasure tools are not effective at removing evidence of the BitTorrent protocol and the forensic investigator can successfully recover the data.
In the next study, Woodward [15] studied whether the BitTorrent clients running on Windows 7 leave behind the meaningful data.The BitTorrent client programs BitComet, BitTornado, µTorrent, and Vuze (formerly Azureus) were investigated using default settings.The research was dedicated to studying the registry and local data area within the Windows operating system.Woodward concluded that all BitTorrent client programs created the same data during their operation.This data could be used as a basis for the location of the initial source of a downloaded file.
Lallie and Briggs [16] explored three BitTorrent client applications, namely BitComet, Vuze and µTorrent.The purpose of the study was to outline the registry artifacts created during the installation of the BitTorrent client on a Windows 7 machine.Several authors [15,17] were already doubtful about the evidential value of registry keys.The study showed that many artifacts are created in the registry keys.However, these artifacts can mainly determine that a BitTorrent client has been used on the computer.The most significant discovery in the registry was an identification of the BitComet subkey that contains a record of the website URL from which the last torrent was opened and downloaded.Lallie and Briggs confirmed the already known result that the artifacts in the registry keys can only show who installed and who used the application.
The latest product in the BitTorrent family is BitTorrent Sync, which is mainly devoted to file replication.Farina et al. [7] investigated in detail the evidence left behind by BitTorrent Sync in the client computer.The authors conducted an experiment during which they installed the BitTorrent Sync client on a Windows XP machine.A complete list of the files created during the installation procedure is provided.The registry keys created during the installation procedure are represented, as well.It is worth noting that the default installation process creates a BTSync folder with three hidden files: .SyncID, .SyncIgnore, and .SyncArhive (folder).Finally, the BitTorrent Sync application was uninstalled.The remaining Windows registry keys after uninstallation are provided, as well.
Venčkauskas et al. [18] investigated the latest version of the BitTorrent client for the Windows 8.1 operating system.The contents of the changed registry keys and the configuration files are provided.During the experiment, the authors considered the possibility of hiding the evidence of the use of the BitTorrent client.The investigation found that the evidence is left in the Windows registry or in the BitTorrent configuration files, or both, depending on the method used to remove the BitTorrent client application.
The BitTorrent client program always generates several files that contain the information related to the BitTorrent program.All these files are stored in BEncode format [19].Dedicated tools are needed to open and read such files.The BEncode Editor [20] allows BEncoded files to be opened.However, the editor provides only the basic functionality of extracting the keys and values.Many values are presented in integer or binary format that cannot by comprehended by a human reader.Other utilities are needed to convert them into a more readable form.
The Plaso framework [21] allows automatic information extraction from a large number of sources including the Windows registry, browsers histories, file systems and others.The Plaso framework provides the basic interface for BEncode plugins [22].However, the framework lacks many functions useful for forensic investigators when analyzing BitTorrent files.Therefore, more elaborate tools are needed to aid forensic investigators in the analysis of BitTorrent files.
The review revealed that the BitTorrent protocol and the supporting environment are constantly evolving.New BitTorrent clients, which generate their own application data, enter the market.The application data generated by the different BitTorrent clients have separate structure and distinct contents.Therefore, new research is needed in order to keep pace with the state of the art for the BitTorrent protocol and its supporting environment.Next, the reviewed tools lack many functions useful for forensic investigators to analyze BitTorrent files.Consequently, new, more elaborate tools, which would enable analysis of BitTorrent client artifacts without knowing the particular structure of the application data, are needed to facilitate analysis.

Methodology to Locate BitTorrent Client Artifacts and Results of the Experiment
To locate the artifacts left by the BitTorrent client on a local computer, we suggest the following methodology:

1)
Create an experimental environment consisting of the server and virtual machines.

2)
Install the BitTorrent client onto the virtual machines and analyze the changes in the file system.3) Download a file using the BitTorrent client without an Internet connection and analyze the changes in the BitTorrent configuration files.4) Download the file using the BitTorrent client when the virtual machines are connected to the Internet and analyze the changes in the BitTorrent configuration files.
The goal of the second step of the methodology is to locate the files which are added by the BitTorrent client application during installation and investigate their contents.The goal of the third step is to locate the evidence of BitTorrent activity in the BitTorrent configuration files.In order to keep a clean experimental environment and to avoid possible pollution by Internet artifacts, we suggested the virtual machines not connect to the Internet during the third step.Firstly, it needs to investigate the BitTorrent configuration files in a limited environment.That would ensure the minimal contents of the files.
The goal of the fourth step of the methodology is to locate the evidence of BitTorrent activity in the BitTorrent configuration files when the virtual machines are connected to the Internet.This would allow us to observe the artifacts added by the Internet connection and to investigate the real environment.That is the difference from the third stage.
To start an experiment according to the methodology, we created the experimental environment that consisted of a host machine and a virtual environment running on the host machine (Figure 1).The virtual environment included three virtual machines two client machines, VM1 and VM2, and a virtual server to connect these machines to the virtual local network and provide an Internet connection.This a model of an ISP (Internet service provider).To locate the artifacts left by the BitTorrent client on a local computer, we suggest the following methodology: 1) Create an experimental environment consisting of the server and virtual machines.
2) Install the BitTorrent client onto the virtual machines and analyze the changes in the file system.3) Download a file using the BitTorrent client without an Internet connection and analyze the changes in the BitTorrent configuration files.4) Download the file using the BitTorrent client when the virtual machines are connected to the Internet and analyze the changes in the BitTorrent configuration files.
The goal of the second step of the methodology is to locate the files which are added by the BitTorrent client application during installation and investigate their contents.The goal of the third step is to locate the evidence of BitTorrent activity in the BitTorrent configuration files.In order to keep a clean experimental environment and to avoid possible pollution by Internet artifacts, we suggested the virtual machines not connect to the Internet during the third step.Firstly, it needs to investigate the BitTorrent configuration files in a limited environment.That would ensure the minimal contents of the files.
The goal of the fourth step of the methodology is to locate the evidence of BitTorrent activity in the BitTorrent configuration files when the virtual machines are connected to the Internet.This would allow us to observe the artifacts added by the Internet connection and to investigate the real environment.That is the difference from the third stage.
To start an experiment according to the methodology, we created the experimental environment that consisted of a host machine and a virtual environment running on the host machine (Figure 1).The virtual environment included three virtual machines two client machines, VM1 and VM2, and a virtual server to connect these machines to the virtual local network and provide an Internet connection.This a model of an ISP (Internet service provider).The preparation steps of the virtual environment were as follows: 1) We created a virtual machine image using Windows 8.1, installed all updates available on the day of the experiment, downloaded [23] and saved the latest version of the BitTorrent download The preparation steps of the virtual environment were as follows: 1) We created a virtual machine image using Windows 8.1, installed all updates available on the day of the experiment, downloaded [23] and saved the latest version of the BitTorrent download client application (version 7.9.3 (build 40634)) installation file BitTorrent.exe to the user download directory.

2)
On the host computer using the Hyper-V manager, we created a default gateway virtual machine using Windows Server 2008 and Forefront Security, which is used to model an ISP in the real world.The default gateway was set up to have two virtual networks: an external virtual network to provide the Internet connection to the virtual machines, and an internal virtual network to connect the virtual machines to the default gateway.

3)
On the host computer using the Hyper-V manager, we created two identical virtual machines, VM1 and VM2, using an image created at step 1.

4)
VM1 and VM2 were connected to the internal network using static internal IP addresses as shown in Figure 1.
The file resume.dathad many keys; however, many values were empty.We have chosen the keys and values which are of interest for a forensic investigator (Table 2).The file resume.dathad many keys; however, many values were empty.We have chosen the keys and values which are of interest for a forensic investigator (Table 2).

Key Value
The most valuable information for a forensic investigator is IP addresses of peers that took part in the file downloading process.We examined the key peers6 (54) and decoded it as 18 bytes for every IP address for the peer: The most valuable information for a forensic investigator is IP addresses of peers that took part in the file downloading process.We examined the key peers6 (54) and decoded it as 18 bytes for every IP address for the peer: Then every 18 bytes can be decoded as follows: Binary array having length of 3120 bytes (6240 hexadecimal digits) The key nodes (b) (6240) contained a lot of coded information (Figure 3).Therefore, we further examined the coded information.
Then every 18 bytes can be decoded as follows: Table 2. Non-empty keys and values of resume.datfile.
The most valuable information for a forensic investigator is IP addresses of peers that took part in the file downloading process.We examined the key peers6 (54) and decoded it as 18 bytes for every IP address for the peer: Then every 18 bytes can be decoded as follows: Binary array having length of 3120 bytes (6240 hexadecimal digits) The key nodes (b) (6240) contained a lot of coded information (Figure 3).Therefore, we further examined the coded information.The key nodes (b) (6240) contained a lot of coded information (Figure 3).Therefore, we further examined the coded information.The key nodes (b) are decoded as 26-byte (52 hexadecimal digits) IP addresses for every node: We found 240 IP addresses (6240/26 = 240).The first four of them were decoded and they are presented in Table 4.The key nodes (b) are decoded as 26-byte (52 hexadecimal digits) IP addresses for every node: The key nodes (b) are decoded as 26-byte (52 hexadecimal digits) IP addresses for every node: We found 240 IP addresses (6240/26 = 240).The first four of them were decoded and they are presented in Table 4.  Investigation of the file resume.datrevealed that the value of the key peers6 increased to 216, when the value at the third step of the experiment was 54.Consequently, we obtained 12 IP addresses, which were decoded and presented in Table 5.To summarize the fourth step of the experiment we conclude that the file dht.dat contains the IP addresses of many nodes to aid in the process of downloading the file.The number of participating peers during the file download when the virtual machines were connected to the Internet in the file resume.datwas higher than when the connection was disabled.
During the experiment, we experienced many times when it was quite inconvenient to examine the BEncoded BitTorrent configuration files.Therefore, we propose to implement a tool to locate the BitTorrent client artifacts and to present them in a human-readable format.In the next section, we review the implementation of this tool and provide illustrative results of its use.

Tool to Locate BitTorrent Client Artifacts and Results of the Experiment
The BEncode Editor is designed to view and edit the encoded BitTorrent files.However, our experiment revealed that many additional utilities are required to feel comfortable with the data opened in BEncode Editor.Therefore, we designed and implemented the tool to present all the valuable data for the forensic investigator in a human-readable format.The choice of the data to be presented is based on our experience during the experiment presented in the previous section.Therefore, we included all the data presented in Tables 1-3.Additionally we decided to include the data from a file settings.dat.The file settings.datstores the information about configuration parameters of the BitTorrent client.A lot of keys are used, and many of them are not empty.However, only a few of the keys of the file settings.datare of interest to the forensic investigator.We report them in Table 6.The most suitable format to represent data with the format of keys and their values is an Extensible Markup Language (XML).XML presents a simple, very flexible text format to meet the challenges of large-scale electronic publishing [24].XML format looks much like the HTML format used to represent Internet pages.However, XML is much more flexible, since it allows the introduction of one's own tags.XML documents can be viewed in many browsers.Special programs have been created to view and edit XML documents, like XML Notepad [25], which displays friendly colorful views.
We decoded the values of all the keys presented in Tables 1-3 and 6 and we placed them into the XML document.The names of the tags will consist of composite names with the file name and then the name of the key.
When a cybercrime is committed using a BitTorrent utility, it is helpful for the forensic investigator to know all the peers who shared the copyrighted or otherwise illegal material.Therefore, we have decided to aid the forensic investigator and supply additional information on the basis of the decoded values.That is, we verify every IP address taken from considered files in the open Whois database [26].This database allows the IP provider to be obtained for every IP address.This information enables the forensic investigator to contact the IP provider to find out who is using the particular IP address.
Additionally, we used the GeoLite2 [27] database that provides for the IP address in question the country name, subdivision name, postal code, and two coordinates on the map: latitude and longitude.These coordinates show the location of the IP address considered.These coordinates are not very accurate; however, they enable the forensic investigator to guess the approximate location of the IP address considered.
The tool is implemented as a command line utility with three parameters: the letter of the disk drive to search for, the full path where to place the report file, and the name of the report file.The initial plan was to include the created utility in a larger open program system, for example Autopsy [28].
One of the main challenges was to decode the BEncoded file. Figure 4 shows the hexadecimal view and illustrates the process of the file decoding.The most suitable format to represent data with the format of keys and their values is an Extensible Markup Language (XML).XML presents a simple, very flexible text format to meet the challenges of large-scale electronic publishing [24].XML format looks much like the HTML format used to represent Internet pages.However, XML is much more flexible, since it allows the introduction of one's own tags.XML documents can be viewed in many browsers.Special programs have been created to view and edit XML documents, like XML Notepad [25], which displays friendly colorful views.
We decoded the values of all the keys presented in Tables 1-3 and 6, and we placed them into the XML document.The names of the tags will consist of composite names with the file name and then the name of the key.
When a cybercrime is committed using a BitTorrent utility, it is helpful for the forensic investigator to know all the peers who shared the copyrighted or otherwise illegal material.Therefore, we have decided to aid the forensic investigator and supply additional information on the basis of the decoded values.That is, we verify every IP address taken from considered files in the open Whois database [26].This database allows the IP provider to be obtained for every IP address.This information enables the forensic investigator to contact the IP provider to find out who is using the particular IP address.
Additionally, we used the GeoLite2 [27] database that provides for the IP address in question the country name, subdivision name, postal code, and two coordinates on the map: latitude and longitude.These coordinates show the location of the IP address considered.These coordinates are not very accurate; however, they enable the forensic investigator to guess the approximate location of the IP address considered.
The tool is implemented as a command line utility with three parameters: the letter of the disk drive to search for, the full path where to place the report file, and the name of the report file.The initial plan was to include the created utility in a larger open program system, for example Autopsy [28].
One of the main challenges was to decode the BEncoded file. Figure 4 shows the hexadecimal view and illustrates the process of the file decoding.The C# programming language was chosen as the programming language of the implementation.Firstly, the programming language is supported by a rich .NET platform.Next, this programming The C# programming language was chosen as the programming language of the implementation.Firstly, the programming language is supported by a rich .NET platform.Next, this programming language is modern and it has features enabling easy analysis of text files.One such features is a dictionary data type.This data type provides a mapping from the set of keys to the set of values.
The architectural view of the program, which provides more implementation details, is shown in Figure 5. language is modern and it has features enabling easy analysis of text files.One such features is a dictionary data type.This data type provides a mapping from the set of keys to the set of values.The architectural view of the program, which provides more implementation details, is shown in Figure 5.The presented architecture (Figure 5) implements all the ideas presented so far in Section 3. The tool searches for the family of BitTorrent files on the supplied disk drive and presents their contents in the XML format.The family of BitTorrent files consists of the following files: dht.dat, resume.dat,settings.dat,and *.torrent.The first three files are common to BitTorrent client users.The *.torrent file is separate for every downloaded file (Figure 6). Figure 6 presents a contracted view of the results produced by the tool.As we can see, the current user downloaded six files using the BitTorrent client.The information for every item can be expanded.The fully expanded file of the results is quite lengthy.We provide two small excerpts from this file in Figures 7 and 8.The presented architecture (Figure 5) implements all the ideas presented so far in Section 3. The tool searches for the family of BitTorrent files on the supplied disk drive and presents their contents in the XML format.The family of BitTorrent files consists of the following files: dht.dat, resume.dat,settings.dat,and *.torrent.The first three files are common to BitTorrent client users.The *.torrent file is separate for every downloaded file (Figure 6).

2016, 8, 40 11 of 15
language is modern and it has features enabling easy analysis of text files.One such features is a dictionary data type.This data type provides a mapping from the set of keys to the set of values.The architectural view of the program, which provides more implementation details, is shown in Figure 5.The presented architecture (Figure 5) implements all the ideas presented so far in Section 3. The tool searches for the family of BitTorrent files on the supplied disk drive and presents their contents in the XML format.The family of BitTorrent files consists of the following files: dht.dat, resume.dat,settings.dat,and *.torrent.The first three files are common to BitTorrent client users.The *.torrent file is separate for every downloaded file (Figure 6). Figure 6 presents a contracted view of the results produced by the tool.As we can see, the current user downloaded six files using the BitTorrent client.The information for every item can be expanded.The fully expanded file of the results is quite lengthy.We provide two small excerpts from this file in Figures 7 and 8. Figure 6 presents a contracted view of the results produced by the tool.As we can see, the current user downloaded six files using the BitTorrent client.The information for every item can be expanded.The fully expanded file of the results is quite lengthy.We provide two small excerpts from this file in Figures 7 and 8.We see from Table 7 that our implemented tool provides many useful features for the forensic investigator.Our possible competitors do not have many of these features.Our tool is implemented as a stand-alone tool and it can be integrated into a larger framework like Autopsy or Plaso.

Conclusions
The BitTorrent application is a popular tool dedicated for downloading large files from the Internet.It is a very effective instrument to download large video and audio files.However, this powerful instrument can be employed to commit cybercrimes, like sharing of copyrighted movie files, child pornography, and others.In order to help forensic investigators to fight against such cybercrimes, we suggested a methodology to locate the artifacts left by the BitTorrent client on the local computer.

Figure 1 .
Figure 1.Diagram of the experimental setup.

Figure 1 .
Figure 1.Diagram of the experimental setup.

Figure 2 .
Figure 2. Decoding of key age (i) in the dht.dat file.

labelDirectoryMap (d) ( 3 )
Directories for audio, documents and video files audio (b)(24) documents (b)(28) video (b)(25) labelRuleMap (d)(3) Rules for audio, documents and video files audio (b)(18) documents (b)(21) video (b)(18) settings_saved_systime (i)Settings saved system time-UNIX timestamp For all the presented files, we collected the following data concerning the file creation and access from the file system:

Figure 5 .
Figure 5.The architecture of the program.

Figure 6 .
Figure 6.View of the results in the Mozilla Firefox browser.

Figure 5 .
Figure 5.The architecture of the program.

Figure 5 .
Figure 5.The architecture of the program.

Figure 6 .
Figure 6.View of the results in the Mozilla Firefox browser.

Figure 6 .
Figure 6.View of the results in the Mozilla Firefox browser.

Symmetry 2016, 8 , 40 12 of 15 Figure 7 .
Figure 7.View of the results in the XML Notepad editor.

Figure 8 .
Figure 8.View of the results in the Internet Explorer browser.

Figure 7 . 15 Figure 7 .
Figure 7.View of the results in the XML Notepad editor.

Figure 8 .
Figure 8.View of the results in the Internet Explorer browser.

Figure 8 .
Figure 8.View of the results in the Internet Explorer browser.

Table 1 .
Keys and values of .torrentfile.

Table 1 .
Keys and values of .torrentfile.

Table 2 .
Non-empty keys and values of resume.datfile.

Table 2 .
Non-empty keys and values of resume.datfile.

Table 3 .
Keys from the dht.dat file.

Table 3 .
Keys from the dht.dat file.

Table 3 .
Keys from the dht.dat file.

Table 4 .
Decoded IP addresses of the file dht.dat.

Table 4 .
Decoded IP addresses of the file dht.dat.

Table 4 .
Decoded IP addresses of the file dht.dat.

Table 5 .
Decoded IP addresses in the file resume.dat.

Table 7 .
Summary of features and comparison with possible competitors.