Systematic Approach to Malware Analysis (SAMA)

Featured Application: The systematic and methodological process of analysis described in this document will provide a complete understanding of the life cycle of a malware specimen in terms of its behavior, operation, interaction with the environment, methods of concealment and obfuscation, system updates, and communications. Abstract: Malware threats pose new challenges to analytic and reverse engineering tasks. It is needed for a systematic approach to that analysis, in an attempt to fully uncover their underlying attack vectors and techniques and ﬁnd commonalities between them. In this paper, a method of malware analysis is described, together with a report of its application to the case of Flame and Red October. The method has also been used by di ﬀ erent analysts to analyze other malware threats like ‘Stuxnet’, ‘Dark Comet’, ‘Poison Ivy’, ‘Locky’, ‘Careto’, and ‘Sofacy Carberp’. The method presented in this work is a systematic and methodological process of analysis, whose main objective is the acquisition of knowledge as well as to gain a full understanding of a particular malware. Using the proposed method to analyze two well-known malware as ‘Flame’ and ‘Red October’ will help to understand the added value of the method.


Introduction
'Stuxnet' was the first malware that received worldwide attention for being able of causing physical damage in an industrial infrastructure. However, there are also a significant number of other known malware that can be grouped together as a recent and particular phenomenon known as Advanced Persistent Threats (APT), that deserves separate attention from other kinds of malware. 'Flame', as a part of this group, belongs to a family of complex malware that also includes 'Stuxnet', 'Duqu', and 'Gauss'. It was intended to attack particular types of networks and infrastructures using different techniques to persist during long periods of time [1]. On the other hand, another complex malware that was deployed mainly in Eastern European and Central Asian countries was 'Red October'. It was intended to attack government organizations and diplomatic services. Its operation focuses on information exfiltration in order to improve the intelligence of attackers.
WannaCry, Locky, etc. are other malware, called 'ransomware' and developed to steal money. Today there are many specimens of malware, as mentioned above, that use multiple attack techniques and vectors and are conducted stealthily to avoid detection, in order to retain unnoticed control over targeted systems for long periods of time [2]. On the other hand, in recent decades with the new educational technologies and the growth of the Massive Open Online Course (MOOCs) [3][4][5][6][7][8] there are new opportunities for this kind of malware to 'catch' new objectives and personal data which can be really important.
In this paper, we describe a new systematic approach or methodology of malware analysis and its evaluation (case studies) by applying it to two of the most complex and well-known malwares that have emerged so far: Red October and Flame. It provides a structured analysis process and the definition of the steps to be followed, the objectives to be achieved and the specific techniques and tools to be used. It proposes a four stages implementation in order to that will help the analyst to work securely to produce a repeatable and objective result, to cooperate with other analysts. In the end, using this method will provide a better understanding of the analyzed malware without forgetting the importance of using the right information in the right moment enhancing the analyst effectiveness.
Regarding the toolset that is needed for any malware analysis, it must be confirmed that the proposed method is independent of any specific tool or technic allowing an easy and cost-effective adaptation to malware analyst resources. Main characteristics of the presented method to analyze malware are as follows: • It does not depend on the complexity of the malware to be analyze • It is independent of the tools used making easier its adoption • In enhances the importance of the malware classification as one of the first steps to be done • It considers the integration of any existing information of the malware as an input for the analysis • It defines a specific sequence for the analysis based on making the analysis of code, before the dynamic or behavioral analysis • It includes a systematic feedback loop to retake the process when malware complexity requires to do it, as well as to improve the current analysis feeding previous stages with the obtained results of later ones The rest of this paper is structured as follows. Section 2 surveys our current knowledge on the types of malware analysis techniques and available tools. Then, Sections 3 and 4 discuss the design of a methodological framework to integrate such techniques in a coherent and systematic way. The detailed results of applying that method to the analysis of the malware Red October and Flame are provided in Section 5. Finally, conclusions and future research guidelines are included in Section 6.

Background
Today, malicious code, hereinafter referred to as malware, has evolved into one of the most important dangers and threats that affects the security of information technology (IT) systems. The degree of the technique complexity and the level of knowledge needed to analyze malware is proportional to the level of sophistication of the sample. The main benefits of malware analysis are the following [2,9]: To assess the malware detection capability of organizations' protection systems • To evaluate the damage caused by the intrusion and malware actions • To discover other machines that have been affected by the same malware • To identify the vulnerability that was exploited by the malware, to obtain the software update that mitigates it, if available • To obtain enough data and information to implement the appropriate defense mechanisms to mitigate and neutralize the damage caused by the malware, including firewall rules, host and network intrusion detection systems, firewall, and antivirus • To determine the level of malware sophistication and complexity • If it is possible, to determine the origin of an attack and to identify the intruder and malware developer In the analysis of malware, a series of methods and techniques are required for any analysis. They will help analysts to understand the risks, threats, and intentions associated with the malware. Thus, the obtained information may be used to reconfigure the organization's defenses in order to detect it and being protected against it.
The first and oldest analysis method used in analysis of malware is observation. Jiang, in his doctoral thesis [10], use this method to compare the states of a system before and after verified malware activity. Data is achieved by the mere observation of a machine for clues modifications of registers, files or creation of new ones as, for example, 'enbiei.exe' activity which indicates 'Blaster Worm' malware. Theerthagiri, in the article "Reverting Malware, Confirms the Following: A Detection Intelligence with In-Depth Security Analysis" [11], provides a review of the most relevant malware analysis techniques. Specifically, three specific and differentiated lines of work are identified, such as the following: • Static Code Analysis: technique that collects as much information as possible of a binary without actually executing it. To do it, disassembly and reverse engineering techniques are used. This latter technique of analysis includes others more specific as string analysis, reverse engineering, packaging, obfuscation, restricted execution environments, etc. A tool to perform this type of analysis is IDA PRO [12] or GHIDRA [13]. • Dynamic code analysis: the analysis is basically done through a type of tools called 'debuggers' that are associated with the program code under analysis to take full control of them, allowing the possibility of executing it code line by code line. Some popular debuggers for malware analysis are OllyDbg [14], ImmunityDbg [15], and WinDbg [16]. • Dynamic or behavioral analysis: It analyzes the binary or specimen malware while it is running focusing on the supervision and monitorization of the malware interaction with the environment. The aim is to look for changes in the infected machine where it is running, network traffic and potential external communications with command and control servers. It can be done manually in a virtual or physical environment using different tools that collect the data of its interaction with the machine infected or automatically by "Sandbox" toolset [17].
The Figure 1 summarizes the main methods and techniques for the analysis of malware that we have identified as a framework. As it has been mentioned in the previous paragraphs, all of them present a series of advantages and disadvantages that mean that malware analysis cannot be carried out using a single method or technique.
Appl. Sci. 2020, 10, x 3 of 31 In the analysis of malware, a series of methods and techniques are required for any analysis. They will help analysts to understand the risks, threats, and intentions associated with the malware. Thus, the obtained information may be used to reconfigure the organization's defenses in order to detect it and being protected against it.
The first and oldest analysis method used in analysis of malware is observation. Jiang, in his doctoral thesis [10], use this method to compare the states of a system before and after verified malware activity. Data is achieved by the mere observation of a machine for clues modifications of registers, files or creation of new ones as, for example, 'enbiei.exe' activity which indicates 'Blaster Worm' malware. Theerthagiri, in the article "Reverting Malware, Confirms the Following: A Detection Intelligence with In-Depth Security Analysis" [11], provides a review of the most relevant malware analysis techniques. Specifically, three specific and differentiated lines of work are identified, such as the following:


Static Code Analysis: technique that collects as much information as possible of a binary without actually executing it. To do it, disassembly and reverse engineering techniques are used. This latter technique of analysis includes others more specific as string analysis, reverse engineering, packaging, obfuscation, restricted execution environments, etc. A tool to perform this type of analysis is IDA PRO [12] or GHIDRA [13].  Dynamic code analysis: the analysis is basically done through a type of tools called 'debuggers' that are associated with the program code under analysis to take full control of them, allowing the possibility of executing it code line by code line. Some popular debuggers for malware analysis are OllyDbg [14], ImmunityDbg [15], and WinDbg [16].  Dynamic or behavioral analysis: It analyzes the binary or specimen malware while it is running focusing on the supervision and monitorization of the malware interaction with the environment. The aim is to look for changes in the infected machine where it is running, network traffic and potential external communications with command and control servers. It can be done manually in a virtual or physical environment using different tools that collect the data of its interaction with the machine infected or automatically by "Sandbox" toolset [17].
The figure 1 summarizes the main methods and techniques for the analysis of malware that we have identified as a framework. As it has been mentioned in the previous paragraphs, all of them present a series of advantages and disadvantages that mean that malware analysis cannot be carried out using a single method or technique. Once the different types and methods of malware analysis have been shown, the following paragraphs present various considerations reflected in different scientific research articles.
Malware is constantly evolving by developing obfuscation techniques using metamorphic and polymorphic techniques to frustrate static code analysis [2]. Moser et al. [18] proposed an obfuscation approach that demonstrates that static analysis techniques alone might no longer be sufficient to analyze complex specimen of malware. Also, in [19], the "Problems of Static Malware Analysis Once the different types and methods of malware analysis have been shown, the following paragraphs present various considerations reflected in different scientific research articles.
Malware is constantly evolving by developing obfuscation techniques using metamorphic and polymorphic techniques to frustrate static code analysis [2]. Moser et al. [18] proposed an obfuscation approach that demonstrates that static analysis techniques alone might no longer be sufficient to analyze Appl. Sci. 2020, 10, 1360 4 of 31 complex specimen of malware. Also, in [19], the "Problems of Static Malware Analysis Approaches", they stated that sometimes the source code of malware samples is not readily available. As the analysis of binaries is complex and it requires high knowledge of reverse engineering, it is probable that the disassembly of binaries of the malware presents ambiguous results. Given the problems indicated for this type of analysis, this article demonstrates that it must be complemented with other types of analysis in order to maximize its effectiveness.
Derivative of the cost of resources and time of a manual analysis, some methodologies to automate the process have been developed, such as the one that is presented in this paper. Its implementation is done via virtual and isolated run environments known as "Sandbox" [11]. Basically, it is a supervised and controlled execution environment of malicious code so that it cannot do any harm to the real system that hosts the virtual environment. It is usually implemented through a virtual environment where the processor, memory, and file system are simulated.
Given the complexity of carrying out manual analysis, the need to automate analysis arises, in principle, due to the high cost in human resources and time. However, this automation can only be carried out, nowadays, in behavioral analysis using virtualization environments. This does not constitute a global solution of malware analysis. Nevertheless, it can be used as an element of support, taking into consideration some inconveniences that are presented below.
The main purpose of an automated dynamic malware analysis [20] is to support the analysts to understand the malware behavior and to obtain data, effectively, in order to facilitate reverse engineering to design effective countermeasures.
Furthermore, in the [9], it is indicated that using 'Sandbox' presents important drawbacks, for example, the inconvenience of running the malware and lose information due to the lack of registering all events because its complexity. To address this limitation, it is always necessary to perform dynamic code analysis.
All of them record the actions taken by the malware under analysis automatically, but they have the drawback of analyzing only a single execution path, so they could ignore relevant behavior of the malware. (i.e., malware logic behavior under certain conditions like system date). According to Gandotra et al. [2], "The virtual environment in which malwares are executed is different from the real one and the malwares may perform in different ways resulting in artificial behavior rather than the exact one".
Another problem that has to be faced using this type of tools and performing dynamic analysis and behavioral techniques, are the methods of malware evasion. It means that, in order to prevent the analysis of the malware, it can detect the presence of an instrumented environment of analysis forcing malware to hide or inhibit its malicious behavior [19]. It is possible to use the Pafish [28] tool that allows the analyst to test if the environment for the analysis is properly implemented, thanks to its capability to detect sandboxes and analysis environments in the same way as malware do.
An additional important aspect of a malware analysis methodology is to identify the tools that are used in the different types of malware analysis. As it has been said before, there are plenty of tools both free and paid available.
So as to create an effective laboratory for malware analysis, a study of these tools is necessary. It means that depending on the machines that will be used and the type of analysis that will be performed, a specific toolset must be chosen. The specific toolset shown in this paper is a result of a study based on the needs of each phase of the methodology and their functionality taking into consideration the ease of use and the capabilities they provide. The Table 1 shows a classification of tools based on its functionality. It was found quite complex to identify a toolkit which could be seen as a standard. The wide variety of tools, techniques and methods for analyzing malware, the disadvantages of the dynamic analysis automatization tools, sandbox, and the increasing complexity of malware have pushed the development of many tools. This situation forced us to study around one hundred of tools in order to select the most adequate ones to be used in the in the present article.
It has been confirmed that having a method independent of technical tools may be useful to adapt it to the complexity that different types of malware may present [29]. Beyond technical tools or individual types of analysis, it seems that having a methodology that allows us to respond systematically to the complexity that malware presents today could be necessary. It is therefore necessary to develop systematic and methodological analysis process in order to provide a set of procedures, methods, and techniques with their associated tools to help the analyst to gain an understanding and complete information malware under study. In this way, by applying this methodological process, "systematize the analysis of malware" could be possible as well as the contribution to enhance the prevention by identifying malware thanks to the results obtained after applying the methodology.
Without a high level structured approach, it is known that it is not only difficult to know which tool must be used, but also when, where an in which order should be used. That is why the need of a methodology is justified, because it provides the basis for conducting a systematic and methodological analysis process for malware analysis.
In addition, establishing this methodology can allow the distribution of the malware analysis phases among different groups of analysts, laboratories or computer, each one with its own strengths, knowledge, and specific resources for the analysis. Using this methodology could also allow the sharing of the result with other groups of analysts able to understand it (if they know the methodology that was used).

Related Work
At present, we have identified only one malware analysis methodology [30]. It presents a comprehensive methodology for malware analysis that includes a structured analysis process. It also defines the steps to follow to achieve the objectives and the techniques and tools used in each phase.
It proposes four logical phases which will help the analyst to produce a repeatable and objective output in order to obtain a better understanding of the analyzed malware. These phases are as follows: 1. Detection 2.
Isolation and extraction 3.
Code analysis However, we do not consider it very appropriate, since there are other procedures for obtaining malware that are more effective because the malware has already been compressed or it can even be obtained from another organization that has already extracted it with its own forensic procedures Appl. Sci. 2020, 10, 1360 6 of 31 assuring always the chain of custody. It also proposes its application to the judicial sector, which is only a particular case of the one described above.
This methodology includes the 'detection' phase first. This implies high complexity due to the need of very different ways to get the malware including complex and expensive defense systems such as antivirus, intrusion detection systems (IDS), firewall, or security information and event management (SIEM). It means that the difficulty to systematize and standardize is very high, so it is the inclusion of this phase.
The 'isolation and extraction' phase is focus on the rootkit malware extraction process. Nowadays, more dangerous and complex malware such as Advanced Persistent Threat (APT) or Ransomware, do not work in the same way that old fashion malwares making this phase invalid for these types of specimen. The conclusion is that this phase could not respond adequately to the complexity of the current malware. The main goal of these tasks is the collection, analysis, and preservation of data from an IT machine as part of forensic processes, to make it admissible in a court of law [31,32].
For the reasons mentioned above, it is considered not necessary to include the two previous phases in the methodology set out in this article. Apart from the aforementioned, a number of important differences are also identified:

•
One of the main differences with the methodology proposed in this paper is the execution order of the phases, performing the 'code analysis' phase first, before the 'dynamic or behavioral analysis' phase. That is why the information gathered during 'code analysis' is usually very useful for the following phase as passwords, installation commands, orders of command and control, possible execution paths, etc. • MARE does not include the technique of 'dynamic analysis of code' by means of a debugger. This technique is very important to understand how malware works. • It has not been published its application to a specific case study of a malware in order to test it and be validated. However, the methodology advanced has been applied to eight cases, two of them being included in this article.
The Figure 2 represents MARE and SAMA comparison with the differences indicated in previous paragraphs.
Appl. Sci. 2020, 10, x 6 of 31 obtained from another organization that has already extracted it with its own forensic procedures assuring always the chain of custody. It also proposes its application to the judicial sector, which is only a particular case of the one described above. This methodology includes the 'detection' phase first. This implies high complexity due to the need of very different ways to get the malware including complex and expensive defense systems such as antivirus, intrusion detection systems (IDS), firewall, or security information and event management (SIEM). It means that the difficulty to systematize and standardize is very high, so it is the inclusion of this phase.
The 'isolation and extraction' phase is focus on the rootkit malware extraction process. Nowadays, more dangerous and complex malware such as Advanced Persistent Threat (APT) or Ransomware, do not work in the same way that old fashion malwares making this phase invalid for these types of specimen. The conclusion is that this phase could not respond adequately to the complexity of the current malware. The main goal of these tasks is the collection, analysis, and preservation of data from an IT machine as part of forensic processes, to make it admissible in a court of law [31,32].
For the reasons mentioned above, it is considered not necessary to include the two previous phases in the methodology set out in this article. Apart from the aforementioned, a number of important differences are also identified:


One of the main differences with the methodology proposed in this paper is the execution order of the phases, performing the 'code analysis' phase first, before the 'dynamic or behavioral analysis' phase. That is why the information gathered during 'code analysis' is usually very useful for the following phase as passwords, installation commands, orders of command and control, possible execution paths, etc.  MARE does not include the technique of 'dynamic analysis of code' by means of a debugger. This technique is very important to understand how malware works.  It has not been published its application to a specific case study of a malware in order to test it and be validated. However, the methodology advanced has been applied to eight cases, two of them being included in this article.
The Figure 2 represents MARE and SAMA comparison with the differences indicated in previous paragraphs.

Method Proposal
The aim of the proposed method is to provide an effective guidance during any detailed analysis of complex malware. The outcome of such a method will be valuable information that will allow the analyst to develop effective countermeasures as well as to extend our knowledge about the origin of malware and attack vectors used during the campaign.

Method Proposal
The aim of the proposed method is to provide an effective guidance during any detailed analysis of complex malware. The outcome of such a method will be valuable information that will allow the analyst to develop effective countermeasures as well as to extend our knowledge about the origin of malware and attack vectors used during the campaign. Conducting a systematic, repeatable, and methodological process during the analysis is very important in order to facilitate the acquisition of knowledge about a particular malware. Then the main objective is to analyze and get a full understanding of it, in terms of its operation, its identification and the ways of removing the threat. The objectives and specific information to be obtained in the process of applying a data analysis methodology are listed in the Table 2. The main structure of the methodology with its main phases is depicted in the following figure with the workflow of its application.
As it has been highlighted before, the 'code analysis' phase is performed before the 'behavioral analysis' since the data obtained in the first step can be used to improve the second, obtaining more and better data.
SAMA methodology includes a feedback loop between the final stage of 'behavioral analysis' phase and 'classification phase'. This feature would be useful when malware under analysis uses more than one files or specimens when it is running. General flow chart of the methodology and its associated processes are as follows on Figure 3: The process begins with a series of actions with the aim of starting the analysis of malware in a clean state without any possible infection. Next, in the 'classification' stage, the malware compressed and protected by password is the input. During this phase the code analysis is performed.
In the last stage, 'behavioral analysis', malware security measures to avoid its execution on a virtual environment are checked. To do so, using IDA Pro tool the magic code of VMware can be searched. If it is founded, the laboratory must be patched to allow running the analysis in the virtual environment. If it fails, the last option is to use physical environment to do the analysis. The process begins with a series of actions with the aim of starting the analysis of malware in a clean state without any possible infection. Next, in the 'classification' stage, the malware compressed and protected by password is the input. During this phase the code analysis is performed.
In the last stage, 'behavioral analysis', malware security measures to avoid its execution on a virtual environment are checked. To do so, using IDA Pro tool the magic code of VMware can be searched. If it is founded, the laboratory must be patched to allow running the analysis in the virtual environment. If it fails, the last option is to use physical environment to do the analysis.
Subsequently, the dynamic behavioral analysis is performed. If the malware is complex, it could generate more specimens during its execution, forcing the analyst to go back to the 'classification' for each of the new samples or specimens created. It means that depending on the complexity of the malware, SAMA could be applied at least one time for each piece of malware generated.
Once the full process is finished, all the information listed in Table 2 will have been provided. The phases or steps proposed in this methodology are detailed below: Table 2 should be moved here, is it fine with you?
• Initial Actions. It consists of performing a series of actions mainly to obtain a record of the configuration of the machines involved in the analysis, with the purpose of obtaining a reference enabling us to compare the state of the same before and after running the malware under study. More specifically, in the next flow chart of the steps, it is as follows on Figure 4: Preserving integrity by deactivating services: System recovery and automatic updates Preservation of integrity by baseline generation of system configuration: snapshot (VMWare) of victim's virtual machines, monitor and services; snapshot units C: (Systracer) Guarantee of the integrity of the generated samples: MD5 hash of the snapshot of the C units: by means of the tools "Md5sum" and verification with "WinMD5".
Appl. Sci. 2020, 10, x 9 of 31 Subsequently, the dynamic behavioral analysis is performed. If the malware is complex, it could generate more specimens during its execution, forcing the analyst to go back to the 'classification' for each of the new samples or specimens created. It means that depending on the complexity of the malware, SAMA could be applied at least one time for each piece of malware generated.
Once the full process is finished, all the information listed in Table 2 will have been provided. The phases or steps proposed in this methodology are detailed below: Table 2 should be moved here, is it fine with you?
 Initial Actions. It consists of performing a series of actions mainly to obtain a record of the configuration of the machines involved in the analysis, with the purpose of obtaining a reference enabling us to compare the state of the same before and after running the malware under study. More specifically, in the next flow chart of the steps, it is as follows on Figure 4: Guarantee of the integrity of the generated samples: MD5 hash of the snapshot of the C units: by means of the tools "Md5sum" and verification with "WinMD5".
The tools selected to these phases are Systracer, MD5Sum, and Gmer.
 Classification. It is to examine the malware executable file, without accessing your code, to  Figure 4. Diagram of the initial actions step.
The tools selected to these phases are Systracer, MD5Sum, and Gmer.
• Classification. It is to examine the malware executable file, without accessing your code, to confirm it is some kind of malware and to obtain information in open sources, like the Internet, of its functionality that aid to realization of the following phases of analysis. It even enables the realization simple network firms, should be necessary for reasons of urgency in protecting systems. The tasks executed in this phase are:  The objectives and specific information to be obtained in this step will be shown in the Table 3.  The objectives and specific information to be obtained in this step will be shown in the Table 3. Table 3. Information to obtain in classification step.

Information to Obtain in Classification
Step In case of the malware being obfuscated, the process to follow would be (see Figure 6): In case of the malware being obfuscated, the process to follow would be (see Figure 6): The tools selected to these phases are Winmd5, Ssdeep, Bintext, String, PEiD, Detect It, Scylla, Yara (with packer and cipher rules), Binwalk, PEBrowse, Winhex, PE Explorer, ProcDump 32, and Dependecy Walquer.


Code Analysis. It consists of a static and dynamic analysis of the malware's code, in assembly language, performing a detailed browsing process through it in order to get a better understanding of its functionality. It is a complex process that provides information for carrying out the next phase. It requires a reverse engineering analyst to find hidden features that involve new execution paths, which could be tried by the malware during the next phase to interact with the user. These discoveries will help to understand better its purpose and its functionality. In the next flow chart, the detail of the analysis steps is shown on Figure 7: The tools selected to these phases are Winmd5, Ssdeep, Bintext, String, PEiD, Detect It, Scylla, Yara (with packer and cipher rules), Binwalk, PEBrowse, Winhex, PE Explorer, ProcDump 32, and Dependecy Walquer.

•
Code Analysis. It consists of a static and dynamic analysis of the malware's code, in assembly language, performing a detailed browsing process through it in order to get a better understanding of its functionality. It is a complex process that provides information for carrying out the next phase. It requires a reverse engineering analyst to find hidden features that involve new execution paths, which could be tried by the malware during the next phase to interact with the user. These discoveries will help to understand better its purpose and its functionality. In the next flow chart, the detail of the analysis steps is shown on Figure 7: Appl. Sci. 2020, 10, x 12 of 31 After finishing the dynamic code analysis stage, the system will be returned to its original uninfected state because using a debugger means the execution of malicious code in the victim machine.
The objectives and specific information to be obtained in this step will be shown in the Table 4: Table 4. Information to obtain in classification step

Information to Obtain in Code Analysis
Step Information for running the malware. Installation information. Execution command. Passwords. Command and control commands.
Other execution routes, etc. IRC channel and connection password. Information to avoid operating restrictions in virtual environments.
Hidden functionalities. Compression of the operation of the malware.
The tools selected to perform this phase are PE Explorer, IDA Pro, Ghidra, Reverse Engineering Compiler, and Ollydbg.


Behavioral Analysis (See Figure 8). It consists in analyzing the implementation of malware in a lab that simulates a real environment, in order to observe malware behavior, network traffic generated and thus, the actions that are taken on the target system (registry modifications, files, etc.). After finishing the dynamic code analysis stage, the system will be returned to its original uninfected state because using a debugger means the execution of malicious code in the victim machine.
The objectives and specific information to be obtained in this step will be shown in the Table 4: Table 4. Information to obtain in classification step.

Information to Obtain in Code Analysis
Step Information for running the malware. Installation information. Execution command. Passwords. Command and control commands.
Other execution routes, etc. IRC channel and connection password. Information to avoid operating restrictions in virtual environments.
Hidden functionalities. Compression of the operation of the malware.
The tools selected to perform this phase are PE Explorer, IDA Pro, Ghidra, Reverse Engineering Compiler, and Ollydbg.

•
Behavioral Analysis (See Figure 8). It consists in analyzing the implementation of malware in a lab that simulates a real environment, in order to observe malware behavior, network traffic generated and thus, the actions that are taken on the target system (registry modifications, files, etc.). The objectives and specific information to be obtained in this step will be shown in the Table 5. This phase is divided in five stages. First, one consists in taking a snapshot of the system to compare their condition before and after the execution of the malware, using specific tools. This helps to identify files that have been added or modified in the system and to understand what changes have occurred after the malware execution. The objectives and specific information to be obtained in this step will be shown in the Table 5. This phase is divided in five stages. First, one consists in taking a snapshot of the system to compare their condition before and after the execution of the malware, using specific tools. This helps to identify files that have been added or modified in the system and to understand what changes have occurred after the malware execution.
In this phase, it is needed to analyze the network traffic between machines in order to detect the interaction between the malware and command and control trough a simulated channel. After launching the different malware functionalities, all the captured traffic will allow the analyst to perform a more detailed analysis of the malware.
Added to the other analyses it is the RAM memory check that could be done with the tools volatility or memorize as follows on Figure 9: Appl. Sci. 2020, 10, x 14 of 31 In this phase, it is needed to analyze the network traffic between machines in order to detect the interaction between the malware and command and control trough a simulated channel. After launching the different malware functionalities, all the captured traffic will allow the analyst to perform a more detailed analysis of the malware.
Added to the other analyses it is the RAM memory check that could be done with the tools volatility or memorize as follows on Figure 9: The proposed methodology is intended as a basis for conducting a systematic and methodological process of analyzing complex and flexible malware that may require slight variations given the large number of existing types of malicious codes and the used technologies in its development, which may apply to specific or granular parts of the code. However, this situation does not change the general structure thereof.

Study Cases
The research shown in this article corresponds to the category 'experimental development' (systematic work based on existing knowledge, derived from research and/or practical experience, aimed at the production of new materials, products, devices, processes, systems and services, or the improvement of existing ones) [33], since it is a scientific work oriented to obtain a new methodology or systematic process of malware analysis.
Likewise, taking as a reference the article "Research Methods in Computer Science", by Serge Demeyer [34], and taking into account that the article falls within the field of Computer Science, the research method used is the dominant one within this type of science, are the so-called 'case studies': useful to "investigate a contemporary phenomenon within its real context, when the boundaries between the phenomenon and the context are not clearly evident [35]. This is why two case studies of the application of the malware analysis methodology proposed to the Red October and Flame malware have been included in the article. The first one is presented in a more complete way to reflect in detail the application of the methodology and a second one more focused on the presentation of the result of the analysis.
The reason for choosing these two malware has been because they are considered the two most complex to date, one of Russian and one of Anglo-Saxon descent. This requires an organizational approach to allocating resources in order to be effective in their use. The proposed methodology provides a systematic process of analysis and reverse engineering of complex pieces of malware, using techniques and methods that include, for example, the re-engineering of a complete malware lifecycle analysis. This means that analysis of its behavior; stealth methods, obfuscation, updates, and communications are included.
The validity of the applicability of the methodology used as a reference is confirmed by consistency between the results obtained using it and the ones obtained from expert sources [36] and [37] respectively, demonstrating its success.

Red October Analysis
This section presents the results and experience gained from the application of the methodology presented in this article, to the analysis of the malware Red October.
The main objective of this malware is the theft of classified information on systems with Windows OS and external devices like BlackBerry and using spear-phishing campaigns to spread the The proposed methodology is intended as a basis for conducting a systematic and methodological process of analyzing complex and flexible malware that may require slight variations given the large number of existing types of malicious codes and the used technologies in its development, which may apply to specific or granular parts of the code. However, this situation does not change the general structure thereof.

Study Cases
The research shown in this article corresponds to the category 'experimental development' (systematic work based on existing knowledge, derived from research and/or practical experience, aimed at the production of new materials, products, devices, processes, systems and services, or the improvement of existing ones) [33], since it is a scientific work oriented to obtain a new methodology or systematic process of malware analysis.
Likewise, taking as a reference the article "Research Methods in Computer Science", by Serge Demeyer [34], and taking into account that the article falls within the field of Computer Science, the research method used is the dominant one within this type of science, are the so-called 'case studies': useful to "investigate a contemporary phenomenon within its real context, when the boundaries between the phenomenon and the context are not clearly evident [35]. This is why two case studies of the application of the malware analysis methodology proposed to the Red October and Flame malware have been included in the article. The first one is presented in a more complete way to reflect in detail the application of the methodology and a second one more focused on the presentation of the result of the analysis.
The reason for choosing these two malware has been because they are considered the two most complex to date, one of Russian and one of Anglo-Saxon descent. This requires an organizational approach to allocating resources in order to be effective in their use. The proposed methodology provides a systematic process of analysis and reverse engineering of complex pieces of malware, using techniques and methods that include, for example, the re-engineering of a complete malware lifecycle analysis. This means that analysis of its behavior; stealth methods, obfuscation, updates, and communications are included.
The validity of the applicability of the methodology used as a reference is confirmed by consistency between the results obtained using it and the ones obtained from expert sources [36,37] respectively, demonstrating its success.

Red October Analysis
This section presents the results and experience gained from the application of the methodology presented in this article, to the analysis of the malware Red October.
The main objective of this malware is the theft of classified information on systems with Windows OS and external devices like BlackBerry and using spear-phishing campaigns to spread the malware.
Given its complexity, it has been considered to be developed by a skillful and resourceful organization, which fall within the category of 'Advanced Persistent Threats'.
The following objectives were pursued: • To test the practical applicability of the presented methodology • Identify future areas of development, to guarantee the evolution and adaptability of the methodology to the evolution of malware • Reinforce the recognition of the importance of the development and evolution of methodologies that can effectively counteract the development of malware and serve to generate transferable knowledge • To broaden the existing knowledge base on the evolution of malware development, a clear end to the design and use of malware analysis methodologies First, the laboratory environment is prepared for malware analysis Red October. The complete laboratory is based on the article "Malware Analysis. Environment Designs" by A. Sanabria [38], adapting it to the size of the experiment.
After defining the laboratory and the malware analysis tools, it is identified that the methodology is independent of both technological aspects, allowing each analyst or team of analysts to adapt the methodology to the real needs and existing capabilities. The architecture of the working environment defined for testing the methodology is presented below in the Figure 10: Appl. Sci. 2020, 10, x 15 of 31 malware. Given its complexity, it has been considered to be developed by a skillful and resourceful organization, which fall within the category of 'Advanced Persistent Threats'. The following objectives were pursued:  To test the practical applicability of the presented methodology  Identify future areas of development, to guarantee the evolution and adaptability of the methodology to the evolution of malware  Reinforce the recognition of the importance of the development and evolution of methodologies that can effectively counteract the development of malware and serve to generate transferable knowledge  To broaden the existing knowledge base on the evolution of malware development, a clear end to the design and use of malware analysis methodologies First, the laboratory environment is prepared for malware analysis Red October. The complete laboratory is based on the article "Malware Analysis. Environment Designs" by A. Sanabria [38], adapting it to the size of the experiment.
After defining the laboratory and the malware analysis tools, it is identified that the methodology is independent of both technological aspects, allowing each analyst or team of analysts to adapt the methodology to the real needs and existing capabilities. The architecture of the working environment defined for testing the methodology is presented below in the Figure 10:


Host Server. To host the virtual environment to support the working environment.  Victim. Its objective will be to constitute the platform where the malware will be executed and monitored in order to study its behavior. Built on a Windows machine.  Monitoring and Windows services. Its objective is to record and monitor network traffic produced by malware, provide specific Windows system services such as SMB and another like DNS, FTP, and WEB.  Services. The objective of this server is to provide services to malware in its interaction with the environment, such as HTTP, DHCP, Chat, IRC Server, FTP, DNS, and SMTP, to simulate its execution environment. Built on Sans Institute REMNUX malware analysis distribution. It has installed tools such as Inetsim and Netcat to simulate services for malware.
Secondly, the working tools for malware analysis must be selected. In this case, the main choice is to use freeware tools instead of licensed ones. It is important to highlight the independence of the methodology with any specific tools. This fact is demonstrated by the following table, which shows that the tools used in this case are a subset of the total tools identified in Section 4.
The tools shown in the Table 6 are those, which were considered necessary to perform the analysis. The importance of this characteristic lies in the optimization of the working tools in each analysis, without altering the effectiveness of the methodology, as well as being ready to the update process of the set of tools according to their evolution.

•
Host Server. To host the virtual environment to support the working environment. • Victim. Its objective will be to constitute the platform where the malware will be executed and monitored in order to study its behavior. Built on a Windows machine.

•
Monitoring and Windows services. Its objective is to record and monitor network traffic produced by malware, provide specific Windows system services such as SMB and another like DNS, FTP, and WEB.

•
Services. The objective of this server is to provide services to malware in its interaction with the environment, such as HTTP, DHCP, Chat, IRC Server, FTP, DNS, and SMTP, to simulate its execution environment. Built on Sans Institute REMNUX malware analysis distribution. It has installed tools such as Inetsim and Netcat to simulate services for malware.
Secondly, the working tools for malware analysis must be selected. In this case, the main choice is to use freeware tools instead of licensed ones. It is important to highlight the independence of the methodology with any specific tools. This fact is demonstrated by the following table, which shows that the tools used in this case are a subset of the total tools identified in Section 4.
The tools shown in the Table 6 are those, which were considered necessary to perform the analysis. The importance of this characteristic lies in the optimization of the working tools in each analysis, without altering the effectiveness of the methodology, as well as being ready to the update process of the set of tools according to their evolution. The following are the most relevant findings from the point of view of both the analysis of the selected malware and the use of the methodology.

•
Transfer. The malware is transferred via encrypted USB and decompressed: red_oct.document.exploit-(exploit) red_oct.bin.drop-(dropper) • Identification (see Table 7). Obtaining the HASH MD5 signature of the files that theoretically contain the malware: • Check for malware (see Table 8). Tested using online tools (virus total) and installable application Avira. Considering that Red October was detected in 2012, the results obtained from the verification phase confirm that we are analyzing a piece of malware. Identification of the level of complexity of the malware in analysis. It is confirmed that it is an APT, which allows us to prepare and be aware that it is a highly complex malware. The code of a critical file 'MSC.BAT' is identified. It is also verified that a command is executed to change the reference character table for coding which means that can be a correlation with the use of Cyrillic characters in its origin. MSC.BAT file has the following contents: chcp 1251 Repeat attrib -a -s -h -r "%DROPPER_FILE%" of "%DROPPER_FILE%" if exist "%DROPPER_FILE%" goto Repeat of "%TEMP%msc.bat" It is identified that the malware does not interact if it is not able to establish an outside connection, in particular with the command and control (C & C). It is identified that once the back door is established, an exchange of information is initiated allowing the malware to install specific modules in order to develop a precisely attack adapted to the victim.
It is true that in this case, a sample of malware detected in 2012 is being analyzed, and this allows access to a lot of information that allows for being more effective during the perform of the analysis. However, it is considered that this is a great advantage of the methodology being tested. Both from the point of view of efficiency and effectiveness in the use of resources, and the possibility of increasing the degree of knowledge about malware by understanding what has already been discovered, allowing the analyst to focus on new aspects.

•
Strings. 'Bintext' and 'Strings' tools are used to look for text strings in the code that can be helpful to the analysis. This part of the analysis yields a lot of information that needed to be analyzed in detail. The most relevant results obtained are presented below: Reference identification to the Excel.exe application Identification of TLS functions that can be used to avoid debuggers as well as a reference to the function 'IsdebuggerPresent' Identification of strings related to encoding 'Encodepointer' and 'DecodePointer' Sequences that may be related to memory write attempts and buffer overflow attacks (!%s) Identification of references to specific websites (SamLab.ws) • Obfuscation techniques (see Figure 11). Thanks to the previous phase, it was identified that the file 'red_oct.document.exploit' had almost no ASCII chains that is a potential indicator of being an obfuscated program. The 'PEiD' tool is used, but is not capable of producing any valid results.
Appl. Sci. 2020, 10, x 18 of 31  Obfuscation techniques (see Figure 11). Thanks to the previous phase, it was identified that the file 'red_oct.document.exploit' had almost no ASCII chains that is a potential indicator of being an obfuscated program. The 'PEiD' tool is used, but is not capable of producing any valid results.
The same test is carried out with the file 'red_oct.bin.drop' and interesting results are obtained. An entropy value of 6.62 is detected, which suggests that obfuscation techniques may be being used. According to the methodology, at this point it would be necessary to continue with the classification of the obfuscation techniques used, through a manual process of the packaging analysis and potential encryption of the file 'red_oct.document.exploit'.
Bearing in mind that the object of this analysis is aimed at verifying the applicability of the methodology presented, and getting empirical proof of the benefits it provides, it was decided not to develop the tasks of the obfuscation techniques analysis of used through manual unpacking and decryption because they require significant resources.
In this way, during the following stages of the application of the methodology, the work will focus on the file 'red_oct.bin.drop'.


Format and structure. We are in the last step of the classification stage and it is about analyzing the structure of the file to identify the use and the relationships between libraries, versions, dates and entry sections that may suggest that the code under analysis is a malware.
During the practical exercise, the tools 'Dependency Walker' and 'PEBrowser' were used. Due to the dependence of the analysis with the tools used, in this step only the archive 'red_oct.bin.drop' could be analyzed since the other file was not recognized as a PE file, limiting our analysis. Thanks to the extensive identification of modules and dependencies, it is possible to identify the following relevant aspects: 'OVERLAY' function that allows to embed code, so once it is in execution time, malware can use additional memory to the one reserved for that program According to the methodology, at this point it would be necessary to continue with the classification of the obfuscation techniques used, through a manual process of the packaging analysis and potential encryption of the file 'red_oct.document.exploit'.
Bearing in mind that the object of this analysis is aimed at verifying the applicability of the methodology presented, and getting empirical proof of the benefits it provides, it was decided not to develop the tasks of the obfuscation techniques analysis of used through manual unpacking and decryption because they require significant resources.
In this way, during the following stages of the application of the methodology, the work will focus on the file 'red_oct.bin.drop'.

•
Format and structure. We are in the last step of the classification stage and it is about analyzing the structure of the file to identify the use and the relationships between libraries, versions, dates and entry sections that may suggest that the code under analysis is a malware.
During the practical exercise, the tools 'Dependency Walker' and 'PEBrowser' were used. Due to the dependence of the analysis with the tools used, in this step only the archive 'red_oct.bin.drop' could be analyzed since the other file was not recognized as a PE file, limiting our analysis. Thanks to the extensive identification of modules and dependencies, it is possible to identify the following relevant aspects: Identification of the use of the 'CRYPT32.DLL' library and the 'CryptStringToBinayW' function that indicate the need to process encrypted code at runtime Identification of the use of the 'MPR.DLL' (multi provider router) module related to the establishment of different connections and the identification of key changes and session starts Identification of the use of the 'SHELL32.DLL' library, used to open different Windows components Finally, the risk of the archive is analyzed using the 'PEStudio' tool, which allows the identification of indicators of suspicion of the code being analyzed. The analysis performed gives a result of 17 risk indicators with the highest severity, highlighting aspects such as: 'OVERLAY' function that allows to embed code, so once it is in execution time, malware can use additional memory to the one reserved for that program

Code Analysis
At this stage of the analysis, the code is analyzed using debugging tools. In this case, according to the methodology that is being used, the dynamic analysis was carried out first and then the static analysis.
As a conclusion within this same phase, it is experimentally proved that thanks to performing first the dynamic analysis, it is possible to know with greater accuracy how the code is developed. It will allow the analyst to optimize the static analysis and its effectiveness. Below are two aspects that confirm the above statement: • Identification of the use of obfuscation techniques that increase the complexity of static code analysis • Identification of additional malware protection mechanisms such as the need for external connection to be able to be fully executed.
The results obtained during this phase are presented below. • Dynamic code analysis. The 'Ollydbg' tool is used to develop this analysis. In the first place, it is decided to navigate through the code to obtain a generalized image and the following findings are made.
Identification of protection measure by which it detects if the code is running in debug mode (see Figure 12) The point of creation of the file 'MSC.bat' previously identified in the classification phase is detected (see Figure 13) References to the files 'scvhost.exe' and 'lhfd.gcp', mentioned in the open sources consulted, are also identified Access to the 'RPCRT4.DLL' library is identified that means, with high probability, that it will be used to establish the back door Coded messages and text strings that mention the service 'sshd' Appl. Sci. 2020, 10, x 19 of 31

Code Analysis
At this stage of the analysis, the code is analyzed using debugging tools. In this case, according to the methodology that is being used, the dynamic analysis was carried out first and then the static analysis.
As a conclusion within this same phase, it is experimentally proved that thanks to performing first the dynamic analysis, it is possible to know with greater accuracy how the code is developed. It will allow the analyst to optimize the static analysis and its effectiveness. Below are two aspects that confirm the above statement:


Identification of the use of obfuscation techniques that increase the complexity of static code analysis  Identification of additional malware protection mechanisms such as the need for external connection to be able to be fully executed.
The results obtained during this phase are presented below.   Dynamic code analysis. The 'Ollydbg' tool is used to develop this analysis. In the first place, it is decided to navigate through the code to obtain a generalized image and the following findings are made.
o Identification of protection measure by which it detects if the code is running in debug mode (see Figure 12) o The point of creation of the file 'MSC.bat' previously identified in the classification phase is detected (see Figure 13) Access to the 'RPCRT4.DLL' library is identified that means, with high probability, that it will be used to establish the back door o Coded messages and text strings that mention the service 'sshd' Finally, it should be noted that during the development of the analysis it is concluded that the quality of the dynamic analysis phase is directly proportional to the analyst's expertise. This being so, the possibility to work the methodology in different phases but well-coordinated, can facilitate the management and organization of specific teams for each phase. The most interesting conclusion obtained after the application of the methodology is the possibility to carry out iterations between dynamic and static analysis, allowing the analyst to check the discoveries that are obtained and thus to optimize the resources and time used. The most important results obtained during this phase are presented below: o It is identified that within the structure of the code there is obfuscated code in each section, as well as empty memory blocks o The use of the 'ISDebuggerPresent' function stored in the memory position '004180F0' is identified, potentially related to the protection measure detected in the previous phase (see Figure 14) Finally, it should be noted that during the development of the analysis it is concluded that the quality of the dynamic analysis phase is directly proportional to the analyst's expertise. This being so, the possibility to work the methodology in different phases but well-coordinated, can facilitate the management and organization of specific teams for each phase.
• Static Code Analysis. The disassembly tool 'IDA PRO' (evaluation version) is used to develop this analysis. This analysis has made it possible to obtain information of analyzed file structure at the level of memory locations, used functions and the relationship between them.
The most interesting conclusion obtained after the application of the methodology is the possibility to carry out iterations between dynamic and static analysis, allowing the analyst to check the discoveries that are obtained and thus to optimize the resources and time used. The most important results obtained during this phase are presented below: It is identified that within the structure of the code there is obfuscated code in each section, as well as empty memory blocks The use of the 'ISDebuggerPresent' function stored in the memory position '004180F0 is identified, potentially related to the protection measure detected in the previous phase (see Figure 14) Appl. Sci. 2020, 10, x 20 of 31 Access to the 'RPCRT4.DLL' library is identified that means, with high probability, that it will be used to establish the back door o Coded messages and text strings that mention the service 'sshd' Finally, it should be noted that during the development of the analysis it is concluded that the quality of the dynamic analysis phase is directly proportional to the analyst's expertise. This being so, the possibility to work the methodology in different phases but well-coordinated, can facilitate the management and organization of specific teams for each phase.


Static Code Analysis. The disassembly tool 'IDA PRO' (evaluation version) is used to develop this analysis. This analysis has made it possible to obtain information of analyzed file structure at the level of memory locations, used functions and the relationship between them.
The most interesting conclusion obtained after the application of the methodology is the possibility to carry out iterations between dynamic and static analysis, allowing the analyst to check the discoveries that are obtained and thus to optimize the resources and time used. The most important results obtained during this phase are presented below: o It is identified that within the structure of the code there is obfuscated code in each section, as well as empty memory blocks o The use of the 'ISDebuggerPresent' function stored in the memory position '004180F0' is identified, potentially related to the protection measure detected in the previous phase (see Figure 14)

Behavior Analysis
In this phase, we proceed to analyze the behavior of malware at runtime in order to catch and monitor the changes suffered by the victim to compare with its initial status.
The main conclusion to be drawn from executing this phase within the proposed methodological model is the large amount of information that the analyst already has at this point of analysis, reducing the number of baseline hypotheses required. This fact makes it possible to substantially optimize the analysis to be carried out.
Likewise, the application of the iteration proposal between the behavioral analysis phase and the code analysis phase demonstrates the continuous and optimized generation of information during the analysis, which translates into a continuous optimization of the analysis phases based on the findings obtained previously during the perform in the previous phases. The most relevant findings during this phase are presented below.


Initial Tasks. Situation of the analysis starting point and establishing two work areas: the execution environment and the monitoring environment.
o Considering that the malware has been identified requires a network communication environment, the victim's host file has been modified to allow the monitoring team resolves queries about domains o The monitor computer has been configured as the main DNS server being the authority over the domain. o The monitoring tool used was 'Wireshark' o The configuration of the environment of the victim computer has been prepared by capturing its initial state using the tools 'Systracer', 'Regshot. The paths in which the malware, supposedly, creates the files have been marked. Tools 'Diskpulse', 'ProcessMonitor' and VMMap' have been installed to monitor the changes in real time. Considering that, the malware being analyzed is the piece in charge of the initial infection and establishment of the Backdoor, so the detected behavior makes sense.
 Subsequent Tasks. During this stage in the behavior analysis phase, the results are analyzed after the malware execution period. It is identified that the greatest activity after the execution of the file to analyze is done during the first 20 min. The changes detected at the victim and network are presented below.
 Victim host Figure 14. Ida Pro results.

Behavior Analysis
In this phase, we proceed to analyze the behavior of malware at runtime in order to catch and monitor the changes suffered by the victim to compare with its initial status.
The main conclusion to be drawn from executing this phase within the proposed methodological model is the large amount of information that the analyst already has at this point of analysis, reducing the number of baseline hypotheses required. This fact makes it possible to substantially optimize the analysis to be carried out.
Likewise, the application of the iteration proposal between the behavioral analysis phase and the code analysis phase demonstrates the continuous and optimized generation of information during the analysis, which translates into a continuous optimization of the analysis phases based on the findings obtained previously during the perform in the previous phases. The most relevant findings during this phase are presented below.

•
Initial Tasks. Situation of the analysis starting point and establishing two work areas: the execution environment and the monitoring environment.
Considering that the malware has been identified requires a network communication environment, the victim's host file has been modified to allow the monitoring team resolves queries about domains The monitor computer has been configured as the main DNS server being the authority over the domain. The monitoring tool used was 'Wireshark' The configuration of the environment of the victim computer has been prepared by capturing its initial state using the tools 'Systracer', 'Regshot. The paths in which the malware, supposedly, creates the files have been marked. Tools 'Diskpulse', 'ProcessMonitor' and VMMap' have been installed to monitor the changes in real time.

Running malware
The file 'red_oct.bin.drop' is executed It is detected that after 15 s of its execution, the executed file disappears from the computer • Activation of malware services Three modifications are detected in the DNS services. A first deactivation, an activation as a secondary server and an activation as a primary server. Considering that, the malware being analyzed is the piece in charge of the initial infection and establishment of the Backdoor, so the detected behavior makes sense.
• Subsequent Tasks. During this stage in the behavior analysis phase, the results are analyzed after the malware execution period. It is identified that the greatest activity after the execution of the file to analyze is done during the first 20 min. The changes detected at the victim and network are presented below. It is interesting to detect the attempt to access routes related to web browsing applications such as Opera or Firefox, which could mean that the sample was aimed at a victim using this type of browser.
The files responsible for the infection process are analyzed: The file 'msc.bat' remains temporarily in the system. This file is in charge of activating the infection inside the system (see Figure 15). The file 'svchost.exe' is a file that cannot be removed from the system. It is executed automatically by creating a process that contains encrypted code. Code analysis will be necessary. The 'hafd.gcp' file is a hidden and resident file. It is accessed only during the infection phase. It is an encrypted file. The use of the 'VMMAP' tool has made it possible to monitor the 'svchost.exe' process with PID2464. This monitoring has led to the following findings: Text strings contained in the 'Private Data' section (00400000-00421FF) that refer to the 'Get http' method for downloading specific modules and names of external servers that can serve as proxies to communicate with the C & C. The use of the tools 'Systracer' and 'Regshot' have allowed identifying the modification of user values (HKU) and the local computer (HKLM), used to allow the execution of all the tasks associated with the infection process. It is interesting to detect the attempt to access routes related to web browsing applications such as Opera or Firefox, which could mean that the sample was aimed at a victim using this type of browser.
The files responsible for the infection process are analyzed: o The file 'msc.bat' remains temporarily in the system. This file is in charge of activating the infection inside the system (see Figure 15). o The file 'svchost.exe' is a file that cannot be removed from the system. It is executed automatically by creating a process that contains encrypted code. Code analysis will be necessary. o The 'hafd.gcp' file is a hidden and resident file. It is accessed only during the infection phase. It is an encrypted file. o The use of the 'VMMAP' tool has made it possible to monitor the 'svchost.exe' process with PID2464. This monitoring has led to the following findings:  Text strings contained in the 'Private Data' section (00400000-00421FF) that refer to • Network traffic analysis. It is identified that the malware is programmed to initiate communication several minutes after the infection process. In this case, it has been 19 min, although this time may vary depending on the environment where the infection occurs. The analysis has been divided into three parts: First malware execution with the DNS service disabled, where random requests are monitored trying to contact external DNS servers that operate as proxy.
Second execution with the active DNS server where an interaction with the malware is checked, who asks the DNS server if it is an authorized server. Once the check is established, the malware no longer interacts. It was probably waiting for confirmation from C & C. Thirdly, the malware was executed and the file "hafd.gcp" was deleted, the communication sequence being identical to the second scenario.
• Dump and memory analysis. This is the last step in behavioral analysis. The 'Winpmem' tool has been used together with the 'Volatility' tool and its specific 'malfind' plugin to analyze malware. The most relevant findings based on the analysis of the file 'svchost.exe' with PID 1540 are the following: Identification of suspicious 'hooks' in both 'user mode' and 'kernel mode' Identification of code and suspiciously injected or hidden DLLs

Flame Analysis
To test the methodology proposed in this paper, the analysis of a malware 'Flame' has been made. It belongs to APT malware type, one of the most complex and dangerous malwares discovered today. The objective of this analysis is to obtain a comprehensive understanding of its operation, composition, modules, types of encryption, injection mechanisms, communications with its command and control system, data types, etc. A more complete analysis is beyond our purpose as it would be unfeasible in a limited time.
The main objective of this malware is the theft of classified information (cyber espionage) on systems with Windows OS and using USB sticks to spread exfiltration data. Given its complexity, it has been considered to be developed by a state, which fall within the category of 'Advanced Persistent Threats'.
Their level of complexity would be very high considering its size, the number of files it handles, languages and libraries used, different encryption methods used, or the time needed for its analysis. It means that so far can be considered the most complex existing type of malware.

•
Transfer. The malware is transferred via encrypted USB and decompressed: mssecmgr.ocx.orig

• Identification
File Type (see Table 9): type dynamic library.ocx

Code Analysis
One method used to get the real text strings that malware can use is to work with a debugger such as 'ollydbg'. It allows you to execute and analyze the code step by step in a dynamic way, it is especially interesting when text strings could be encrypted or encoded (see Figure 16).
Initially a string analysis was performed on the original malware file 'mssecmgr.ocx.orig', resulting in a few interesting strings because they could be encrypted or encoded initially. However, by this method of analysis the actual text strings used by the malware were obtained.
An analysis of the text strings used by the malware processes loaded in memory, i.e., 'running', was performed. For example, the Olly DBG tool was used to decode the running process 'services.exe' in memory in real time, from which strings were extracted and used as initial input information for the next stage of 'Behavioral Analysis' in a more effective way. Strings of the following type were obtained:

Code Analysis
One method used to get the real text strings that malware can use is to work with a debugger such as 'ollydbg'. It allows you to execute and analyze the code step by step in a dynamic way, it is especially interesting when text strings could be encrypted or encoded (see Figure 16). Initially a string analysis was performed on the original malware file 'mssecmgr.ocx.orig', resulting in a few interesting strings because they could be encrypted or encoded initially. However, by this method of analysis the actual text strings used by the malware were obtained.
An analysis of the text strings used by the malware processes loaded in memory, i.e., 'running', was performed. For example, the Olly DBG tool was used to decode the running process 'services.exe' in memory in real time, from which strings were extracted and used as initial input information for The following shows what seem to be modules that are part of the design of this malware. It is extracted from the text strings, which by their name are interesting: GATOR, FROG, MICROBE, GADGET, EUPHORIA, BEETLEJUICE, Snack, Munch, AUTORUN_INFECTOR, SUICIDE, JIMMY, and FLASK.

Behavior Analysis
It is a very complex type of APT with a large number of components. It is also composed of large files, something customary in the malware.

•
Running malware Scripts and executables Implementation of the specimen: start/ wait runddll32.exe mssecmgr.ocx.orig

Capacity of Infection
The malware timed their actions so that they appear intertwined with legitimate activities and migrates between processes as needed. We also see that it creates many temporary files, some are deleted shortly after being created, and others remain at least for several minutes. It migrates the process 'rundll32.exe' after its first execution to the process 'services.exe'. It means that it infects using overwriting memory technique. It performs a 'hook' back in the Windows API, in the 'SHGetSpecialFolderPathW' call 'Explorer.exe'. It also infects the 'iexplore.exe' process.
In the process 'services.exe', specifically in the memory range '0x7e6a0000-0x7ecc7fff', we can see that the 'services.exe' is another executable.

•
Activation of malware services Files created (see Table 12) Registry modifications

HKEY_LOCAL_MACHINE\SYSTEM\ControlSet001\Control\LSA\Authentication
Packages added mssecmgr.ocx indicates packages that are loaded when a user logs on the victim machine.

Memory analysis
In the memory of infected reference 'services.exe', it appears a process that performs an action of 'not installed programs' such as antivirus, firewalls, network monitors, etc., indicating that the malware has its own 'blacklist' within the code to avoid detection by these applications.
The Table 13 shows the result of the different phases of the methodology applied to the aforementioned malware. The degree of fulfilment will be 'C' for complete, 'P' for partial, and 'X' for none.
Through these analyses, it is demonstrated that the application of this methodology, its procedures and activities, in the study and analysis of a malware, allows obtaining effective results on the action of a malware on the infected systems. It provides techniques and knowledge that will generate an effective application base. The results of the 'Code Analysis' depends greatly on the analyst's reverse engineering skills and the complexity of the malware.
The complexity of the malware requires not only organizational resources such as proposed methodology, tools and a good laboratory, but also it is essential to have a very high knowledge in reverse engineering, by the analyst who will perform the analysis.

Discussion
The main lessons learned from the completion of the two study cases are related to the proposed methodology of malware analysis and to its suggested modifications of the analysis process: • Considering the current situation in the development of malware and its industrialization, within the analysis of malware, the technical analysis of the solution is a fundamental issue. It has been verified that the phases of classification and search in open sources are key to establish an optimal strategy of analysis in the following phases. Currently, there is a lot of information published by expert analysts that can help greatly to economize the use of resources in a specific analysis, either by reuse of code, by reuse of vulnerabilities to exploit or reuse of entire malware packages if the attack is characterized by the use of specific modules.

•
During the application of the methodology to the piece of malware, two highly complex defense mechanisms have been identified: one is the packaging and obfuscation and the second is the detection of operation in debugging mode. To be able to deal with these types of measures, it requires a high level of specialization of the analyst team that is not available to everyone. • However, thanks to the structure of the methodology, it is possible to identify this situation and, in case of incapacity, to derive that part of the analysis to qualified teams. This situation implies an optimization of resources at the level of analysis, guaranteeing a good use of work teams, starting with the detection of highly complex situations, as well as the possibility of deriving parts of the analysis to different teams.

•
Code analysis capabilities are essential to develop a good analysis. This implies that the methodology depends, of course, on the capacity of the analyst, especially in the static analysis phase. However, the structure of the methodology makes it possible to derive this phase to a team or several teams in parallel, providing the possibility of crossing results and improving the analysis in two ways: By iteration between code analysis and behavior analysis By iteration between static and dynamic code analysis Through the joint work of different teams in these phases in order to consolidate the results

Conclusions
It has been reported the application of the method to a case study with two of the most complex and sophisticated malwares known as a first evaluation of the method. These two APT are 'Flame' and 'Red October' malware. The main conclusions of its applications are the following:

•
The methodology is applicable regardless of the complexity of the malware. This makes it possible to analyze different types of malware models and obtain a homogeneous analysis result.

•
The identification of the phases allows its execution by human teams with its specific tools. It allows to systematize the analysis optimizing each phase based on the existing capacities and to share the results of each phase with the rest of the involved analysts to improve the effectiveness of the results.

•
The methodology establishes the type of analysis to be performed but it is independent of the tools. This guarantees the versatility and adaptability of the method to different analysis environments.

•
The order established by the methodology guarantees an effective obtaining information process, starting from the classification based on searching in open sources. Later, this will make it possible to define an operative strategy in the analysis of the selected malware. Likewise, it allows, in the face of the same malware, to develop specific analysis work on specific aspects, a highly interesting issue for the development of intelligence on complex malware such as APTs.
• The inclusion of an iteration process in different phases of the methodology has made it possible to resolve and complete the different phases of the analysis by generating a greater global knowledge than the sum of each of the phases independently.

•
The validity of the applicability of the methodology taken as a reference is strongly confirmed. The execution of the complete process according to the proposed order has guaranteed an orderly and coherent advance in the analysis of the 'Red October' and 'Flame' malware. The results of which, in accordance with those obtained by expert sources, proves its success.

•
The classification phase has been identified as a critical one for the realization of this pilot. The information gathered has allowed the tests carried out to be effectively oriented, optimizing the analysis. This fact shows that this phase is vital in the analysis of this type malware that have been previously detected by third parties. In the case of unidentified malware, the relevance of this phase would diminish, drastically making the analysis more complex.

•
During the code analysis phase, it is confirmed that the proposed innovation of exchanging the order, placing the dynamic code analysis first to the static one, facilitates the development of the malware study if it is highly complex and uses advanced obfuscation techniques. However, the need to include a cyclical and iterative process between both analyses has been detected-possibly conditioned by the level of the analyst-with the aim of focusing and performing deeper investigation as the knowledge about malware increases. This could be an innovation derived from the one described above.

•
The need for the malware behavior analysis phase is confirmed, as it greatly increases the knowledge about the malware behavior being analyzed. However, the effectiveness of this phase is conditioned by the quality of the information obtained in the previous phases which drastically reduces any possible interest in executing this phase before the previous ones. The methodology includes an iterative mechanism consistent with the possibility of having to analyze new samples due to the complexity of malware. It can be materialized during execution, a situation that would have occurred if phase 2 of the APT 'Red October' had been initiated and specific modules were being downloaded from the C & C.

•
During the execution of the tests, taking into account their scope, it has become clear that the tests in the phases of analysis of code analysis (static as well as behavior) can focus on the victim system, simplifying the number of tools to install in the rest of systems involved in the lab. This allows the analyst to focus on the target of the monitor system and the system services needed as key parts in the constitution of a realistic scenario in order to force the malware to show all its behavior.
As a continuation of this paper, it has been identified in the FIGURE lines of future work: • Management and classification of tools. The application of the methodology is based on the use of multiple tools, many of them of free use and others previous acquisition of license. Bearing in mind that the second type of tool already has legal conditions, the first one is more difficult to control (support, continuity, etc.). A line of development focused on the effective management and maintenance of these tools would improve the management of technical and human resources and could lead to possible standardizations in specific sectors.

•
Apply the proposed methodology to other operating systems. While here it has been developed and validated for Windows based systems, as this type of operating system used mainly in the world, is can also be valid for other operating systems like Linux, Android, etc. Since, nowadays, malware is present in any of them. New lab design should consider other victim machines' operating systems such as Linux and Android.

•
Determine whether different analysis tools are compatible and, if not, look for the equivalent versions for Linux and Android operating systems.