E-mail-Based Phishing Attack Taxonomy
Round 1
Reviewer 1 Report
The paper proposes a taxonomy of e-mail based phishing attacks. The paper is clear and pleasant to read, and I appreciate the work done on the related works sections. I miss, however, a clear definition of phishing and a better identification of the sub-categories (what is pharming, whaling, etc.).
My main issue and the main reason for my grades, though is that the purpose of the paper is unclear to me. Why is is relevant to distinguish e-mail from other means of text communication? Most of the description of the different phases would also apply to instant messaging like Whatsapp, Telegram or Facebook messenger. The main differences between these means of communication are technical (protocol, availability of addresses, easiness to hack into a server, ...). From the user's point of view, it is just different habits & different icon on the desktop.
Finally, I also have doubts about why this classification is better that the ones presented in the related works. The authors claim that it provides more first-level nodes, but why is that a good thing? What do these new dimensions bring to the classification process? What was not possible before? The argument would be greatly improved by using an example of a misclasification of the other taxonomies.
Author Response
Thank you for your time and valuable advises. The response summary is in the attacked file.
Author Response File: Author Response.pdf
Reviewer 2 Report
The authors present a taxonomy for e-mail based phishing attacks.
The paper is well structured, easy to read and follow. There are no major issues with the paper.
On minor issue is with subsection 4.2. It is not elaborated well and sticks out of the paper. I don't see a need for the section to be there. If there is, it needs to be improved a lot. Since you are comparing two averages, free form notating and taxonomy based, you should do some statistics (e.g. t-test of non-parametric equivalent) to show that your taxonomy is better. Additionally, you should use several annotators to do the job, and compare the inter-rater agreement, to be sure the result is not by chance alone, but due to your systematic work / proposal.
I suggest you omit the sub-section completely and leave its usability study for further work, for a new paper.
Author Response
We value Your opinion, but we believe the subsection 4.2 is needed, as it shows the practical application of the proposed taxonomy.
Your advise on non-parametric t-test equivalent was very useful, therefore we added some additional text to explain why the obtained results are significantly different (we used Wilcoxon signed-rank test to test our hypothesis). The added text will be marked in light blue background in the paper.
Reviewer 3 Report
The paper is a little lengthy and could be modified more simplify.
For the presented Figures and/or Tables, more explanations on them would help readers to follow the development of the results.
English could be further improved, and all the grammar and composition mistakes should be corrected in the revision.
Author Response
Thank you for your time and valuable advises. The response summary is in the attacked file.
Author Response File: Author Response.pdf
Reviewer 4 Report
- The idea of the paper is good and has some merits as being useful, but does not bring significant novelty, more like incremental work.
- I would suggest considering to refine your taxonomy as follows:
- "Sending email" -> "Real"
- Add: "Sending email" -> "Real" -> "Real (hacked accounts)"
- Add: "Sending email" -> "Real" -> "Real (phisher owned/generated account)"
- "Data gathering" -> "Secret Data"
- Add: "Data gathering" -> "Secret Data" -> "PII (such as DoB, SSN)"
- "Usage of gathered data"
- Add: "Usage of gathered data" -> "Identity theft/fraud"
- "Sending email" -> "Real"
- The evaluation in "Section 4.2" does not seem very well designed, thought and executed. It does not described the dataset well enough. Then the number comparison {1,2,4} free-form vs. {5,7.8,10} notation-based is very well explained - I am not convinced by the section whether higher or lower numbers are better, etc. Why a bigger number is better? (e.g., from ML point of view, increasing the numbers/categories increases the feature-set which brings its own set of challenges...). Just "comparing whose numbers are bigger" without more sound explanation why this is the right thing to do does not work well in science.
- The interested audience would appreciate if the authors' taxonomies are released open-source in machine-readable formats (XML, UML, JSON, etc.). Since you already designed your taxonomy in some tool, that would be pretty straightforward to release to github or similar platforms.
- Requires proof-reading and improvements
- Title Section 2.1 is long and confusing/non-sugestive
- Also theory about taxonomy representation could reference:
- Automatic construction of lexicons, taxonomies, ontologies, and other knowledge structures (Olena MedelyanIan H. WittenAnna DivoliJeen Broekstra)
- Ontologies, Taxonomies and Thesauri in Systems Science and Systematics By Emilia Currás
- Metadata? Thesauri? Taxonomies? Topic Maps! Making Sense of it all Lars Marius Garshol
- Also theory about taxonomy representation could reference:
- Line 59 "Here, there" proofread
- Citation style is weird and should be made consistent and to better style (line 109, line 113, line 122, etc.), example:
- Line 109 "Katharina Krombholtz et al." -> "Krombholtz et al."
- Just drop the FirstName from references.
- When more than two authors, use first-author's "LastName et al."
- When two authors use "LastName1 and LastName2"
- When one author use "LastName1"
- Title Section 2.1 is long and confusing/non-sugestive
Author Response
Thank you for your time and valuable advises. The response summary is in the attacked file.
Author Response File: Author Response.pdf
Round 2
Reviewer 1 Report
The authors properly answered the minor comments expressed before. I am still dubious about the relevance of that particular angle (focusing on e-mails) and of the advantages of that particular taxonomy with respect to others, but the paper contains no major flaw or omission and can therefore be published.
Author Response
We understand your doubts, but our next step is to create a solution for automatic phishing e-mail identification, based on retrieved phishing attack classification data. Therefore, we need e-mail based phishing attack taxonomy only as a meta-model of the dataset. This is why we do not want to include all type of phishing attacks, which would expand the taxonomy, as some types of phishing attacks would require additional classification criteria etc.
Reviewer 3 Report
English could be further improved, and all the grammar and composition mistakes should be corrected in the revision.
Author Response
Some errors we noticed were corrected.