USENIX Security '24 Fall Paper #121 Reviews and Comments =========================================================================== Paper #121 PhishDecloaker: Detecting CAPTCHA-cloaked Phishing Websites via Hybrid Vision-based Interactive Models Review #121A =========================================================================== Paper summary ------------- The paper presents PhishDecloacker, a framework that bypasses Captchas so that Phishing Detection approaches can identify phishing websites that reside behind captcha mechanisms. The framework consists of several computer vision components for detecting the captcha region, identifying the captcha type, and then solving the captcha. The paper evaluates the framework, demonstrating its applicability in solving captcha and detecting phishing websites, as well as the framework’s adversarial robustness. Detailed comments for authors ----------------------------- Overall, I believe this is an interesting and nice submission that focuses on an important issue. At the same time, I believe that the paper has some room for improvement. I provide some suggestions below. **Results and human authentication:** One of my concerns with the paper is how the results are affected by the fact that the framework can not assess if there is a failure in detection because of the components of the system or because of the human authentication. This is a bit problematic as it does not allow us to properly assess the performance of the entire framework as presented in Section 5.1. The paper briefly touches upon this topic but leaves it entirely to future work, which is not ideal given that it is likely that this factor is affecting the reported performance. **Evaluation:** Overall, the evaluation touches upon a lot of components and aspects that might affect the performance of the framework, however, the evaluation of each part is rather shallow. For instance, in Section 5.2, the evaluation simply presents aggregate numbers of the performance of the Captcha detection component, without evaluating how the type or the difficulty of the captcha might affect the performance. Overall, I suggest to the authors to expand their evaluation and in case they are constrained by the page limits to put the additional evaluation in an appendix. **Inconsistencies:** The paper has several inconsistencies throughout the paper, both in reporting results and how the evaluation is conducted. For example, in Section 5.1.2, the numbers and the results described in the text do not follow the results presented in Table 2, so I am wondering which results are correct there. Also, given that there is a huge difference between the two reported results, I am skeptical about whether the overall performance of the framework is adequate or not. Also, another concern is that throughout the paper, the evaluation does not use the same set of captcha mechanisms. For instance, in many parts of the evaluation, the paper evaluates hCAPTCHA v1 and hCAPTCHA v2, and in others, there is simply hCAPTCHA without mentioning the version. Overall, I suggest to the authors to be more systematic in their evaluation and ensure that they use the same set of captcha mechanisms, difficulties of the captchas, etc. **Other minor issues:** - Please clarify in Section 2, what is the number of configurations and whether the 92 detectors from VirusTotal are considered one configuration - Figure 1: There are typos on the word CAPTCHA (both in the figure and the caption) - Section 5.2.1: There is an appendix reference missing Ethics consideration -------------------- 1. No Required changes ---------------- 1. Expand the discussion on human authentication and how this affects the reported performance of the framework 2. Expand the evaluation so that it covers the aspects of the same set of captcha types and difficulties throughout the paper 3. Fix the inconsistencies in the reporting of the numbers and the inconsistencies in how the evaluation is conducted Reasons to accept the paper --------------------------- 1. The paper focuses on an important problem 2. The presented framework extends the state-of-the-art and can be used by others to identify/solve captchas 3. The paper evaluates the framework against adversarial attacks Reasons to not accept the paper ------------------------------- 1. Not clear how the presented results are affected by human authentication beyond captcha 2. Some parts of the evaluation do not go in adequate depth 3. There are some inconsistencies with the results in the paper Recommended decision -------------------- 3. Accept Conditional on Major Revision Questions for authors' response ------------------------------- 1. How does the human authentication beyond the captcha affect the reported performance metrics? 2. What are the correct metrics for the framework's performance? (the ones reported in Table 2 or the ones mentioned in the text) Writing quality --------------- 3. Adequate Confidence in recommended decision ---------------------------------- 2. Fairly confident Review #121B =========================================================================== Paper summary ------------- This paper proposes a machine learning-based captcha solver designed to help anti-phishing crawlers detect phishing pages protected by captchas. The authors start with a small-scale experiment to show that URLs submitted to VirusTotal do not result in successful captcha solves. They then propose a framework, "PhishDecloaker", to bypass a variety of modern captcha types and thus de-cloak potential phishing websites. The authors test the framework by first adding captchas to phishing page samples from the DynaPhish dataset, and then deploying PhishDecloaker, combined with two phishing detection systems, to classify the samples. Detailed comments for authors ----------------------------- Thank you for submitting this paper! The authors study an important problem which I believe has not received sufficient attention in recent literature. At the same time, much of the evaluation in the paper is purely academic in nature, with a major focus on the ML-based captcha detector and solver itself; to solidify the contribution of this work at a security venue, phishing websites that use phishing cloaking *in the wild* must be studied, and gaps in crawlers' ability to detect them need to be clearly shown. Detailed comments follow: **Definitively showing that captcha-cloaked phishing websites are a threat** * In Section 2, more details/experiments are needed to show that crawling of the submitted URLs was actually attempted by the anti-phishing services that VirusTotal monitors. "We leave a script to record the visiting event if its protecting CAPTCHA is solved" implies that the authors only collected telemetry behind the captcha. At minimum, a control group without captchas should be added. * With such a small sample of URLs in the experiment (30), it is unclear why the authors chose to avoid setting up lookalike phishing websites, as a positive detection from one crawler would likely propagate to other crawlers (prior work such as [46] has shown that this can be done ethically). * To further ascertain that bypassing captchas is not possible, the authors should also consider direct submissions to Google Safe Browsing and Microsoft Defender, which collectively protect the majority of modern web browsers * The authors should identify captcha-cloaked websites in the wild and measure their coverage rates compared to non-captcha-cloaked examples * The link from reference [49] does not work (404 error) **Understanding real-world phishing websites that use captcha cloaking** The evaluation in Section 5 considers a variety of different captcha types, from which I believe many interesting insights can be gleaned. However, the authors' experiment relies on "synthetically" adding captchas to a dataset of known phishing kits/pages, which means that no insights are provided on real-world captcha cloaking. Thus, it is not possible to contextualize the current evaluation results against the types of threats that anti-phishing engines may need to classify in the wild. Although I appreciate the authors' effort to be as generic as possible in being able to solve many different captchas, the current performance metrics (Table 5) suggest that the automated captcha solving has limitations which could otherwise be quite significant in practice. In the discussion, the authors point out that phishing websites may employ other cloaking techniques, such as human behavior fingerprinting / bot detection, but that this is outside of the scope of this work. I believe that this should be directly addressed as part of an evaluation of phishing websites in the wild. For example, if network profiling or bot detection techniques could trivially inform the phishing website that the proposed captcha solver is automated, it could render the latter effort moot. **Showing that PhishDecloaker could be part of a real-world deployment** Figure 1 correctly suggests that a captcha solver could be part of a pre-processing stage of an anti-phishing engine, or the post-processing stage of a crawler. However, the authors do not discuss how this could be deployed in practice (i.e., at scale within existing systems). For example, the performance overhead of the proposed system could make it impractical or cost-prohibitive to deploy, especially when combined with the imperfect solve rates. In particular, I would recommend that the authors disclose their findings to anti-phishing providers (especially if more measurements are added on real-world sites with captchas), and augment the discussion with insights from any feedback obtained. For example, a system such as Google Safe Browsing could reasonably be expected to be able to programmatically bypass all Google-issued challenges (recaptcha), and potentially others, thus needing to prioritize other captcha tyles. **Minor issues** There are several small typos throughout the paper, especially in the introduction and early sections of the paper, which collectively hamper readability. Please make a pass to address these. Ethics consideration -------------------- 2. Yes: submission appropriately mitigates potential risks or harms Comments for ethics consideration --------------------------------- Hosting and/or reporting phishing websites, even if innocuous, can lead to potential ethical concerns. However, due to the authors' conservative approach in Section 2, I do not have any concerns with the current work. Required changes ---------------- * more thoroughly measure the current ecosystem's ability/inability to take action on websites with captcha cloaking * evaluate the framework on live phishing websites that use captchas * discuss the performance improvements / considerations required for a practical (at-scale) deployment Reasons to accept the paper --------------------------- * the authors address an important type of cloaking with limited coverage in prior work * the evaluation considers a variety of modern captcha techniques Reasons to not accept the paper ------------------------------- * the ecosystem's inability to detect captcha-cloaked phishing pages is not properly validated or measured * approach is not evaluated on any live phishing pages protected by captchas (i.e., to verify performance of PhishDecloaker, test the effectiveness of bot detection cloaking, and better understand techniques used in the wild) * no disclosure to, or discussion with, ecosystem entities regarding practical deployment at scale * the current performance overhead and detection limitations could become significant in a real-world deployment Recommended decision -------------------- 3. Accept Conditional on Major Revision Questions for authors' response ------------------------------- * please elaborate on the telemetry collected in the VirusTotal experiment, and any evidence the authors may have to show that submission to VirusTotal would lead to crawling activity across all of the detection engines * please discuss any insights (or evaluation of PhishDecloaker) you may have on real-world phishing websites that use captcha cloaking Writing quality --------------- 3. Adequate Confidence in recommended decision ---------------------------------- 3. Highly confident (would try to convince others) Review #121C =========================================================================== Paper summary ------------- This paper proposes PhishDecloaker, a tool that detects, classifies, and solves CAPTCHAs by employing a hybrid vision-based interactive model. The tool can be integrated into web crawlers and can enable them to access potentially-malicious content that would otherwise be inaccessible due to the presence of CAPTCHAs. The authors calibrate their tool to detect and solve 4 kinds of CAPTCHAs and evaluate the effectiveness of the proposed approach by comparing the output of state-of-the-art phishing-detection solutions with and without the preprocessing offered by PhishDecloaker. While the introduction of CAPTCHAs makes phishing detectors unable to recognize phishing websites as the content is not available anymore, the use of PhishDecloaker can restore their effectiveness by introducing a tolerable runtime overhead. Detailed comments for authors ----------------------------- Undoubtedly, phishing websites have become more sophisticated in recent years, and the integration of fingerprinting and cloaking techniques -including CAPTCHAs- contributed to making their detection way more difficult. I thank the authors for submitting this manuscript to Usenix: I think that providing solutions to defeat cloaking for phishing detection represents a valuable contribution for both academia and industry. Although I've enjoyed reading the work and I personally think this can be a valuable contribution, I'm convinced that the effectiveness of the proposed solution must be better evaluated, not confining the experiments to a single dataset, but proving how much and at which cost (i.e., overhead) the introduction of PhishDecloaker can help on catching real-world phishing campaigns. Please find below some critical points I think must be addressed with new experiments in a major revision. **Focus on phishing** While the premises of this work clearly focus on phishing (i.e., CAPTCHAs are becoming very popular in phishing websites, state-of-the-art phishing solutions such as Phishpedia and PhishIntention being part of the experiments) I feel the approach and the evaluation diverge towards general-purpose CAPTCHA detection and solving by losing the phishing context. - `re-CAPTCHA, hCAPTCHA, slider, and rotation, take 98.9\% of the CAPTCHA market share`. While this is true in general, such an evaluation must be contextualized in the phishing ecosystem by looking at what attackers do. - In addition, adding different CAPTCHAs to phishing kits and testing the effectiveness of state-of-the-art phishing detectors (Section 2) provides biased statistics, as not all phishing websites employ cloaking techniques. This evaluation must be carried out in a real-world setting - next comment. - Section 5.2.1 evaluates CAPTCHA detection on Alexa top 1M. Again, this is clearly different from the phishing context and the efficacy of the proposed solution must be claimed in the phishing context. **Effectiveness** As already pointed out, one of the main improvements that this work needs to meet the Usenix bar concerns the evaluation. The way authors presented their tool makes the focus diverge from defeating cloaking due to CAPTCHAs in phishing websites to a general-purpose CAPTCHA solver. I suggest the authors conduct a real-world evaluation to compare the capabilities of state-of-the-art phishing detectors with and without PhishDecloaker. This can be, for instance, on a live stream of suspicious URLs (e.g., Virus Total URL feed). As previous work did, a key point is to show that the proposed solution can help in detecting 0-days that were undetectable with previous solutions. Such an evaluation, will also provide non-biased statistics about A) CAPTCHA-protected phishing websites, B) CAPTCHA types breakdown in adversarial context (i.e., the phishing one), C) solving success-ratio D) phishing websites detected with the use of the tool, and E) overhead in real-world hunting. **FPs and FNs** In each of the tool phases (e.g., Detection, Recognition, Solving), the authors reported some metrics to prove efficacy. I recommend not to limit to only reporting figures in the results, but to understand and comment on failure cases (FNs) or false positives (FPs). Carrying this analysis on real-world feeds would also contribute to strengthening the paper and underlining the usefulness of the proposed tool. **Editorial** I've found many typos in the manuscript so I suggest to proof-read it before submitting a revision. Please find some of them below: - Abstract: CPATCHA -> CAPTCHA - His or her -> their - Does not reply on code -> does not rely? - The our CAPTCHA -> our CAPTCHA - Appendix ?? (broken reference) Ethics consideration -------------------- 1. No Required changes ---------------- - Better contextualize claims in the phishing context - Evaluation and efficacy must be measured on real-data in phishing contextes - Authors must prove their solutions helps in detecting 0-day phishing websites Reasons to accept the paper --------------------------- - Hot topic - Relevant for both industry and academia Reasons to not accept the paper ------------------------------- - Proper evaluation on real-world data is missing - Focus diverges from the phishing context to general-purpose CAPTCHA solving - Need to fix writing Recommended decision -------------------- 3. Accept Conditional on Major Revision Questions for authors' response ------------------------------- - Why inserting CAPTCHAs in already existing phishing kits was chosen as test bench instead of measuring the effectiveness of your tool on catching phishing in the wild? Writing quality --------------- 4. Needs improvement Confidence in recommended decision ---------------------------------- 3. Highly confident (would try to convince others) Review #121D =========================================================================== Paper summary ------------- The paper studied an interesting problem of phishing websites using CAPTCHA-based cloaking. The paper developed PhishDecloaker to solve the CAPTCHA and detect phishing websites. PhishDecloaker used 5 types of vision models and designed 3 stages including detection, recognition, and solving. The experiments showed that PhishDecloaker improved the detection performance of existing phishing detectors against CAPTCHA-cloaked phishing websites. Detailed comments for authors ----------------------------- * The paper studied an important problem of phishing websites abusing CAPTCHA techniques. Recent studies have shown that phishing websites increasingly used cloaking to evade detection. It is appreciated the efforts that the paper put to design the multi-stage approach, solve different CAPTCHA types, and perform experiments with existing phishing detectors. However, some points are suggested to clarify and improve. * The technical novelty of the paper may be limited. While it is interesting to design 3 stages including detection, recognition, and solving in Section 4, most approachs were from existing studies. For example, the detection in Section 4.1 used OLN in [30], and the solving in Section 4.3 used existing approaches for different types of CAPTCHAs, e.g., reCAPTCHA v2 used the approach in [27], hCaptcha Version 1 used the approach in [64], hCaptcha Version 2 used the approach in [25], and Rotation CAPTCHA used the approach in [60]. The novel design seems to be just the dual-branch architecture in the recognition in Section 4.2. Therefore, the novelty of the work is questionable. It is suggested to clarify the technical novelty of the developed approaches. * Another point relevant to technical novelty is that the paper did not compare performance with other CAPTCHA recognition approaches. The paper introduced the dual-branch architecture in the CAPTCHA recognition in Section 4.2. Although Table 4 showed recognition performance on different types of CAPTCHAs, the paper lacks experiments to show the performance improvement of the new design in Section 4.2. It is suggested to add experiments to evaluate the recognition performance improvement. * There is an ethics concern about the developed approach. PhishDecloaker was developed to solve CAPTCHAs employed at phishing websites. However, PhishDecloaker has the risk of being abused by adversaries to compromise CAPTCHAs at benign websites. While the paper has an "Ethical Considerations" paragraph on P12, it did not discuss the potential risk, mitigation suggestions, or benefits to the general society. It is suggested to clarify and discuss the ethics concern. * Some details of approaches and results need clarification. Specifically, the paper mentioned "5 types of deep computer vision models". Would the 5 models refer to the CAPTCHA solving approaches in Section 4.3? It is suggested to clarify the "deep computer vision models". In Section 5.5.2, Table 6 has row "JSMA" shows -49.1% under column "Accuracy (no Def.)", the value should be (0.97-0.50)/0.97= 48.5%. Similarly, Table 7 has row "DPatch" shows 0.28 (-71.7%) under column "mAP (no Def.)", and the percentage value is not correct based on the other values. * Minor points and typos: - P2 "The results show that PhishDecloaker can (1) ... (2) ... (2) ..." The sentences used duplicate numbering of "(2)". - P9 "Examples of the synthetic samples generated are shown in Appendix ??" has a malformed reference. - Table 5 is not referred in the text. On P10 "Table 4 shows the performance of our model on the training and testing dataset. Table 4 presents our results on the open-set dataset." The second "Table 4" probably meant to be "Table 5". Ethics consideration -------------------- 3. Yes: submission may not appropriately mitigate potential risks or harms Comments for ethics consideration --------------------------------- PhishDecloaker may have the risk of being abused by adversaries to compromise CAPTCHAs at benign websites. It is suggested to clarify and discuss the ethics concern. Required changes ---------------- * It is suggested to clarify the technical novelty. * It is suggested to add experiments to compare CAPTCHA recognition performance. * It is suggested to clarify the ethics concern. Reasons to accept the paper --------------------------- + The paper studied an important problem of detecting CAPTCHA-cloaked phishing. + The paper developed solvers for different types of CAPTCHAs. + The experiments showed detection improvement based on existing phishing detectors. Reasons to not accept the paper ------------------------------- - The technical novelty of the developed approaches may be limited. - The paper lacks comparison of CAPTCHA recognition improvement. - The developed tool has ethics concern. - Some details of approaches and results need clarification. Recommended decision -------------------- 3. Accept Conditional on Major Revision Questions for authors' response ------------------------------- * It is suggested to clarify the technical novelty, particularly about Section 4. * It is suggested to clarify the ethics concern. Writing quality --------------- 2. Well-written Confidence in recommended decision ---------------------------------- 2. Fairly confident AuthorFeedback Response by Author [Yun Lin ] (897 words) --------------------------------------------------------------------------- We sincerely thank all reviewers for improving our work, we are truly grateful for your efforts in advancing the development of our community! We will fix all the comments in our revision. # Reviewer#A ## Q1: How to address human authentication beyond CAPTCHA? We agree that other human authentication (e.g., browser fingerprint, targeted IP, etc.) is important. We leave it to future work because some existing work such as [28, 67] has provided some relevant solutions for them. Our focus is to address the emerging novel CAPTCHA-based cloaking techniques, which is orthogonal/complementary to existing solutions. We also believe that a commercialized security crawler can be a hybrid solution, i.e., including diverse anti-authentication decloaking techniques. We will discuss the case in our revision. ## Q2: Inconsistent data in Table 2 and Section 5.1.2. Table 2 presents the phishing detection rate, while Section 5.1.2 shows the detection recovery rate, i.e., 78% = 57/73 indicating that we recover 78% of the detection rate from Phishpedia. So is the other data. We thank the reviewer for pointing out the confusion and will fix the clarification issue in our revision. ## Q3: Shallow evaluation Given the space limit, we pick up the most summarized results for readability. More discussions are provided in our anonymous google website (https://sites.google.com/view/phishdecloaker/field-study) where more settings and examples are presented, the reviewer may refer to #Reviewer#B#Q2 and the website for more details of our discussion.. We will introduce more in-depth discussion in our revision. # Reviewer#B ## Q1: Telemetry collected in the VirusTotal experiment? Any evidence to show the submissions have been accessed by all crawlers? In the experiment, for each of our website, we collect: (1) the timestamp when it was accessed (2) the timestamp when the CAPTCHA was solved (i.e., access the protected content) VirusTotal provides an API to query the analysis report of each submitted URL. This report lists the verdict of each phishing detection engine, and the overall result (how many phishing/malicious/suspicious/clean/unrated verdicts). We use this information to determine which crawlers have analyzed the URL. In our revision, we will follow the suggestions of the reviewers to further strengthen the study, and discuss the threats to validity. ## Q2: Missing field study We thank the reviewer’s suggestion. To further gain insights and drive this research further, we also carried a 3-week field study even after the submission. With the configuration *PhishIntention+PhishDecloaker*, we crawled about 500,000 zero-day websites from CertStream, detecting 1024 websites using CAPTCHA, of which 175 websites are reported as CAPTCHA-cloaked phishing websites. As for CAPTCHA detection rate, we achieve the precision of 0.85 and the recall of 0.92 (by sampling). Since false-positives can be practically removed if they are not interactable, the actual precision is higher. Further, we find that: - The phishing attackers prefer convenient and free CAPTCHA like reCAPTCHA to others like GeeTest slider; and - CAPTCHA-cloaked phishing websites share similar targets with conventional phishing websites. More details (e.g., brand distribution, CATCHA type distribution, etc.) can be referred in https://sites.google.com/view/phishdecloaker/field-study Overall, we do find a considerable number of CAPTCHA-cloaked phishing websites in the wild. We believe an alarm shall be raised to the phishing detection community. Further, we would also like to have the reviewer’s comment to further improve the study. ## Q3: The link from reference [49] does not work (404 error) The link was once blocked by Google during the review period, given it includs phishing content. We have appealed to Google to have it recovered. ## Q4: deployment in practice Our practice (field study) is to use containerized services where crawler, CAPTCHA-detector, CAPTCHA-solver, and phishing detectors are deployed as independent microservices. # Reviewer#C ## Q1: Missing field/wild study We thank the reviewer for pointing this out. Please check our response for Reviewer#B#Q2. ## Q2: Diverge from phishing context CAPTCHA-based cloaking techniques are emerging. As shown in our wild/field study, we discovered 175 CAPTCHA-cloaked phishing websites in 3 weeks, which is worth the alarm to our community. Our countermeasure is to solve CAPTCHAs. As for phishing-relevance, we design our framework to be extensible, considering that the phishing attackers can adopt novel CAPTCHAs. Thus, the extensibility allows us to evolve quickly in the cat-and-mouse game. Also, we will include our field study in the revision to strengthen the phishing context. Following the reviewer's suggestion, we will fix all the typos in our revision. # Reviewer#D ## Q1: Novelty: how is PhishDecloaker different from existing CAPTCHA solvers? We agree that solving an individual CAPTCHA cannot be counted to be novel. We also do not claim contribution in solving an individual CAPTCHA. Our novelty lies in that we propose an extensible CAPTCHA-solving framework by CAPTCHA detection, recognition, and solving. In the cat-and-mouse game, the attackers can take new types of CAPTCHA. In this case, the extensibility allows us to report the new CAPTCHA, and evolve the CAPTCHA-solving system quickly. Given the extensibility consideration, it is the first CAPTCHA detection-recognition-solving framework to the best of our knowledge. ## Q2: Ethics concern. We admit that the security crawlers are more penetrating than normal ones, which can be a double-edged sword. To ensure their ethical use, we can take the following action: - Keep it closed-sourced, and only share it exclusively with a whitelist of researchers, security companies, and government. - Offer a controlled cloud service to the public to prevent abuse. By this means, we can stop service if detecting it is used against benign websites. We appreciate it a lot if the reviewer can also provide further suggestions to our community. Comment @A1 by Administrator --------------------------------------------------------------------------- [REC] Ethics discussion summary: Thank you for responding to the ethical concerns raised in the reviews and setting out a mitigation approach. This satisfies the ethical question raised and we encourage you to include the mitigation in a revised version of the paper. Comment @A2 by Author [Xiwen Teoh ] --------------------------------------------------------------------------- Dear reviewers, could you share the list of finalized changes needed for the major revision? Thanks for your time! Comment @A3 by Reviewer C --------------------------------------------------------------------------- Dear authors, after the rebuttal, the reviewers agreed that the paper can be a valuable contribution for the conference but it needs to undergo a major revision. Please find attached the list of revision criteria. Please let us know if you'd like some clarifications on something. Could you also let us know the timeline by which you'll provide us with a revised version? Please keep in mind the conference deadlines. - Clearly state your position / limitations concerning failures in your pipeline due other human authentication techniques - Strengthen the study that evaluates the detection capabilities of the current systems in detecting CAPTCHA-cloaked websites (RevB#Q1) - Evaluate and compare how the dual-branch architecture for the CAPTCHA recognition introduced in Section 4.2 improves detection performance (RevD) - Measure on real-world data and prove that CAPTCHA-cloaked phishing websites are a considerable threat. We all agree that cloaking is deeply used in phishing websites, but your analysis should reveal on which percentage of them cloaking also involve CAPTCHAs. - Measure on real-world data the distribution and the types of CAPTCHAs types used by the attackers. This will improve the shallow evaluation currently presented in the paper. Such study can complement the analysis of the point above. - Measure on real-world data that your solution helps in discovering more 0-day websites that were otherwise unaccessible by previously-proposed solutions. - Since the goal is to prove that more 0-days are detected, please provide clear indication on how such detections were systematically verified (i.e., one could expect that systems such as VT, SafeBrowsing, Windows defender will be cloaked and thus not report the detections). Please provide specific details on the characteristics the sites that did ultimately get detected and blocked (and how long it took), compared to those that did not, similar to the approach taken in [46] - Discuss the overhead that might be introduced by your system in a real-world experiment. - Properly disclose the findings (e.g., 0-days websites) to relevant ecosystem entities (e.g., anti-phishing providers) - Please address the typos, inconsistencies in Tables, and missing references Comment @A4 by Author [Xiwen Teoh ] --------------------------------------------------------------------------- We thank the reviewers for the suggestions. We will address the required changes by Apr 30. To further confirm with the required changes, we provide more details on how we are going to implement the changes. We require the kind confirmation from the Shepherd and reviewers. # R1: Clearly state your position / limitations concerning failures in your pipeline due other human authentication techniques To fix the comment, we will emphasize the cases when PhishDecloaker (alone) is blocked by the other human authentication factors. Further, we will discuss that a commercial security crawler needs to be a hybrid solution (e.g., PhishDecloaker + Proxy/VPN + Browser Fingerprint Obfuscation). Further, we will provide more qualitative examples in the Appendix. If the decision is made to accept the paper, we will move all the relevant information to our PhishDecloaker website. # R2: Strengthen the study that evaluates the detection capabilities of the current systems in detecting CAPTCHA-cloaked websites Referencing [3] and [46], we plan to reconduct the empirical study by hosting 5 kinds of phishing websites: a baseline without any cloaking, and 4 different CAPTCHA-cloaked phishing sites (reCAPTCHA, hCaptcha, slider, rotation). Throughout the study, we will submit the phishing URLs to each of the following anti-phishing entities: VirusTotal, Google Safe Browsing, and Microsoft Defender. In order to prevent pre-emptive blocklisting of our websites without scanning, we will avoid deceptive keywords and randomly generate URLs as in our original study. After a predefined threshold of *N* days, we will query the URLs on these anti-phishing entities, if they are not on the blacklist, we consider the evasion as successful. We want to point out that hosting phishing kits bears the risk of whole account takedown and blacklisting by hosting providers, and requests for immunity are not guaranteed as in [3]. Nonetheless, we will still carry out the task. We kindly ask the reviewers/shepherd to provide additional suggestions on hosting phishing kits in a reliable manner. # R3: Evaluate and compare how the dual-branch architecture for the CAPTCHA recognition introduced in Section 4.2 improves detection performance (RevD) To verify both the effectiveness of OLN detector and the dual-branch Siamese architecture, we will provide an ablation study to compare the performance of 4 different groups on CAPTCHA recognition: (1) OLN Detector + Dual-branch Siamese (2) OLN Detector + ResNet-50 (3) Faster R-CNN + Dual-branch Siamese (4) Faster R-CNN + ResNet-50. # R4: Measure on real-world data and prove that CAPTCHA-cloaked phishing websites are a considerable threat. We all agree that cloaking is deeply used in phishing websites, but your analysis should reveal on which percentage of them cloaking also involve CAPTCHAs. We will re-conduct a wild study by crawling new registered domains from Certstream and prepare 6 different study groups to analyze the crawled sites for phishing. Group 1: PhishIntention Group 2: PhishIntention + Basics Group 3: PhishIntention + Basics + Anti-interaction-cloaking Group 4: PhishIntention + Basics + Anti-fingerprint-cloaking Group 5: PhishIntention + Basics + Anti-behavior-cloaking Group 6: PhishIntention + Basics + Anti-CAPTCHA-cloaking (i.e., PhishDecloaker) - base detector: we select PhishIntention as the base phishing detectors as it is shown to be the detector with the best performance on real-world phishing websites - basic human-authentication function (Basics): we enable Javascript rendering for all the settings - Anti-interaction-cloaking: handle alert and notification windows, random mouse movement and clicking - Anti-fingerprint-cloaking: enable cookies, use randomized user agent, use spoofed referrer, use stealth headless browser - Anti-behavior-cloaking: follow 3XX redirects, wait 5 seconds after DOM is loaded, retry up to 3 times if the page is blank - Anti-CAPTCHA-cloaking: use PhishDecloaker We will run the field study for 3-4 weeks to evaluate: (1) how many more phishing websites can Group 2-6 find than Group 1 for the *percentage*, and (2) how many more phishing websites can Group 6 (PhishDecloaker) over Group 2 find # R5: Measure on real-world data the distribution and the types of CAPTCHAs types used by the attackers. This will improve the shallow evaluation currently presented in the paper. Such study can complement the analysis of the point above. In the wild study, we will analyze the CAPTCHA-cloaked phishing websites reported by Group 6 and report the distribution of CAPTCHA types and its phishing category (banking, technology, logistics etc.). # R6: Measure on real-world data that your solution helps in discovering more 0-day websites that were otherwise inaccessible by previously-proposed solutions. In the wild study, we will compare the amount of 0-day phishing websites discovered by Group 2 and 6. Ideally, Group 6 should discover more 0-day phishing websites as compared to the baseline Group 2. # R7: Since the goal is to prove that more 0-days are detected, please provide clear indication on how such detections were systematically verified. Please provide specific details on the characteristics the sites that did ultimately get detected and blocked (and how long it took), compared to those that did not, similar to the approach taken in [46] To verify that a phishing website is 0-day in the in-the-wild study, we will manually inspect potential phishing websites reported by all groups. If a site is confirmed as phishing, we will submit its URL to VirusTotal. URLs that are not blacklisted by VirusTotal in the scan report are determined as 0-day and placed on a watchlist. To assess the longevity of 0-day websites, we will incorporate a monitoring module that loads each URL from the list and periodically checks if they are taken down (i.e., blank page / status code 404) or blacklisted by Google Safe Browsing and Microsoft SmartScreen. We analyze these URLs for insights such as: 1. Percentage of 0-day phishing websites using CAPTCHA-cloaking 2. Alive time (in days) of 0-day phishing websites without CAPTCHA-cloaking 3. Alive time (in days) of 0-day phishing websites with CAPTCHA-cloaking 4. Characteristics of phishing websites with CAPTCHA-cloaking (e.g., CAPTCHA type, geolocation, brand) # R8: Discuss the overhead that might be introduced by your system in a real-world experiment. In the wild study, we will have a breakdown of the average time taken for CAPTCHA detection, recognition and solving in Group 6. # R9: Properly disclose the findings (e.g., 0-days websites) to relevant ecosystem entities (e.g., anti-phishing providers). We will send an email disclosing the threat of CAPTCHA-cloaked phishing websites to the relevant anti-phishing entities. We will report any discussions or acknowledgement of our disclosure if they respond to us. To conclude, we will conduct 3 experiments: 1. A new empirical study to evaluate the detection capabilities of the current systems in detecting CAPTCHA-cloaked websites. (addresses R2) 2. An ablation study (addresses R3) 3. A wild study with 6 different groups, to show the number of CAPTCHA-cloaked phishing websites as compared to other types of cloaking, and to show the number of 0-day CAPTCHA-cloaked phishing websites discovered by PhishDecloaker, and further analysis on these websites. (addresses R4, R5, R6, R7, R8, R9) Comment @A5 by Shepherd --------------------------------------------------------------------------- Dear authors, Thanks for sending the very detailed plan for your changes in the revised paper. The changes and the timeline look good to me! I am looking forward to reading the revised paper. Thanks, Your Shepherd Comment @A6 by Author [Yun Lin ] --------------------------------------------------------------------------- Dear Shepherd and reviewers, Thanks again for your insightful and valuable suggestions to improve our work! With your help and advice, we have address all the requested changes, resulting a work of higher quality. To save your kind efforts, we highlight the key paragraphs with assigned indexes to each comment such as R1, R2, etc. The shepherd and the reviewers can search the index across the paper to see how we address each comment. You can kindly check the response letter and find the corresponding change in the revision. Finally, our more sophisticated field study (with your advice) does show that CAPTCHA-cloaked phishing website is emerging in the modern phishing campaign. The voice is worth hearing in our community. Comment @A7 by Author [Yun Lin ] --------------------------------------------------------------------------- Dear Shepherd and reviewers, Sorry for keep reminding you. However, we might need some time to prepare visa to the conference. Thus your early decision matters a lot to us. Could you kindly check whether you are satisfied with your revision? Also, please free free to let us know whether you need us to do extra changes in the paper. Many thanks! Comment @A8 by Shepherd --------------------------------------------------------------------------- Dear authors, Thanks for sending the revised version of your paper. I will consult with the other reviewers about the changes and come back to you as soon as possible. Thanks, Your Shepherd Comment @A9 by Author [Yun Lin ] --------------------------------------------------------------------------- Thanks a lot! We are looking forward to hearing from you! Comment @A10 by Shepherd --------------------------------------------------------------------------- Dear authors, Thanks again for submitting their revised version. I can now confirm that the paper aligns with the revision criteria and can be accepted! Congratulations! There are two very minor issues with the revised paper that I strongly encourage you to address in the final version. - page 11: "0.93 over 0.91 in precision" --> If I read the table correctly, it should be 0.93 over 0.92 in precision - Page 12: "CATPCHA types, lifespan, and time to be blacklisted." --> CAPTCHA types.... Thanks, Your Shepherd Comment @A11 by Author [Yun Lin ] --------------------------------------------------------------------------- Dear Shepherd and the reviewers, We sincerely thank you for your appreciation and support! We have updated the version accordingly as attached. If you are fine with the new version, would you mind to update our paper status as *Accept*? Thanks a lot for your kind advice and suggestion! Comment @A13 by Author [Yun Lin ] --------------------------------------------------------------------------- Dear Shepherd and the reviewers, We've attached the latest version to fix your minor changes in red. If everything is good, could you please kindly update our paper status as *Accept*? Thank you very much!