Detection of Hidden Fraudulent URLs within Trusted Sites using Lexical Features

posted Jul 8, 2013, 3:05 AM by Eric Medvet   [ updated Nov 13, 2013, 1:57 AM ]
  • 8th International Conference on Availability, Reliability and Security (ARES), 2013, Regensburg (Germany)
  • Enrico Sorio, Alberto Bartoli, Eric Medvet
  • Google Scholar
Internet security threats often involve the fraudulent modification of a web site, often with the addition of new pages at URLs where no page should exist. Detecting the existence of such hidden URLs is very difficult because they do not appear during normal navigation and usually are not indexed by search engines. Most importantly, drive-by attacks leading users to hidden URLs, for example for phishing credentials, may fool even tech-savvy users, because such hidden URLs are increasingly hosted within trusted sites, thereby rendering HTTPS authentication ineffective. In this work, we propose an approach for detecting such URLs based only on their lexical features, which allows alerting the user before actually fetching the page. We assess our proposal on a dataset composed of thousands of URLs, with promising results.