Dorking - exploiting search engine capabilities to discover security gaps

Search engines, such as public ones like Google or companies’ internal intranet search tools, are typically used so one can find information about a topic that they are interested in. However, a more nefarious way to use these tools has recently gained prominence. Cybercriminals are hunting for sensitive information hidden in publically-accessible information by using search terms called “dorks”. This technique is referred to as both “Google hacking” or “Google dorking”.

Typically, one would use key words to find out information about a topic they are interested in. For example, if one wanted to improve their tennis serve, they would probably search for “tennis serve tips”, which would give a number of links to articles based upon the terms provided. Search engines archive information collected throughout the internet, indexing information so it is readily available to comb through. A user will provide a keyword which the engine will then find throughout its indexes, returning results based upon relevance and algorithms. Search engines store all publicly accessible information in a website, like things hidden in the code that are not secured. In addition to searching for key terms, search engines provide a number of more advanced operators to take advantage of, which is exactly what Google dorking takes advantage of. For example, the operator ‘site’ will focus a search on a specific website instead of every site indexed by the search engine.

Wide scale security events using Google dorking has occurred recently and has been widely reported on. Two notable examples include the data leak of a French political party’s data, which was found on the site of its webhost using “dorks” of the type “Index of /” and the discovery of numerous websites used by the CIA for communication, leading to numerous executions of agents working for the US.

To give an example of a specific search that uses Google dorking, the inquiry “inurl: files intext:nationality filetype: xls intext: <first name or last name type>”is likely to find Excel files that contain individuals’ information with columns displaying name and nationality. At the same time, a single well-chosen keyword—for example, the name of an enterprise application searched on the Internet or the word “salary” searched on the company intranet—can be enough to find highly sensitive information.

Search engines internal to certain websites can also be exploited. For example, websites containing application source codes (e.g. GitHub), technical forums for software publishers, or job posting websites containing descriptions of sensitive technical environments can be ripe for using Google dorking to find exploitable information

Google dorking has reached an apex of public accessibility because of numerous tutorials postedonline, specialized, private search engines (e.g. startpage.com) and websites listing thousands of “dorks” (e.g. Google Hacking Database) listed by specific use case (e.g. finding files containing passwords).

Google dorking is typically performed manually but able to be automated with “dork scanners” such as “Zeus-scanner” or with the help of PowerShell tools (PnP-PowerShell) for searches in Office365.

To guard against exploitation of internal search engines, organizations can:

Use Data Loss Prevention software and services to detect data leakage of sensitive information, including tools that search non-indexed websites, like the Dark Web.
Implement Data Classification and Governance procedures, including oversight of how data is shared, like withOffice365 Groups, starting with data that is most critical to business operation and would lead to the largest risk events (e.g. sensitive trade groups, client data, HR information, etc.)
Appropriately oversee outsourced activities with a Security Assurance Plan and raise providers’ awareness of the importance of the adequate protection and nondisclosure of the information accessible to them. When possible, require evidence of data destruction after a contract has expired
Supervise the transfer of sensitive information to parties that do not always possess Synchronous Serial Interface (SSI) protection capabilities (on their company intranet or on the Internet) equivalent to the organization’s own capabilities

Organizations can also take measures to limit the impact of a known data leak:

Develop a process for managing identified leaks, including actions to be taken for search engines and websites that have indexed the leak (e.g. Google search engine optimization (SEO) management)
Have procedures for security incident management including data breach (with regards to GDPR) and crisis management activities, noting the potential need for notification of regulatory authorities and impacted individuals.
Have a monitoring procedure and tools in place for social media networks, developing prepared responses for engaging with individuals on these platforms when dealing with a crisis

Google dorking can be leveraged by an organization to test out the security of its own systems, such as during, security audits or Red Team activities that aim at thinking from a malicious agent’s perspective to discover—before the agent—the gaps that could be exploited to cause harm to an organization. Google dorking is a powerful technique bad actors are using to exploit rarely seen gaps in an organization’s architecture, and the possibility of this technique’s use should be kept in mind when assessing an organization’s information security risk. It is one of the many examples of ways cybercriminals continue to evolve in their activities today, and highlights the need for an organization’s continuous evolution in how it handles security.

Dorking – exploiting search engine capabilities to discover security gaps

Navigating Cybersecurity Compliance: Managing the Complexity of Expanding Regulatory Layers

Resilience Entra ID

Dorking – exploiting search engine capabilities to discover security gaps

On the same topic

Navigating Cybersecurity Compliance: Managing the Complexity of Expanding Regulatory Layers

Resilience Entra ID