Home > Medias & Articles > Beneath the Surface: Tools and Techniques to Investigate the Ownership of Websites

Beneath the Surface: Tools and Techniques to Investigate the Ownership of Websites

Websites are designed to show information to the public as their owners want; however, there is plenty of hidden technical information behind their fancy design. 
There are many reasons to investigate the ownership of websites. For instance, on the legal side, we may want to contact the webmaster for a link-building campaign, ask the webmaster to remove a link or a piece of content, or contact the website owner to put advertisement or to purchase the domain name! Journalists and online investigators need to investigate a web presence to connect their findings with other information collected from other sources (e.g., social media profiles, blogs, discussion forums, data collected from the dark web such as the TOR network).

On the illegal side, threat actors (e.g. cyber criminals, phishers, and spammers) aim to collect as much information about their targets before launching direct attacks against them. For example, by investigating website owner information, an adversary can know other websites running by the same owner, their private contact information, discover technology used to build the target website to establish a beachhead to execute further attacks.  
This article will introduce various tools and techniques to investigate the websites’ various elements to reveal the valuable information hidden underneath the websites’ public interface. 

Anything published online may disappear suddenly. For example, a webpage, an image, video, or an entire website may fade over time. Wayback Machine is a free online service for retrieving historical data about any website (see Figure 1).

A website domain name can reveal important information about its owners. A domain name is unique. For instance, there is only one domain name that have the name “yahoo.com.” When a person purchases a domain name, a record containing information (mainly personal and contact information) about the owner is created.  Sometimes this information is made private by the owner. However, we can still try our luck and search for historical domain data to see previous historical changes that may contain information about the owner before making the registration private. There are many services to retrieve domain name information. 

Domaintools: This service allows you to access historical Whois records.
Whoisology: This service offers reverse Whois lookups and reveals connections between domain names and their owners. 
Domainbigdata: In addition to finding domain name info, this service finds other domains owned by the same person.
Godaddy: Find information about website owners.    
Icann: This tool gives you the ability to look up the registration data for domain names. ICANN is a non-profit organization that governs the registration of new domain names worldwide.

Figure 1 – The Wayback Machine can be used to investigate historical website data 

Every website has files stored on a webserver that existed somewhere in the world. To find who is hosting any website, use any of the following services: 

  1. Hostingchecker
  2. hostadvice
  3. webhostinghero

Many websites use shared hosting to reduce the hosting costs; websites utilizing shared hosting have the same IP address. After knowing the target website IP address, we can conduct a reverse IP search to see all websites hosted on the same IP address. Before listing reverse IP search sites, let us learn how to retrieve any website’s IP address using Windows command prompt (see Figure 2).

Figure 2 – Retrieve a website IP address using Windows ping command

Analyzing other domain names sharing the same IP address can lead to the owner of the target website (e.g., many web administrators host websites belonging to the same person on the same IP address).
The following are some services to conduct reverse IP search:

  1. ViewDNSinfo (see Figure 3)
  2. We can use the Bing search engine to conduct a reverse IP search by typing the following search query:ip:
  3. Robtext
  4. Netcraft: Display useful information about the target website, including domain info, technology used to build the website, web trackers hosting history (see Figure 4). 
Figure 3 – Conduct reverse IP search using ViewDNS.info 
Figure 4 – Netcraft display various technical information about the target website including its hosting history record

What you see in your web browser when visiting a website is the graphical representation of HTML and JavaScript code that is used to build webpages. We can see any webpage source code on most web browser by right clicking anywhere on the web page and selecting “View Page Source”. A web page source may contains comments left by website developer (HTML comments begin with ) that mentioning the developer company name, contact information and plugins used to create the website (e.g. many WordPress plugins add comments automatically on webpages source code) (see Figure 5).

Let us imagine this scenario, if we discovered that our target website is using a specific WordPress plugin from its webpages source code, we can go to www.cvedetails.com and check if there are any associated vulnerabilities with this plugin (see Figure 6).

Website owners use Google Analytics service to monitor the traffic to their website. Commonly, webmasters uses the same analytical account to monitor multiple websites. By conducting a reverse Google analytical search, we can find websites belonging to the same owner. We can search for a website Google Analytical ID via its source code by searching for “UA-“(see Figure 7). However, there are many free services to conduct this search automatically: 

Finding related websites using Google Analytical ID is an efficient method. However, keep in mind that some web developers may copy a webpage source code and paste it in their website without deleting the original website’s Google Analytical ID number. Such a thing will lead to confusing results. Make sure to use multiple sources when conducting a reverse search,  and check if related websites are relevant to the target (e.g. gaming or a website offering free internet programs downloads may not be relevant to a government entity or big enterpirse even though they are using the same Google Analytical ID).

webmasters use robots.txt to instruct search engines on how they should crewel pages when visiting their websites. For instance, this file allows website administrators to limit search engine crawlers from indexing some parts or files in their websites. This practice is commonly used to restrict some services –such as the Way Back Machine robots- from archiving part of website content or files.
As an investigator, analyzing robots.txt may reveal important information. Some webmasters include sensitive URLs (to files and directories) within this file to hide them from search engine crawlers. robots.txt file is public and can be accessed by appending /robots.txt after the domain name (see Figure 8).

Figure 5 -Viewing a webpage source code
Figure 6 – Use cvedetails to search for security vulnerabilities 
Figure 7 – Search for Google Analytics ID within a website source code –The general format of Google Analytics IDs is to begin with “ua-” and then have a string of numbers.
Figure 8 – Sample robot file https://www.bankofamerica.com/robots.txt

To successfully collect information about any website, you need to use a plethora of tools and online services in addition to understanding where to search for such info online. This article gives you a solid introduction to begin expanding your knowledge about website investigation techniques.

Khera, V., 2021. [online] Available at: https://www.linkedin.com/pulse/beneath-surface-tools-techniques-investigate-ownership-khera/?trackingId=c435Jn8GTde27daavtnO8Q%3D%3D [Accessed 30 June 2021].