Extract and Normalize Root Domain from URL
A raw URL often contains more than the domain you actually need: protocol, subdomains, folders, query strings, tracking parameters, and fragments. Root domain normalization turns messy URLs into one consistent value, such as example.com. This guide explains the process, common edge cases, and the fastest way to do it in bulk.
What Is a Normalized Root Domain?
A root domain is the registered domain plus its public suffix. For example, the root domain of https://blog.example.com/pricing?ref=ad is example.com. Normalization means applying the same cleanup rules to every URL so equivalent URLs produce the same domain.
Example normalization:
Why Normalize Root Domains?
Deduplicate URL Lists
Backlink exports, analytics reports, search results, and crawl data often contain many URLs from the same site. Normalizing to root domains lets you count unique websites instead of unique pages.
Clean SEO and Outreach Data
SEO workflows usually care about referring domains, prospects, competitors, or publisher websites. A normalized root domain is easier to group, filter, enrich, and compare across tools.
Make Reporting Consistent
Without normalization, http://example.com, https://www.example.com/, and https://blog.example.com/post can appear as separate records. Normalization collapses them into one value.
Root Domain Normalization Rules
| Step | Input Part | Action |
|---|---|---|
| 1 | Protocol | Remove http, https, and ftp prefixes. |
| 2 | Path and filename | Remove everything after the hostname. |
| 3 | Query string and fragment | Remove tracking parameters, IDs, and hash fragments. |
| 4 | www prefix | Normalize www.example.com to example.com. |
| 5 | Subdomains | Strip subdomains unless you intentionally need hostnames. |
Before and After Examples
| URL | Normalized Root Domain |
|---|---|
https://www.example.com/blog/post?utm_source=email | example.com |
http://shop.example.com/products/123#reviews | example.com |
https://news.bbc.co.uk/sport/football | bbc.co.uk |
https://docs.github.com/en/actions?query=test | github.com |
https://mail.yahoo.co.jp/login | yahoo.co.jp |
How to Extract and Normalize Root Domains
1Use a Free URL to Domain Tool
Paste one or more URLs into the tool, click Extract Domains, and copy or export the normalized root domains. This is the fastest option for audits, spreadsheets, and one-off cleanup tasks.
- Paste URLs, one per line or mixed inside text.
- Run the extractor.
- Review the normalized root domains.
- Copy the results or export them as CSV.
2JavaScript Approach
The built-in URL API can isolate the hostname. For production root-domain logic, pair it with a Public Suffix List parser.
const input = 'https://www.blog.example.co.uk/path?utm_source=email';
const hostname = new URL(input).hostname.replace(/^www\./, '');
// Basic parser for simple cases:
const parts = hostname.split('.');
const simpleRoot = parts.slice(-2).join('.');
// Use a PSL-aware library for real datasets:
// psl.parse(hostname).domain -> 'example.co.uk'3Python Approach
In Python, tldextract is a common choice because it handles public suffixes.
import tldextract
url = 'https://www.blog.example.co.uk/path?utm_source=email'
parts = tldextract.extract(url)
root_domain = f'{parts.domain}.{parts.suffix}'
print(root_domain) # example.co.ukEdge Cases to Watch For
Multi-Part Suffixes
Domains like example.co.uk and example.com.au need suffix-aware parsing. Splitting on dots and taking the last two parts is not enough.
Subdomains That Matter
For SEO deduplication, root domains are usually best. For security review, app inventory, or hosting analysis, you may need to preserve full hostnames such as api.example.com.
Malformed URLs
Real datasets often include missing protocols, trailing punctuation, copied email text, or markdown links. A good extractor should handle plain domains and URLs embedded in text, not only perfectly formatted links.
Try the Free URL to Domain Tool
Paste full URLs below and convert them into normalized root domains. The tool works in your browser and supports bulk input.
Frequently Asked Questions
What is the difference between a hostname and a root domain?
A hostname can include subdomains, such as blog.example.com. The root domain is the registered domain, such as example.com.
Does normalization remove tracking parameters?
Yes. Since root domain extraction only keeps the domain, query strings such as utm_source, fbclid, and other tracking parameters are removed.
Is root domain normalization safe for all use cases?
It is ideal for deduplication, SEO reporting, and prospect lists. If your workflow depends on individual hosts or subdomains, keep a hostname column alongside the root domain column.