Extract and Normalize Root Domain from URL

A raw URL often contains more than the domain you actually need: protocol, subdomains, folders, query strings, tracking parameters, and fragments. Root domain normalization turns messy URLs into one consistent value, such as example.com. This guide explains the process, common edge cases, and the fastest way to do it in bulk.

What Is a Normalized Root Domain?

A root domain is the registered domain plus its public suffix. For example, the root domain of https://blog.example.com/pricing?ref=ad is example.com. Normalization means applying the same cleanup rules to every URL so equivalent URLs produce the same domain.

Example normalization:

https://www.blog.example.co.uk/articles?id=42&utm_source=newsletter#intro
becomes
example.co.uk

Why Normalize Root Domains?

Deduplicate URL Lists

Backlink exports, analytics reports, search results, and crawl data often contain many URLs from the same site. Normalizing to root domains lets you count unique websites instead of unique pages.

Clean SEO and Outreach Data

SEO workflows usually care about referring domains, prospects, competitors, or publisher websites. A normalized root domain is easier to group, filter, enrich, and compare across tools.

Make Reporting Consistent

Without normalization, http://example.com, https://www.example.com/, and https://blog.example.com/post can appear as separate records. Normalization collapses them into one value.

Root Domain Normalization Rules

StepInput PartAction
1ProtocolRemove http, https, and ftp prefixes.
2Path and filenameRemove everything after the hostname.
3Query string and fragmentRemove tracking parameters, IDs, and hash fragments.
4www prefixNormalize www.example.com to example.com.
5SubdomainsStrip subdomains unless you intentionally need hostnames.

Before and After Examples

URLNormalized Root Domain
https://www.example.com/blog/post?utm_source=emailexample.com
http://shop.example.com/products/123#reviewsexample.com
https://news.bbc.co.uk/sport/footballbbc.co.uk
https://docs.github.com/en/actions?query=testgithub.com
https://mail.yahoo.co.jp/loginyahoo.co.jp

How to Extract and Normalize Root Domains

1Use a Free URL to Domain Tool

Paste one or more URLs into the tool, click Extract Domains, and copy or export the normalized root domains. This is the fastest option for audits, spreadsheets, and one-off cleanup tasks.

  1. Paste URLs, one per line or mixed inside text.
  2. Run the extractor.
  3. Review the normalized root domains.
  4. Copy the results or export them as CSV.

2JavaScript Approach

The built-in URL API can isolate the hostname. For production root-domain logic, pair it with a Public Suffix List parser.

const input = 'https://www.blog.example.co.uk/path?utm_source=email';
const hostname = new URL(input).hostname.replace(/^www\./, '');

// Basic parser for simple cases:
const parts = hostname.split('.');
const simpleRoot = parts.slice(-2).join('.');

// Use a PSL-aware library for real datasets:
// psl.parse(hostname).domain -> 'example.co.uk'

3Python Approach

In Python, tldextract is a common choice because it handles public suffixes.

import tldextract

url = 'https://www.blog.example.co.uk/path?utm_source=email'
parts = tldextract.extract(url)
root_domain = f'{parts.domain}.{parts.suffix}'

print(root_domain)  # example.co.uk

Edge Cases to Watch For

Multi-Part Suffixes

Domains like example.co.uk and example.com.au need suffix-aware parsing. Splitting on dots and taking the last two parts is not enough.

Subdomains That Matter

For SEO deduplication, root domains are usually best. For security review, app inventory, or hosting analysis, you may need to preserve full hostnames such as api.example.com.

Malformed URLs

Real datasets often include missing protocols, trailing punctuation, copied email text, or markdown links. A good extractor should handle plain domains and URLs embedded in text, not only perfectly formatted links.

Try the Free URL to Domain Tool

Paste full URLs below and convert them into normalized root domains. The tool works in your browser and supports bulk input.

Frequently Asked Questions

What is the difference between a hostname and a root domain?

A hostname can include subdomains, such as blog.example.com. The root domain is the registered domain, such as example.com.

Does normalization remove tracking parameters?

Yes. Since root domain extraction only keeps the domain, query strings such as utm_source, fbclid, and other tracking parameters are removed.

Is root domain normalization safe for all use cases?

It is ideal for deduplication, SEO reporting, and prospect lists. If your workflow depends on individual hosts or subdomains, keep a hostname column alongside the root domain column.

Related Guides