Extract URLs

Extract all URLs and links from text.

Text Tools

How to Use Extract URLs

  1. 1Paste your text containing URLs
  2. 2Click Extract URLs
  3. 3Copy or download the list of extracted URLs

About Extract URLs

Extract URLs finds and extracts all web URLs and links from any block of text using regex pattern matching. The tool detects URLs starting with http://, https://, or ftp:// and returns a deduplicated, one-per-line list that is ready to copy, analyze, or import into another tool.

The extractor works on any text source — HTML page source, Markdown files, plain text documents, log entries, and data exports. It captures the full URL including query strings and fragments, making it suitable for link auditing and reference extraction.

All extraction runs locally in your browser with no server required. This tool is useful for web scraping link harvesting, finding URLs in log files, extracting references from documents, and auditing links in HTML source code.

Key Features of Extract URLs

  • Extracts all URLs starting with http://, https://, or ftp://
  • Automatically deduplicates the extracted URL list
  • Captures full URLs including query strings and URL fragments
  • Outputs one URL per line for easy copying or importing
  • Works on plain text, HTML source, Markdown, and log files
  • Shows a count of unique URLs found
  • One-click copy button for the full extracted list
  • Runs entirely in-browser with no data transmission

Supported Formats

Input Formats

Plain text containing URLsHTML source code with href and src attributesMarkdown files with inline and reference linksLog entries with request URLs

Output Formats

Deduplicated URL list (one per line)

Detects http://, https://, and ftp:// URLs only. Bare domain names without a protocol (e.g., example.com) are not detected. URLs are extracted as-is, including query parameters.

Examples

Extract links from HTML source code

Pull out all href and src URLs from an HTML page source for a link audit.

Input

<a href="https://example.com/page">Link</a> and <img src="https://cdn.example.com/img.png">

Output

https://example.com/page
https://cdn.example.com/img.png

Extract URLs from log file entries

Find all requested URLs in a server access log for analysis.

Input

2025-01-01 GET https://api.example.com/users 200
2025-01-01 POST https://api.example.com/orders 201

Output

https://api.example.com/users
https://api.example.com/orders

Common Use Cases

  • Extracting all links from an HTML page source for link auditing
  • Finding URLs in server access logs for traffic analysis
  • Pulling references from Markdown or plain text documents
  • Harvesting links from scraped web page content for further crawling
  • Extracting API endpoint URLs from documentation or log files
  • Collecting all external links in a document for broken link checking

Troubleshooting

Bare domain names like "example.com" not being detected

Solution

The tool only detects URLs with explicit protocol prefixes (http://, https://, ftp://). Bare domain names without a protocol are not extracted. Add "https://" to bare domains before extracting if needed.

URLs being cut off at special characters like quotes or angle brackets

Solution

The regex stops the URL at common delimiter characters including quotes, angle brackets, and spaces. URLs that end at these characters are extracted correctly. If a URL is being cut short unexpectedly, check the surrounding context for ambiguous delimiters.

Duplicate URLs still appearing in the output

Solution

Deduplication is exact-match — two URLs that differ by a trailing slash or by query parameter order are treated as different. Normalize the URLs before extracting if you need semantic deduplication.

Frequently Asked Questions

Does it find bare domain names without "http://"?

No. The tool only detects URLs that start with http://, https://, or ftp://. Bare domain names like "example.com" without a protocol prefix are not extracted. This is by design to avoid false positives.

Does it extract URLs from HTML attributes?

Yes. The regex scans for URL patterns regardless of surrounding context, so it detects URLs in href, src, action, data-src, and other attributes as well as plain text.

Are duplicate URLs removed?

Yes. Exact-duplicate URLs are automatically deduplicated. Two URLs that differ only by a trailing slash, query parameter, or case are treated as different URLs — only exact string matches are considered duplicates.

Does it capture query strings and URL fragments?

Yes. The full URL is extracted including query strings (?param=value) and fragments (#section). The URL ends at the first whitespace or common delimiter character like a quote or angle bracket.

Does it work on Markdown files?

Yes. URLs in Markdown links ([text](https://example.com)), reference links, and bare URLs in Markdown text are all detected as long as they start with the recognized protocol prefixes.

Is there a text length limit?

No. The tool processes text of any length locally in your browser. Paste entire HTML pages, logs, or large documents without size concerns.

Is my text stored or sent to a server?

No. All extraction runs in client-side JavaScript. Your text is never uploaded, stored, or transmitted to any server.

Can it extract FTP URLs?

Yes. URLs starting with ftp:// are detected and extracted along with http:// and https:// URLs. Other custom protocols (custom://, myapp://) are not detected.