Last month, I reviewed a website and found 17 broken links that could affect user experience and SEO rankings. Using a broken link checker tool available online, I scanned 50 pages of the website and found 254 links.
To scan more than 50 pages, I have to pay a subscription fee of ₹375. Rather than using expensive tools, I decided to build my own broken link checker. In just 1 hour, I built a tool that solved the problem and taught me a lot about CORS restrictions and Node.js servers.
Hello everyone, I am Hemanta. Today, I'll show you exactly how to build a broken link checker tool and how it actually works behind the scenes.
Broken Link: Imagine sending a letter to an address that doesn't exist anymore. Then the post office returns the letter, just like a broken link. It's a URL that points to content that's gone, unreachable, or deleted.
Why I Built This SEO Tool?
The first reason is of course subscription fees. Also, Instead of relying on third‑party broken link checkers, I wanted to learn how the checking process actually works. The main challenge is how to collect the website's content, because CORS (Cross-Origin Resource Sharing) blocks direct requests from the browser.
To solve this problem, I set up a Node.js proxy server, which acts as an intermediary between the tool and the target website.
By building this yourself, you'll learn:
- How web scraping actually works
- How to bypass CORS safely
- How to parse HTML with
DOMParser() - Node.js server implementation
How My Broken Link Checker Works
Before we build the tool step-by-step, let me show you exactly how the entire system and server works together. Understanding this will make the code much clearer.
Step 1: User Enters a Website URLThis tool displays an input box where the user pastes a website URL (e.g. https://codehemu.in). When they click "Scan Links", the JavaScript reads the URL and starts the process.
At the beginning of the process, the browser will request the proxy server to get the HTML from the target website.
Here is how the browser sends:
fetch('http://localhost:3000/?url=https://example.com')
Step 3: Proxy Server Fetches the Website
The proxy server will request the target website server and return the HTML content. Since it's server-to-server, CORS doesn't apply.
Step 4: Extracts All LinksThis tool uses DOMParser() to read HTML and find every link on the page.
This tool doesn't stop at the target link given by the user. It follows each page's internal links and checks them one by one.
Each time a page is fetched through the proxy, the HTML is parsed. Then, all anchor links are collected.
For each link, the tool tests the response. Any link that fails to load or returns an error code is highlighted in red. Opposite side, links that return a normal page are shown as green.
What do you need?
To create this tool, you need to have basic knowledge of HTML, JavaScript, and CSS. But don't worry, every step will be explained.
You will also need a few things installed on your computer:
- VS Code Editor:
Visual Studio (VS) Code is a free code editor. It will help you with coding and project management.
- Node.js:
It allows you to run JavaScript outside of a browser. We will use Node.js to run CORS Proxy server.
How I Built This SEO Tool in 1 Hour
It took me 1 hour to create this tool, but if you follow the steps below, it may take 15 to 20 minutes to create your tool.
So, let's start creating the broken link checker tool.
00:00 - Project Setup
- Create a New Folder: To create the Broken Link Checker Tool, start by creating a folder on your computer. You can name the folder broken-link-checker.
- Create Files: Create 4 files inside the broken-link-checker folder. These are:
- index.html
- style.css
- script.js
- server.js
- Open Code Editor: Next, open this project folder in your VS Code editor.
Building the Server
The main challenge in building this tool is getting the HTML from the target website. You may be wondering: We can fetch websites directly from JavaScript!
Good idea, but I'll show you what happens if you try:
// This JavaScript looks simple, right?
fetch('https://www.codehemu.in/')
.then(response => response.text())
.then(html => console.log(html))
But you get this error:
Access to fetch at 'https://www.codehemu.in/' from origin 'http://localhost:3000'
has been blocked by CORS policy: No 'Access-Control-Allow-Origin'
header is present on the requested resource.
GET https://www.codehemu.in/ net::ERR_FAILED 200 (OK)
This error is displayed by a security rule called CORS, which prevents unauthorized access to other websites.
The CORS feature is actually good for browser security, but it's a problem for this tool.
The CORS Solution: A Proxy Server
It's not possible to fetch the website HTML content directly. So we'll create a middleman (proxy server) that does the fetching for the tool.
Here's how server works:
CORS only blocks browser-to-server requests. Server-to-server requests are perfectly fine!
Set up Node.js proxy server
Open the server.js file in VS code editor and paste the following JavaScript code:
const httpModule = require("http");
const httpsModule = require("https");
const { URL } = require("url");
function fetchWebContent(targetWebsite, callback) {
const client = targetWebsite.startsWith("https") ? httpsModule : httpModule;
client.get(targetWebsite, {
headers: {
"User-Agent": "Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36",
}
}, (resp) => {
let PageContent = "";
resp.on("data", chunk => PageContent += chunk);
resp.on("end", () => callback(null, PageContent));
}).on("error", err => callback(err, null));
}
const server = httpModule.createServer((req, res) => {
const reqUrl = new URL(req.url, `http://${req.headers.host}`);
const targetWebsite = reqUrl.searchParams.get("url");
if (!targetWebsite) {
res.writeHead(400, { "Content-Type": "application/json" });
return res.end(JSON.stringify({ error: "Plz.. Add ?url= parameter" }));
}
fetchWebContent(targetWebsite, (err, result) => {
if (err) {
res.writeHead(500, { "Content-Type": "application/json" });
return res.end(JSON.stringify({ error: "Failed to fetch target Website"}));
}
res.writeHead(200, {
"Access-Control-Allow-Origin": "*",
"Content-Type": "application/json"
});
res.end(JSON.stringify({ contents: result }));
});
});
server.listen(3000, () => {
console.log("Proxy running - http://localhost:3000/?url=https://example.com");
});
Here's what each part does:
- Importing Required Modules
const httpModule = require("http");
const httpsModule = require("https");
const { URL } = require("url");
- Imports Node.js modules used to make HTTP or HTTPS requests.
- Imports the URL class so that the server can read query parameters like
?url=.
- The fetchWebContent() Function
function fetchWebContent(requestedWebsite, callback) {
const client = requestedWebsite.startsWith("https") ? https : http;
This decides the correct modules based on the URL (HTTP or HTTPS).
client.get(requestedWebsite, {
headers: {
"User-Agent": "Mozilla/5.0..."
}
}, (resp) => {
Why use the User-Agent header? Many websites check this header to identify the user and browser. If you don't set a User-Agent, the target website will think your proxy server is a bot and block the request.
resp.on("data", chunk => PageContent += chunk);
resp.on("end", () => callback(null, PageContent));
resp.on("data"): Collects data from the target website.resp.on("end"): When data collection is complete, it returns the data using a callback function.
- Creating the Proxy Server
const server = http.createServer((req, res) => {
const reqUrl = new URL(req.url, `http://${req.headers.host}`);
const requestedWebsite = reqUrl.searchParams.get("url");
This reads the ?url=... parameter from the request.
Suppose someone visits http://localhost:3000/?url=https://codehemu.in/. Then the requestedWebsite will be https://codehemu.in/
- The CORS Headers
res.writeHead(200, {
"Access-Control-Allow-Origin": "*",
"Content-Type": "application/json"
});
The Access-Control-Allow-Origin: * header tells the browser that it's safe for a website running at localhost:3000 to access data from this response. If this header is missing, the browser will block access.
- Starting the Server
server.listen(3000, () => {
console.log("Proxy running - http://localhost:3000/?url=https://example.com");
});
This starts listening on port 3000.
Building the Crawler
Now that the proxy server is ready, the next step is to create a crawler that finds and checks all the links on a website.
Open the script.js file in VS code editor and paste the following JavaScript code:
const corsProxy = 'http://localhost:3000/?url=';
const resultsDiv = document.getElementById('results');
const visited = new Set();
async function crawl(startUrl) {
const queue = [startUrl];
resultsDiv.innerHTML = `<p>Starting crawl...</p>`;
while (queue.length > 0) {
const url = queue.shift();
if (visited.has(url)) continue;
visited.add(url);
try {
const response = await fetch(corsProxy + encodeURIComponent(url));
const htmlContent = await response.json();
if (!htmlContent.contents) {
resultsDiv.innerHTML += `
<p class="broken">Broken: ${url}</p>`;
continue;
}
resultsDiv.innerHTML += `
<p class="valid">Valid: ${url}</p>`;
if (!url.startsWith(startUrl)) continue;
const parser = new DOMParser();
const doc = parser.parseFromString(htmlContent.contents, 'text/html');
const links = Array.from(doc.querySelectorAll('a[href]'))
.map(a => new URL(a.getAttribute('href'), url).href)
.filter(link => link.startsWith('http'));
queue.push(...links);
} catch (err) {
resultsDiv.innerHTML += `
<p class="broken">Error: ${url} - ${err.message}</p>`;
}
}
resultsDiv.innerHTML += `<p>Finished crawling. Total scanned: ${visited.size}</p>`;
}
document.getElementById('checkLinks').addEventListener('click', () => {
const startUrl = document.getElementById('urlInput').value.trim();
if (!startUrl) return alert('Please enter a valid URL.');
visited.clear();
crawl(startUrl);
});
A Closer Look at the Crawler Code:
- The Queue System
const queue = [startUrl];
while (queue.length > 0) {
const url = queue.shift();
The loop starts with the user's URL in queue. On each repeat, shift() removes and returns the first URL in the list. Because newly discovered links are added to the end of queue, the crawler processes URLs in the same order they are found.
- The Visited Set
if (visited.has(url)) continue;
visited.add(url);
This pattern checks previously processed links, stops duplicate scans, and avoids infinite loops in the crawler.
visited is a Set() that keeps track of URLs processed. If the current url is in visited, the loop skips the rest of the current repeated and moves on to the next URL.
- Fetching Through Proxy
const response = await fetch(corsProxy + encodeURIComponent(url));
const htmlContent = await response.json();
This is fetching webpage content using the Node.js proxy server.
encodeURIComponent(): Safely encodes the URL for the query parameter.response.json(): Converts the server response into JSON format.
- Parsing HTML with DOMParser
const parser = new DOMParser();
const doc = parser.parseFromString(htmlContent.contents, 'text/html');
DOMParser(): It creates a parser to read HTML content.
This code converts the fetched HTML content into a document.
- Extracting Links
const links = Array.from(doc.querySelectorAll('a[href]'))
.map(a => new URL(a.getAttribute('href'), url).href)
.filter(link => link.startsWith('http'));
doc.querySelectorAll('a[href]'): Finds all anchor element (<a href="...">) links on the document.filter(link..): Removes unsupported links from the found links.
Building the Interface
At this point, our proxy server and crawler logic is ready. Now we need an interface so users can interact with our tool.
Step 1: Update HTML Structure- Open the index.html file in VS code editor.
-
You can type it manually or copy the HTML code.
<!DOCTYPE html> <html> <head> <title>Broken Link Checker Tool</title> <link rel="stylesheet" href="style.css"> </head> <body> <h1>Broken Link Checker</h1> <input type="text" id="urlInput" placeholder="https://example.com"> <button id="checkLinks">Scan Links</button> <div id="results"></div> <script src="script.js"></script> </body> </html>
What Each HTML Element Does:
<h1>Broken Link Checker</h1>: Displays the main heading of the tool.<input id="urlInput" ...>: An input element where the user enters the URL of a website to scan for broken links.<button id="checkLinks">Scan Links</button>: A button that starts the scanning process when clicked.<div id="results"></div>: An empty container, where the tool will display the scan results.
Here our tool progress up to now:
Step 2: Style with CSSOpen the styles.css file in VS code editor and paste the following code:
body {
display: flex;
flex-direction: column;
font-family: sans-serif;
max-width: 800px;
margin: 40px auto;
padding: 20px;
background-color: #f9f9f9;
}
h1 {
text-align: center;
color: #2c3e50;
}
input[type="text"] {
padding: 10px;
font-size: 1em;
border: 1px solid #ccc;
border-radius: 5px;
margin-right: 10px;
}
button {
padding: 10px 20px;
font-size: 1em;
margin: 10px;
border: none;
background-color: #56bbff;
color: white;
border-radius: 5px;
cursor: pointer;
}
button:hover {
background-color: #2980b9;
}
#results {
margin-top: 20px;
background-color: #fff;
padding: 20px;
border-radius: 10px;
box-shadow: 0 0 10px #0000001a;
}
#results p {
line-height: 1.5;
font-size: 1em;
}
.valid {
color: green;
}
.broken {
color: red;
}
What Each CSS Style Does:
body: Styles the document body of the webpage. It organized everything and gives the page a light gray background.margin: 40px auto;: Centralizes the main contents of the tool.h1: Centers the title and changes the text color.input[type="text"]: Styles the URL input box to make typing easier.button: Styles the broken link scan button. It's background color is blue (hex code #56bbff).background-color: #2980b9;: Slightly darker shade for interaction feedback..validand.broken: Uses green for working links and red for broken links to clearly show results.
Here our tool progress up to now:
Testing and Results
- Start the proxy server (Node.js): Open the VS Code terminal and run the following command:
node server.js
You should see:
This starts the CORS proxy server required for fetching websites.
- Open your HTML file (index.html) with Live Server in VS Code.
- Try scanning:
- Enter URL: https://www.codehemu.com
- Click on the "Scan Links" button
- See live results!
Final Tool
Congratulations! You've successfully created a broken link checker tool using HTML, CSS, and JavaScript with a Node.js proxy server.
This project is a great way to fetch content from external websites and parse links. You can further enhance this tool by adding features like downloading broken link reports or scanning for specific error codes.
Keep exploring and happy coding!











0 Comments