Overwrite robots.txt with CloudFlare workers

CloudFlare workers handle http requests & overwrite responses on the fly. Use it to overwrite robots.txt file to prevent crawlers on your self-hosted instances

Robots.txt is a text file that follows the robots exclusion standard to prevent well-known internet crawlers from scanning specific locations on your website. They serve as guidelines for robots who respect it and do not block general purpose crawlers and scrapers. Some directives on robots.txt also offer instructions about crawl-delay and sitemaps of a site.

Sometimes, you cannot change the contents of the robots.txt file. One scenario is when you set up a self hosted instance, and you don’t want it showing up on search results. But you don’t know where to change the contents of the files. Self hosted Reddit privacy frontend, Libreddit, builds robots.txt during compilation. Less technically inclined individuals would not want to mess with the code to cause problems.

However, if you use CloudFlare, there’s a solution. CloudFlare acts as a reverse proxy and it offers serverless workers, that handle http requests and overwrite responses on the fly. CloudFlare offers a generous worker allowance on their free tier of 100,000 requests per day, which is enough for something this simple.

You’ll need a CloudFlare account and a domain operating with it. Navigate to the workers section on the sidebar. Create a new service, name it, and go to edit its code. The gif below shows the service creation and edit process.

Now paste the following code in the editor on the left side. The code is simple. It takes a request and returns a txt file with the content of the `textContent` variable. Edit the content inside the back tick quotes for the robots.txt content.

addEventListener("fetch", event => {
  event.respondWith(handleRequest(event.request))
})

async function handleRequest(request) {
  const init = {
    headers: {
      'content-type': 'text/plain;charset=UTF-8',
      'cache-control': 'max-age=31536000',
      'X-Frame-Options': 'SAMEORIGIN',
      'Referrer-Policy': 'no-referrer',
      'content-security-policy': 'upgrade-insecure-requests',
      'X-XSS-Protection': '1; mode=block',
    },
  }
  return new Response(textContent,init)
}

const textContent =  `User-agent: *
Disallow: /
`

Send a request to the service to see if it returns the text. If it works as expected, save and deploy.

Screenshot testing the response of the cloudflare worker for robots.txt
The worker responding with the content of the robots.txt

Now, you need to create a route the worker for your domains robots.txt file. Navigate to the ‘Triggers‘ section of the worker you just created. It should look like the image below. Post the link to your robots.txt file as ‘*.yourdomain.com/robots.txt‘ and choose the same domain from the drop down.

Screenshot of how to add a route for your domains robots.txt file as a trigger for cloudflare worker
Adding a route for your domains robots.txt file

Once set, you should be able to navigate to your robots.txt link on your website to see the changes. You might want to try it on a private window as the cache of your previous robots.txt file wont affect it. That’s it, you have a new robots.txt on your website which you can edit right from your CloudFlare dashboard. As CloudFlare acts as a reverse proxy and it has CDN all over the world, this requests take milliseconds and will keep some misbehaving robots out of your website.

Leave a Reply