The robots.txt file is a powerful but often overlooked tool that gives website owners control over how search engine bots interact with their site. Whether you're managing SEO strategy or protecting sensitive content, understanding how robots.txt works is essential.
- What Is Robots.txt
- Why Is Robots.txt Important
- How Does Robots.txt Work
- Key Robots.txt Directives
- Example Robots.txt File
- How to View a Website’s Robots.txt File
- How to Create a Robots.txt File
What Is Robots.txt
The robots.txt file is a plain text file placed in the root directory of your website. It provides instructions to search engine crawlers (also known as bots or spiders) about which parts of your site they are allowed to crawl and index, and which parts they should avoid.
Why is Robots.txt Important
A properly configured robots.txt file helps ensure your website is crawled efficiently and that only relevant or public-facing content is indexed. Here are a few common uses:
- Optimize Crawling
Direct crawlers to focus on important pages and avoid wasting resources on low-priority or duplicate content.
- Block Unwanted Pages
Prevent search engines from accessing private, internal, or temporary pages, such as login portals or staging environments.
- Restrict Access to Resources
Exclude non-HTML assets like images, PDFs, or videos from being indexed, if desired.
How Does Robots.txt Work
When a search engine crawler visits your site, the first thing it checks is the robots.txt file, located at:
https://yourdomain.com/robots.txt
The crawler reads the rules defined in this file to determine which URLs it is allowed to access.
Key Robots.txt Directives
Here’s a breakdown of the most common directives used in a robots.txt file:
User-agent
This directive specifies which crawler the rules apply to. You can target specific bots (e.g., Googlebot) or apply the rules to all bots using an asterisk (*).
User-agent: Googlebot
Disallow
Prevents bots from crawling certain directories or pages.
Disallow: /admin/
To allow access to everything (no restrictions), leave the value blank:
Disallow:
Allow
Used to override a Disallow rule for specific files or directories, mainly supported by Googlebot.
Disallow: /blog/
Allow: /blog/featured-article.html
Sitemap
Specifies the URL of your sitemap to help search engines discover your content more efficiently.
Sitemap: https://yourdomain.com/sitemap.xml
Note: If the sitemap URL is not included in the robots.txt file, Ubersuggest will report a “Missing Sitemap” issue in the Site Audit.
This happens because Ubersuggest looks for the sitemap reference specifically within the robots.txt file. To avoid this alert, make sure the sitemap URL is listed there.
Crawl-delay
Tells bots to wait a specified number of seconds between crawl requests, helping reduce server load.
Crawl-delay: 10
Note: Googlebot and Ubersuggest bot do not support Crawl-delay.
For Google, use the crawl rate settings in Google Search Console. However, other crawlers like Bing and Yandex still honor this directive.
Example Robots.txt File
Here’s an example of a more complete robots.txt setup:
User-agent: Googlebot
Disallow: /clients/
User-agent: *
Disallow: /archive/
Disallow: /support/
Sitemap: https://yourdomain.com/sitemap.xml
This configuration blocks Googlebot from the /clients/ directory while blocking all other bots from /archive/ and /support/. It also provides the sitemap location to help with indexing.
How to View a Website’s Robots.txt File
To check the robots.txt file of any website, simply go to:
https://example.com/robots.txt
This file is publicly accessible by design and is typically located in the root directory of the site.
How to Create a Robots.txt File
If You're Managing a Custom Website:
If your website is custom-built or you're managing the server directly, creating a robots.txt file is straightforward:
- Use a plain text editor (like Notepad, Sublime Text, or VS Code). Avoid using rich-text editors like Microsoft Word, as they can add formatting that breaks the file.
- Name the file exactly robots.txt.
- Add your directives, using the format shown above.
- Upload the file to the root directory of your domain (e.g., https://yourdomain.com/robots.txt).
- You can also use online tools to generate a valid robots.txt file if you're unsure where to start.
If You're Using a CMS (e.g., WordPress, Shopify, Wix)
For CMS platforms, the method for editing or creating a robots.txt file may vary or be limited:
- WordPress: If you're using an SEO plugin like Yoast or Rank Math, you can edit your robots.txt file directly within the plugin's settings. Without a plugin, WordPress auto-generates a virtual file that may require custom development to override.
- Shopify: Shopify auto-generates a robots.txt file that can’t be edited directly. However, you may be able to request changes via their support or use theme customizations (depending on your plan).
- Wix / Squarespace / Others: Many website builders provide limited access to robots.txt. You’ll need to check their documentation or reach out to support for guidance on how to make changes, if supported at all.
Tip: If you're using a CMS and aren't sure how to proceed, it's best to contact your platform's support team or consult their help center for the recommended method to create or edit your robots.txt file.
While robots.txt helps control crawling behavior, it does not secure your content.
If there are pages you don’t want anyone to access, make sure they are protected through authentication or server-level permissions, not just by listing them in robots.txt.
If you’re experiencing any issues with your robots.txt
settings or configuring allowances for Ubersuggest, feel free to reach out to our support team.
Simply click the “Help” button located at the bottom-right corner of this article, and we’re here to assist you.