Deactivate search engine crawling

Prevent search engines from indexing pages, folders, your entire site, or just your webflow.io subdomain.

You have the ability to manage which pages search engines explore on your website using 2 methods: by creating a robots.txt file or by inserting a noindex tag to specific pages. This way, you can stop search engines from exploring and indexing particular pages, directories, your entire site, or your webflow.io subdomain. This can be beneficial for concealing pages, such as your website’s 404 page, from being indexed and displayed in search results.

Crucial: Your site’s content may still get indexed, even without being explored. This occurs when a search engine is aware of your content, either because it was previously published or there are links to that content from other online sources. To ensure that a formerly indexed page does not get indexed, avoid adding it to the robots.txt file. Instead, utilize the Sitemap indexing toggle to eliminate that content from Google’s index.

Within this tutorial: 

  1. How to deactivate indexing of the Webflow subdomain
  2. How to activate or deactivate indexing of site pages
  3. Recommendations for privacy protection
  4. Common questions and tips for solving issues

How to deactivate indexing of the Webflow subdomain

To prevent Google and other search engines from indexing your website’s webflow.io subdomain, disable indexing in your Site settings.

  1. Access Site settings > SEO tab > Indexing section
  2. Toggle Disable Webflow subdomain indexing to “Yes” 
  3. Click Save changes and publish your website

This action will publish a specific robots.txt solely on the subdomain, instructing search engines to disregard this domain.

Note: To prevent search engine indexing of the Webflow subdomain, you will require a Site plan or paid Workspace. Get more information about Site and Workspace plans.

How to activate or deactivate indexing of site pages

There are 2 ways to deactivate indexing of site pages:

  • By utilizing the Sitemap indexing toggle in Page settings
  • By generating a robots.txt file

Note that if you deactivate indexing of a site page through a robots.txt file, the page will still be included in your website’s auto-generated sitemap (if the sitemap is enabled). Moreover, if you’ve previously added a noindex tag to a site page via custom code, the page will still be part of your website’s auto-generated sitemap (unless you switch on the Sitemap indexing toggle).

How to deactivate indexing of site pages with the Sitemap indexing toggle

Disabling indexing of a static site page with the Sitemap indexing toggle prevents the page from being indexed by search engines and excluded from your website’s sitemap. You can only disable indexing using the toggle if your website’s auto-generated sitemap is enabled.

Note: The Sitemap indexing toggle appends <meta content=”noindex” name=”robots”> to your site page, preventing it from being explored and indexed by search engines.

To stop search engines from indexing specific site pages:

  1. Access the page you wish to prevent Google from indexing
  2. Go to Page settings > SEO settings
  3. Switch off Sitemap indexing
  4. Publish your website

How to enable indexing of site pages with the Sitemap indexing toggle

To permit search engines to index specific site pages:

  1. Navigate to the page you wish to allow Google to index
  2. Visit Page settings > SEO settings
  3. Enable Sitemap indexing
  4. Publish your website

How to create a robots.txt file 

The robots.txt is commonly used to specify the URLs on a website that are not to be explored by search engines. You can also list the sitemap of your website in your robots.txt file to inform search engine crawlers which content they should explore.

Similar to a sitemap, the robots.txt file is located in the main directory of your domain. Once you set it up in your Site settings, Webflow will generate the /robots.txt file for your website.

To generate a robots.txt file:

  1. Navigate to Site settings > SEO tab > Indexing section
  2. Include the desired robots.txt rule(s)
  3. Click Save changes and publish your website

Important: Your site’s content may still get indexed, even without being explored. This occurs when a search engine is aware of your content, either because it was previously published or there are links to that content from other online sources. To ensure that a formerly indexed page does not get indexed, avoid adding it to the robots.txt file. Instead, utilize the Sitemap indexing toggle to eliminate that content from Google’s index.

Robots.txt rules

You can incorporate any of these rules in your robots.txt file.

  • User-agent: * indicates that this section is applicable to all robots.
  • Disallow:  informs the robot not to explore the website, page, or directory.

To conceal your entire website

User-agent: *

Disallow: /

To hide individual pages

User-agent: *

Disallow: /page-name

To hide an entire directory of pages

User-agent: *

Disallow: /folder-name/

To include a sitemap

Sitemap: https://your-site.com/sitemap.xml

Note: Webflow automatically adds a link to your sitemap in your robots.txt file.
Valuable resources

Discover more helpful robots.txt rules.

Note: It is possible for anyone to access your website’s robots.txt file, potentially identifying and accessing your private content.

Best methods for privacy protection 

If you wish to prevent a specific page or URL on your website from being discovered, avoid using the robots.txt to prevent the URL from being explored. Instead, choose one of the following options: 

  • Enable the Sitemap indexing toggle to prevent search engines from indexing your content and exclude it from the search engine index. 
  • Save pages that contain sensitive content as drafts and avoid publishing them. Protect pages needing publication with passwords. 

FAQ and tips for problem solving

Can a robots.txt file be used to prevent indexing of assets on my Webflow site? 

It’s not feasible to use a robots.txt file to prevent indexing of assets on your Webflow site as the robots.txt file must be on the same domain as the content it applies to (in this case, where the assets are hosted). Webflow serves assets from our global CDN rather than from the custom domain where the robots.txt file is located. 

I deleted the robots.txt file from my Site settings, but it remains visible on my published site. How can I resolve this? 

Once the robots.txt has been created, it cannot be entirely eliminated. However, you can replace it with new rules to enable site crawling, e.g.: 

User-agent: * 

Disallow:

Ensure to save the changes and republish your site. If the issue persists and the old robots.txt rules are still visible on your published site, please reach out to customer support.

Ewan Mak
Latest posts by Ewan Mak (see all)