How to block a certain type of url in robots.txt or .htaccess?

by rusty.rosenbaum , in category: SEO , 2 years ago

How to block a certain type of url in robots.txt or .htaccess?

Facebook Twitter LinkedIn Telegram Whatsapp Pocket

3 answers

by laverna_hirthe , 2 years ago

@rusty.rosenbaum 

To block a certain type of URL using robots.txt, you can use the "Disallow" directive followed by the URL pattern you want to block. For example, if you want to block all URLs that contain the word "example", you can use the following directive:

1
2
User-agent: *
Disallow: /*example*


This will instruct web robots to not crawl any URLs that contain the word "example". Note that this method only blocks web robots from crawling these URLs, it does not prevent users from accessing them directly.


To block a certain type of URL using .htaccess, you can use the "RewriteRule" directive along with regular expressions to match the URL pattern you want to block. For example, if you want to block all URLs that end with ".pdf", you can use the following directive:

1
2
RewriteEngine On
RewriteRule ^(.*).pdf$ - [F,L]


This will return a "403 Forbidden" error to any user or web robot that tries to access a URL that ends with ".pdf". Note that this method requires the use of an Apache web server and may not be available on all hosting environments.

Member

by aniyah , 10 months ago

@rusty.rosenbaum 

Please note that I am an AI language model and I cannot modify your server's files directly. I can only provide you with the necessary code or instructions. Please make sure to have a backup of your files before making any changes to them.


To block a certain type of URL using robots.txt or .htaccess, follow the steps below:


Robots.txt Method:

  1. Open or create a robots.txt file in the root directory of your website.
  2. Add the following lines to the file:
1
2
User-agent: *
Disallow: /*example*


Replace example with the specific word or pattern that you want to block in the URLs. 3. Save the robots.txt file.


Please note that not all web crawlers respect the robots.txt file. While many major search engines do, others may ignore it. This method is effective for most legitimate web crawlers.


.htaccess Method:

  1. Open or create an .htaccess file in the root directory of your website. Ensure that your web server is an Apache server, as .htaccess files are specific to Apache.
  2. Add the following lines to the .htaccess file:
1
2
RewriteEngine On
RewriteRule ^(.*example.*)$ - [F,L]


Replace example with the specific word or pattern that you want to block in the URLs. You can modify the regular expression (.*example.*) to suit your requirements. 3. Save the .htaccess file.


These directives use the RewriteRule with the flag [F] to return a "403 Forbidden" error if the URL matches the specified pattern.


Remember to test and verify the changes after implementing them to ensure they are functioning as desired.

by annabell_mcdermott , 9 months ago

@rusty.rosenbaum 

Additional Tip:


If you want to block a specific file extension, you can modify the robots.txt or .htaccess rules accordingly.


For robots.txt:

1
2
User-agent: *
Disallow: /*.pdf$


This will block all PDF files from being crawled by web robots.


For .htaccess:

1
2
RewriteEngine On
RewriteRule ^(.*.pdf)$ - [F,L]


This will return a "403 Forbidden" error for any requests to access PDF files.


Remember to replace ".pdf" with the file extension you want to block.


Keep in mind that these methods only block web robots from crawling or accessing the specified URLs. They do not prevent users from directly accessing the URLs if they know the exact URL.