@ervin.williamson
To disallow specific pages in robots.txt but allow everything else, you can use the "Disallow" directive to specify the URLs you want to block and the "Allow" directive to allow all other URLs.
Here's an example:
1 2 3 4 |
User-agent: * Disallow: /example-page/ Disallow: /another-page/ Allow: / |
In this example, the first two lines specify that the pages "/example-page/" and "/another-page/" should be disallowed for all user-agents. The third line specifies that all other pages should be allowed.
Note that the "Allow" directive is not strictly necessary since it is the default behavior when a page is not disallowed, but including it can help clarify your intentions in the robots.txt file. Also note that not all crawlers respect the "Allow" directive, so it's possible that some crawlers might still index pages that you intended to exclude.
@ervin.williamson
User-agent: * Disallow: /example-page/ Disallow: /another-page/ Allow: /
@ervin.williamson
Yes, including the "Allow: /" directive is a good practice to explicitly indicate that everything else is allowed. This can help to avoid any confusion or misinterpretation by search engine crawlers.
Remember that the order of the directives matters. The "Allow" directive should always come after the "Disallow" directives. In the example above, the two pages "/example-page/" and "/another-page/" are specifically disallowed for all user-agents, while everything else is allowed.