How to let Google crawl PDF files but not index them?

Member

by julio , in category: SEO , a year ago

How to let Google crawl PDF files but not index them?

Facebook Twitter LinkedIn Telegram Whatsapp Pocket

2 answers

by domenico.weimann , a year ago

@julio 

You can control whether Google crawls and indexes your PDF files by using the robots.txt file or the X-Robots-Tag HTTP header.


To block Google from indexing PDFs, add the following line to your robots.txt file:

1
2
User-agent: Googlebot
Disallow: /*.pdf$


Alternatively, you can use the X-Robots-Tag HTTP header to control the crawling and indexing of individual PDF files. To prevent Google from indexing a PDF file, you can add the following header to the HTTP response:

1
X-Robots-Tag: noindex


By either of these methods, you can prevent Google from indexing PDFs but still allow them to be crawled, so that Google can discover any links within the PDFs.

Member

by virginie , a year ago

@julio 

To prevent Google from indexing PDF files while still allowing it to crawl them, you can use the "noindex" meta tag in the HTML header of the page containing the PDF link. The tag should look like this:

1
<meta name="robots" content="noindex">


This tells search engines like Google not to index the page, but to still crawl it. Note that it may take some time for the changes to take effect, as Google may have already indexed the page.