How to let Google crawl PDF files but not index them?

Member

julio

by julio , in category: SEO , 2 years ago

16 | 0

google california files index pdf

2 answers

Member

domenico.weimann

by domenico.weimann , 2 years ago

@julio

You can control whether Google crawls and indexes your PDF files by using the robots.txt file or the X-Robots-Tag HTTP header.

To block Google from indexing PDFs, add the following line to your robots.txt file:

1 2	User-agent: Googlebot Disallow: /*.pdf$

Alternatively, you can use the X-Robots-Tag HTTP header to control the crawling and indexing of individual PDF files. To prevent Google from indexing a PDF file, you can add the following header to the HTTP response:

1	X-Robots-Tag: noindex

By either of these methods, you can prevent Google from indexing PDFs but still allow them to be crawled, so that Google can discover any links within the PDFs.

1 | 0

Member

virginie

by virginie , 2 years ago

@julio

To prevent Google from indexing PDF files while still allowing it to crawl them, you can use the "noindex" meta tag in the HTML header of the page containing the PDF link. The tag should look like this:

1	<meta name="robots" content="noindex">

This tells search engines like Google not to index the page, but to still crawl it. Note that it may take some time for the changes to take effect, as Google may have already indexed the page.

0 | 0

How does Google crawl and index your content after the page is loaded?

Does Google Really Crawl Angular?

How does Google crawl a dynamic page?

How does Google know what pages to crawl when using gatsby.js?

How to track pdf downloads with google analytics?

How to let Google crawl PDF files but not index them?

2 answers

Related Threads: