How to read sitemap url text from robots.txt file?

by ervin.williamson , in category: SEO , 6 months ago

How to read sitemap url text from robots.txt file?

Facebook Twitter LinkedIn Telegram Whatsapp Pocket

1 answer

Member

by hanna , 6 months ago

@ervin.williamson 

To read sitemap URL text from the robots.txt file, follow the steps below:

  1. Locate the robots.txt file: The robots.txt file is typically present in the root directory of your website. To access it, append "/robots.txt" to the domain name (e.g., www.example.com/robots.txt).
  2. Read the robots.txt file: Open the robots.txt file using a text editor or any file reading method appropriate for your programming language.
  3. Find the sitemap directive: Look for the line that starts with "Sitemap:". This directive specifies the URL(s) of the XML sitemap(s) for your site. For example, if the line is "Sitemap: https://www.example.com/sitemap.xml", then the URL is "https://www.example.com/sitemap.xml".
  4. Extract the sitemap URL: Parse the robots.txt file and extract the text following the "Sitemap:" directive. Remove any leading or trailing spaces, and store the URL for further use.


Here's an example in Python:

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
import requests

# Retrieve the robots.txt file
response = requests.get('https://www.example.com/robots.txt')
robots_txt = response.text

# Find the sitemap directive
for line in robots_txt.split('
'):
    if line.lower().startswith('sitemap:'):
        sitemap_url = line.split(': ')[1].strip()
        print(sitemap_url)
        break


Make sure to replace 'https://www.example.com' with your actual site domain.