How to parse a robots.txt file with php?

by genevieve_boehm , in category: SEO , a year ago

How to parse a robots.txt file with php?

Facebook Twitter LinkedIn Telegram Whatsapp Pocket

1 answer

Member

by jamison , a year ago

@genevieve_boehm 

To parse a robots.txt file with PHP, you can use the following code:

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
<?php
$url = 'https://www.example.com/robots.txt'; // Replace with the URL of the robots.txt file

// Fetch the content of the robots.txt file
$content = file_get_contents($url);

// Split the content into lines
$lines = explode("
", $content);

$rules = array();

// Iterate through each line
foreach($lines as $line) {
    $line = trim($line);

    // Skip blank lines or comments
    if (empty($line) || $line[0] == '#') {
        continue;
    }

    // Extract the user-agent and rule from the line
    $parts = explode(':', $line);
    $directive = trim($parts[0]);
    $value = trim($parts[1]);

    // Add the rule to the array
    if (!isset($rules[$directive])) {
        $rules[$directive] = array();
    }
    $rules[$directive][] = $value;
}

// Print the parsed rules
print_r($rules);
?>


This code will fetch the content of the robots.txt file specified by the $url variable using file_get_contents(). It then splits the content into separate lines and iterates through each line.


For each line, it checks if it is a blank line or a comment (marked by '#' at the beginning). If it is not, it extracts the directive (e.g., User-agent or Disallow) and the corresponding value from the line.


The code then adds the rule to an array, with each directive having an array of corresponding values. Finally, it prints the parsed rules using print_r().


Note: Make sure the allow_url_fopen directive is enabled in your PHP configuration to allow fetching remote files using file_get_contents().