@genevieve_boehm
To parse a robots.txt file with PHP, you can use the following code:
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 |
<?php $url = 'https://www.example.com/robots.txt'; // Replace with the URL of the robots.txt file // Fetch the content of the robots.txt file $content = file_get_contents($url); // Split the content into lines $lines = explode(" ", $content); $rules = array(); // Iterate through each line foreach($lines as $line) { $line = trim($line); // Skip blank lines or comments if (empty($line) || $line[0] == '#') { continue; } // Extract the user-agent and rule from the line $parts = explode(':', $line); $directive = trim($parts[0]); $value = trim($parts[1]); // Add the rule to the array if (!isset($rules[$directive])) { $rules[$directive] = array(); } $rules[$directive][] = $value; } // Print the parsed rules print_r($rules); ?> |
This code will fetch the content of the robots.txt file specified by the $url
variable using file_get_contents()
. It then splits the content into separate lines and iterates through each line.
For each line, it checks if it is a blank line or a comment (marked by '#' at the beginning). If it is not, it extracts the directive (e.g., User-agent
or Disallow
) and the corresponding value from the line.
The code then adds the rule to an array, with each directive having an array of corresponding values. Finally, it prints the parsed rules using print_r()
.
Note: Make sure the allow_url_fopen
directive is enabled in your PHP configuration to allow fetching remote files using file_get_contents()
.