To monitor a website for changes in Python, you can use the requests library to fetch the website’s content and the hashlib library to create a hash (unique identifier) of the content.
By periodically checking the website and comparing its current hash to the previous one, you can detect if any changes have been made.
Code
import requests import hashlib import time # URL of the website to monitor url = "https://www.example.com" # Interval between checks in seconds check_interval = 60 # 60 seconds (1 minute) # Function to get the hash of the website's content def get_hash(url): try: # Send a GET request to fetch the content response = requests.get(url) response.raise_for_status() # Raise an exception for HTTP errors # Generate a hash of the content return hashlib.md5(response.content).hexdigest() except requests.RequestException as e: print(f"Error accessing {url}: {e}") return None # Monitor for changes in the website content def monitor_website(url, check_interval): print(f"Starting to monitor {url} for changes every {check_interval} seconds.") # Initial content hash last_hash = get_hash(url) if last_hash is None: print("Failed to retrieve the initial content. Exiting.") return while True: time.sleep(check_interval) # Wait for the specified interval current_hash = get_hash(url) # Get the latest content hash # Check if there was an error in fetching the content if current_hash is None: print("Error in fetching content. Skipping this check.") continue # Compare the current hash with the last hash if current_hash != last_hash: print("Change detected on the website!") last_hash = current_hash # Update the last hash with the new one else: print("No change detected.") # Start monitoring the website monitor_website(url, check_interval)
Explanation of the Code
- Import Required Libraries:
- requests is used to fetch the website’s content.
- hashlib is used to create a unique hash of the website content.
- time is used to introduce delays between each check.
- Define the URL and Check Interval:
- url is the URL of the website you want to monitor.
- check_interval is the time (in seconds) between each check. Here, it’s set to 60 seconds (1 minute), but you can adjust it as needed.
- get_hash() Function:
- This function fetches the website content and creates a hash.
- response = requests.get(url): Sends a GET request to retrieve the website’s content.
- response.raise_for_status(): Raises an exception if there’s an HTTP error.
- hashlib.md5(response.content).hexdigest(): Generates an MD5 hash of the website content. The hash is a unique representation of the content at that time.
- If there’s an error in fetching the content, it prints an error message and returns None.
- monitor_website() Function:
- This function monitors the website for changes in its content.
- last_hash = get_hash(url): Retrieves and stores the initial content hash.
- If last_hash is None, the initial request failed, and the program exits.
- Loop for Monitoring:
- time.sleep(check_interval): Pauses execution for the specified interval.
- current_hash = get_hash(url): Fetches the latest hash of the website content.
- If current_hash is None, it skips this check due to an error.
- If current_hash != last_hash, the content has changed:
- It prints “Change detected on the website!”
- Updates last_hash with current_hash for the next comparison.
- If current_hash == last_hash, it prints “No change detected.”
- Start Monitoring:
- monitor_website(url, check_interval): Calls the function to start monitoring the website.
How It Works
- The initial content hash of the website is saved.
- Every check_interval seconds, the website is fetched again, and a new hash is generated.
- The new hash is compared with the previous hash:
- If they differ, the content has changed, and a message is printed.
- If they match, no change has been detected.
Sample Output
Starting to monitor https://www.example.com for changes every 60 seconds. No change detected. No change detected. Change detected on the website! No change detected. ...
Important Notes
- Hashing Method: MD5 is used here for simplicity, but for large websites, other hashing methods like SHA-256 might be more accurate.
- Network Errors: The code includes error handling for network issues, so it will continue checking in the next interval if a request fails.
- Frequency: Avoid setting the check_interval too low, as this may overload the server and your internet connection. Use a reasonable interval based on the frequency of expected updates.
Summary
This Python program monitors a website for content changes by periodically fetching its content, hashing it, and comparing hashes.
It’s a simple way to detect website updates or changes, with flexibility in monitoring frequency and error handling.