Python mechanize HTTP Error 403 request disallowed by robots.txt
When running following python script i get error “HTTP Error 403 request disallowed by robots.txt”
1 2 3 4 5 6 7 8 9 10 11 |
from mechanize import Browser a = ['https://google.com', 'https://serverok.in', 'https://msn.com'] br = Browser() for x in range(len(a)): br.open(a[x]) print("Website title: ") print(br.title()) print("\n") |
To fix this, find
1 |
br = Browser() |
Add below
1 2 |
br.set_handle_robots(False) br.addheaders = [('User-agent', 'Mozilla/5.0 (X11; Linux x86_64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/91.0.4472.77 Safari/537.36')] |
set_handle_robots is used to disable robots.txt checking. Second line will set a User-agent, so remote won’t won’t block you as a robot.
See Python