Python mechanize HTTP Error 403 request disallowed by robots.txt

Written by

Yujin Boby

When running following python script i get error “HTTP Error 403 request disallowed by robots.txt”

from mechanize import Browser

a = ['https://google.com', 'https://serverok.in', 'https://msn.com']

br = Browser() 

for x in range(len(a)):     
    br.open(a[x]) 
    print("Website title: ")
    print(br.title())
    print("\n")

To fix this, find

br = Browser()

Add below

br.set_handle_robots(False)
br.addheaders = [('User-agent', 'Mozilla/5.0 (X11; Linux x86_64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/91.0.4472.77 Safari/537.36')]

set_handle_robots is used to disable robots.txt checking. Second line will set a User-agent, so remote won’t won’t block you as a robot.

See Python

Python mechanize HTTP Error 403 request disallowed by robots.txt

Comments

Leave a Reply Cancel reply

More posts

Alpine Linux

CPU

Yoast SEO sitemaps on large wordpress site

How to Upgrade Debian 11 to Debian 12