HTTPError: HTTP Error 403: Prohibido

Estoy creando un script en Python para uso personal pero no funciona para wikipedia …

Este trabajo:

import urllib2, sys from bs4 import BeautifulSoup site = "http://youtube.com" page = urllib2.urlopen(site) soup = BeautifulSoup(page) print soup 

Esto no funciona:

 import urllib2, sys from bs4 import BeautifulSoup site= "http://en.wikipedia.org/wiki/StackOverflow" page = urllib2.urlopen(site) soup = BeautifulSoup(page) print soup 

Este es el error:

 Traceback (most recent call last): File "C:\Python27\wiki.py", line 5, in  page = urllib2.urlopen(site) File "C:\Python27\lib\urllib2.py", line 126, in urlopen return _opener.open(url, data, timeout) File "C:\Python27\lib\urllib2.py", line 406, in open response = meth(req, response) File "C:\Python27\lib\urllib2.py", line 519, in http_response 'http', request, response, code, msg, hdrs) File "C:\Python27\lib\urllib2.py", line 444, in error return self._call_chain(*args) File "C:\Python27\lib\urllib2.py", line 378, in _call_chain result = func(*args) File "C:\Python27\lib\urllib2.py", line 527, in http_error_default raise HTTPError(req.get_full_url(), code, msg, hdrs, fp) HTTPError: HTTP Error 403: Forbidden 

Dentro del código actual:

Python 2.X

 import urllib2, sys from BeautifulSoup import BeautifulSoup site= "http://en.wikipedia.org/wiki/StackOverflow" hdr = {'User-Agent': 'Mozilla/5.0'} req = urllib2.Request(site,headers=hdr) page = urllib2.urlopen(req) soup = BeautifulSoup(page) print soup 

Python 3.X

 from bs4 import BeautifulSoup from urllib.request import Request, urlopen site= "http://en.wikipedia.org/wiki/StackOverflow" hdr = {'User-Agent': 'Mozilla/5.0'} req = Request(site,headers=hdr) page = urlopen(req) soup = BeautifulSoup(page) print(soup) 

Python 3.X con Selenium (ejecución de funciones Javascript)

 from selenium import webdriver as driver browser = driver.PhantomJS() p = browser.get("http://en.wikipedia.org/wiki/StackOverflow") assert "Stack Overflow - Wikipedia" in browser.title 

La razón por la que la versión modificada funciona es porque Wikipedia comprueba que User-Agent sea de “navegador popular”