python爬虫如何使用代理ip
python requests和selenium使用代理ip
很多时候,我们的爬虫ip被封,这个时候就需要用到代理ip了。
requests使用代理ip
这里假设代理的用户和密码,ip和端口分别为: proxyUser = 4858555769507840 proxyPass = X7nEeMi proxyHost = “122.239.176.108” proxyPort = “5021” requests代码:
requests.post(url, data=collect_data, headers=headers,cookies=cookies,proxies={ http: http://4858555769507840:X7nEeMi@122.239.176.108:5021, https: http://4858555769507840:X7nEeMi@122.239.176.108:5021}, verify=False)
selenium使用代理ip
def create_proxy_auth_extension(proxy_host, proxy_port, proxy_username, proxy_password, scheme=http, plugin_path=None): if plugin_path is None: plugin_path = r./proxy_auth_plugin.zip manifest_json = """ { "version": "1.0.0", "manifest_version": 2, "name": "Chrome Proxy", "permissions": [ "proxy", "tabs", "unlimitedStorage", "storage", "<all_urls>", "webRequest", "webRequestBlocking" ], "background": { "scripts": ["background.js"] }, "minimum_chrome_version":"22.0.0" } """ background_js = string.Template( """ var config = { mode: "fixed_servers", rules: { singleProxy: { scheme: "${scheme}", host: "${host}", port: parseInt(${port}) }, bypassList: ["foobar.com"] } }; chrome.proxy.settings.set({value: config, scope: "regular"}, function() {}); function callbackFn(details) { return { authCredentials: { username: "${username}", password: "${password}" } }; } chrome.webRequest.onAuthRequired.addListener( callbackFn, {urls: ["<all_urls>"]}, [blocking] ); chrome.webRequest.onBeforeSendHeaders.addListener(function (details) { details.requestHeaders.push({name:"connection",value:"close"}); return { requestHeaders: details.requestHeaders }; }, {urls: ["<all_urls>"]}, [blocking] ); """ ).substitute( host=proxy_host, port=proxy_port, username=proxy_username, password=proxy_password, scheme=scheme, ) with zipfile.ZipFile(plugin_path, w) as zp: zp.writestr("manifest.json", manifest_json) zp.writestr("background.js", background_js) return plugin_path chrome_options = webdriver.ChromeOptions() proxy_auth_plugin_path = create_proxy_auth_extension( proxy_host=proxyHost, proxy_port=proxyPort, proxy_username=proxyUser, proxy_password=proxyPass) chrome_options.add_extension(proxy_auth_plugin_path) driver = webdriver.Chrome(chrome_options=chrome_options)
注意:selenium使用有用户名和密码的代理ip时候,不能使用无头模式
selenium工具被浏览器检测出来
在代码中添加如下参数,可以让浏览器检测的windows.navigator.webdriver变量值为undefined
chrome_options.add_argument("--disable-blink-features") chrome_options.add_argument("--disable-blink-features=AutomationControlled")
上一篇:
IDEA上Java项目控制台中文乱码