python爬虫实现对年级网站通知的自动化推送
前言
原理
函数库
import time import datetime import urllib.request import requests import json from bs4 import BeautifulSoup import smtplib from email import (header) from email.mime import (text, multipart)
安装方法
pip install --upgrade pip pip install beautifulsoup4 pip install requests
函数
1.模拟浏览器向网站发出请求并加载资源到本地
def getTitle (url): headers = (User-Agent,"Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/89.0.4389.114 Safari/537.36") opener = urllib.request.build_opener() opener.addheaders = [headers] urllib.request.install_opener(opener) html = urllib.request.urlopen(url).read().decode(utf-8, ignore) bs = BeautifulSoup(html,html.parser) Title_links = bs.select(特定标签) return Title_links
2.获取当前日期
def getNowDate(): now_time = datetime.datetime.now() yes_time = now_time+datetime.timedelta(days=-3) current_time = yes_time.strftime(%Y-%m-%d) return current_time
3.对所筛选出的数据进行整合
for link in linklist_Title: contents.append(link.text.strip()) links.append(link.get(href)) for date in linklist_Date: dates.append(date.text.strip()) #获取指定日期的文章信息 for date,text, link, in zip(dates, contents, links): data = date+ +text+:http://xxx.xxx.com+link if date == Now_Date: send_data = send_data+data+
4.群发邮件
5.用json格式向push+推送文章
token = 4bxxxxxxxxxxxxxxxxxxxxxxx5 title= 今日级网更新通知 content = send_data url = http://pushplus.hxtrip.com/send data = { "token":token, "title":title, "content":content } body=json.dumps(data).encode(encoding=UTF-8) headers = { Content-Type:application/json} requests.post(url,data=body,headers=headers)
源码
gihub仓库:
云服务器定时计划
定时执行shell指令
/usr/bin/python /www/server/panel/class/Notice_Spider.py
博客: