python下载视频,python腾讯视频

　　我学习Python爬虫已经很久了。为了加强学习过程，为以后的学习留下一些痕迹，我在此记录下整个爬虫过程。以电影天堂为例，提取当前界面的最新电影。

　　# -*-编码：utf-8 -*-

　　导入urllib2

　　导入操作系统

　　进口re

　　URL= 3358 www.dy2018.com/html/gndy/dyzz/index.html #这是电影天堂最新电影的网站。

　　conent=urllib2.urlopen(url)

　　conent=conent.read()

　　Ent=cone.decode (gb2312 ， ignore )。编码( utf-8 ， ignore) #以避免中文乱码

　　f=打开( conent.txt ， w )

　　f.write(内容)

　　f.close()

　　产生的context.txt文件是

　　电影《天堂》，浏览器打开后的F12界面，看起来如下图。(本文档有1126行)

　　初步过滤，过滤出586行：

　　pattern=re.compile(。*?(.*?)items=re.findall (pattern，conent) #先选择包含最新电影的网页代码，再进行下一次放映。

　　str=“”。加入(项目)

　　f=打开( str.txt ， w )

　　f.write

　　f.close()

　　进一步过滤和提取：

　　pattern=re.compile((。*?)’，re。s)

　　news=re.findall(pattern，str)

　　f=打开( movie.txt ， w )

　　对于新闻中的j:

　　f.write(j[1] \n )

　　f.close()

　　好了，你完成了！

　　所有代码如下：

　　# -*-编码：utf-8 -*-

　　导入urllib2

　　导入操作系统

　　进口re

　　URL= 3358 www.dy2018.com/html/gndy/dyzz/index.html #这是电影天堂最新电影的网站。

　　conent=urllib2.urlopen(url)

　　conent=conent.read()

　　conent=conent.decode(gb2312 ， ignore )。编码(“utf-8”、“忽略”)

　　#这个‘忽略’差点忘了。主要是编码，忽略了一些可以忽略的参数。下午从来没想到过，它总是出错。

　　f=打开( conent.txt ， w )

　　f.write(内容)

　　f.close()

　　pattern=re.compile(。*?(.*?)items=re.findall (pattern，conent) #先选择包含最新电影的网页代码，再进行下一次放映。

　　str=“”。加入(项目)

　　f=打开( str.txt ， w )

　　f.write

　　f.close()

　　pattern=re.compile((。*?)’，re。s)

　　news=re.findall(pattern，str)

　　f=打开( movie.txt ， w )

　　对于新闻中的j:

　　f.write(j[1] \n )

　　f.close()

郑重声明：本文由网友发布，不代表盛行IT的观点，版权归原作者所有，仅为传播更多信息之目的，如有侵权请联系，我们将第一时间修改或删除，多谢。

python下载视频,python腾讯视频

相关文章阅读

去评论

去顶部