python爬虫爬取b站视频,请在哔哩哔哩搜索python爬虫视频的短视频

  python爬虫爬取b站视频,请在哔哩哔哩搜索python爬虫视频的短视频

  

python视频教程栏目介绍如何爬取视频

  相关免费学习推荐:python视频教程

  这篇文章主要告诉你如何真实地使用python 爬取哔哩哔哩中的视频。首先,我是大数据开发工程师,爬行动物只是我的爱好。喜欢爬虫的朋友可以互相交流。好多了,喜欢的朋友就不多说了。收藏,转发请复原文链接谢谢

  00-1010我使用以下环境仅供参考:开发工具:皮查姆python环境: Python-3 . 8 . 0依赖的包: Shutil,os,re,json,choice,requests,lxml

  00-1010这里我以前段时间很火的马老师的视频为例。

  视频链接:https://www.bilibili.com/video/BV1Ef4y1i78b? from=search seid=12072538764197074893

  视频链接分析我们这里只需要BV1Ef4y1i78b,也就是视频后面?不,第二部分在包裹前面,视频被分成多个小段在这里。看完源代码分析,我们可以解析/scriptscript中的内容,返回一个json字符串进行解析,得到我们想要的数据。分析返回的json中的具体内容如下,真正对我们有用的信息在data中.

  在数据下面,我们可以清楚的看到我们想要的东西,比如视频的质量,视频的地址。注意:如果你拿到地址直接进行访问的话是访问不到了,一个推荐人被添加到页面。如果直接在浏览器中访问,没有Referer就找不到页面。

  我们需要解析的内容如下:

  视频的长度,视频的质量,视频的URL,音视频的URL合并

一、环境准备

二、页面分析

依赖包

  JSON导入操作系统导入重新导入shutter导入SSL导入时间从concurrent.futures导入线程池从随机导入执行程序从lxml导入树中选择添加请求头和随机用户代理

  #设置请求头等参数,防止被反向爬取。标题={

  接受 : */* ,

  Accept-Language: en-US,en;q=0.5 ,

  用户代理“:”Mozilla/5.0(Windows NT 10.0;WOW64) AppleWebKit/537.36 (KHTML,像壁虎一样)Chrome/80 . 0 . 3987 . 116 Safari/537.36 } def get _ user _ agent():

  获取随机用户代理

  用户代理=[

  Mozilla/4.0

  (compatible; MSIE 6.0; Windows NT 5.1; SV1; AcooBrowser; .NET CLR 1.1.4322; .NET CLR 2.0.50727)",

   "Mozilla/4.0 (compatible; MSIE 7.0; Windows NT 6.0; Acoo Browser; SLCC1; .NET CLR 2.0.50727; Media Center PC 5.0; .NET CLR 3.0.04506)",

   "Mozilla/4.0 (compatible; MSIE 7.0; AOL 9.5; AOLBuild 4337.35; Windows NT 5.1; .NET CLR 1.1.4322; .NET CLR 2.0.50727)",

   "Mozilla/5.0 (Windows; U; MSIE 9.0; Windows NT 9.0; en-US)",

   "Mozilla/5.0 (compatible; MSIE 9.0; Windows NT 6.1; Win64; x64; Trident/5.0; .NET CLR 3.5.30729; .NET CLR 3.0.30729; .NET CLR 2.0.50727; Media Center PC 6.0)",

   "Mozilla/5.0 (compatible; MSIE 8.0; Windows NT 6.0; Trident/4.0; WOW64; Trident/4.0; SLCC2; .NET CLR 2.0.50727; .NET CLR 3.5.30729; .NET CLR 3.0.30729; .NET CLR 1.0.3705; .NET CLR 1.1.4322)",

   "Mozilla/4.0 (compatible; MSIE 7.0b; Windows NT 5.2; .NET CLR 1.1.4322; .NET CLR 2.0.50727; InfoPath.2; .NET CLR 3.0.04506.30)",

   "Mozilla/5.0 (Windows; U; Windows NT 5.1; zh-CN) AppleWebKit/523.15 (KHTML, like Gecko, Safari/419.3) Arora/0.3 (Change: 287 c9dfb30)",

   "Mozilla/5.0 (X11; U; Linux; en-US) AppleWebKit/527+ (KHTML, like Gecko, Safari/419.3) Arora/0.6",

   "Mozilla/5.0 (Windows; U; Windows NT 5.1; en-US; rv:1.8.1.2pre) Gecko/20070215 K-Ninja/2.1.1",

   "Mozilla/5.0 (Windows; U; Windows NT 5.1; zh-CN; rv:1.9) Gecko/20080705 Firefox/3.0 Kapiko/3.0",

   "Mozilla/5.0 (X11; Linux i686; U;) Gecko/20070322 Kazehakase/0.4.5",

   "Mozilla/5.0 (X11; U; Linux i686; en-US; rv:1.9.0.8) Gecko Fedora/1.9.0.8-1.fc10 Kazehakase/0.5.6",

   "Mozilla/5.0 (Windows NT 6.1; WOW64) AppleWebKit/535.11 (KHTML, like Gecko) Chrome/17.0.963.56 Safari/535.11",

   "Mozilla/5.0 (Macintosh; Intel Mac OS X 10_7_3) AppleWebKit/535.20 (KHTML, like Gecko) Chrome/19.0.1036.7 Safari/535.20",

   "Opera/9.80 (Macintosh; Intel Mac OS X 10.6.8; U; fr) Presto/2.9.168 Version/11.52",

   "Mozilla/5.0 (Windows NT 6.1; WOW64) AppleWebKit/536.11 (KHTML, like Gecko) Chrome/20.0.1132.11 TaoBrowser/2.0 Safari/536.11",

   "Mozilla/5.0 (Windows NT 6.1; WOW64) AppleWebKit/537.1 (KHTML, like Gecko) Chrome/21.0.1180.71 Safari/537.1 LBBROWSER",

   "Mozilla/5.0 (compatible; MSIE 9.0; Windows NT 6.1; WOW64; Trident/5.0; SLCC2; .NET CLR 2.0.50727; .NET CLR 3.5.30729; .NET CLR 3.0.30729; Media Center PC 6.0; .NET4.0C; .NET4.0E; LBBROWSER)",

   "Mozilla/4.0 (compatible; MSIE 6.0; Windows NT 5.1; SV1; QQDownload 732; .NET4.0C; .NET4.0E; LBBROWSER)",

   "Mozilla/5.0 (Windows NT 6.1; WOW64) AppleWebKit/535.11 (KHTML, like Gecko) Chrome/17.0.963.84 Safari/535.11 LBBROWSER",

   "Mozilla/4.0 (compatible; MSIE 7.0; Windows NT 6.1; WOW64; Trident/5.0; SLCC2; .NET CLR 2.0.50727; .NET CLR 3.5.30729; .NET CLR 3.0.30729; Media Center PC 6.0; .NET4.0C; .NET4.0E)",

   "Mozilla/5.0 (compatible; MSIE 9.0; Windows NT 6.1; WOW64; Trident/5.0; SLCC2; .NET CLR 2.0.50727; .NET CLR 3.5.30729; .NET CLR 3.0.30729; Media Center PC 6.0; .NET4.0C; .NET4.0E; QQBrowser/7.0.3698.400)",

   "Mozilla/4.0 (compatible; MSIE 6.0; Windows NT 5.1; SV1; QQDownload 732; .NET4.0C; .NET4.0E)",

   "Mozilla/4.0 (compatible; MSIE 7.0; Windows NT 5.1; Trident/4.0; SV1; QQDownload 732; .NET4.0C; .NET4.0E; 360SE)",

   "Mozilla/4.0 (compatible; MSIE 6.0; Windows NT 5.1; SV1; QQDownload 732; .NET4.0C; .NET4.0E)",

   "Mozilla/4.0 (compatible; MSIE 7.0; Windows NT 6.1; WOW64; Trident/5.0; SLCC2; .NET CLR 2.0.50727; .NET CLR 3.5.30729; .NET CLR 3.0.30729; Media Center PC 6.0; .NET4.0C; .NET4.0E)",

   "Mozilla/5.0 (Windows NT 5.1) AppleWebKit/537.1 (KHTML, like Gecko) Chrome/21.0.1180.89 Safari/537.1",

   "Mozilla/5.0 (Windows NT 6.1; WOW64) AppleWebKit/537.1 (KHTML, like Gecko) Chrome/21.0.1180.89 Safari/537.1",

   "Mozilla/5.0 (iPad; U; CPU OS 4_2_1 like Mac OS X; zh-cn) AppleWebKit/533.17.9 (KHTML, like Gecko) Version/5.0.2 Mobile/8C148 Safari/6533.18.5",

   "Mozilla/5.0 (Windows NT 6.1; Win64; x64; rv:2.0b13pre) Gecko/20110307 Firefox/4.0b13pre",

   "Mozilla/5.0 (X11; Ubuntu; Linux x86_64; rv:16.0) Gecko/20100101 Firefox/16.0",

   "Mozilla/5.0 (Windows NT 6.1; WOW64) AppleWebKit/537.11 (KHTML, like Gecko) Chrome/23.0.1271.64 Safari/537.11",

   "Mozilla/5.0 (X11; U; Linux x86_64; zh-CN; rv:1.9.2.10) Gecko/20100922 Ubuntu/10.10 (maverick) Firefox/3.6.10",

   "MQQBrowser/26 Mozilla/5.0 (Linux; U; Android 2.3.7; zh-cn; MB200 Build/GRJ22; CyanogenMod-7) AppleWebKit/533.1 (KHTML, like Gecko) Version/4.0 Mobile Safari/533.1",

   "Mozilla/5.0 (iPhone; CPU iPhone OS 9_1 like Mac OS X) AppleWebKit/601.1.46 (KHTML, like Gecko) Version/9.0 Mobile/13B143 Safari/601.1",

   "Mozilla/5.0 (Linux; Android 5.1.1; Nexus 6 Build/LYZ28E) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/48.0.2564.23 Mobile Safari/537.36",

   "Mozilla/5.0 (iPod; U; CPU iPhone OS 2_1 like Mac OS X; ja-jp) AppleWebKit/525.18.1 (KHTML, like Gecko) Version/3.1.1 Mobile/5F137 Safari/525.20",

   "Mozilla/5.0 (Linux;u;Android 4.2.2;zh-cn;) AppleWebKit/534.46 (KHTML,like Gecko) Version/5.1 Mobile Safari/10600.6.3 (compatible; Baiduspider/2.0; +http://www.baidu.com/search/spider.html)",

   "Mozilla/5.0 (compatible; Baiduspider/2.0; +http://www.baidu.com/search/spider.html)"

   ]

   # 在user_agent列表中随机产生一个代理,作为模拟的浏览器

   user_agent = choice(user_agents)

   return user_agent

3.2 编写下载代码

def single_download(aid, acc_quality):

   '''单个视频实现下载'''

   # 请求视频链接,获取信息

   origin_video_url = 'https://www.bilibili.com/video/' + aid

   res = requests.get(origin_video_url, headers=headers)

   html = etree.HTML(res.text)

   title = html.xpath('//*[@id="viewbox_report"]/h1/span/text()')[0]

   print('您当前正在下载:', title)

   video_info_temp = re_video_info(res.text, '__playinfo__=(.*?)</script><script>')

   video_info = {}

   # 获取视频质量

   quality = video_info_temp['data']['accept_description'][acc_quality]

   # 获取视频时长

   video_info['duration'] = video_info_temp['data']['dash']['duration']

   # 获取视频链接

   video_url = video_info_temp['data']['dash']['video'][acc_quality]['baseUrl']

   # 获取音频链接

   audio_url = video_info_temp['data']['dash']['audio'][acc_quality]['baseUrl']

   # 计算视频时长

   video_time = int(video_info.get('duration', 0))

   video_minute = video_time // 60

   video_second = video_time % 60

   print('当前视频清晰度为{},时长{}分{}秒'.format(quality, video_minute, video_second))

   # 调用函数下载保存视频

   download_video_single(origin_video_url, video_url, audio_url, title)

3.3 编写下载代码

def download_video_single(referer_url, video_url, audio_url, video_name):

   '''单个视频下载'''

   # 更新请求头

   headers.update({"Referer": referer_url})

   print("视频下载开始:%s" % video_name)

   # 下载并保存视频

   video_content = requests.get(video_url, headers=headers)

   print('%s\t视频大小:' % video_name, round(int(video_content.headers.get('content-length', 0)) / 1024 / 1024, 2), '\tMB')

   received_video = 0

   with open('%s_video.mp4' % video_name, 'ab') as output:

   headers['Range'] = 'bytes=' + str(received_video) + '-'

   response = requests.get(video_url, headers=headers)

   output.write(response.content)

   # 下载并保存音频

   audio_content = requests.get(audio_url, headers=headers)

   print('%s\t音频大小:' % video_name, round(int(audio_content.headers.get('content-length', 0)) / 1024 / 1024, 2), '\tMB')

   received_audio = 0

   with open('%s_audio.mp4' % video_name, 'ab') as output:

   headers['Range'] = 'bytes=' + str(received_audio) + '-'

   response = requests.get(audio_url, headers=headers)

   output.write(response.content)

   received_audio += len(response.content)

   print("视频下载结束:%s" % video_name)

   video_audio_merge_single(video_name)

3.4 将下载好的音频和视频合并

def video_audio_merge_single(video_name):

   '''使用ffmpeg单个视频音频合并'''

   print("视频合成开始:%s" % video_name)

   import subprocess

   command = 'ffmpeg -i %s_video.mp4 -i %s_audio.mp4 -c copy %s.mp4 -y -loglevel quiet' % (

   video_name, video_name, video_name)

   subprocess.Popen(command, shell=True)

   print("视频合成结束:%s" % video_name)

3.4 运行测试

  

4.总结

好了到这里我们就成功爬取出哔哩哔哩中的视频了,如果小伙感觉那里有不懂的地方或者有疑惑的地方可以后台留言,我这边为你解答。

  

相关免费学习推荐:php编程(视频)

  

以上就是介绍Python爬取哔哩哔哩视频的详细内容,更多请关注盛行IT软件开发工作室其它相关文章!

  

郑重声明:本文由网友发布,不代表盛行IT的观点,版权归原作者所有,仅为传播更多信息之目的,如有侵权请联系,我们将第一时间修改或删除,多谢。

相关文章阅读

  • 哔哩哔哩手机app缓存的视频可以导入出来吗,哔哩哔哩上缓存的视频怎么导出
  • 哔哩哔哩怎么升级到lv1,哔哩哔哩怎么从lv0升级到lv1
  • 哔哩哔哩的音乐在哪里听,bilibili音乐位置
  • 哔哩哔哩手机绑定qq在哪,哔哩哔哩怎么绑定qq号
  • 哔哩哔哩缓存文件夹在哪,哔哩哔哩默认缓存在哪个文件夹
  • B站弹幕姬插件,哔哩哔哩弹幕姬怎么显示
  • 哔哩哔哩自带的视频编辑,哔哩哔哩怎么编辑自己的视频
  • 哔哩哔哩在哪里绑定邮箱,哔哩哔哩怎么改绑定邮箱
  • python爬取b站弹幕,哔哩哔哩弹幕爬取
  • 哔哩哔哩粉丝牌子获得方法视频,哔哩哔哩粉丝牌子获得方法有哪些
  • 哔哩哔哩升级快一点教程在哪,哔哩哔哩升级快一点教程是什么
  • 哔哩哔哩录制GIF,哔哩哔哩gif录制在哪
  • 哔哩哔哩怎样投屏到电视,如何把哔哩哔哩投屏到电视上
  • 哔哩哔哩青少年模式关闭方法怎么设置,哔哩哔哩青少年模式怎么关闭
  • 哔哩哔哩定时关闭播放,哔哩哔哩定时停止播放设置教程在哪
  • 留言与评论(共有 条评论)
       
    验证码: