python 模块开发,python高并发框架

　　本文主要介绍python的未来，并详细说明Python并发编程的未来模块前景。有需要的朋友可以借鉴一下，希望能有所帮助。祝大家进步很大，早日升职加薪。

　　00-1010区分并发和并行并发编程期货到底什么是期货？为什么多线程一次只由一个线程执行？结论无论哪种语言，并发编程都是一项非常重要的技能。比如我们上一章使用的爬虫，广泛应用于工业的各个领域。我们每天在各种网站、app上获取的新闻信息，很大一部分是通过并发编程版本的爬虫获取的。

　　正确使用并发编程无疑会给我们的程序带来巨大的性能提升。今天，我们将学习Python中的并发编程——Futures。

　　对于线程，操作系统知道每个线程的所有信息，所以会在适当的时候率先切换线程。这样做的好处是代码容易写，因为程序员不需要做任何切换操作；但是，切换线程的操作可能发生在语句的执行过程中(例如，X=1)，这使得竞争条件更容易发生。

　　至于asyncio，当主程序要切换任务时，必须通知它这个任务是可以切换的，这样就可以避免上面的竞态条件。

　　至于并行，只是同时发生而已。这就是Python中多处理的含义，对应的是多进程。我们可以简单的理解，如果我们的电脑是8核CPU，我们可以强制Python打开8个进程同时执行，加快程序的运行速度。大概就是下图的思路吧。

　　相比之下，并发通常用于I/O操作频繁的场景。比如我们要从一个网站下载多个文件，I/O操作的时间要比CPU操作的时间长很多，所以并发更适合。在CPU被大量使用的情况下，为了加快运算速度，我们将多使用几台机器，让多个处理器进行计算。

　　还记得之前写博客，总结Python中的多线程是一种通过CPU切换上下文实现的“伪多线程”。在大量线程切换的过程中，会占用更多的CPU资源。但在进行IO操作时(无论是在网络上交换数据，还是从内存或硬盘中读写数据)，都不需要CPU进行计算。所以多线程只适合IO密集型的环境，不适合计算密集型的操作。

区分并发和并行

　　单线程和多线程的性能比较

　　下面我们通过一个例子从代码的角度来理解一下并发编程中的Futures，并进一步比较它与单线程的性能差异。

　　假设我们有一个任务，从网站下载一些内容并打印出来，如果是以单线程的方式完成的话。

　　导入请求

　　导入时间

　　def下载_one(网址):

　　resp=requests.get(url)

　　打印(从{}中读取{})。格式(长度(对应内容)，url))

　　def下载_全部(网址):

　　对于urls:中的url

　　下载_一个(网址)

　　定义主():

　　站点=[

　　 https://en . Wikipedia . org/wiki/portal : arts ，

　　 https://en . Wikipedia . org/wiki/portal : history ，

　　 https://en . Wikipedia . org/wiki/portal : society ，

　　 https://en . Wikipedia . org/wiki/portal :传记，

　　超文本传送协议（Hyper Text Transport Protocol的缩写）

　　s://en.wikipedia.org/wiki/Portal:Mathematics,

　　 https://en.wikipedia.org/wiki/Portal:Technology,

　　 https://en.wikipedia.org/wiki/Portal:Geography,

　　 https://en.wikipedia.org/wiki/Portal:Science,

　　 https://en.wikipedia.org/wiki/Computer_science,

　　 https://en.wikipedia.org/wiki/Python_(programming_language),

　　 https://en.wikipedia.org/wiki/Java_(programming_language),

　　 https://en.wikipedia.org/wiki/PHP,

　　 https://en.wikipedia.org/wiki/Node.js,

　　 https://en.wikipedia.org/wiki/The_C_Programming_Language,

　　 https://en.wikipedia.org/wiki/Go_(programming_language)

　　 ]

　　 start_time = time.perf_counter()

　　 download_all(sites)

　　 end_time = time.perf_counter()

　　 print(Download {} sites in {} seconds.format(len(sites),end_time-start_time))

　　if __name__ == __main__:

　　 main()

　　这是种最简单暴力最直接的方式：

　　先遍历存储网站的列表

　　对当前的网站进行下载操作

　　当前操作完成后，再对下一个网站进行同样的操作，一直到结束。

　　可以试出来总耗时大概是2s多，单线程的方式简单明了，但是最大的问题是效率低下，程序最大的时间都消耗在I/O等待上（这还是用的print，如果是写在硬盘上的话时间会更多）。如果在实际生产环境中，我们需要访问的网站至少是以万为单位的，所以这个方案根本行不通。

　　接着我们看看多线程版本的代码

import concurrent.futures
　　import requests
　　import threading
　　import time
　　def download_one(url):
　　 resp = requests.get(url).content
　　 print(Read {} from {}.format(len(resp),url))
　　def download_all(sites):
　　 with concurrent.futures.ThreadPoolExecutor(max_workers=5) as executor:
　　 executor.map(download_one,sites)
　　def main():
　　 sites = [
　　 https://en.wikipedia.org/wiki/Portal:Arts,
　　 https://en.wikipedia.org/wiki/Portal:History,
　　 https://en.wikipedia.org/wiki/Portal:Society, 
　　 https://en.wikipedia.org/wiki/Portal:Biography,
　　 https://en.wikipedia.org/wiki/Portal:Mathematics,
　　 https://en.wikipedia.org/wiki/Portal:Technology,
　　 https://en.wikipedia.org/wiki/Portal:Geography,
　　 https://en.wikipedia.org/wiki/Portal:Science,
　　 https://en.wikipedia.org/wiki/Computer_science,
　　 https://en.wikipedia.org/wiki/Python_(programming_language),
　　 https://en.wikipedia.org/wiki/Java_(programming_language),
　　 https://en.wikipedia.org/wiki/PHP,
　　 https://en.wikipedia.org/wiki/Node.js,
　　 https://en.wikipedia.org/wiki/The_C_Programming_Language,
　　 https://en.wikipedia.org/wiki/Go_(programming_language) 
　　 ]
　　 start_time = time.perf_counter()
　　 download_all(sites)
　　 # for i in sites:
　　 end_time = time.perf_counter()
　　 # print(Down {} sites in {} seconds.format(len(sites),end_time-start_time))
　　if __name__ == __main__:
　　 main()

　　这段代码的运行时长大概是0.2s，效率一下提升了10倍多，可以注意到这个版本和单线程的区别主要在下面：

def download_all(sites):
　　 with concurrent.futures.ThreadPoolExecutor(max_workers=5) as executor:
　　 executor.map(download_one,sites)

　　在上面的代码中我们创建了一个线程池，有5个线程可以分配使用。executer.map()与以前将的Python内置的map()函数，表示对sites中的每一个元素并发的调用函数download_one()函数。

　　顺便提一下，在download_one()函数中，我们使用的requests.get()方法是线程安全的(thread-safe)，因此在多线程的环境下，它也可以安全使用，并不会出现race condition（条件竞争）的情况。

　　另外，虽然线程的数量可以自己定义，但是线程数并不是越多越好，以为线程的创建、维护和删除也需要一定的开销。所以如果设置的很大，反而会导致速度变慢，我们往往要根据实际的需求做一些测试，来寻找最优的线程数量。

　　当然，我们也可以用并行的方式去提高运行效率，只需要在download_all()函数中做出下面的变化即可

def download_all(sites):
　　 with concurrent.futures.ThreadPoolExecutor(max_workers=5) as executor:
　　 to_do = []
　　 for site in sites:
　　 future = executor.submit(download_one,site)
　　 to_do.append(site)
　　 for future in concurrent.futures.as_completed(to_do):
　　 future.result()

　　在需要改的这部分代码中，函数ProcessPoolExecutor()表示创建进程池，使用多个进程并行的执行程序。不过，这里通常省略参数workers，因为系统会自动返回CPU的数量作为可以调用的进程数。

　　就像上面说的，并行方式一般用在CPU密集型的场景中，因为对于I/O密集型操作多数时间会用于等待，相比于多线程，使用多进程并不会提升效率，反而很多时候，因为CPU数量的限制，会导致执行效率不如多线程版本。

到底什么是Futures？

　　Python中的Futures，位于concurrent.futures和asyncio中，他们都表示带有延迟的操作，Futures会将处于等待状态的操作包裹起来放到队列中，这些操作的状态可以随时查询。而他们的结果或是异常，也能在操作后被获取。

　　通常，作为用户，我们不用考虑如何去创建Futures，这些Futures底层会帮我们处理好，我们要做的就是去schedule这些Futures的执行。比方说，Futures中的Executor类，当我们中的方法done()，表示相对应的操作是否完成——用True表示已完成，ongFalse表示未完成。不过，要注意的是done()是non-blocking的，会立刻返回结果，相对应的add_done_callback(fn)，则表示Futures完成后，相对应的参数fn，会被通知并执行调用。

　　Futures里还有一个非常重要的函数result()，用来表示future完成后，返回器对应的结果或异常。而as_completed(fs)，则是针对给定的future迭代器fs，在其完成后，返回完成后的迭代器。

　　所以也可以把上面的例子写成下面的形式：

def download_all(sites):
　　 with concurrent.futures.ThreadPoolExecutor(max_workers=5) as executor:
　　 to_do = []
　　 for site in sites:
　　 future = executor.submit(download_one,site)
　　 to_do.append(site)
　　 for future in concurrent.futures.as_completed(to_do):
　　 future.result()

　　这里，我们首先用executor.submit()，将下载每个网站的内容都放进future队列to_do里等待执行。然后是as_completed()函数，在future完成后输出结果

　　不过这里有个事情要注意一下：future列表中每个future完成的顺序和他在列表中的顺序不一定一致，至于哪个先完成，取决于系统的调度和每个future的执行时间。