pythonpandas读取csv文件,pandas操作csv文件

　　本文主要和大家分享一些在PythonPandas中处理CSV文件的常用技巧，比如统计列值的次数，筛选特定的列值，遍历数据行等。供参考。

　　00-1010读取Panda文件，统计列值个数，筛选特定列值，遍历数据线，绘制直方图(直方图)，Panda处理CSV文件，分为以下步骤：

　　读取熊猫文件，统计列值的数量，过滤特定的列值，遍历数据行并绘制直方图(直方图)

统计列值出现的次数

　　df[列名].value_counts()，如df["扰动类别"].value_counts()：

df["扰动类别"].value_counts()

　　输出：

coated OKT3 720
OKT3 720
coated OKT3+anti-CD28 576
DMSO 336
anti-CD28 288
PBS 288
Nivo 288
Pemb 288
empty 192
coated OKT3 + anti-CD28 144
Name: 扰动类别, dtype: int64
　　

　　直接绘制value_counts()的柱形图，参考Pandas - Chart Visualization：

import matplotlib.pyplot as plt
　　%matplotlib inline
　　plt.close("all")
　　plt.figure(figsize=(20, 8))
　　df["扰动类别"].value_counts().plot(kind="bar")
　　# plt.xticks(rotation=vertical, fontsize=10)
　　plt.show()

　　柱形图：

筛选特定列值

　　df.loc[筛选条件]，筛选特定列值之后，重新赋值，只处理筛选值，也可以写入csv文件。

df_plate1 = df.loc[df["板子编号"] == "plate1"]
　　df_plate1.info()
　　# df.loc[df["板子编号"] == "plate1"].to_csv("batch3_IOStrain_klasses_utf8_plate1.csv") # 存储CSV文件

　　注意：筛选的内外两个df需要相同，否则报错

pandas loc IndexingError: Unalignable boolean Series provided as indexer (index of the boolean Series and of the indexed object do not match).
　　

　　输出，数据量由3840下降为1280。

<class 'pandas.core.frame.DataFrame'>
Int64Index: 1280 entries, 0 to 1279
Data columns (total 16 columns):
# Column Non-Null Count Dtype
--- ------ -------------- -----
0 实验时间批次 1280 non-null object
1 物镜倍数 1280 non-null object
2 板子编号 1280 non-null object
3 板子编号及物镜倍数 1280 non-null object
4 图名称 1280 non-null object
5 细胞类型 1280 non-null object
6 板子孔位置 1280 non-null object
7 孔拍摄位置 1280 non-null int64
8 细胞培养基 1280 non-null object
9 细胞培养时间（小时） 1280 non-null int64
10 扰动类别 1280 non-null object
11 扰动处理时间（小时） 1280 non-null int64
12 扰动处理浓度（ug/ml） 1280 non-null float64
13 标注激活(1/0) 1280 non-null int64
14 unique 1280 non-null object
15 tvt 1280 non-null int64
dtypes: float64(1), int64(5), object(10)
memory usage: 170.0+ KB
　　

遍历数据行

　　for idx, row in df_plate1_lb0.iterrows():，通过row[列名]，输出具体的值，如下：

for idx, row in df_plate1_lb0.iterrows():
　　 img_name = row["图名称"]
　　 img_ch_format = img_format.format(img_name, "{}")
　　 for i in range(1, 7):
　　 img_path = os.path.join(plate1_img_folder, img_ch_format.format(i))
　　 img = cv2.imread(img_path)
　　 print([Info] img shape: {}.format(img.shape))
　　 break

　　输出：

[Info] img shape: (1080, 1080, 3)
[Info] img shape: (1080, 1080, 3)
[Info] img shape: (1080, 1080, 3)
[Info] img shape: (1080, 1080, 3)
[Info] img shape: (1080, 1080, 3)
[Info] img shape: (1080, 1080, 3)
　　

绘制直方图(柱状图)

　　统计去除背景颜色的灰度图字典

# 去除背景颜色
　　pix_bkg = np.argmax(np.bincount(img_gray.ravel()))
　　img_gray = np.where(img_gray <= pix_bkg + 2, 0, img_gray)
　　img_gray = img_gray.astype(np.uint8)
　　# 生成数值数组
　　hist = cv2.calcHist([img_gray], [0], None, [256], [0, 256]) 
　　hist = hist.ravel()
　　# 数值字典
　　hist_dict = collections.defaultdict(int)
　　for i, v in enumerate(hist):
　　 hist_dict[i] += int(v)
　　# 去除背景颜色，已经都统计到0，所以0值非常大，删除0值，观察分布
　　hist_dict[0] = 0

　　绘制柱状图：

plt.subplots：设置多个子图，figsize背景尺寸，facecolor背景颜色
ax.set_title：设置标题
ax.bar：x轴的值，y轴的值
ax.set_xticks：x轴的显示间隔
plt.savefig：存储图像
plt.show：展示

fig, ax = plt.subplots(1, 1, figsize=(10, 8), facecolor=white)
　　ax.set_title(channel {}.format(ci))
　　n_bins = 100
　　ax.bar(range(n_bins+1), [hist_dict.get(xtick, 0) for xtick in range(n_bins+1)])
　　ax.set_xticks(range(0, n_bins, 5))
　　plt.savefig(res_path)
　　plt.show()

　　效果：

　　到此这篇关于Python Pandas处理CSV文件的常用技巧分享的文章就介绍到这了,更多相关Pandas处理CSV文件内容请搜索盛行IT软件开发工作室以前的文章或继续浏览下面的相关文章希望大家以后多多支持盛行IT软件开发工作室！

郑重声明：本文由网友发布，不代表盛行IT的观点，版权归原作者所有，仅为传播更多信息之目的，如有侵权请联系，我们将第一时间修改或删除，多谢。

pythonpandas读取csv文件,pandas操作csv文件

目录

统计列值出现的次数

筛选特定列值

遍历数据行

绘制直方图(柱状图)

相关文章阅读