python读取json文件转化为list,python将文本数据转换成表格

　　本文主要介绍python读取json数据，批量将表格还原为html。由于需要对ocr识别系统的表格识别结果进行验证，比较返回的json文件的结果比较麻烦，所以需要将json文件中的识别结果还原为表格进行验证。以下详情请各位朋友参考。

　　00-1010 1.做法1。首先，创建一个新文档2。添加案文2。将word转换成html1。使用pydocx转换2。使用win32模块背景：.

　　由于需要对ocr识别系统的表格识别结果进行验证，比较返回的json文件的结果比较麻烦，所以需要将json文件中的识别结果还原到表格中进行验证。

　　文件部分内容如下：

　　{row 3360 6 ， col ， 5 start _ row 3360 0， start _ column 3360 0， end _ row 3360 0， end _ column 3360 18，93 ， org _ position 3360[50，60，167，62，166，84，49，82]， char _ position 3360 [[86，83，

　　现在需要通过行和列的起止坐标和内容生成相应的表格。

　　我准备开始用js，但是因为有些语法忘记了，所以还是选择python。

　　经过一些专栏研究，发现python-docx可以自动生成表格，但是格式是word，后期所有操作都是从word转换成html。

一、实操

　　从docx导入文档

　　文档=文档()

　　然后用Document类的add_table方法添加一个表格，其中行是行，列是列，样式表样式。详情可查看官方文档：

　　Table=document . add _ Table(rows=37，cols=13，style=Table Grid )

　　上面的代码在word中插入了一个37行13列的表格。(有37*13=481个单元格)

　　每个生成的单元格都有“坐标”，比如上表左上角的(0，0)，右下角的(36，12)。

　　接下来我们需要做的是合并一些单元格，从而到达我们最终需要的表格。

　　table.cell(0，0)。merge(table.cell(2，2))

　　上面的代码将单元格(0，0)到单元格(2，2)中的所有单元格合并成一个单元格。

　　这里需要注意的是，虽然每个单元格都被合并了，但是它仍然存在。比如合并。

　　了(0,0)和(0,1)两个cell，那么这个合并的cell其实就是(0,0;0,1)

　　如果cell较多，无法直观的看出坐标的话，可以用下列的代码将每个cell的坐标都标注出来，方便合并

document = Document()
　　table = document.add_table(rows=37,cols=13,style=Table Grid)
　　document.save(table-1.docx)
　　document1 = Document(table-1.docx)
　　table = document1.tables[0]
　　for row,obj_row in enumerate(table.rows):
　　   for col,cell in enumerate(obj_row.cells):
　　       cell.text = cell.text + "%d,%d " % (row,col)
　　document1.save(table-2.docx)

2.添加文本

　　将所有cell依次合并后，就需要向合并后的cell里添加文本。

　　用table的row方法可以得到一个表格的一行list其中包含了这一行的所有cell

hdr_cells0 = table.rows[0].cells

　　上面代码就得到了合并表格后的第一行所有cell，然后我们用hdr_cell0[0]就可以得到合并表格后的第一行的第一个cell。用add_paragraph方法即可像cell里添加文本

hdr_cells0[0].add_paragraph(数据文字)

　　其他使用方法可参考官网模块:https://www.osgeo.cn/python-docx/

二、word转成html

1.使用pydocx转换

pip install pydocx

from pydocx import PyDocX
　　html = PyDocX.to_html("test.docx")
　　f = open("test.html", w, encoding="utf-8")
　　f.write(html)
　　f.close()

　　通过网页上传word文档，只接收docx

<form method="post" enctype="multipart/form-data">
　　<input type="file" name="file" accept="application/vnd.openxmlformats-officedocument.wordprocessingml.document">
　　</form>

2.使用win32模块

pip3 install pypiwin32
　　from win32com import client as wc
　　import os
　　word = wc.Dispatch(Word.Application)
　　def wordsToHtml(dir):
　　    for path, subdirs, files in os.walk(dir):
　　        for wordFile in files:
　　            wordFullName = os.path.join(path, wordFile)
　　            doc = word.Documents.Open(wordFullName)
　　            wordFile2 = wordFile
　　            dotIndex = wordFile2.rfind(".")
　　            if (dotIndex == -1):
　　                print(wordFullName + "********************ERROR: 未取得后缀名！")
　　            fileSuffix = wordFile2[(dotIndex + 1):]
　　            if (fileSuffix == "doc" or fileSuffix == "docx"):
　　                fileName = wordFile2[: dotIndex]
　　                htmlName = fileName + ".html"
　　                htmlFullName = os.path.join(path, htmlName)
　　                print("generate html:" + htmlFullName)
　　                doc.SaveAs(htmlFullName, 10)
　　                doc.Close()
　　    word.Quit()
　　    print("")
　　    print("Finished!")
　　if __name__ == __main__:
　　    import sys
　　    if len(sys.argv) != 2:
　　        print("Usage: python funcName.py rootdir")
　　        sys.exit(100)
　　    wordsToHtml(sys.argv[1])