首页 > 美文阅读

python实现docx的批注（comments）插入

更新时间:2023-06-25 10:07:03 阅读：评论：0

python实现docx的批注（comments）插⼊

项⽬需要实现⾃动在docx中插⼊批注，⾸选为python，python中有docx库，但是到⽬前为⽌还是未⽀持插⼊批注功能，但是在python-

docx项⽬中，有⼈提出了这个问题，作者scanny给出了相关。

总结⼀下⼤致思路为：解压docx⽂件后会得到很多⽂件及⽂件夹，对⽐插⼊批注和未插⼊批注的解压⽂件发现：插⼊批注会新增⼀个

l⽂件，并且会修改word/_l.rels和l，后续插⼊新的标注只会修改

l和l。所以只需要搞清楚ls、l、l的变化规律，

就可以实现批注插⼊的⾃动化。

⼤家可以尝试将docx⽂件重命名为.zip，然后解压，⼿动修改⾥⾯的⽂件信息，再压缩回.zip，再重命名为docx，关于压缩回.zip可能出现的罗汉果的作用

问题，参考

以下为未插⼊批注解压⽂件结构：

以下为插⼊批注的⽂件结构:

最明显的区别在于新增了l⽂件其次还有word/_l.rels、l内容的变化。

⾸先对⽐word/_l.rels⽂件内容的变化

插⼊批注前：

<?xml version="1.0" encoding="UTF-8" standalone="yes"?>

<Relationships xmlns="schemas.openxmlformats/package/2006/relationships"><Relationship Id="rId5" Type="schemas.openxmlformats/office 插⼊批注后：

<?xml version="1.0" encoding="UTF-8" standalone="yes"?>

<Relationships xmlns="schemas.openxmlformats/package/2006/relationships"><Relationship Id="rId6" Type="schemas.openxmlformats/office 其次对⽐l内容变化：

插⼊前：

<?xml version="1.0" encoding="UTF-8" standalone="yes"?>

<w:document xmlns:wpc="/office/word/2010/wordprocessingCanvas" xmlns:mc="schemas.openxmlformats/markup-compa 插⼊后：

<?xml version="1.0" encoding="UTF-8" standalone="yes"?>

<w:document xmlns:wpc="/office/word/2010/wordprocessingCanvas" xmlns:mc="schemas.openxmlformats/markup-compa 对⽐插⼊⼀个批注和插⼊两个批注的区别：

插⼊⼀个：

<?xml version="1.0" encoding="UTF-8" standalone="yes"?>

<w:comments xmlns:wpc="/office/word/2010/wordprocessingCanvas" xmlns:mc="schemas.openxmlformats/markup-compa 插⼊两个：

<?xml version="1.0" encoding="UTF-8" standalone="yes"?>

<w:comments xmlns:wpc="/office/word/2010/wordprocessingCanvas" xmlns:mc="schemas.openxmlformats/markup-compa 区别⼤家可以⾃⼰尝试。

不多说，上实现代码：

运⾏python3 [code.py代码⽂件名] [docx⽂件路径] [需要被批注的⽂本内容] [批注内容]

例： python3 inrt_comments.py /Urs/guochuanxiang/Desktop/comments.docx ⽂本批注

# coding:utf-8

import sys

from zipfile import ZipFile

import os

import shutil

import re

def write_comments(comments_file_content, comments): # comments: [被批注⽂本，批注]

comments_id = comments[2]

print ('')

tmp = '<w:comment w:id="{}" w:author="guochuanxiang" w:date="2019-03-13T15:10:06Z" w:initials="g"><w:p><w:pPr><w:pStyle w:val="2"/></w:pPr><w:r><w:t> content_comments = comments_file_content[:-13]+tmp

return content_comments

def write_document(document_file_content, comments):

comments_id = comments[2]

print ('')

tmp = '</w:t></w:r><w:commentRangeStart w:id="{}"/><w:r><w:rPr><w:rFonts w:hint="eastAsia"/></w:rPr><w:t>{}</w:t></w:r><w:commentRangeEnd w:id="{}"/> content_document = document_place(comments[0],tmp,1)

return content_document

def write_rel(rel_file_content, comments):

if rel_file_content.find('l') == -1:

print ("not l")

content_rel = rel_file_content[:-16]+'<Relationship Id="{}" Type="schemas.openxmlformats/officeDocument/2006/relationships/comments" T

print(content_rel)

return content_rel

el:

print('l in rels file')

return rel_file_content

def run(file_path='/Urs/guochuanxiang/Desktop/test.docx',comments=['内容', '批注1']):

doc_file = open(file_path, 'rb')微小世界

doc = ZipFile(doc_file)

print ('')

file_name = doc.namelist() #获取所有⽂件名

if 'l' not in file_name:

print ('l')

comments_file = '<?xml version="1.0" encoding="UTF-8" standalone="yes"?>\n<w:comments xmlns:wpc="/office/word/2010/wo comments.append(0)

el:

comments_file = ad('l').decode('utf-8') #获取l内容

comment_id = re.compile(r'(?<=id=")\d+') #寻找所有comments id

comment_id = int(max(comment_id.findall(comments_file)))+1 #设置批注id为最⼤+1

comments.append(comment_id)

document_file = ad('l').decode('utf-8') #获取l内容无上业位

rel_file = ad('word/_l.rels').decode('utf-8') #获取rel内容

doc.clo()

doc_file.clo()

comments_g = write_comments(comments_file, comments) #获取添加批注后l内容

document = write_document(document_file, comments) #获取添加批注后l内容

rel = write_rel(rel_file, comments) #获取添加批注后rel内容

print ('get all content')

print('')

r_f = open('word/_l.rels','w')

r_f.write(rel)

r_f.clo()

print('done')

print ('l...')

c_f = open('l','w') #将插⼊批注的comment内容写⼊l

c_f.write(comments_g)

c_f.clo()

print('done')

print('l....') #将插⼊批注的document内容写⼊l

d_f = open('l','w')

死心塌地爱着你剪刀英语怎么读d_f.write(document)地贫基因

d_f.clo()

print('done')

嘴唇长痘痘

print('creat ')

new_file = ZipFile(doc.filename,mode='w') #新建空docx

if 'l' not in file_name:

print ('add {}'.format('l'))

new_file.write('l')

try:

for name in file_name:

if os.path.isfile(name):范伟最新电影

print('add {}'.format(name))

new_file.write(name) #将⽂件压缩回docx

finally:

print('closing')

new_file.clo()

for name in file_name:

if ists(name):

if os.path.isfile(name):

el:

<(name)

print('done')

if __name__ == '__main__':

file_path = sys.argv[1]

text = sys.argv[2]

comment = sys.argv[3]

comments = [text,comment]

print (comments)

run(file_path,comments)

总结：按scanny的说法，python-docx有提供在xml⾥插⼊内容的⽅法，但是我没⽤过这个模块，所以没有深究如何⽤docx实现，⽬前这种实现⽅法有局限性，如果⼀段⽂本被批注多次可能会出现问题，可能需要使⽤docx模块的插⼊⽅法可以解决，⼤佬们可以尝试⼀下

想深⼊了解docx⽂档结构，可以点击

本文发布于:2023-06-25 10:07:03，感谢您对本站的认可！

本文链接：https://www.wtabcd.cn/fanwen/fan/82/1035492.html

版权声明：本站内容均来自互联网，仅供演示用，请勿用于商业和其他非法用途。如果侵犯了您的权益请与我们联系，我们将在24小时内删除。

上一篇：最新委托合同与居间合同区别委托和居间(二十五篇)

下一篇：雇佣保姆合同法全文(二十六篇)

标签：批注内容问题

相关文章

留言与评论（共有 0 条评论）

推荐文章

排行榜

六一儿童节适合的句子(精选100句)
2024年4月2日发(作者：匡亚明)六一儿童节适合的句子（精选100句）六一儿童节适合的句子（精选100句）在日常的学习、工作、生活中，大家都接触过很多优秀的句子吧，句子可分为单句和复句，单句又可分为主谓句和非主谓句。句子的类型有很多，你都知道吗？下面是小编为大家整理的六一儿童节适合的句子（精选100句），仅供参考，希望能够帮助到大家。六一儿童节适合的句子1 1、六一儿童节，愿你快乐依旧，幸
206℃六一儿童节的句子
204℃写六一儿童节的句子(精选25条)
193℃过六一儿童节的句子(精选190句)
197℃最火的六一句子精选150句
171℃六一儿童节的优美句子(50句)
124℃有关六一儿童节的句子(精选9篇)
135℃六一儿童节金句(146句)
109℃关于六一儿童节句子有哪些64条
133℃关于六一儿童节的句子

热门标签

Copyright ©2019-2022 Comsenz Inc.Powered by © 专利检索| 网站地图

我要关灯

我要开灯
返回顶部