别再⼿动⽐对⽂件啦,Python让你轻松实现⽂件内容以及⽬录对⽐
⽬录
问题引⼊:有这样的需求,在平时批改作业的时候,学⽣将源代码⽂件明星故事
提交给我之后,我每次都要将源代码打开,逐⼀核对,⽂件数量⼀多就显得⾮常耗时且⿇烦,有没有什么更快捷的⽅式呢?答案:有的。让学⽣按照我指定的格式,填写答案,然后使⽤ Python ⾃动实现⽂件内容的⽐较!接下来就去给⼤家讲讲⽂件内容差异对⽐以及⽂件⽬录的⽐较⽅法。
⼀、⽂件内容差异对⽐⽅法
1.1 两个字符串的差异对⽐
本⽰例通过使⽤ difflib 模块实现两个字符串的差异对⽐,然后以版本控制风格进⾏输出。⽰例代码如下:
import difflib
from pprint import pprint
text1_lines =''' 1. Beautiful is better than ugly.
2. Explicit is better than implicit.
3. Simple is better than complex.
4. Cologo图案
mplex is better than complicated.'''.splitlines(keepends=T玩游戏的坏处
rue)
# 以⾏进⾏分割,以便进⾏对⽐
text2_lines =''' 1. Beautiful is better than ugly.
3. Simple is better than complex.
4. Complicated is better than complex.
5. Flat is better than nested.'''.splitlines(keepends=True)
d = difflib.Differ()# 创建Differ()对象
result =pare(text1_lines, text2_lines))# 采⽤compare⽅法对字符串进⾏⽐较
pprint(result)
被⽰例采⽤ Differ() 类对两个字符串进⾏⽐较,另外 difflib 的 SuquenceMatcher() 类⽀持任意类型序列的⽐较,HtmlDiff() 类⽀持将⽐较结果输出为 HTML 格式,⽰例运⾏结果如下:
Each line of a Differ delta begins with a two-letter code:
1.2 ⽣成美观的对⽐HTML格式⽂档
⽰例代码如下:
import difflib
text1_lines =''' 1. Beautiful is better than ugly.
2. Explicit is better than implicit.
3. Simple is better than complex.
4. Complex is better than complicated.'''.splitlines(keepends=True)
# 以⾏进⾏分割,以便进⾏对⽐
text2_lines =''' 1. Beautiful is better than ugly.双一流
3. Simple is better than complex.
4. Complicated is better than complex.
5. Flat is better than nested.'''.splitlines(keepends=True)
d = difflib.HtmlDiff()# 创建HtmlDiffer()对象
with open("test.html","w")as file:
# 采⽤make_file⽅法对字符串进⾏⽐较并写⼊到html⽂件中
file.write(d.make_file(text1_lines, text2_lines))
使⽤浏览器打开 test.html ⽂件,如下图所⽰,HTML ⽂档包括了⾏号、差异标志、图例等信息,可读性增强了很多。
⼆、⽂件⽬录差异对⽐⽅法
Compare the files named f1 and f2, returning True if they em equal,Fal otherwi.
cmpfiles(多⽂件对⽐) 如下:
Compare the files in the two directories dir1 and dir2 who names are given by common.
Returns three lists of file names: match, mismatch, errors.
For example, cmpfiles('a','b',['c','d/e']) will compare a/c with b/c and a/d/e with b/d/e.
'c'and'd/e' will each be in one of the three returned lists.
dircmp(⽬录对⽐) 如下:
class filecmp.dircmp(a, b, ignore=None, hide=None)
Construct a new directory comparison object, to compare the directories a and b.
ignore is a list of names to ignore,and defaults to filecmp.DEFAULT_IGNORES.
hide is a list of names to hide,and defaults to [os.curdir, os.pardir].
2.1 单⽂件对⽐
单⽂件对⽐:采⽤ p(f1, f2, shallow=True) ⽅法,⽐较⽂件名为 f1 和 f2 的⽂件,相同返回 True,不相同返回
Fal,shallow 默认为 True,意思是只根据雨的传说
os.stat() ⽅法返回的⽂件基本信息进⾏对⽐,⽐如最后访问时间、修改时间、状态改变时间等,会忽略⽂件内容的对⽐。当 shallow 为 Fal 时,则 os.stat() 与⽂件内容同时进⾏校验。⽂件内容如下:
完整⽰例代码如下:
import filecmp
p("",""))# Fal
p("",""))# True
2.2 多⽂件对⽐
多⽂件对⽐:采⽤ pfiles(dir1, dir2, common, shallow=True) ⽅法,对⽐ dir1 与 dir2 ⽬录给定的⽂件清单。该⽅法返回⽂件名的三个列表,分别为匹配、不匹配、错误。匹配为包含匹配的⽂件的列表,不匹配反之,错误列表包含了⽬录不存在⽂件、不具备读权限或其他原因导致的不能⽐较的⽂件清单。⽬录⽂件列表如下:
完整⽰例代码如下:
import filecm最爱英文怎么说
p
pfiles('one','two',['','','','','']))
2.3 ⽬录对⽐
通过 filecmp.dircmp(a, b, ignore=None, hide=None) 类创建⼀个⽬录⽐较对象,其中 a 和 b 是参加⽐较的⽬录名。ignore 代表⽂件名忽略的列表,hide 代表隐藏的列表,默认 [os.curdir, os.pardir]。dircmp 类可以获得⽬录⽐较的详细信息,如只有在 a ⽬录中包括的⽂件、a 与 b 都存在的⼦⽬录、匹配的⽂件等,同时⽀持递归。dircmp 提供了三个输出报告的⽅法:
1. report():Print (to sys.stdout) a comparison between a and b.
2. report_partial_closure():Print a comparison between a and b and common immediate subdirectories.
3. report_full_closure():Print a comparison between a and b and common subdirectories (recursively).
The dircmp class offers a number of interesting attributes that may be ud to get various bits of information about the directory trees being compared.
1. left:The directory a. 左⽬录,如类定义中的 a
2. right:The directory b. 右⽬录,如类定义中的 b
3. left_list:Files and subdirectories in a, filtered by hide and ignore. 左⽬录中的⽂件及⽬录列表
4. right_list:Files and subdirectories in b, filtered by hide and ignore. 右⽬录中的⽂件及⽬录列表
5. common:Files and subdirectories in both a and b. 两边⽬录共同存在的⽂件或⽬录
6. left_only:Files and subdirectories only in a. 只在左⽬录中的⽂件或⽬录
7. right_only:Files and subdirectories only in b. 只在右⽬录中的⽂件或⽬录
8. common_dirs:Subdirectories in both a and b. 两边⽬录都存在的⼦⽬录
9. common_files:Files in both a and b. 两边⽬录都存在的⼦⽂件
10. common_funny:Names in both a and b, such that the type differs between the directories, or names for which
os.stat() reports an error. 两边⽬录都存在的⼦⽬录(不同⽬慰问信范文
录类型或os.stat()记录的错误)
11. same_files:Files which ar酸辣鸡蛋汤的做法
e identical in both a and b, using the class’s file comparison operator. 匹配相同的⽂件
12. diff_files:Files which are in both a and b, who contents differ according to the class’s file comparison operator. 不
匹配的⽂件
13. funny_files:Files which are in both a and b, but could not be compared. 两边⽬录中都存在,但⽆法⽐较的⽂件
14. subdirs:A dictionary mapping names in common_dirs to dircmp objects. 将common_dirs⽬录映射到新的dircmp对象,格
式为字典类型
⽰例:对⽐ one 与 two 的⽬录差异。通过调⽤ dircmp() ⽅法实现⽬录差异对⽐功能,同时输出⽬录对⽐对象所有属性信息。代码如下:import filecmp
cmp= filecmp.dircmp("one","two")
port())
程序运⾏结果如下图所⽰: