tree-sitter-编程语⾔解析⼯具包
tree_sitter是⼀个解析器⽣成器⼯具和增量解析库。它可以为源代码⽂件构建具体的语法树,并在编辑源⽂件时⾼效地更新语法树。它⽀持多种编程语⾔的解析,包括python, java, c等。同时⽀持多种编程语⾔的使⽤。
【关于tree-sitter更多的⽤法请查看我主页的PLP专栏:】
tree_sitter的优点:
1.⾜够通⽤,可以解析任何编程语⾔
2.⾜够快,可以在⽂本编辑器中对每次击键进⾏解析
3.⾜够健壮,即使在出现语法错误的情况下也能提供有⽤的结果
4.⽆依赖性,因此运⾏库(⽤纯C编写)可以嵌⼊到任何应⽤程序中
【 ⽹上基本没有tree_sitter的相关博客,只能从官⽹看看教程了。】
猪的成语
在使⽤之前先按python这个github配置好对应编程语⾔的环境(这样才能解析对应编程语⾔):
下⾯⽤tree-sitter的python包来解析java代码:
不落巢臼
from tree_sitter import Language, Parr
JAVA_LANGUAGE = Language('my-languages.so','java')
parr = Parr()
parr.t_language(JAVA_LANGUAGE)
src ='/*这是代码*/public class Hello{\nprivate String text = "Hello World!";\npublic void print(int value) {\nif(value>100) System.out.println(value);}}'
# 去除注释(这是我⾃⼰写的去除注释的)当然你不去除注释也可以,把节点为注释类型的过滤掉也可。
src_clean = remove_comments_and_docstrings(src,'java')
print(src_clean)
定义⼀个递归遍历函数:
def make_move(cursor, move, all_nodes):
# 递归遍历该树,把每个节点的信息保存起来,包括结点的类型、涉及范围的代码⾏起始位置、终⽌位置。
# cursor: 即当前光标的位置(即节点的位置),通过de即可获取当前节点
# move: 把move参数作为当前节点的移动⽅向
# all_nodes: 保存节点信息的列表(保存的是前序遍历的结果:根左右)
if(move =="down"):
all_nodes.de)
哑铃练肩
_first_child()):
make_move(cursor,"down", all_nodes)
_next_sibling()):
make_move(cursor,"right", all_nodes)
_parent()):
make_move(cursor,"up", all_nodes)
elif(move =="right"):
all_nodes.de)
_first_child()):
make_move(cursor,"down", all_nodes)
_next_sibling()):
make_move(cursor,"right", all_nodes)
_parent()):
make_move(cursor,"up", all_nodes)
elif move =="up":
_next_sibling()):美丽的草原
裨将make_move(cursor,"right", all_nodes)
_parent()):
make_move(cursor,"up", all_nodes)
如果要⾼效地遍历⼤量节点,可以⽤官⽹提供的这个TreeCursor的⽅式来遍历:
tree = parr.par(bytes(src_clean,'utf8'))
cursor = tree.walk()
all_nodes =[]
make_move(cursor,"down", all_nodes)
print(all_nodes)
以下就是AST的前序遍历的结果了,包含了对应代码的token以及token涉及范围的起始、终⽌位置,然后再根据⾃⼰的需要保留or过滤掉⼀些不需要的token即可。
[<Node kind=program, start_point=(0,1), end_point=(3,42)>,
<Node kind=class_declaration, start_point=(0,1), end_point=(3,42)>,
<Node kind=modifiers, start_point=(0,1), end_point=(0,7)>,
<Node kind="public", start_point=(0,1), end_point=(0,7)>,
<Node kind="class", start_point=(0,8), end_point=(0,13)>,
<Node kind=identifier, start_point=(0,14), end_point=(0,19)>,
<Node kind=class_body, start_point=(0,19), end_point=(3,42)>,
<Node kind="{", start_point=(0,19), end_point=(0,20)>,
<Node kind=field_declaration, start_point=(1,0), end_point=(1,37)>,
<Node kind=modifiers, start_point=(1,0), end_point=(1,7)>,
<Node kind="private", start_point=(1,0), end_point=(1,7)>,
资教
<Node kind=type_identifier, start_point=(1,8), end_point=(1,14)>,
<Node kind=variable_declarator, start_point=(1,15), end_point=(1,36)>,
<Node kind=identifier, start_point=(1,15), end_point=(1,19)>,
电梯礼仪<Node kind="=", start_point=(1,20), end_point=(1,21)>,
<Node kind=string_literal, start_point=(1,22), end_point=(1,36)>,
<Node kind=";", start_point=(1,36), end_point=(1,37)>,
<Node kind=method_declaration, start_point=(2,0), end_point=(3,41)>,
<Node kind=modifiers, start_point=(2,0), end_point=(2,6)>,
<Node kind="public", start_point=(2,0), end_point=(2,6)>,
<Node kind=void_type, start_point=(2,7), end_point=(2,11)>,
<Node kind=identifier, start_point=(2,12), end_point=(2,17)>,
<Node kind=formal_parameters, start_point=(2,17), end_point=(2,28)>,
<Node kind="(", start_point=(2,17), end_point=(2,18)>,
<Node kind=formal_parameter, start_point=(2,18), end_point=(2,27)>,
<Node kind=integral_type, start_point=(2,18), end_point=(2,21)>,
<Node kind="int", start_point=(2,18), end_point=(2,21)>,
<Node kind=identifier, start_point=(2,22), end_point=(2,27)>,
<Node kind=")", start_point=(2,27), end_point=(2,28)>,
<Node kind=block, start_point=(2,29), end_point=(3,41)>,
<Node kind="{", start_point=(2,29), end_point=(2,30)>,
<Node kind=if_statement, start_point=(3,0), end_point=(3,40)>,
<Node kind="if", start_point=(3,0), end_point=(3,2)>,
<Node kind=parenthesized_expression, start_point=(3,2), end_point=(3,13)>, <Node kind="(", start_point=(3,2), end_point=(3,3)>,
<Node kind=binary_expression, start_point=(3,3), end_point=(3,12)>,
<Node kind=identifier, start_point=(3,3), end_point=(3,8)>,
<Node kind=">", start_point=(3,8), end_point=(3,9)>,
<Node kind=decimal_integer_literal, start_point=(3,9), end_point=(3,12)>,
<Node kind=")", start_point=(3,12), end_point=(3,13)>,
<Node kind=expression_statement, start_point=(3,14), end_point=(3,40)>, <Node kind=method_invocation, start_point=(3,14), end_point=(3,39)>,
<Node kind=field_access, start_point=(3,14), end_point=(3,24)>,
人美b<Node kind=identifier, start_point=(3,14), end_point=(3,20)>,
<Node kind=".", start_point=(3,20), end_point=(3,21)>,
<Node kind=identifier, start_point=(3,21), end_point=(3,24)>,
<Node kind=".", start_point=(3,24), end_point=(3,25)>,
<Node kind=identifier, start_point=(3,25), end_point=(3,32)>,
<Node kind=argument_list, start_point=(3,32), end_point=(3,39)>,
<Node kind="(", start_point=(3,32), end_point=(3,33)>,
<Node kind=identifier, start_point=(3,33), end_point=(3,38)>,
<Node kind=")", start_point=(3,38), end_point=(3,39)>,
<Node kind=";", start_point=(3,39), end_point=(3,40)>,
<Node kind="}", start_point=(3,40), end_point=(3,41)>,
<Node kind="}", start_point=(3,41), end_point=(3,42)>]