analyze

更新时间:2022-12-27 22:04:34 阅读: 评论:0


2022年12月27日发(作者:国际高中排名)

Analyzer分词器

#SimpleAnalyzer–按照⾮字母切分(符号被过滤),⼩写处理

#StopAnalyzer–⼩写处理,停⽤词过滤(the,a,is)

#WhitespaceAnalyzer–按照空格切分,不转⼩写

#KeywordAnalyzer–不分词,直接将输⼊当作输出

#PatterAnalyzer–正则表达式,默认W+(⾮字符分隔)

#Language–提供了30多种常见语⾔的分词器

1,StandardAnalyzer

2,SimpleAnalyzer

3,WhitespaceAnalyzer

4,StopAnalyzer

5,KeyworkAnalyzer

6,PatternAnalyzer

#SimpleAnalyzer–按照⾮字母切分(符号被过滤),⼩写处理

#StopAnalyzer–⼩写处理,停⽤词过滤(the,a,is)

#WhitespaceAnalyzer–按照空格切分,不转⼩写

#KeywordAnalyzer–不分词,直接将输⼊当作输出

#PatterAnalyzer–正则表达式,默认W+(⾮字符分隔)

#Language–提供了30多种常见语⾔的分词器

#2runningQuickbrown-foxesleapoverlazydogsinthesummerevening

#查看不同的analyzer的效果

#standard

GET_analyze

{

"analyzer":"standard",

"text":"2runningQuickbrown-foxesleapoverlazydogsinthesummerevening."

}

#simpe

GET_analyze

{

"analyzer":"simple",

"text":"2runningQuickbrown-foxesleapoverlazydogsinthesummerevening."

}

GET_analyze

{

"analyzer":"stop",

"text":"2runningQuickbrown-foxesleapoverlazydogsinthesummerevening."

}

#stop

GET_analyze

{

"analyzer":"whitespace",

"text":"2runningQuickbrown-foxesleapoverlazydogsinthesummerevening."

}

#keyword

GET_analyze

{

"analyzer":"keyword",

"text":"2runningQuickbrown-foxesleapoverlazydogsinthesummerevening."

}

GET_analyze

{

"analyzer":"pattern",

"text":"2runningQuickbrown-foxesleapoverlazydogsinthesummerevening."

}

#english

GET_analyze

{

"analyzer":"english",

"text":"2runningQuickbrown-foxesleapoverlazydogsinthesummerevening."

}

POST_analyze

{

"analyzer":"icu_analyzer",

"text":"他说的确实在理”"

}

POST_analyze

{

"analyzer":"standard",

"text":"他说的确实在理”"

}

POST_analyze

{

"analyzer":"icu_analyzer",

"text":"这个苹果不⼤好吃"

}

demo

//直接指定analyze进⾏测试

GET_analyze

{

"analyzer":"icu_analyzer",

"text":"你好中国"

}

2,其他中⽂分词插件

//⾃定义分词器

PUT/my_index

{

"ttings":{

"analysis":{

"char_filter":{

"&_to_and":{

"type":"mapping",

"mappings":["&=>and"]

}},

"filter":{

"my_stopwords":{

"type":"stop",

"stopwords":["the","a"]

}},

"analyzer":{

"my_analyzer":{

"type":"custom",

"char_filter":["html_strip","&_to_and"],

"tokenizer":"standard",

"filter":["lowerca","my_stopwords"]

}}

}}}

//设置mapping

PUT/my_index/_mapping

{

"properties":{

"urname":{

"type":"text",

"analyzer":"my_analyzer"

},

"password":{

"type":"text"

}

}

}

//插⼊数据

PUT/my_index/_doc/1

{

"urname":"Thequick&brownfox",

"password":"Thequick&brownfox"

}

//验证

GETmy_index/_analyze

{

"field":"urname",

"text":"Thequick&brownfox"

}

GETmy_index/_analyze

{

"field":"password",

"text":"Thequick&brownfox"

}

需要下载与el版本⼀致的分词器版本

2,plugins⽂件夹下⾯创建⼀个analysis-ik⽬录

3,将下载的zip⽂件copy到analysis-ik⽬录下,执⾏unzip

4,运⾏es

//ik_max_word

//ik_smart

POST_analyze

{

"analyzer":"ik_max_word",

"text":["剑桥分析公司多位⾼管对卧底记者说,他们确保了唐纳德·特朗普在总统⼤选中获胜"]

}

hanlp分词插件

本文发布于:2022-12-27 22:04:34,感谢您对本站的认可!

本文链接:http://www.wtabcd.cn/fanwen/fan/90/42901.html

版权声明:本站内容均来自互联网,仅供演示用,请勿用于商业和其他非法用途。如果侵犯了您的权益请与我们联系,我们将在24小时内删除。

上一篇:fragrance
下一篇:flop
标签:analyze
相关文章
留言与评论(共有 0 条评论)
   
验证码:
Copyright ©2019-2022 Comsenz Inc.Powered by © 专利检索| 网站地图