Analyzer分词器
#SimpleAnalyzer–按照⾮字母切分(符号被过滤),⼩写处理
#StopAnalyzer–⼩写处理,停⽤词过滤(the,a,is)
#WhitespaceAnalyzer–按照空格切分,不转⼩写
#KeywordAnalyzer–不分词,直接将输⼊当作输出
#PatterAnalyzer–正则表达式,默认W+(⾮字符分隔)
#Language–提供了30多种常见语⾔的分词器
1,StandardAnalyzer
2,SimpleAnalyzer
3,WhitespaceAnalyzer
4,StopAnalyzer
5,KeyworkAnalyzer
6,PatternAnalyzer
#SimpleAnalyzer–按照⾮字母切分(符号被过滤),⼩写处理
#StopAnalyzer–⼩写处理,停⽤词过滤(the,a,is)
#WhitespaceAnalyzer–按照空格切分,不转⼩写
#KeywordAnalyzer–不分词,直接将输⼊当作输出
#PatterAnalyzer–正则表达式,默认W+(⾮字符分隔)
#Language–提供了30多种常见语⾔的分词器
#2runningQuickbrown-foxesleapoverlazydogsinthesummerevening
#查看不同的analyzer的效果
#standard
GET_analyze
{
"analyzer":"standard",
"text":"2runningQuickbrown-foxesleapoverlazydogsinthesummerevening."
}
#simpe
GET_analyze
{
"analyzer":"simple",
"text":"2runningQuickbrown-foxesleapoverlazydogsinthesummerevening."
}
GET_analyze
{
"analyzer":"stop",
"text":"2runningQuickbrown-foxesleapoverlazydogsinthesummerevening."
}
#stop
GET_analyze
{
"analyzer":"whitespace",
"text":"2runningQuickbrown-foxesleapoverlazydogsinthesummerevening."
}
#keyword
GET_analyze
{
"analyzer":"keyword",
"text":"2runningQuickbrown-foxesleapoverlazydogsinthesummerevening."
}
GET_analyze
{
"analyzer":"pattern",
"text":"2runningQuickbrown-foxesleapoverlazydogsinthesummerevening."
}
#english
GET_analyze
{
"analyzer":"english",
"text":"2runningQuickbrown-foxesleapoverlazydogsinthesummerevening."
}
POST_analyze
{
"analyzer":"icu_analyzer",
"text":"他说的确实在理”"
}
POST_analyze
{
"analyzer":"standard",
"text":"他说的确实在理”"
}
POST_analyze
{
"analyzer":"icu_analyzer",
"text":"这个苹果不⼤好吃"
}
demo
//直接指定analyze进⾏测试
GET_analyze
{
"analyzer":"icu_analyzer",
"text":"你好中国"
}
2,其他中⽂分词插件
//⾃定义分词器
PUT/my_index
{
"ttings":{
"analysis":{
"char_filter":{
"&_to_and":{
"type":"mapping",
"mappings":["&=>and"]
}},
"filter":{
"my_stopwords":{
"type":"stop",
"stopwords":["the","a"]
}},
"analyzer":{
"my_analyzer":{
"type":"custom",
"char_filter":["html_strip","&_to_and"],
"tokenizer":"standard",
"filter":["lowerca","my_stopwords"]
}}
}}}
//设置mapping
PUT/my_index/_mapping
{
"properties":{
"urname":{
"type":"text",
"analyzer":"my_analyzer"
},
"password":{
"type":"text"
}
}
}
//插⼊数据
PUT/my_index/_doc/1
{
"urname":"Thequick&brownfox",
"password":"Thequick&brownfox"
}
//验证
GETmy_index/_analyze
{
"field":"urname",
"text":"Thequick&brownfox"
}
GETmy_index/_analyze
{
"field":"password",
"text":"Thequick&brownfox"
}
需要下载与el版本⼀致的分词器版本
2,plugins⽂件夹下⾯创建⼀个analysis-ik⽬录
3,将下载的zip⽂件copy到analysis-ik⽬录下,执⾏unzip
4,运⾏es
//ik_max_word
//ik_smart
POST_analyze
{
"analyzer":"ik_max_word",
"text":["剑桥分析公司多位⾼管对卧底记者说,他们确保了唐纳德·特朗普在总统⼤选中获胜"]
}
hanlp分词插件
本文发布于:2022-12-27 22:04:34,感谢您对本站的认可!
本文链接:http://www.wtabcd.cn/fanwen/fan/90/42901.html
版权声明:本站内容均来自互联网,仅供演示用,请勿用于商业和其他非法用途。如果侵犯了您的权益请与我们联系,我们将在24小时内删除。
留言与评论(共有 0 条评论) |