ES总结(三)停词,同义词配置

更新时间:2023-05-18 21:03:20 阅读: 评论:0

ES总结(三)停词,同义词配置
⽬录
前⾔
上次总结了es的搜索匹配算法等等,这次总结⼀下在es搜索的时候是将词语进⾏分词,那么我们如何让es更好的分词?
搜索
停词配置
设置了之后不会出现类型的词语
⼀般采⽤ik插件进⾏分词,具体百度,就是下载下来之后解压到es plugins/ik⽬录下
看下IKAnalyzer.cfg
<?xml version="1.0" encoding="UTF-8"?>
<!DOCTYPE properties SYSTEM "/dtd/properties.dtd">
<properties>
海虹煮多久
<comment>IK Analyzer 扩展配置</comment>
<!--⽤户可以在这⾥配置⾃⼰的扩展字典 -->
<entry key="ext_dict"></entry>
<!--⽤户可以在这⾥配置⾃⼰的扩展停⽌词字典-->
<entry key="ext_stopwords"></entry>
<!--⽤户可以在这⾥配置远程扩展字典 -->
<!-- <entry key="remote_ext_dict">words_location</entry> -->
<!--⽤户可以在这⾥配置远程扩展停⽌词字典-->
<!-- <entry key="remote_ext_stopwords">words_location</entry> -->
</properties>
在上⾯配置即可,但是如果要更新的⽂本的话,需要重启es。所以⽹上有些就去改源码,进⾏数据库动态配置,
同义词
这个很有意义,⽐如我搜索美妆,可能需要关联粉底液,等等⼀系列东西,可以使⽤同义词去配置,具体也可以看上⾯链接。
按照⾥⾯步骤⾛下来即可,注意点:版本问题:es我使⽤5.6.15,上⾯插件采⽤最新的,可以使⽤,然后就是插件的配置了,这个根据具体情况去配置
plugin-descriptor.properties
# 'description': simple summary of the plugin
description=ElasticSearch Plugin for Dynaic Synonym Token Filter.
#
# 'version': plugin's version
version=5.2.2
#
# 'name': the plugin name
name=dynamic-synonym
### mandatory elements for site plugins:
#
# 'site': t to true to indicate contents of the _site/
#  directory in the root of the plugin should be rved.
site=true
跳远怎么练
#
### mandatory elements for jvm plugins :
#
# 'jvm': true if the 'classname' class should be loaded
#  from jar files in the root directory of the plugin.
#  Note that only jar files in the root directory are
#  added to the classpath for the plugin! If you need
#  other resources, package them into a resources jar.
jvm=true
#
# 'classname': the name of the class to load, fully-qualified.
classname=com.ginobefunny.elasticarch.plugins.synonym.DynamicSynonymPlugin #
# 'java.version' version of java the code is built against
# u the system property java.specification.version
# version string must be a quence of nonnegative decimal integers
# parated by "."'s and may have leading zeros
java.version=1.8
#
# 'elasticarch.version' version of elasticarch compiled against
# You will have to relea a new version of the plugin for each new
# elasticarch relea. This version is checked when the plugin
# is loaded so Elasticarch will refu to start in the prence of
# plugins with the incorrect elasticarch.version.
elasticarch.version=5.6.15
#
### deprecated elements for jvm plugins :
#
# 'isolated': true if the plugin should have its own classloader.
怀孕胎动
油炸汤圆怎么做# passing fal is deprecated, and only intended to support plugins
# that have hard dependencies against each other. If this is
# not specified, then the plugin is isolated by default.
isolated=true
#
1.我采⽤postman去创建index
{
"ttings": {
"analysis": {
"analyzer": {
"analyzer_with_dynamic_synonym": {
"type": "custom",
"tokenizer": "whitespace",
"filter": [
"my_synonym"
]
农村信息化}
},
"filter": {
"my_synonym": {
"type": "dynamic-synonym",
"expand": true,
"ignore_ca": true,
"tokenizer": "whitespace",
"db_url": "jdbc:mysql://localhost:3306/test?ur=root&password=ys123456&uUnicode=true&characterEncoding=UTF8"                }
}
}
}
}
⾥⾯的数据库链接⾃⼰配置
2.创建mapping
POST /index_synonym/product/_mapping
{
"product": {
"properties": {
"productName": {
"type": "text",
"analyzer": "analyzer_with_dynamic_synonym"            }
}
}
}
3.查询搜索同义词是否成功
搜索结果
{
"took": 5,
"timed_out": fal,
"_shards": {
"total": 5,
"successful": 5,
"skipped": 0,
"failed": 0
},
"hits": {
"total": 2,
"max_score": 0.9547894,
"hits": [
{
"_index": "index_synonym",
我爱冬季
"_type": "product",
身边的榜样演讲稿"_id": "4",
"_score": 0.9547894,
"_source": {
"productName": "This is a adidas sports jacket"
}
},
{
"_index": "index_synonym",
"_type": "product",
"_id": "3",
"_score": 0.2824934,
"_source": {
"productName": "This is a adidas shoes"
}
}
]
}
}
可以看到同义词起作⽤了。
如何实现动态配置同义词呢?
⽹上很多也是使⽤txt去配置的,这⾥也需要改源码,,不愧是⼤佬呀。中正神社
在es⾼版本的话不⽀持⼿动去配置,只能api去请求,就是createIndex(xx.class,tting)后⾯是tting将上⾯的数据转成map进⾏注⼊意义
可以让搜索更加准确,不然只能上算法跟机器学习了,⽐如我搜索眼影,会附带眼影盘这些出来。同义词的出现可以让我们搜索更加精确

本文发布于:2023-05-18 21:03:20,感谢您对本站的认可!

本文链接:https://www.wtabcd.cn/fanwen/fan/89/914195.html

版权声明:本站内容均来自互联网,仅供演示用,请勿用于商业和其他非法用途。如果侵犯了您的权益请与我们联系,我们将在24小时内删除。

标签:配置   搜索   同义词   扩展   出现
相关文章
留言与评论(共有 0 条评论)
   
验证码:
推荐文章
排行榜
Copyright ©2019-2022 Comsenz Inc.Powered by © 专利检索| 网站地图