bert+es7实现相似度搜索(待测试与更新bert中⽂预处理模型)
version: '3.7'
rvices:
猪八戒图片大全# web:
# build: ./web
# ports:
运动鞋网店# - "5000:5000"
# environment:
# - INDEX_NAME
# depends_on:
# - elasticarch
# - bertrving
# deploy:
# resources:
# limits:
# memory: 500M
elasticarch:
image: /elasticarch/elasticarch:7.7.1
ports:
-"9200:9200"
volumes:
- es-data:/usr/share/elasticarch/data
tty: true
environment:
deploy:
resources:
limits:
memory: 1G
bertrving:
build: ./bertrving
ports:
-"5555:5555"
-
"5556:5556"
environment:
- PATH_MODEL=${PATH_MODEL}
volumes:
-"${PATH_MODEL}:/model"
deploy:
resources:
limits:
儿童疫苗接种memory: 8G #bert-rvice运⾏需要⾼内存占⽤
volumes:
es-data:
driver: local
export PATH_MODEL=./cad_L-12_H-768_A-12
bert模型路径
2.创建es index mapping
在kibana运⾏:
PUT /quotes
{
"ttings": {
"index": {"number_of_shards": "1","number_of_replicas": "0"}
},
"mappings" : {
"properties": {
"quote" : {"type": "text"},
"vector" : {"type": "den_vector","dims" : 768 }
}
}
}
此处创建了⼀个名为quotes的index,含有quote与vector两个字段
3.bert处理数据并导⼊index
from elasticarch import Elasticarch
from elasticarch.helpers import bulk
import numpy as np
三角形的面积怎么算
from bert_rving.client import BertClient
bc = BertClient()
es = Elasticarch([{'host':'localhost','port':9200}])
def getQuotes():
f = open('/Urs/linxier/Downloads/','r')
for line in f:
quote = line.strip().lower()
print(quote)
if(len(quote.split())): # 510 IS THE MAX卖米原文
vector = bc.encode()[0].tolist()
yield {
"quote" : quote,
"vector" : vector
}
bulk(client=es, actions = getQuotes(), index="quotes",chunk_size=1000, request_timeout = 120) 4.进⾏相似度搜索
from bert_rving.client import BertClient
bc = BertClient()
from elasticarch import Elasticarch
client = Elasticarch([{'host': 'localhost','port': 9200}])
def findRelevantHits(inQuiry):
父爱如山的唯美句子
inQuiry_vector = bc.encode()[0].tolist()
queries = {
'bert': {
"script_score": {
"query": {
"match_all": {}
},
"script": {
"source": "cosineSimilarity(params.inQuiry_vector, doc['vector']) + 1.0",
"params": {
"inQuiry_vector": inQuiry_vector
}
}
}
},
'mlt': {
"more_like_this": {
"fields": ["quote"],
"like": inQuiry,
"min_term_freq": 1,
"max_query_terms": 50,
"min_doc_freq": 1
}
}
}
result = {'bert': [],'mlt': []}
休戚与共什么意思
电脑连接电视hdmifor metric, query in queries.items():
body = {"query": query,"size": 10,"_source": ["quote"]}
respon = client.arch(index='quotes', body=body)
result = [a['_source']['quote']for a in respon['hits']['hits']]
return result
inQuiry = "could i help you"
result = findRelevantHits(inQuiry.strip().lower())
print(result)
使⽤cosineSimilarity进⾏相似度距离计算,⽐较基于bert与基于es⾃带mlt⽅法的准确度。结果:
{‘bert’: [‘can i help?’, ‘could you take a picture for me?’, ‘the telephone is ringing, would you answer it, plea?’, ‘do you have some change?’, ‘i hope you have a good time on your trip.’, ‘i’d like a bowl of tamoto soup, plea.’, ‘what would you like to eat?’, ‘intelligent life on other planets? i’m not even sure there is on
earth!’, ‘if we can only encounter each other rather than stay with each other,then i wish we had never encountered.’, ‘i would like weeping with the smile rather than repenting with the cry,when my heart is broken ,is it needed to fix?’],
‘mlt’: [‘can i help?’, ‘could you take a picture for me?’, ‘i hope you have a good time on your trip.’, “you will have it if it belongs to you,whereas you don’t kvetch for it if it doesn’t appear in your life.”, ‘do you have some change?’, ‘what would you like to eat?’, ‘if we can only encounter each other rather than stay with each other,then i wish we had never encountered.’, ‘i would like weeping with the smile rather than repenting with the cry,when my heart is broken ,is it needed to fix?’, ‘the telephone is ringing, would you answer it, plea?’, ‘you have your
choice of three flavors of ice cream.’]}