首页 > 美文阅读

bert+es7实现相似度搜索（待测试与更新bert中文预处理模型）

更新时间:2023-05-30 10:05:38 阅读：评论：0

bert+es7实现相似度搜索（待测试与更新bert中⽂预处理模型）

version: '3.7'

rvices:

猪八戒图片大全# web:

# build: ./web

# ports:

运动鞋网店# - "5000:5000"

# environment:

# - INDEX_NAME

# depends_on:

# - elasticarch

# - bertrving

# deploy:

# resources:

# limits:

# memory: 500M

elasticarch:

image: /elasticarch/elasticarch:7.7.1

ports:

-"9200:9200"

volumes:

- es-data:/usr/share/elasticarch/data

tty: true

environment:

deploy:

resources:

limits:

memory: 1G

bertrving:

build: ./bertrving

ports:

-"5555:5555"

"5556:5556"

environment:

- PATH_MODEL=${PATH_MODEL}

volumes:

-"${PATH_MODEL}:/model"

deploy:

resources:

limits:

儿童疫苗接种memory: 8G #bert-rvice运⾏需要⾼内存占⽤

volumes:

es-data:

driver: local

export PATH_MODEL=./cad_L-12_H-768_A-12

bert模型路径

2.创建es index mapping

在kibana运⾏：

PUT /quotes

{

"ttings": {

"index": {"number_of_shards": "1","number_of_replicas": "0"}

"mappings" : {

"properties": {

"quote" : {"type": "text"},

"vector" : {"type": "den_vector","dims" : 768 }

}

此处创建了⼀个名为quotes的index，含有quote与vector两个字段

3.bert处理数据并导⼊index

from elasticarch import Elasticarch

from elasticarch.helpers import bulk

import numpy as np

三角形的面积怎么算

from bert_rving.client import BertClient

bc = BertClient()

es = Elasticarch([{'host':'localhost','port':9200}])

def getQuotes():

f = open('/Urs/linxier/Downloads/','r')

for line in f:

quote = line.strip().lower()

print(quote)

if(len(quote.split())): # 510 IS THE MAX卖米原文

vector = bc.encode()[0].tolist()

yield {

"quote" : quote,

"vector" : vector

}

bulk(client=es, actions = getQuotes(), index="quotes",chunk_size=1000, request_timeout = 120) 4.进⾏相似度搜索

from bert_rving.client import BertClient

bc = BertClient()

from elasticarch import Elasticarch

client = Elasticarch([{'host': 'localhost','port': 9200}])

def findRelevantHits(inQuiry):

父爱如山的唯美句子

inQuiry_vector = bc.encode()[0].tolist()

queries = {

'bert': {

"script_score": {

"query": {

"match_all": {}

"script": {

"source": "cosineSimilarity(params.inQuiry_vector, doc['vector']) + 1.0",

"params": {

"inQuiry_vector": inQuiry_vector

}

'mlt': {

"more_like_this": {

"fields": ["quote"],

"like": inQuiry,

"min_term_freq": 1,

"max_query_terms": 50,

"min_doc_freq": 1

}

result = {'bert': [],'mlt': []}

休戚与共什么意思

电脑连接电视hdmifor metric, query in queries.items():

body = {"query": query,"size": 10,"_source": ["quote"]}

respon = client.arch(index='quotes', body=body)

result = [a['_source']['quote']for a in respon['hits']['hits']]

return result

inQuiry = "could i help you"

result = findRelevantHits(inQuiry.strip().lower())

print(result)

使⽤cosineSimilarity进⾏相似度距离计算，⽐较基于bert与基于es⾃带mlt⽅法的准确度。结果：

{‘bert’: [‘can i help?’, ‘could you take a picture for me?’, ‘the telephone is ringing, would you answer it, plea?’, ‘do you have some change?’, ‘i hope you have a good time on your trip.’, ‘i’d like a bowl of tamoto soup, plea.’, ‘what would you like to eat?’, ‘intelligent life on other planets? i’m not even sure there is on

earth!’, ‘if we can only encounter each other rather than stay with each other,then i wish we had never encountered.’, ‘i would like weeping with the smile rather than repenting with the cry,when my heart is broken ,is it needed to fix?’],

‘mlt’: [‘can i help?’, ‘could you take a picture for me?’, ‘i hope you have a good time on your trip.’, “you will have it if it belongs to you,whereas you don’t kvetch for it if it doesn’t appear in your life.”, ‘do you have some change?’, ‘what would you like to eat?’, ‘if we can only encounter each other rather than stay with each other,then i wish we had never encountered.’, ‘i would like weeping with the smile rather than repenting with the cry,when my heart is broken ,is it needed to fix?’, ‘the telephone is ringing, would you answer it, plea?’, ‘you have your

choice of three flavors of ice cream.’]}

本文发布于:2023-05-30 10:05:38，感谢您对本站的认可！

本文链接：https://www.wtabcd.cn/fanwen/fan/82/812763.html

上一篇：证券公司年终工作总结

下一篇：大三实习自我鉴定

标签：相似模型搜索疫苗父爱连接猪八戒

留言与评论（共有 0 条评论）