ChatGPT 使用嵌入的语义搜索
我们可以通过简单地嵌入我们的搜索查询,然后找到最相似的评论,以非常有效的方式以非常低的成本在语义上搜索所有评论。 数据集在 Obtain_dataset Notebook 中创建。
import pandas as pd import numpy as np datafile_path = "data/fine_food_reviews_with_embeddings_1k.csv" df = pd.read_csv(datafile_path) df["embedding"] = df.embedding.apply(eval).apply(np.array)
请记住对文档(在本例中为评论)使用文档嵌入引擎,对查询使用查询嵌入引擎。 请注意,这里我们只是比较查询嵌入和文档嵌入的余弦相似度,并显示 top_n 个最佳匹配项。
from openai.embeddings_utils import get_embedding, cosine_similarity # search through the reviews for a specific product def search_reviews(df, product_description, n=3, pprint=True): product_embedding = get_embedding( product_description, engine="text-embedding-ada-002" ) df["similarity"] = df.embedding.apply(lambda x: cosine_similarity(x, product_embedding)) results = ( df.sort_values("similarity", ascending=False) .head(n) .combined.str.replace("Title: ", "") .str.replace("; Content:", ": ") ) if pprint: for r in results: print(r[:200]) print() return results results = search_reviews(df, "delicious beans", n=3)
Good Buy: I liked the beans. They were vacuum sealed, plump and moist. Would recommend them for any use. I personally split and stuck them in some vodka to make vanilla extract. Yum! Jamaican Blue beans: Excellent coffee bean for roasting. Our family just purchased another 5 pounds for more roasting. Plenty of flavor and mild on acidity when roasted to a dark brown bean and befor Delicious!: I enjoy this white beans seasoning, it gives a rich flavor to the beans I just love it, my mother in law didnt know about this Zatarains brand and now she is traying different seasoning
results = search_reviews(df, "whole wheat pasta", n=3)
Tasty and Quick Pasta: Barilla Whole Grain Fusilli with Vegetable Marinara is tasty and has an excellent chunky vegetable marinara. I just wish there was more of it. If you arent starving or on a sooo good: tastes so good. Worth the money. My boyfriend hates wheat pasta and LOVES this. cooks fast tastes great.I love this brand and started buying more of their pastas. Bulk is best. Handy: Love the idea of ready in a minute pasta and for that alone this product gets praise. The pasta is whole grain so thats a big plus and it actually comes out al dente. The vegetable marinara
我们可以轻松地搜索这些评论。 为了加快计算速度,我们可以使用一种特殊的算法,旨在通过嵌入进行更快的搜索。
results = search_reviews(df, "bad delivery", n=1)
great product, poor delivery: The coffee is excellent and I am a repeat buyer. Problem this time was with the UPS delivery. They left the box in front of my garage door in the middle of the drivewa
正如我们所见,这可以立即带来很多价值。 在这个例子中,我们展示了能够快速找到交付失败的例子。
results = search_reviews(df, "spoilt", n=1)
Extremely dissapointed: Hi,<br />I am very disappointed with the past shipment I received of the ONE coconut water. 3 of the boxes were leaking and the coconut water was spoiled.<br /><br />Thanks.<b
results = search_reviews(df, "pet food", n=2)
Good food: The only dry food my queen cat will eat. Helps prevent hair balls. Good packaging. Arrives promptly. Recommended by a friend who sells pet food. The cats like it: My 7 cats like this food but it is a little yucky for the human. Pieces of mackerel swimming in a dark broth. It is billed as a "complete" food and contains carrots, peas and pasta.
下一篇:
亚马逊S3Client实现上传下载功能