elasticsearch-py，一个超实用的Python库

最新推荐文章于 2025-03-29 15:30:58 发布

黑马聊AI

最新推荐文章于 2025-03-29 15:30:58 发布

阅读量1.7k

点赞数 31

分类专栏： Python编程文章标签： elasticsearch python jenkins

本文链接：https://blog.csdn.net/2401_83617404/article/details/141613072

版权

Python编程专栏收录该内容

73 篇文章

订阅专栏

elasticsearch-py 是一个用于与 Elasticsearch 交互的 Python 客户端库，它允许开发者方便地在 Python 应用程序中实现与 Elasticsearch 集群的通信和数据操作。

elasticsearch-py的功能特性

高性能：提供了高效的异步操作，提高数据操作的速度。
易用性：简化了与 Elasticsearch 的交互，使得操作更加直观。
灵活性：支持多种数据格式和复杂查询。
稳定性：经过大量测试，确保库的稳定性和可靠性。
社区支持：拥有活跃的社区，不断更新和改进。

如何安装elasticsearch-py

首先，要使用 elasticsearch-py 库，您需要通过 Python 的包管理工具 pip 进行安装。在命令行中执行以下命令：

pip install elasticsearch

安装完成后，您可以在 Python 代码中通过以下方式引入 elasticsearch-py 库：

from elasticsearch import Elasticsearch

这样，您就可以开始使用 elasticsearch-py 库来与 Elasticsearch 进行交互了。

elasticsearch-py的基本功能

elasticsearch-py 是一个用于与 Elasticsearch 交互的 Python 客户端库，它允许程序员轻松地在 Python 应用程序中集成 Elasticsearch 的强大功能。

索引文档

使用 elasticsearch-py 可以轻松地索引文档。以下是一个简单的例子：

from elasticsearch import Elasticsearch

# 连接到 Elasticsearch 服务
es = Elasticsearch()

# 索引一个文档
doc = {
    'author': 'Author Name',
    'title': 'A title',
    'content': 'A content'
}
response = es.index(index="test-index", id=1, body=doc)
print(response)

检索文档

检索已经索引的文档同样简单：

# 检索一个文档
get_response = es.get(index="test-index", id=1)
print(get_response['_source'])

更新文档

更新文档可以通过替换或部分更新来实现：

# 部分更新文档
update_response = es.update(index="test-index", id=1, body={"doc": {"title": "New Title"}})
print(update_response)

删除文档

删除文档也是通过指定索引和文档 ID 来完成的：

# 删除一个文档
delete_response = es.delete(index="test-index", id=1)
print(delete_response)

批量操作

elasticsearch-py 支持批量操作，可以同时进行索引、更新和删除：

# 批量操作
actions = [
    {"_index": "test-index", "_id": 1, "_source": {"title": "A title", "content": "A content"}},
    {"_update": {"_index": "test-index", "_id": 1, "doc": {"title": "Updated title"}}},
    {"_delete": {"_index": "test-index", "_id": 1}}
]
helper = es bulk=actions)
print(helper)

搜索文档

搜索是 elasticsearch-py 的核心功能之一，以下是一个简单的搜索示例：

# 搜索文档
search_response = es.search(index="test-index", body={"query": {"match_all": {}}})
print(search_response['hits']['hits'])

通过这些基本功能，开发者可以轻松地在 Python 应用中集成 Elasticsearch，实现高效的数据检索和管理。

elasticsearch-py的高级功能

###-scroll 滚动查询

使用elasticsearch-py进行大数据量的查询时，可以利用-scroll参数实现滚动查询，避免一次性加载过多数据。

from elasticsearch import Elasticsearch

# 创建 Elasticsearch 客户端
es = Elasticsearch()

# 定义查询和滚动时间
query = {"match_all": {}}
scroll_time = '1m'

# 执行搜索并保存滚动上下文
response = es.search(index="my_index", body=query, scroll=scroll_time)

# 获取首次搜索的结果
scroll_id = response['_scroll_id']
data = response['hits']['hits']

# 循环获取剩余结果
while True:
    response = es.scroll(scroll_id=scroll_id, scroll=scroll_time)
    if not response['hits']['hits']:
        break
    data.extend(response['hits']['hits'])

过滤器缓存

使用过滤器缓存可以提高查询效率，尤其是对于重复的查询请求。

from elasticsearch import Elasticsearch

# 创建 Elasticsearch 客户端
es = Elasticsearch()

# 定义查询和过滤器
query = {
    "bool": {
        "filter": [
            {"term": {"author": "John"}},
            {"term": {"tags": "python"}}
        ]
    }
}

# 执行搜索
response = es.search(index="my_index", body={"query": query})

# 查看结果
print(response['hits']['hits'])

映射和设置

通过elasticsearch-py，可以方便地管理索引的映射和设置。

from elasticsearch import Elasticsearch

# 创建 Elasticsearch 客户端
es = Elasticsearch()

# 定义索引的映射和设置
settings = {
    "settings": {
        "number_of_shards": 1,
        "number_of_replicas": 0
    },
    "mappings": {
        "properties": {
            "name": {"type": "text"},
            "age": {"type": "integer"}
        }
    }
}

# 创建索引
es.indices.create(index="my_index", body=settings)

批量操作

elasticsearch-py支持批量操作，包括批量插入、更新和删除。

from elasticsearch import Elasticsearch, helpers

# 创建 Elasticsearch 客户端
es = Elasticsearch()

# 准备批量操作数据
actions = [
    {"_index": "my_index", "_source": {"name": "Alice", "age": 30}},
    {"_index": "my_index", "_source": {"name": "Bob", "age": 25}},
    # 更新操作
    {"_op_type": "update", "_index": "my_index", "_id": "1", "_source": {"doc": {"age": 32}}},
    # 删除操作
    {"_op_type": "delete", "_index": "my_index", "_id": "2"}
]

# 执行批量操作
helpers.bulk(es, actions)

聚合查询

聚合查询是elasticsearch-py的高级功能之一，可以用来进行数据分析和统计。

from elasticsearch import Elasticsearch

# 创建 Elasticsearch 客户端
es = Elasticsearch()

# 定义聚合查询
query = {
    "aggs": {
        "group_by_age": {
            "terms": {"field": "age"}
        }
    }
}

# 执行搜索
response = es.search(index="my_index", body={"query": {"match_all": {}}, "aggs": query})

# 查看结果
print(response['aggregations']['group_by_age']['buckets'])

实时搜索

elasticsearch-py支持实时搜索，可以获取最新的数据。

from elasticsearch import Elasticsearch

# 创建 Elasticsearch 客户端
es = Elasticsearch()

# 定义实时搜索查询
query = {
    "query": {"match_all": {}},
    "sort": [{"_id": {"order": "asc"}}]
}

# 执行搜索
response = es.search(index="my_index", body=query, request_timeout=10)

# 查看结果
print(response['hits']['hits'])

elasticsearch-py的实际应用场景

在实际开发中，elasticsearch-py库可以帮助我们高效地与Elasticsearch集群交互。以下是一些常见的应用场景：

数据搜索与检索

在程序中实现对Elasticsearch的搜索请求，快速检索数据。

from elasticsearch import Elasticsearch

# 创建Elasticsearch客户端
es = Elasticsearch()

# 执行搜索请求
response = es.search(index="my_index", body={
    "query": {
        "match": {
            "title": "Elasticsearch"
        }
    }
})

# 打印搜索结果
for doc in response['hits']['hits']:
    print(doc['_source'])

数据索引与批量操作

将数据索引到Elasticsearch中，支持批量操作。

# 索引单个文档
doc = {
    'author': 'kimchy',
    'title': 'Elasticsearch: cool tool',
    'content': 'It is a distributed, RESTful search and analytics engine capable of solving a growing number of use cases.'
}
es.index(index="my_index", id=1, body=doc)

# 批量索引文档
docs = [
    {'_index': 'my_index', '_id': 2, '_source': {'title': 'Elasticsearch: cool tool', 'content': '...'}},
    {'_index': 'my_index', '_id': 3, '_source': {'title': 'Python: powerful language', 'content': '...'}}
]
es.bulk(index="my_index", body=docs)

数据更新与删除

在Elasticsearch中更新或删除特定文档。

# 更新文档
update_response = es.update(index="my_index", id=1, body={
    "doc": {
        "title": "Elasticsearch: the tool to use",
        "content": "It is the most popular tool for search and analytics."
    }
})

# 删除文档
delete_response = es.delete(index="my_index", id=1)

数据监控与性能分析

监控Elasticsearch集群的性能，分析查询的执行时间和资源消耗。

# 获取集群健康信息
health_response = es.cluster.health()

# 获取节点统计信息
stats_response = es.nodes.stats()

# 获取查询性能分析
profile_response = es.search(index="my_index", body={
    "query": {
        "match": {
            "title": "Elasticsearch"
        }
    },
    "profile": True
})

实时分析

对实时数据进行统计分析，支持聚合查询。

# 执行聚合查询
agg_response = es.search(index="my_index", body={
    "size": 0,
    "aggs": {
        "popular_authors": {
            "terms": {
                "field": "author",
                "size": 10
            }
        }
    }
})

# 打印聚合结果
for bucket in agg_response['aggregations']['popular_authors']['buckets']:
    print(bucket['key'], bucket['doc_count'])

日志管理与监控

使用Elasticsearch作为日志管理系统，收集和分析日志数据。

# 索引日志数据
log_data = {
    "timestamp": "2021-12-01T12:00:00",
    "level": "INFO",
    "message": "This is a log message"
}
es.index(index="my_logs", body=log_data)

# 查询日志数据
log_query_response = es.search(index="my_logs", body={
    "query": {
        "range": {
            "timestamp": {
                "gte": "now-1d/d",
                "lte": "now/d"
            }
        }
    }
})

总结

elasticsearch-py 提供了丰富的方法和特性来操作 Elasticsearch，使得 Python 程序员能够更加高效地进行数据索引和搜索。从基本的索引、搜索、更新、删除操作，到高级的映射、聚合、筛选等功能，它都表现得游刃有余。通过实际应用场景的展示，我们可以看到 elasticsearch-py 在日志分析、全文检索、数据分析等领域的强大作用。掌握 elasticsearch-py，将大大拓宽我们在数据处理和检索方面的能力。

编程、AI、副业交流：https://t.zsxq.com/19zcqaJ2b
领【150 道精选 Java 高频面试题】请 go 公众号：码路向前。