官网：https://www.elastic.co/cn/

filebeat：部署在具体的业务机器上，通过定时监控的方式获取增量的日志，并转发到kafka消息系统暂存。
kafka: 以高吞吐量的特征，作为一个消息系统的角色，接受从filebeat收集转发过来的日志，通常以集群的形式提供服务。
logstash: logstash从kafka获取日志，并通过input-filter-output三个阶段的处理，更改或过滤日志。最终输出我们感兴趣的数据，通常，根据kafka集群上分区（partition）的数量，1:1确定logstash实例的数量，组成consumer group进行日志消费
elasticsearch: 最后 elasticsearch 存储并索引logstash转发过来的数据，并通过kibana 查询和可视化展示，达到实时分析日志的目的。

Elastcisearch 是分布式的文档存储。它能存储和检索复杂的数据结构--序列化成为JSON文档--以实时的方式。
换句话说，一旦一个文档被存储在 Elasticsearch 中，它就是可以被集群中的任意节点检索到。

elasticesarch/kibana还可以通过安装x-pack插件实现扩展功能，比如监控elasticesarch 集群状态、数据访问授权等。

[root@b45 ~]# curl -XGET http://192.168.241.46:9200/_count?pretty
{
"count" : 1129757827,
"_shards" : {
"total" : 53,
"successful" : 53,
"failed" : 0
}
}

配置文件

/usr/local/elasticsearch-5.1.1/config/elasticsearch.yml

使用elasticdump工具可以实现数据以reindex的方式进行数据的备份与转移:

1、#index->index:
elasticdump --input=http://localhost:9200/test --output=http://localhost:9200/test

2、#index->file:
elasticdump --input=http://localhost:9200/test --output=./json.log

例子：
[root@b37 bin]# /maixun/tools/node-v0.10.32-linux-x64/bin/elasticdump --input=http://192.168.241.37:9200/community2
--output=$ |gzip > /disk2/backup_es_18.4.8/community2.json.gz

3、#file->index:
elasticdump --input=./json.log --output=http://localhost:9200/test

检查集群健康

[root@b46 community2]# curl -XGET http://192.168.241.46:9200/_cluster/health?pretty
{
"cluster_name" : "elasticsearch",
"status" : "green",
"timed_out" : false,
"number_of_nodes" : 9,
"number_of_data_nodes" : 6,
"active_primary_shards" : 53,
"active_shards" : 71,
"relocating_shards" : 0,
"initializing_shards" : 0,
"unassigned_shards" : 0,
"delayed_unassigned_shards" : 0,
"number_of_pending_tasks" : 0,
"number_of_in_flight_fetch" : 0,
"task_max_waiting_in_queue_millis" : 0,
"active_shards_percent_as_number" : 100.0
}

status

green :所有主分片和副本分片都在正常运行
yellow: 所有的主分片都在正常运行，但不是所有的副本分片都在正常运行
red：有主分片没能正常运行

文档元数据
一个文档不仅仅包含它的数据，也包含元数据。三个必须的元数据元素如下：
_index: 文档在哪存放
_type: 文档表示的对象类别
_id: 文档唯一标识

GET ：获取
PUT ：插入
POST：更新
DELETE：删除
-H 指定文档类型
-d 指定内容

插入一条记录

[root@linux-node3 index]# curl -XPUT 'http://192.168.56.13:9200/bbs/1/index' -d '{

"user": "xu",
"post_date": "2018-02-08",
"message": "this is test"
}'
{"_index":"bbs","_type":"1","_id":"index","_version":1,"_shards":{"total":2,"successful":1,"failed":0},"created":true}
[root@linux-node3 index]# echo $?
0

/bbs/1/index
三级目录分布对应了_index,_type,_id,实际上es存放的所有记录都只能通过三级目录的方式找到，不能多也不能少。
_id字段可以是数字也可以是字符串。-d 后面跟上要插入的json格式的记录。
_XPUT 表明插入一条数据，es中叫创建一个索引。es返回结果中，一个_version字段，表明了当前记录的版本号，当这个索引重新put一条记录时，版本号自动加一。

删除数据

[root@linux-node3 indices]# curl -XDELETE 'http://localhost:9200/blogs/0/index'
{"found":true,"_index":"blogs","_type":"0","_id":"index","_version":2,"_shards":{"total":2,"successful":1,"failed":0}}
[root@linux-node3 indices]#

查询

[root@linux-node3 indices]# curl -XGET 'http://localhost:9200/bbs/1/index'
{"_index":"bbs","_type":"1","_id":"index","_version":1,"found":true,"_source":{
"user": "xu",
"post_date": "2018-02-08",
"message": "this is test"
}}[root@linux-node3 indices]#

_source 字段时查询到的记录，查询的时候，可以将_type 设置成为all,es就会返回在_index下所有type中，第一个匹配_id的记录

指定条件删除数据

[root@linux-node3 bin]# curl -XGET 'http://192.168.56.13:9200/bbs/1/index/_source?pretty'
{
"user" : "liu",
"post_date" : "2017-02-08",
"message" : "test"
}

[root@linux-node3 bin]# curl -XDELETE 'http://localhost:9200/bbs/1/_query?pretty' -d '{

"query":{
"match" :{
"message" : "test"
}
}
}'
{
"took" : 204,
"timed_out" : false,
"_indices" : {
"_all" : {
"found" : 1,
"deleted" : 1,
"missing" : 0,
"failed" : 0
},
"bbs" : {
"found" : 1,
"deleted" : 1,
"missing" : 0,
"failed" : 0
}
},
"failures" : [ ]
}

[root@linux-node3 bin]# curl -XGET 'http://192.168.56.13:9200/bbs/1/index/_source?pretty'

[root@linux-node3 bin]#

查询20170101以前的数据

curl -XGET http://192.168.241.36:9200/_search?pretty -d '{
"query" : {
"range" : {
"post_time" : {
"lt" : "2017-01-01 00:00:00"
}
}
}
}'

查询

1、match_all
{"match_all":{}}
表示取出所有documents，在与filter结合使用时，会经常使用match_all。

2、 match
一般在全文检索时使用，首先利用analyzer 对具体查询字符串进行分析，然后进行查询；如果是在数值型字段、日期类型字段、布尔字段或not_analyzed 的字符串上进行查询时，不对查询字符串进行分析，表示精确匹配，两个简单的例子如：

{ "match": { "tweet": "About Search" }}
{ "match": { "age": 26 }}
3、 term
term 用于精确查找，可用于数值、date、boolean值或not_analyzed string，当使用term时，不会对查询字符串进行分析，进行的是精确查找。

{ "term": { "date": "2014-09-01" }}
4、 terms
terms 和 term 类似，但是，terms 里可以指定多个值，只要doc满足terms 里的任意值，就是满足查询条件的。与term 相同，terms 也是用于精确查找。

{ "terms": { "tag": [ "search", "full_text", "nosql" ] }}
注意，terms 表示的是contains 关系，而不是 equals关系。

5、 range
类比数据库查找的范围查找，举个简单的例子：

{
"range": {
"age": {
"gte": 20,
"lt": 30
}
}
}
操作符可以是：

gt：大于
gte：大于等于
lt：小于
lte：小于等于

elk 计算集群中文档的数量 配置文件 使用elasticdump工具可以实现数据以reindex的方式进行数据的备份与转移: 检查集群健康 status 插入一条记录 删除数据 查询 指定条件删除数据 查询20170101以前的数据 查询