版本信息
CORE
VERSION
PORT
ORDER
OTHER
Elasticsearch
7.13.3
9200 9300
1
搜索引擎
Kibana
7.13.3
5601
2
数据可视化
Canal-Admin
1.1.15
8489 11110(Admin) 11111(tcp) 11112(metric)
3
Canal管理页面
Canal-Server
1.1.15
4
Canal数据同步,MySQL同步ES(1.1.16版本有bug,回调至低版本)
Prometheus
2.37.0
9090
5
采集
Grafana
9.0.4
3000
6
可视化
Node_exporter
1.3.1
9100
7
系统监控采集
P.S. 服务器为Linux,本文涉及到的ip地址请自行根据个人情况更改,所有组件部署全为单机部署,大部分为服务方式启动,本文仅简单记录部署过程,仅供参考,切勿完全照搬。
准备工作 必备组件 1 yum -y install nano tar net-tools
JDK 需要大于等于1.8版本
1 2 3 4 [root@eck ~]# java -version openjdk version "1.8.0_332" OpenJDK Runtime Environment (build 1.8.0_332-b09) OpenJDK 64-Bit Server VM (build 25.332-b09, mixed mode)
没有安装的参考下这个(自行替换 JAVA_HOME
部分内容)
1 2 3 4 5 6 7 8 9 10 [root@eck ~] [root@eck ~] JAVA_HOME=/usr/lib/jvm/java-1.8.0-openjdk-1.8.0.342.b07-1.el7_9.x86_64/jre PATH=$PATH :$JAVA_HOME /bin CLASSPATH=.:$JAVA_HOME /lib export JAVA_HOME CLASSPATH PATH[root@eck jre]
关闭防火墙 1 2 3 [root@eck] systemctl disable firewalld [root@eck] systemctl stop firewalld [root@eck] systemctl status firewalld
如果不允许关闭防火墙,需要手动开启防火墙端口
1 2 3 4 5 6 7 8 [root@eck] firewall-cmd --zone=public --add-port=9200/tcp --permanent [root@eck] firewall-cmd --zone=public --add-port=9300/tcp --permanent [root@eck] firewall-cmd --zone=public --add-port=5601/tcp --permanent
如果是云服务器,还需要在云服务器厂商页面开放安全组端口
Linux调优 内存权限 elasticsearch
用户拥有的内存权限太小,至少需要 262144
1 [root@eck] nano /etc/sysctl.conf
修改值大于等于 262144
即可
立即生效
1 2 [root@eck] sysctl -p vm.max_map_count = 262144
最大文件描述符 elasticsearch
用户拥有的最大文件描述符太小,至少需要 65535
1 [root@eck] nano /etc/security/limits.conf
修改值大于等于即可
1 2 3 4 * soft nofile 65536 * hard nofile 131072 * soft nproc 2048 * hard nproc 4096
退出登录Logout,重新登录查看是否生效
端口号冲突检测 1 2 3 4 5 6 # Elasticsearch [root@eck ~] netstat -an |grep :9200 [root@eck ~] netstat -an |grep :9300 # kibana [root@eck ~] netstat -an |grep :5601
Elasticsearch 下载 最新版本:Download Elasticsearch | Elastic
已测试过的版本:Elasticsearch 7.13.3 | Elastic
安装 1 [root@eck] tar -zxvf elasticsearch-7.13.3-linux-x86_64.tar.gz -C /usr/local
指定JDK 由于ES和JDK存在强指定版本关系,我们需要指定ES使用的JDK版本,ES自带了JDK,此处只需要指定使用ES自带的JDK,而非系统环境变量指向的JDK。如果系统未安装JDK,则默认使用ES自带的JDK
1 2 3 4 [root@eck] cd /usr/local [root@eck] mv elasticsearch-7.13.3/ elasticsearch/ [root@eck] cd /usr/local/elasticsearch/bin [root@eck] nano elasticsearch
添加如下内容
1 2 3 4 5 6 7 8 9 10 # # 将jdk修改为es中自带jdk的配置目录 export JAVA_HOME=/usr/local/elasticsearch/jdk export PATH=$JAVA_HOME/bin:$PATH if [ -x "$JAVA_HOME/bin/java" ]; then JAVA="/usr/local/elasticsearch/jdk/bin/java" else JAVA=`which java` fi
保存并退出 esc
:wq
修改内存上限 请根据服务器配置自行修改
1 [root@eck] nano /usr/local/elasticsearch/config/jvm.options
添加如下内容
官方推荐系统内存的一半(因为要预留给Lucene OS 到 File Cache,如果你把所有的内存都分配给Elasticsearch,不留一点给 Lucene,那你的全文检索性能会很差的。),但不要超过32GB(64位,32位要更少,因为JVM采用内存对象指针压缩技术,不然对象指针需要占用很大的内存,而不同 JDK 版本最大边界值是不同,所以推荐不超过32GB)
如果你有一台128GB以上的机器,建议一台机器2个Node
创建用户 P.S. root用户无法启动
1 [root@eck] useradd -s /sbin/nologin -M elastic
创建所属组
1 [root@eck] chown elastic:elastic -R /usr/local/elasticsearch
修改日志及数据卷位置 创建如下文件P.S. 请自行更换至你挂载的硬盘,否则后期空间不够需要自行扩容
1 2 3 4 [root@eck] mkdir -vp /home/elastic/elasticsearch/data [root@eck] mkdir -vp /home/elastic/elasticsearch/logs [root@eck] chown elastic:elastic -R /home/elastic/elasticsearch/data [root@eck] chown elastic:elastic -R /home/elastic/elasticsearch/logs
编辑ES配置文件
1 [root@eck] nano /usr/local/elasticsearch/config/elasticsearch.yml
添加如下内容
1 2 3 4 5 6 7 8 9 10 11 12 path.data: /home/elastic/elasticsearch/data path.logs: /home/elastic/elasticsearch/logs network.host: 192.168 .0 .88 cluster.name: elasticsearch node.name: es-node0 cluster.initial_master_nodes: ["es-node0" ]
启动 服务启动
[user-es@eck] nano /usr/lib/systemd/system/elasticsearch.service
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 [Unit] Description=elasticsearch After=network.target [Service] Type=forking User=elastic Restart=on-failure RestartSec=15s ExecStart=/usr/local/elasticsearch/bin/elasticsearch -d PrivateTmp=true LimitNOFILE=65535 LimitNPROC=65535 LimitAS=infinity LimitFSIZE=infinity TimeoutStopSec=0 KillSignal=SIGTERM KillMode=process SendSIGKILL=no SuccessExitStatus=143 [Install] WantedBy=multi-user.target
1 2 3 4 5 6 7 8 9 10 11 12 13 14 [root@eck elasticsearch] [root@eck elasticsearch] ● elasticsearch.service - elasticsearch Loaded: loaded (/usr/lib/systemd/system/elasticsearch.service; disabled; vendor preset: disabled) Active: active (running) since 二 2022-08-02 17:02:05 CST; 4s ago Process: 9969 ExecStart=/usr/local/elasticsearch/bin/elasticsearch -d (code=exited, status=0/SUCCESS) Main PID: 10146 (java) CGroup: /system.slice/elasticsearch.service └─10146 /usr/local/elasticsearch/jdk/bin/java -Xshare:auto -Des.networkaddress.cache.ttl=60 -Des.networkaddress.cache.negative.ttl=10 -XX:+AlwaysPreTouch -Xss1m -Djava.awt.headless=true -Dfile.encoding=UTF-8... 8月 02 17:02:03 eck systemd[1]: Starting elasticsearch... 8月 02 17:02:05 eck systemd[1]: Started elasticsearch. [root@eck elasticsearch] Created symlink from /etc/systemd/system/multi-user.target.wants/elasticsearch.service to /usr/lib/systemd/system/elasticsearch.service.
验证启动 浏览器访问 http://ip:port
,例如:http://192.168.1.48:9200/
【选读/点击查看】外网访问需要配置Nginx nginx需要监听此端口,升级WebSocketttp协议,调整最大文件描述符,调整心跳
一般而言,ES禁止对外访问,防止机器人扫描漏洞攻击
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 server { listen 9200 ; server_name elastic; open_file_cache max=65535 inactive=20s ; client_max_body_size 256m ; client_header_buffer_size 32k ; large_client_header_buffers 4 32k ; location / { proxy_pass http://127.0.0.1:9200; proxy_http_version 1 .1 ; proxy_set_header Upgrade $http_upgrade ; proxy_set_header Connection "upgrade" ; proxy_set_header X-Real-IP $remote_addr ; } error_page 500 502 503 504 /50x.html; location = /50x.html { root html; } }
修改密码 ES7.x以后的版本将安全认证功能免费开放了,并将X-pack插件集成了到了开源的ElasticSearch版本中,修改密码ES必须先启动
1 [root@eck] nano /usr/local/elasticsearch/config/elasticsearch.yml
末尾处添加如下内容
1 2 3 xpack.security.enabled: true xpack.security.transport.ssl.enabled: true
重启ES
1 [root@eck] systemctl restart elasticsearch
修改密码
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 [user-es@eck] sh /usr/local/elasticsearch/bin/elasticsearch-setup-passwords interactive 123465 Initiating the setup of passwords for reserved users elastic,apm_system,kibana,kibana_system,logstash_system,beats_system,remote_monitoring_user. You will be prompted to enter passwords as the process progresses. Please confirm that you would like to continue [y/N]y Enter password for [elastic]: Reenter password for [elastic]: Enter password for [apm_system]: Reenter password for [apm_system]: Enter password for [kibana_system]: Reenter password for [kibana_system]: Enter password for [logstash_system]: Reenter password for [logstash_system]: Enter password for [beats_system]: Reenter password for [beats_system]: Enter password for [remote_monitoring_user]: Reenter password for [remote_monitoring_user]: Changed password for user [apm_system] Changed password for user [kibana_system] Changed password for user [kibana] Changed password for user [logstash_system] Changed password for user [beats_system] Changed password for user [remote_monitoring_user] Changed password for user [elastic]
再次登录,验证密码是否正确 http://192.168.1.48:9200/
,输入用户名elastic
和密码能正常访问则成功
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 { "name" : "es-node0", "cluster_name" : "elasticsearch", "cluster_uuid" : "IIaI-dwFS1Oa7dSEyNb2xA", "version" : { "number" : "7.13.3", "build_flavor" : "default", "build_type" : "tar", "build_hash" : "5d21bea28db1e89ecc1f66311ebdec9dc3aa7d64", "build_date" : "2021-07-02T12:06:10.804015202Z", "build_snapshot" : false, "lucene_version" : "8.8.2", "minimum_wire_compatibility_version" : "6.8.0", "minimum_index_compatibility_version" : "6.0.0-beta1" }, "tagline" : "You Know, for Search" }
备份 创建仓库 1 2 3 4 5 6 7 8 9 10 mkdir /home/elastic/backupmkdir /home/elastic/backup/esbackupmkdir /home/elastic/backup/stream-backupchown -R elastic:elastic /home/elastic/backupnano /usr/local/elasticsearch/config/elasticsearch.yml path: repo: - /home/elastic/backup/esbackup - /home/elastic/backup/stream-backup
1 2 3 4 5 6 7 8 PUT /_snapshot/my_fs_backup { "type": "fs" , "settings": { "location": "/home/elastic/backup/esbackup/My_fs_backup_location" , "compress": "true" } }
compress:启用压缩
查询仓库 1 2 3 4 5 6 7 8 9 10 11 GET /_snapshot/_all { "my_fs_backup" : { "type" : "fs" , "settings" : { "compress" : "true" , "location" : "/home/elastic/backup/esbackup/My_fs_backup_location" } } }
验证是否生效 通过 verify
验证节点仓库是否在所有节点已生效
1 2 3 4 5 6 7 8 9 POST /_snapshot/my_fs_backup/_verify { "nodes" : { "foxFx1TVQmiP3X4C5OgOsg" : { "name" : "es-node0" } } }
删除仓库 1 DELETE /_snapshot/my_fs_backup
创建快照 一个仓库可以包含多个 Snapshot
,一个 Snapshot
在集群中的名字是唯一的。Snapshot
快照备份的内容仅包含截止快照开始时间之前的数据,快照之后的数据需要通过不断的增量 Snapshot
来捕获。通过PUT请求创建一个 Snapshot
,默认备份集群所有可读索引、流,如果需要部分备份则可以通过传参来指定。
1 2 3 4 5 6 7 8 9 10 11 PUT /_snapshot/my_fs_backup/snapshot_1?wait_for_completion=true { "indices": "hundredsman,index_1,index_2" , "ignore_unavailable": true , "include_global_state": false , "metadata": { "taken_by": "Leopold" , "taken_because": "Leopold init backup" } }
删除快照 1 2 3 4 DELETE /_snapshot/my_fs_backup/snapshot_1 # 删除多个可以用逗号分隔或者通配符 DELETE /_snapshot/my_fs_backup/snapshot_2,snapshot_3 DELETE /_snapshot/my_fs_backup/snap*
如果 Snapshot
正在创建过程中,ElasticSearch
也会终止任务,并删除所有 Snapshot
相关的数据。但要注意不能手动删除仓库里的备份数据,这样会有数据损坏的风险。
监控进度 1 2 3 4 5 6 7 8 9 10 11 12 13 14 # 查看当前Snapshot状态 GET /_snapshot/my_fs_backup/_current # 指定Snapshot查看 GET /_snapshot/my_fs_backup/snapshot_1 GET /_snapshot/my_fs_backup/snapshot_* # 查看所有仓库(如果建了多个仓库的话) GET /_snapshot/_all GET /_snapshot/my_fs_backup,my_hdfs_backup GET /_snapshot/my* # 指定查看某一个Snapshot的进度详情 GET /_snapshot/my_fs_backup/snapshot_1/_status
例如:
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 68 69 70 71 72 73 74 # 指定Snapshot查看 GET /_snapshot/my_fs_backup/snapshot_1 { "snapshots" : [ { "snapshot" : "snapshot_1" , "uuid" : "9gXttlxpS3ao7N3IZdVr9A" , "version_id" : 7130399 , "version" : "7.13.3" , "indices" : [ ".kibana_security_session_1" , ".tasks" , "pt-platform-logs-2022-08-10" , ".kibana-event-log-7.13.3-000001" , ".apm-custom-link" , ".async-search" , ".ds-ilm-history-5-2022.08.02-000001" , ".security-7" , ".kibana_7.13.3_001" , ".apm-agent-configuration" , "i_platform_medical_record_result" , "i_platform_medical_record" , ".kibana_task_manager_7.13.3_001" ] , "data_streams" : [ "ilm-history-5" ] , "include_global_state" : true , "state" : "SUCCESS" , "start_time" : "2022-08-15T02:17:33.167Z" , "start_time_in_millis" : 1660529853167 , "end_time" : "2022-08-15T02:21:57.109Z" , "end_time_in_millis" : 1660530117109 , "duration_in_millis" : 263942 , "failures" : [ ] , "shards" : { "total" : 17 , "failed" : 0 , "successful" : 17 } , "feature_states" : [ { "feature_name" : "security" , "indices" : [ ".security-7" ] } , { "feature_name" : "async_search" , "indices" : [ ".async-search" ] } , { "feature_name" : "kibana" , "indices" : [ ".kibana_task_manager_7.13.3_001" , ".kibana_7.13.3_001" , ".kibana_security_session_1" , ".apm-agent-configuration" , ".apm-custom-link" ] } , { "feature_name" : "tasks" , "indices" : [ ".tasks" ] } ] } ] }
Restore恢复 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 # 不带参数的请求默认恢复所有Snapshot中的索引、流 POST /_snapshot/my_fs_backup/snapshot_1/_restore # 如果需要恢复特定的索引、流,可以在POST参数中指定 POST /_snapshot/my_fs_backup/snapshot_1/_restore { "indices" : "index*" , "ignore_unavailable" : true , # include_global_state默认为true ,是设置集群全局状态 "include_global_state" : false , # 重命名索引匹配规则,如: index_1 "rename_pattern" : "index_(.+)" , # 重命名索引为新的规则,如: re_index_1 "rename_replacement" : "re_index_$1" , "include_aliases" : false } # 如果索引已经存在,会提示已经有同名索引存在,需要重命名。 { "error" : { "root_cause" : [ { "type" : "snapshot_restore_exception" , "reason" : "[my_fs_backup:snapshot_1/90A9o4hORUCv732HTQBfRQ] cannot restore index [index_1] because an open index with same name already exists in the cluster. Either close or delete the existing index or restore the index under a different name by providing a rename pattern and replacement name" } ] } , "status" : 500 }
监控Restore恢复状态 当 Restore
恢复启动后,因为 Restore
在恢复索引的主分片,所以集群状态会变成 yellow
,主分片恢复完成后 Elasticsearch
开始根据副本设置的策略恢复副本数,所有操作完成后集群才会恢复到 green
状态。也可以先把索引的副本数修改为0,待主分片完成后再修改到目标副本数。Restore
恢复状态可以通过监控集群或者指定索引的 Recovery
状态。
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 # 查看集群恢复状态,更多请参考集群恢复监控接口:https://www.elastic.co/guide/en/elasticsearch/reference/current/cat-recovery.html GET /_cat/recovery/ #! this request accesses system indices: [.apm-agent-configuration, .apm-custom-link, .async-search, .kibana_7.13.3_001, .kibana_task_manager_7.13.3_001, .security-7, .tasks], but in a future major version, direct access to system indices will be prevented by default .kibana_7.13.3_001 0 452ms existing_store done n/a n/a 192.168.0.88 es-node0 n/a n/a 0 0 100.0% 30 0 0 100.0% 2267250 0 0 100.0% .security-7 0 423ms existing_store done n/a n/a 192.168.0.88 es-node0 n/a n/a 0 0 100.0% 38 0 0 100.0% 195699 0 0 100.0% .apm-custom-link 0 39ms existing_store done n/a n/a 192.168.0.88 es-node0 n/a n/a 0 0 100.0% 1 0 0 100.0% 208 0 0 100.0% .kibana-event-log-7.13.3-000001 0 76ms existing_store done n/a n/a 192.168.0.88 es-node0 n/a n/a 0 0 100.0% 25 0 0 100.0% 24651 0 0 100.0% .apm-agent-configuration 0 76ms existing_store done n/a n/a 192.168.0.88 es-node0 n/a n/a 0 0 100.0% 1 0 0 100.0% 208 0 0 100.0% pt-platform-logs-2022-08-10 0 147ms snapshot done n/a n/a 192.168.0.88 es-node0 my_fs_backup snapshot_1 10 10 100.0% 10 27456 27456 100.0% 27456 0 0 100.0% .async-search 0 96ms existing_store done n/a n/a 192.168.0.88 es-node0 n/a n/a 0 0 100.0% 4 0 0 100.0% 3481 0 0 100.0% i_platform_medical_record_result 0 644ms existing_store done n/a n/a 192.168.0.88 es-node0 n/a n/a 0 0 100.0% 82 0 0 100.0% 326927787 0 0 100.0% i_platform_medical_record_result 1 313ms existing_store done n/a n/a 192.168.0.88 es-node0 n/a n/a 0 0 100.0% 120 0 0 100.0% 331827725 0 0 100.0% i_platform_medical_record_result 2 273ms existing_store done n/a n/a 192.168.0.88 es-node0 n/a n/a 0 0 100.0% 102 0 0 100.0% 328299966 0 0 100.0% i_platform_medical_record 0 2.2s existing_store done n/a n/a 192.168.0.88 es-node0 n/a n/a 0 0 100.0% 147 0 0 100.0% 3344435623 0 0 100.0% i_platform_medical_record 1 2.2s existing_store done n/a n/a 192.168.0.88 es-node0 n/a n/a 0 0 100.0% 141 0 0 100.0% 3333267188 0 0 100.0% i_platform_medical_record 2 2.2s existing_store done n/a n/a 192.168.0.88 es-node0 n/a n/a 0 0 100.0% 138 0 0 100.0% 3328457583 0 0 100.0% .kibana_task_manager_7.13.3_001 0 108ms existing_store done n/a n/a 192.168.0.88 es-node0 n/a n/a 0 0 100.0% 71 0 0 100.0% 132821 0 0 100.0% .tasks 0 346ms existing_store done n/a n/a 192.168.0.88 es-node0 n/a n/a 0 0 100.0% 24 0 0 100.0% 27283 0 0 100.0%
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 # 查看索引的恢复状态,更多请参考索引恢复监控接口:https: GET /pt-platform-logs*/_recovery { "pt-platform-logs-2022-08-10" : { "shards" : [ { "id" : 0 , "type" : "SNAPSHOT" , "stage" : "DONE" , "primary" : true , "start_time_in_millis" : 1660530572451 , "stop_time_in_millis" : 1660530572598 , "total_time_in_millis" : 147 , "source" : { "repository" : "my_fs_backup" , "snapshot" : "snapshot_1" , "version" : "7.13.3" , "index" : "pt-platform-logs-2022-08-10" , "restoreUUID" : "KZA2lOpRRH2_rKHNPPEUvg" } , "target" : { "id" : "foxFx1TVQmiP3X4C5OgOsg" , "host" : "192.168.0.88" , "transport_address" : "192.168.0.88:9300" , "ip" : "192.168.0.88" , "name" : "es-node0" } , "index" : { "size" : { "total_in_bytes" : 27456 , "reused_in_bytes" : 0 , "recovered_in_bytes" : 27456 , "percent" : "100.0%" } , "files" : { "total" : 10 , "reused" : 0 , "recovered" : 10 , "percent" : "100.0%" } , "total_time_in_millis" : 96 , "source_throttle_time_in_millis" : 0 , "target_throttle_time_in_millis" : 0 } , "translog" : { "recovered" : 0 , "total" : 0 , "percent" : "100.0%" , "total_on_start" : 0 , "total_time_in_millis" : 37 } , "verify_index" : { "check_index_time_in_millis" : 0 , "total_time_in_millis" : 0 } } ] } }
其他 停止ES 1 [root@eck] systemctl stop elasticsearch
端口说明 9300:tcp通讯端口,集群ES节点之间通讯使用
9200:http协议的RESTFUL接口
查看ES日志 1 [user-es@eck] tail -f -n 300 /home/elastic/elasticsearch/logs/elasticsearch.log
禁用内存交换 内存交换到磁盘对服务器性能来说是致命的,如果内存交换到磁盘上,一个100微秒的操作可能变成10毫秒,再想想那么多10微秒的操作时延累加起来。不难看出swapping对于性能是多么可怕。
最好的办法就是在你的操作系统中完全禁用swapping。这样可以暂时禁用:
为了永久禁用它,你可能需要修改/etc/fstab文件,这要参考你的操作系统相关文档。
如果完全禁用swap,对你来说是不可行的。你可以降低swappiness 的值,这个值决定操作系统交换内存的频率。这可以预防正常情况下发生交换。但仍允许os在紧急情况下发生交换。对于大部分Linux操作系统,可以在sysctl 中这样配置:
swappiness设置为1比设置为0要好,因为在一些内核版本,swappness=0会引发OOM
最后,如果上面的方法都不能做到,你需要打开配置文件中的mlockall开关,它的作用就是运行JVM锁住内存,禁止OS交换出去。在elasticsearch.yml配置如下: 以下配置好像报错
1 bootstrap.mlockall: true
迁移 本机只需要修改文件路径并复制文件内容即可
Kibana 下载 最新版本:Download Kibana Free | Get Started Now | Elastic
已测试过的版本:Kibana 7.13.3 | Elastic
注意:Elasticsearch 和 Kibana 版本号需强一致性
安装 1 2 3 4 5 [root@eck] cd /root/zip [root@eck] tar -zxvf kibana-7.13.3-linux-x86_64.tar.gz -C /usr/local [root@eck] mv /usr/local/kibana-7.13.3-linux-x86_64/ /usr/local/kibana/ [root@eck] mkdir -p /var/log/kibana/ [root@eck] chown -R kibana:kibana /var/log/kibana/
配置 1 [root@eck] nano /usr/local/kibana/config/kibana.yml
添加如下内容
1 2 3 4 5 6 7 8 9 server.name: kibana server.host: "0.0.0.0" server.port: 5601 elasticsearch.hosts: [ "http://127.0.0.1:9200" ]monitoring.ui.container.elasticsearch.enabled: true i18n.locale: "zh-CN" elasticsearch.username: "elastic" elasticsearch.password: "test123!@#" logging.dest: /var/log/kibana/kibana.log
如果内存紧张(一般会占用1.4GB/64位),可以调整JVM老年代大小,将可执行文件kibanad的NODE_OPTIONS
中加入--max_old_space_size=200
,数值可以适当调整,然后重新运行即可
1 [root@eck] nano /usr/local/kibana/config/node.options
添加如下内容
1 NODE_OPTIONS="--no-warnings --max-http-header-size=65536 ${NODE_OPTIONS} --max-old-space-size=200"
创建用户 root用户无法启动
创建用户
1 [root@eck] useradd -s /sbin/nologin -M kibana
创建所属组
1 2 [root@eck] chown kibana:kibana -R /usr/local/kibana [root@eck] chown kibana:kibana -R /var/log/kibana
启动 [root@eck] nano /usr/lib/systemd/system/kibana.service
1 2 3 4 5 6 7 8 9 10 11 12 13 14 [Unit] Description=kibana After=network.target [Service] Type=simple User=kibana Restart=on-failure RestartSec=15s ExecStart=/usr/local/kibana/bin/kibana PrivateTmp=true [Install] WantedBy=multi-user.target
1 2 3 systemctl start kibana systemctl status kibana systemctl enable kibana
其他 停止
Canal-Admin 创建用户 1 useradd -s /sbin/nologin -M canal
创建文件夹
1 2 3 mkdir /usr/local/canalmkdir /usr/local/canal/adminmkdir /usr/local/canal/server
配置环境变量 1 2 mkdir /home/canalnano /home/canal/.bashrc
追加如下内容(根据实际位置配置)
1 2 export CANAL_HOME=/usr/local/canal/serverexport PATH=$PATH :$CANAL_HOME /bin
安装 解压
1 tar -zxvf canal.admin-1.1.6.tar.gz -C /usr/local/canal/admin
创建所属组
1 chown canal:canal -R /usr/local/canal/admin
修改配置文件
1 nano /usr/local/canal/admin/conf/application.yml
按需求改为如下配置
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 server: port: 8489 spring: jackson: date-format: yyyy-MM-dd HH:mm:ss time-zone: GMT+8 spring.datasource: address: 192.168.0.89:3306 database: canal_manager username: canal password: canal driver-class-name: com.mysql.cj.jdbc.Driver url: jdbc:mysql://${spring.datasource.address}/${spring.datasource.database}?useUnicode=true&characterEncoding=UTF-8&useSSL=false&serverTimezone=Asia/Shanghai hikari: maximum-pool-size: 30 minimum-idle: 1 canal: adminUser: admin adminPasswd: admin
上传MySQL对应jar包 mysql-connector-java-8.0.22.jar
至 /usr/local/canal/admin/lib
目录下,删除mysql-connector-java-5.1.48.jar
MySQL 修改MySQL模式,配置 my.cnf
1 2 3 4 [mysqld] log-bin=mysql-bin #添加这一行就 ok binlog-format=ROW #选择 row 模式 server_id=1 #配置 mysql replaction 需要定义,不能和canal的slaveId重复
重启后查看变更结果
1 SQL > show variables like 'binlog_format%' ;
创建数据库用户
1 2 3 4 5 6 7 8 9 10 SQL > CREATE USER canal IDENTIFIED BY 'canal' ;SQL > GRANT SELECT , REPLICATION SLAVE, REPLICATION CLIENT ON * .* TO 'canal' @'%' ;SQL > grant select ,insert ,update ,delete ,create ,alter on * .* to 'canal' ;SQL > FLUSH PRIVILEGES;ALTER USER 'canal' @'%' IDENTIFIED BY 'canal' PASSWORD EXPIRE NEVER; ALTER USER 'canal' @'%' IDENTIFIED WITH mysql_native_password BY 'canal' ;FLUSH PRIVILEGES; SELECT DISTINCT CONCAT('User: ''' ,user ,'''@''' ,host,''';' ) AS query FROM mysql.user;
在数据库中执行 /usr/local/canal/admin/conf/canal_manager.sql
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 68 69 70 71 72 73 74 75 76 77 78 79 80 81 82 83 84 85 86 87 88 89 90 91 92 93 94 95 96 97 98 99 100 101 102 103 104 105 106 107 108 109 110 CREATE DATABASE /*!32312 IF NOT EXISTS*/ `canal_manager` /*!40100 DEFAULT CHARACTER SET utf8 COLLATE utf8_bin */; USE `canal_manager`; SET NAMES utf8; SET FOREIGN_KEY_CHECKS = 0; -- ---------------------------- -- Table structure for canal_adapter_config -- ---------------------------- DROP TABLE IF EXISTS `canal_adapter_config`; CREATE TABLE `canal_adapter_config` ( `id` bigint(20) NOT NULL AUTO_INCREMENT, `category` varchar(45) NOT NULL, `name` varchar(45) NOT NULL, `status` varchar(45) DEFAULT NULL, `content` text NOT NULL, `modified_time` timestamp NOT NULL DEFAULT CURRENT_TIMESTAMP ON UPDATE CURRENT_TIMESTAMP, PRIMARY KEY (`id`) ) ENGINE=InnoDB DEFAULT CHARSET=utf8; -- ---------------------------- -- Table structure for canal_cluster -- ---------------------------- DROP TABLE IF EXISTS `canal_cluster`; CREATE TABLE `canal_cluster` ( `id` bigint(20) NOT NULL AUTO_INCREMENT, `name` varchar(63) NOT NULL, `zk_hosts` varchar(255) NOT NULL, `modified_time` timestamp NOT NULL DEFAULT CURRENT_TIMESTAMP ON UPDATE CURRENT_TIMESTAMP, PRIMARY KEY (`id`) ) ENGINE=InnoDB DEFAULT CHARSET=utf8; -- ---------------------------- -- Table structure for canal_config -- ---------------------------- DROP TABLE IF EXISTS `canal_config`; CREATE TABLE `canal_config` ( `id` bigint(20) NOT NULL AUTO_INCREMENT, `cluster_id` bigint(20) DEFAULT NULL, `server_id` bigint(20) DEFAULT NULL, `name` varchar(45) NOT NULL, `status` varchar(45) DEFAULT NULL, `content` text NOT NULL, `content_md5` varchar(128) NOT NULL, `modified_time` timestamp NOT NULL DEFAULT CURRENT_TIMESTAMP ON UPDATE CURRENT_TIMESTAMP, PRIMARY KEY (`id`), UNIQUE KEY `sid_UNIQUE` (`server_id`) ) ENGINE=InnoDB DEFAULT CHARSET=utf8; -- ---------------------------- -- Table structure for canal_instance_config -- ---------------------------- DROP TABLE IF EXISTS `canal_instance_config`; CREATE TABLE `canal_instance_config` ( `id` bigint(20) NOT NULL AUTO_INCREMENT, `cluster_id` bigint(20) DEFAULT NULL, `server_id` bigint(20) DEFAULT NULL, `name` varchar(45) NOT NULL, `status` varchar(45) DEFAULT NULL, `content` text NOT NULL, `content_md5` varchar(128) DEFAULT NULL, `modified_time` timestamp NOT NULL DEFAULT CURRENT_TIMESTAMP ON UPDATE CURRENT_TIMESTAMP, PRIMARY KEY (`id`), UNIQUE KEY `name_UNIQUE` (`name`) ) ENGINE=InnoDB DEFAULT CHARSET=utf8; -- ---------------------------- -- Table structure for canal_node_server -- ---------------------------- DROP TABLE IF EXISTS `canal_node_server`; CREATE TABLE `canal_node_server` ( `id` bigint(20) NOT NULL AUTO_INCREMENT, `cluster_id` bigint(20) DEFAULT NULL, `name` varchar(63) NOT NULL, `ip` varchar(63) NOT NULL, `admin_port` int(11) DEFAULT NULL, `tcp_port` int(11) DEFAULT NULL, `metric_port` int(11) DEFAULT NULL, `status` varchar(45) DEFAULT NULL, `modified_time` timestamp NOT NULL DEFAULT CURRENT_TIMESTAMP ON UPDATE CURRENT_TIMESTAMP, PRIMARY KEY (`id`) ) ENGINE=InnoDB DEFAULT CHARSET=utf8; -- ---------------------------- -- Table structure for canal_user -- ---------------------------- DROP TABLE IF EXISTS `canal_user`; CREATE TABLE `canal_user` ( `id` bigint(20) NOT NULL AUTO_INCREMENT, `username` varchar(31) NOT NULL, `password` varchar(128) NOT NULL, `name` varchar(31) NOT NULL, `roles` varchar(31) NOT NULL, `introduction` varchar(255) DEFAULT NULL, `avatar` varchar(255) DEFAULT NULL, `creation_date` timestamp NOT NULL DEFAULT CURRENT_TIMESTAMP ON UPDATE CURRENT_TIMESTAMP, PRIMARY KEY (`id`) ) ENGINE=InnoDB DEFAULT CHARSET=utf8; SET FOREIGN_KEY_CHECKS = 1; -- ---------------------------- -- Records of canal_user -- ---------------------------- BEGIN; INSERT INTO `canal_user` VALUES (1, 'admin', '6BB4837EB74329105EE4568DDA7DC67ED2CA2AD9', 'Canal Manager', 'admin', NULL, NULL, '2019-07-14 00:05:28'); COMMIT; SET FOREIGN_KEY_CHECKS = 1;
启动 检查java
1 2 3 which java// 如果不是默认安装,需要添加一个软连接,例如: ln -s /tar/jdk1.8.0_60/bin/java /usr/bin/java
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 nano /usr/lib/systemd/system/canal-admin.service [Unit] Description=canal-admin After=network.target [Service] Type=forking User=canal Restart=on-failure RestartSec=15s ExecStart=/usr/local/canal/admin/bin/startup.sh ExecStop=/usr/local/canal/admin/bin/stop.sh PrivateTmp=true [Install] WantedBy=multi-user.target
1 2 3 systemctl start canal-admin systemctl status canal-admin systemctl enable canal-admin
访问 http://192.168.1.48:8489/
默认用户名:admin
默认密码:123456
其他 查看日志 1 tail -f -n 300 /usr/local/canal/admin/logs/admin.log
修改内存占用 Canal默认3G,如果需要更改,请更改 /usr/local/canal/admin/bin/startup.sh
1 2 3 4 5 6 7 if [ -n "$str" ]; then # JAVA_OPTS="-server -Xms1024m -Xmx1536m -Xmn512m -XX:SurvivorRatio=2 -XX:PermSize=96m -XX:MaxPermSize=256m -XX:MaxTenuringThreshold=15 -XX:+DisableExplicitGC $JAVA_OPTS" # For G1 JAVA_OPTS="-server -Xms1g -Xmx2g -XX:+UseG1GC -XX:MaxGCPauseMillis=250 -XX:+UseGCOverheadLimit -XX:+ExplicitGCInvokesConcurrent $JAVA_OPTS" else JAVA_OPTS="-server -Xms1024m -Xmx1024m -XX:NewSize=256m -XX:MaxNewSize=256m -XX:MaxPermSize=128m $JAVA_OPTS" fi
Canal-server 安装 解压
1 tar -zxvf canal.deployer-1.1.6.tar.gz -C /usr/local/canal/server/
创建所属组
1 chown canal:canal -R /usr/local/canal/server/
连接Canal-admin 修改配置
1 2 3 4 5 6 7 8 9 10 11 12 13 14 nano /usr/local/canal/server/conf/canal_local.properties canal.register.ip = canal.admin.manager = 192.168.1.48:8489 canal.admin.port = 11110 canal.admin.user = admin canal.admin.passwd = 4ACFE3202A5FF5CF467898FC58AAB1D615029441 canal.admin.register.auto = true canal.admin.register.cluster = canal.admin.register.name =
启动 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 nano /usr/lib/systemd/system/canal-server.service [Unit] Description=canal-server After=network.target [Service] Type=forking User=canal Restart=on-failure RestartSec=15s ExecStart=/usr/local/canal/server/bin/startup.sh local ExecStop=/usr/local/canal/server/bin/stop.sh PrivateTmp=true [Install] WantedBy=multi-user.target
1 2 3 systemctl start canal-server systemctl status canal-server systemctl enable canal-admin
其他 查看日志 1 tail -f -n 300 /usr/local/canal/server/logs/canal/canal.log
修改内存占用 Canal默认3G,如果需要更改,请更改 :
1 2 3 4 5 6 7 8 9 nano /usr/local/canal/server/bin/startup.sh if [ -n "$str " ]; then JAVA_OPTS="-server -Xms1g -Xmx2g -XX:+UseG1GC -XX:MaxGCPauseMillis=250 -XX:+UseGCOverheadLimit -XX:+ExplicitGCInvokesConcurrent $JAVA_OPTS " else JAVA_OPTS="-server -Xms1024m -Xmx1024m -XX:NewSize=256m -XX:MaxNewSize=256m -XX:MaxPermSize=128m $JAVA_OPTS " fi
Canal 配置 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 68 69 70 71 72 73 74 75 76 77 78 79 80 81 82 83 84 85 86 87 88 89 90 91 92 93 94 95 96 97 98 99 100 101 102 103 104 105 106 107 108 109 110 111 112 113 114 115 116 117 118 119 120 121 122 123 124 125 126 127 128 129 130 131 132 133 134 135 136 137 138 139 140 141 142 143 144 145 146 147 148 149 150 151 152 153 154 155 156 157 158 159 160 161 162 163 164 165 166 167 168 169 170 171 172 173 174 175 176 177 178 canal.ip =canal.register.ip =canal.port = 11111 canal.metrics.pull.port = 11112 canal.admin.port = 11110 canal.admin.user = admin canal.admin.passwd = 4ACFE3202A5FF5CF467898FC58AAB1D615029441 canal.zkServers =canal.zookeeper.flush.period = 1000 canal.withoutNetty = false canal.serverMode = tcp canal.file.data.dir = ${canal.conf.dir} canal.file.flush.period = 1000 canal.instance.memory.buffer.size = 1048576 canal.instance.memory.buffer.memunit = 1024 canal.instance.memory.batch.mode = MEMSIZE canal.instance.memory.rawEntry = true canal.instance.detecting.enable = false canal.instance.detecting.sql = select 1 canal.instance.detecting.interval.time = 3 canal.instance.detecting.retry.threshold = 3 canal.instance.detecting.heartbeatHaEnable = false canal.instance.transaction.size = 1024 canal.instance.fallbackIntervalInSeconds = 60 canal.instance.network.receiveBufferSize = 16384 canal.instance.network.sendBufferSize = 16384 canal.instance.network.soTimeout = 30 canal.instance.filter.druid.ddl = true canal.instance.filter.query.dcl = false canal.instance.filter.query.dml = false canal.instance.filter.query.ddl = false canal.instance.filter.table.error = false canal.instance.filter.rows = false canal.instance.filter.transaction.entry = false canal.instance.filter.dml.insert = false canal.instance.filter.dml.update = false canal.instance.filter.dml.delete = false canal.instance.binlog.format = ROW,STATEMENT,MIXED canal.instance.binlog.image = FULL,MINIMAL,NOBLOB canal.instance.get.ddl.isolation = false canal.instance.parser.parallel = true canal.instance.parser.parallelBufferSize = 256 canal.instance.tsdb.enable = true canal.instance.tsdb.dir = ${canal.file.data.dir:../conf}/${canal.instance.destination:} canal.instance.tsdb.url = jdbc:h2:${canal.instance.tsdb.dir}/h2;CACHE_SIZE=1000;MODE=MYSQL; canal.instance.tsdb.dbUsername = canal canal.instance.tsdb.dbPassword = canal canal.instance.tsdb.snapshot.interval = 24 canal.instance.tsdb.snapshot.expire = 360 canal.destinations = zhikong canal.conf.dir = ../conf canal.auto.scan = true canal.auto.scan.interval = 5 canal.auto.reset.latest.pos.mode = false canal.instance.tsdb.spring.xml = classpath:spring/tsdb/h2-tsdb.xml canal.instance.global.mode = manager canal.instance.global.lazy = false canal.instance.global.manager.address = ${canal.admin.manager} canal.instance.global.spring.xml = classpath:spring/file-instance.xml canal.aliyun.accessKey =canal.aliyun.secretKey =canal.aliyun.uid =canal.mq.flatMessage = true canal.mq.canalBatchSize = 50 canal.mq.canalGetTimeout = 100 canal.mq.accessChannel = local canal.mq.database.hash = true canal.mq.send.thread.size = 30 canal.mq.build.thread.size = 8 kafka.bootstrap.servers = 127.0.0.1:6667 kafka.acks = all kafka.compression.type = none kafka.batch.size = 16384 kafka.linger.ms = 1 kafka.max.request.size = 1048576 kafka.buffer.memory = 33554432 kafka.max.in.flight.requests.per.connection = 1 kafka.retries = 0 kafka.kerberos.enable = false kafka.kerberos.krb5.file = "../conf/kerberos/krb5.conf" kafka.kerberos.jaas.file = "../conf/kerberos/jaas.conf" rocketmq.producer.group = test rocketmq.enable.message.trace = false rocketmq.customized.trace.topic =rocketmq.namespace =rocketmq.namesrv.addr = 127.0.0.1:9876 rocketmq.retry.times.when.send.failed = 0 rocketmq.vip.channel.enabled = false rocketmq.tag = rabbitmq.host =rabbitmq.virtual.host =rabbitmq.exchange =rabbitmq.username =rabbitmq.password =rabbitmq.deliveryMode =
Prometheus Prometheus中文发音为普罗米修斯,它可以使用各种数学算法实现强大的监控需求,并且原生支持K8S的服务发现,能监控容器的动态变化。结合Grafana绘出漂亮图形,最终使用alertmanager或Grafana实现报警。它与其他监控相比有以下主要优势:数据格式是Key/Value形式,简单、速度快;监控数据的精细程度是绝对的领先,达到秒级(但正因为数据采集精度高,对磁盘消耗大,存在性能瓶颈,而且不支持集群,但可以通过联邦能力进行扩展);不依赖分布式存储,数据直接保存在本地,可以不需要额外的数据库配置。但是如果对历史数据有较高要求,可以结合OpenTSDB;周边插件丰富,如果对监控要求不是特别严格的话,默认的几个成品插件已经足够使用;本身基于数学计算模型,有大量的函数可用,可以实现很复杂的监控(所以学习成本高,需要有一定数学思维,独有的数学命令行很难入门);可以嵌入很多开源工具的内部去进行监控,数据更可信。
有句话说得好,服务崩了,Prometheus 也不能崩,比如2021年的B站崩溃事件,大量服务无法访问的情况下,Prometheus 依然能够运行并检测系统状态,才有了后来的问题精准定位和复盘,他是崩溃时运维能力的保障,也是崩溃时的最后一道防线
安装 Download | Prometheus
创建用户 1 [root@eck] useradd -s /sbin/nologin -M prometheus
部署 1 2 3 4 5 6 7 8 9 [root@eck zip] [root@eck zip] [root@eck zip] [root@eck prometheus-2.37.0.linux-amd64] [root@eck prometheus-2.37.0.linux-amd64] [root@eck prometheus] [root@eck prometheus] [root@eck prometheus] nano /usr/local/prometheus/prometheus.yml
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 global: scrape_interval: 15s evaluation_interval: 15s scrape_timeout: 30s alerting: alertmanagers: - static_configs: - targets: rule_files: scrape_configs: - job_name: "prometheus" static_configs: - targets: ["192.168.1.48:9090" ] - job_name: "canal" static_configs: - targets: ["192.168.1.48:11112" ]
如果需要更换端口,请同步更改上面的 - targets: ["localhost:9090"]
和 ./prometheus --web.listen-address="0.0.0.0:9080"
1 2 3 4 5 [root@eck prometheus]# /usr/local/prometheus/promtool check config /usr/local/prometheus/prometheus.yml Checking prometheus.yml SUCCESS: prometheus.yml is valid prometheus config file syntax [root@eck prometheus] chown prometheus:prometheus -R /usr/local/prometheus [root@eck prometheus] nano /usr/lib/systemd/system/prometheus.service
1 2 3 4 5 6 7 8 9 10 11 [Unit] Description =The Prometheus Server After =network.target [Service] Restart =on-failure RestartSec =15s ExecStart =/usr/local/prometheus/prometheus --web.external-url=prometheus --config.file=/usr/local/prometheus/prometheus.yml --log.level "info" [Install] WantedBy =multi-user.target
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 [root@eck system] [root@eck system] [root@eck system] ● prometheus.service - The Prometheus Server Loaded: loaded (/usr/lib/systemd/system/prometheus.service; disabled; vendor preset: disabled) Active: active (running) since Thu 2022-07-28 14:24:08 CST; 6s ago Main PID: 40225 (prometheus) CGroup: /system.slice/prometheus.service └─40225 /usr/local/prometheus/prometheus --config.file=/usr/local/prometheus/prometheus.yml Jul 28 14:24:09 eck prometheus[40225]: ts=2022-07-28T06:24:09.139Z caller =head.go:536 level=info component=tsdb msg="On-disk memory mappable chunks replay completed" duration=221.285µs Jul 28 14:24:09 eck prometheus[40225]: ts=2022-07-28T06:24:09.139Z caller =head.go:542 level=info component=tsdb msg="Replaying WAL, this may take a while" Jul 28 14:24:09 eck prometheus[40225]: ts=2022-07-28T06:24:09.140Z caller =head.go:613 level=info component=tsdb msg="WAL segment loaded" segment=0 maxSegment=0 Jul 28 14:24:09 eck prometheus[40225]: ts=2022-07-28T06:24:09.140Z caller =head.go:619 level=info component=tsdb msg="WAL replay completed" checkpoint_replay_duration=186.407µs wal_replay_duration=591.84…uration=1.071424ms Jul 28 14:24:09 eck prometheus[40225]: ts=2022-07-28T06:24:09.142Z caller =main.go:993 level=info fs_type=XFS_SUPER_MAGIC Jul 28 14:24:09 eck prometheus[40225]: ts=2022-07-28T06:24:09.142Z caller =main.go:996 level=info msg="TSDB started" Jul 28 14:24:09 eck prometheus[40225]: ts=2022-07-28T06:24:09.142Z caller =main.go:1177 level=info msg="Loading configuration file" filename=/usr/local/prometheus/prometheus.yml Jul 28 14:24:09 eck prometheus[40225]: ts=2022-07-28T06:24:09.144Z caller =main.go:1214 level=info msg="Completed loading of configuration file" filename=/usr/local/prometheus/prometheus.yml totalDuration=2.101397ms db_…µs Jul 28 14:24:09 eck prometheus[40225]: ts=2022-07-28T06:24:09.144Z caller =main.go:957 level=info msg="Server is ready to receive web requests." Jul 28 14:24:09 eck prometheus[40225]: ts=2022-07-28T06:24:09.144Z caller =manager.go:941 level=info component="rule manager" msg="Starting rule manager..." Hint: Some lines were ellipsized, use -l to show in full. [root@eck system] Created symlink from /etc/systemd/system/multi-user.target.wants/prometheus.service to /usr/lib/systemd/system/prometheus.service.
P.S. 新版本的启动命令和旧版的不同,许多参数都优化或删除了,网上许多配置示例都是旧版本的,请注意甄别
停止 1 [root@eck prometheus] systemctl stop prometheus
Grafana 安装 Download Grafana | Grafana Labs
创建用户 1 [root@eck] useradd -s /sbin/nologin -M grafana
部署 1 2 3 4 5 6 7 8 9 [root@eck] mkdir /usr/local/grafana [root@eck] tar zxvf grafana-enterprise-9.0.5.linux-amd64.tar.gz -C /usr/local/grafana/ [root@eck grafana-9.0.5] [root@eck grafana-9.0.5] [root@eck grafana-9.0.5] [root@eck grafana] [root@eck grafana] [root@eck grafana] [root@eck grafana]
更改如下参数
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 [paths] data = /usr/local/grafana/data temp_data_lifetime = 24h logs = /usr/local/grafana/log plugins = /usr/local/grafana/plugins provisioning = /usr/local/grafana/conf/provisioning
添加如下内容
1 2 3 4 5 6 7 8 9 10 11 12 13 14 [Unit] Description =Grafana After =network.target [Service] User =grafana Group =grafana Type =notify ExecStart =/usr/local/grafana/bin/grafana-server -homepath /usr/local/grafana Restart =on-failure RestartSec =15s [Install] WantedBy =multi-user.target
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 [root@eck grafana] [root@eck grafana] [root@eck grafana] ● grafana.service - Grafana Loaded: loaded (/usr/lib/systemd/system/grafana.service; disabled; vendor preset: disabled) Active: active (running) since Thu 2022-07-28 14:04:55 CST; 4s ago Main PID: 33239 (grafana-server) CGroup: /system.slice/grafana.service └─33239 /usr/local/grafana/bin/grafana-server -homepath /usr/local/grafana Jul 28 14:04:50 eck grafana-server[33239]: logger=query_data t=2022-07-28T14:04:50.010286641+08:00 level=info msg="Query Service initialization" Jul 28 14:04:50 eck grafana-server[33239]: logger=live.push_http t=2022-07-28T14:04:50.034617786+08:00 level=info msg="Live Push Gateway initialization" Jul 28 14:04:55 eck grafana-server[33239]: logger=ngalert t=2022-07-28T14:04:55.216538416+08:00 level=warn msg="failed to delete old am configs" org=1 err="database is locked" Jul 28 14:04:55 eck grafana-server[33239]: logger=infra.usagestats.collector t=2022-07-28T14:04:55.313131685+08:00 level=info msg="registering usage stat providers" usageStatsProvidersLen=2 Jul 28 14:04:55 eck grafana-server[33239]: logger=report t=2022-07-28T14:04:55.314348363+08:00 level=warn msg="Scheduling and sending of reports disabled, SMTP is not configured and enabled. Configure SMTP to enable." Jul 28 14:04:55 eck grafana-server[33239]: logger=grafanaStorageLogger t=2022-07-28T14:04:55.316123829+08:00 level=info msg="storage starting" Jul 28 14:04:55 eck grafana-server[33239]: logger=ngalert t=2022-07-28T14:04:55.317403277+08:00 level=info msg="warming cache for startup" Jul 28 14:04:55 eck grafana-server[33239]: logger=ngalert.multiorg.alertmanager t=2022-07-28T14:04:55.317616335+08:00 level=info msg="starting MultiOrg Alertmanager" Jul 28 14:04:55 eck systemd[1]: Started Grafana. Jul 28 14:04:55 eck grafana-server[33239]: logger=http.server t=2022-07-28T14:04:55.321708016+08:00 level=info msg="HTTP Server Listen" address=[::]:3000 protocol=http subUrl= socket= [root@eck grafana] Created symlink from /etc/systemd/system/multi-user.target.wants/grafana.service to /usr/lib/systemd/system/grafana.service.
登录 192.168.1.48:3000
用户名:admin
密码:admin
Node_exporter prometheus负责汇总多个服务器的node_exporter收集的数据在grafana形象的展示出来。所以node_exporter需要安装在监控的目标机上
安装 Releases · prometheus/node_exporter (github.com)
1 2 3 [root@eck zip] [root@eck zip] [root@eck zip]
1 2 3 4 5 6 7 8 9 10 11 12 [Unit] Description =The node_exporter Server After =network.target [Service] ExecStart =/usr/local/node_exporter/node_exporter Restart =on-failure RestartSec =15s SyslogIdentifier =node_exporter [Install] WantedBy =multi-user.target
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 [root@eck zip] [root@eck zip] [root@eck zip] ● node_exporter.service - The node_exporter Server Loaded: loaded (/usr/lib/systemd/system/node_exporter.service; disabled; vendor preset: disabled) Active: active (running) since Thu 2022-07-28 15:01:19 CST; 3s ago Main PID: 53582 (node_exporter) CGroup: /system.slice/node_exporter.service └─53582 /usr/local/node_exporter/node_exporter Jul 28 15:01:19 eck node_exporter[53582]: ts=2022-07-28T07:01:19.364Z caller =node_exporter.go:115 level=info collector=thermal_zone Jul 28 15:01:19 eck node_exporter[53582]: ts=2022-07-28T07:01:19.364Z caller =node_exporter.go:115 level=info collector=time Jul 28 15:01:19 eck node_exporter[53582]: ts=2022-07-28T07:01:19.364Z caller =node_exporter.go:115 level=info collector=timex Jul 28 15:01:19 eck node_exporter[53582]: ts=2022-07-28T07:01:19.364Z caller =node_exporter.go:115 level=info collector=udp_queues Jul 28 15:01:19 eck node_exporter[53582]: ts=2022-07-28T07:01:19.364Z caller =node_exporter.go:115 level=info collector=uname Jul 28 15:01:19 eck node_exporter[53582]: ts=2022-07-28T07:01:19.364Z caller =node_exporter.go:115 level=info collector=vmstat Jul 28 15:01:19 eck node_exporter[53582]: ts=2022-07-28T07:01:19.364Z caller =node_exporter.go:115 level=info collector=xfs Jul 28 15:01:19 eck node_exporter[53582]: ts=2022-07-28T07:01:19.364Z caller =node_exporter.go:115 level=info collector=zfs Jul 28 15:01:19 eck node_exporter[53582]: ts=2022-07-28T07:01:19.365Z caller =node_exporter.go:199 level=info msg="Listening on" address=:9100 Jul 28 15:01:19 eck node_exporter[53582]: ts=2022-07-28T07:01:19.367Z caller =tls_config.go:195 level=info msg="TLS is disabled." http2=false [root@eck zip] Created symlink from /etc/systemd/system/multi-user.target.wants/node_exporter.service to /usr/lib/systemd/system/node_exporter.service.
访问:192.168.1.48:9100
部署
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 global : scrape_interval : 15s # Set the scrape interval to every 15 seconds. Default is every 1 minute. evaluation_interval : 15s # Evaluate rules every 15 seconds. The default is every 1 minute. alerting : alertmanagers : - static_configs: - targets: rule_files :scrape_configs : - job_name: "prometheus" static_configs : - targets: ["192.168.1.48:9090"] - job_name: "canal" static_configs : - targets: ["192.168.1.48:11112"] - job_name: "group1" static_configs : - targets: ["192.168.1.48:9100"]
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 [root@eck ~] Checking /usr/local/prometheus/prometheus.yml SUCCESS: /usr/local/prometheus/prometheus.yml is valid prometheus config file syntax [root@eck ~] [root@eck ~] ● prometheus.service - The Prometheus Server Loaded: loaded (/usr/lib/systemd/system/prometheus.service; enabled; vendor preset: disabled) Active: active (running) since Thu 2022-07-28 15:09:59 CST; 3s ago Main PID: 57357 (prometheus) CGroup: /system.slice/prometheus.service └─57357 /usr/local/prometheus/prometheus --config.file=/usr/local/prometheus/prometheus.yml --log.level info Jul 28 15:09:59 eck prometheus[57357]: ts=2022-07-28T07:09:59.224Z caller =head.go:613 level=info component=tsdb msg="WAL segment loaded" segment=0 maxSegment=2 Jul 28 15:09:59 eck prometheus[57357]: ts=2022-07-28T07:09:59.238Z caller =head.go:613 level=info component=tsdb msg="WAL segment loaded" segment=1 maxSegment=2 Jul 28 15:09:59 eck prometheus[57357]: ts=2022-07-28T07:09:59.239Z caller =head.go:613 level=info component=tsdb msg="WAL segment loaded" segment=2 maxSegment=2 Jul 28 15:09:59 eck prometheus[57357]: ts=2022-07-28T07:09:59.239Z caller =head.go:619 level=info component=tsdb msg="WAL replay completed" checkpoint_replay_duration=41.469µs wal_replay_duration=22.7899…ration=23.179317ms Jul 28 15:09:59 eck prometheus[57357]: ts=2022-07-28T07:09:59.240Z caller =main.go:993 level=info fs_type=XFS_SUPER_MAGIC Jul 28 15:09:59 eck prometheus[57357]: ts=2022-07-28T07:09:59.241Z caller =main.go:996 level=info msg="TSDB started" Jul 28 15:09:59 eck prometheus[57357]: ts=2022-07-28T07:09:59.241Z caller =main.go:1177 level=info msg="Loading configuration file" filename=/usr/local/prometheus/prometheus.yml Jul 28 15:09:59 eck prometheus[57357]: ts=2022-07-28T07:09:59.258Z caller =main.go:1214 level=info msg="Completed loading of configuration file" filename=/usr/local/prometheus/prometheus.yml totalDuration=17.238657ms db…µs Jul 28 15:09:59 eck prometheus[57357]: ts=2022-07-28T07:09:59.258Z caller =main.go:957 level=info msg="Server is ready to receive web requests." Jul 28 15:09:59 eck prometheus[57357]: ts=2022-07-28T07:09:59.258Z caller =manager.go:941 level=info component="rule manager" msg="Starting rule manager..." Hint: Some lines were ellipsized, use -l to show in full.
Mysqld-exporter 下载 Releases · prometheus/mysqld_exporter (github.com)
部署 1 2 3 4 5 6 7 8 9 [root@mysql zip] [root@mysql zip] [root@mysql zip] [root@mysql mysqld_exporter] [client] host=192.168.1.49 port=33106 user=mysql_monitor password=test123
1 2 3 4 5 6 7 8 9 #在数据库里执行(注意更改密码) USE mysql; CREATE USER 'mysql_monitor' IDENTIFIED BY 'test123' WITH MAX_USER_CONNECTIONS 3; GRANT PROCESS, REPLICATION CLIENT, SELECT ON *.* TO 'mysql_monitor'@'%'; -- 没成功 -- CREATE USER 'mysql_monitor'@'localhost' IDENTIFIED BY 'test123' WITH MAX_USER_CONNECTIONS 3; -- GRANT PROCESS, REPLICATION CLIENT, SELECT ON *.* TO 'mysql_monitor'@'localhost'; FLUSH PRIVILEGES; EXIT
1 2 3 4 5 6 7 8 9 10 11 12 [Unit] Description =The mysqld_exporter Server After =network.target [Service] ExecStart =/usr/local/mysqld_exporter/mysqld_exporter --config.my-cnf=/usr/local/mysqld_exporter/my.cnf --log.level=info Restart =on-failure RestartSec =15s SyslogIdentifier =mysqld_exporter [Install] WantedBy =multi-user.target
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 [root@mysql mysqld_exporter] [root@mysql mysqld_exporter] ● mysqld_exporter.service - The mysqld_exporter Server Loaded: loaded (/usr/lib/systemd/system/mysqld_exporter.service; disabled; vendor preset: disabled) Active: active (running) since Fri 2022-07-29 09:23:11 CST; 3s ago Main PID: 32377 (mysqld_exporter) Tasks: 6 Memory: 8.3M CGroup: /system.slice/mysqld_exporter.service └─32377 /usr/local/mysqld_exporter/mysqld_exporter --config.my-cnf=/usr/local/mysqld_exporter/my.cnf --log.level=info Jul 29 09:23:11 mysql mysqld_exporter[32377]: ts=2022-07-29T01:23:11.620Z caller =mysqld_exporter.go:277 level=info msg="Starting mysqld_exporter" version="(version=0.14.0, branch=HEAD, revision=ca1b9af82...b1aac73e7b68c)" Jul 29 09:23:11 mysql mysqld_exporter[32377]: ts=2022-07-29T01:23:11.620Z caller =mysqld_exporter.go:278 level=info msg="Build context" (gogo1.17.8,userroot@401d370ca42e,date20220304-16:25:15)=(MISSING) Jul 29 09:23:11 mysql mysqld_exporter[32377]: ts=2022-07-29T01:23:11.620Z caller =mysqld_exporter.go:293 level=info msg="Scraper enabled" scraper=global_status Jul 29 09:23:11 mysql mysqld_exporter[32377]: ts=2022-07-29T01:23:11.620Z caller =mysqld_exporter.go:293 level=info msg="Scraper enabled" scraper=global_variables Jul 29 09:23:11 mysql mysqld_exporter[32377]: ts=2022-07-29T01:23:11.620Z caller =mysqld_exporter.go:293 level=info msg="Scraper enabled" scraper=slave_status Jul 29 09:23:11 mysql mysqld_exporter[32377]: ts=2022-07-29T01:23:11.620Z caller =mysqld_exporter.go:293 level=info msg="Scraper enabled" scraper=info_schema.query_response_time Jul 29 09:23:11 mysql mysqld_exporter[32377]: ts=2022-07-29T01:23:11.620Z caller =mysqld_exporter.go:293 level=info msg="Scraper enabled" scraper=info_schema.innodb_cmp Jul 29 09:23:11 mysql mysqld_exporter[32377]: ts=2022-07-29T01:23:11.620Z caller =mysqld_exporter.go:293 level=info msg="Scraper enabled" scraper=info_schema.innodb_cmpmem Jul 29 09:23:11 mysql mysqld_exporter[32377]: ts=2022-07-29T01:23:11.620Z caller =mysqld_exporter.go:303 level=info msg="Listening on address" address=:9104 Jul 29 09:23:11 mysql mysqld_exporter[32377]: ts=2022-07-29T01:23:11.621Z caller =tls_config.go:195 level=info msg="TLS is disabled." http2=false Hint: Some lines were ellipsized, use -l to show in full. [root@mysql mysqld_exporter] Created symlink from /etc/systemd/system/multi-user.target.wants/mysqld_exporter.service to /usr/lib/systemd/system/mysqld_exporter.service. [root@mysql mysqld_exporter]
MySQL Overview dashboard for Grafana | Grafana Labs
标签说明
Name
Meaning
Param
Other
MySQL Uptime
MySQL运行时间
MySQL 服务器自从上次重启运行到现在的时长
Current QPS
每秒查询速率
根据使用 MySQL 的 SHOW STATUS
命令查询到的结果,它是服务器在最后一秒内执行的语句数量。这个变量包含在存储程序中执行的语句,与 Questions 变量不同
InnoDB Buffer Pool Size
InnoDB 缓冲池
InnoDB 缓冲池 InnoDB 维护一个称为缓冲池的存储区域,用于在内存中缓存数据和索引。了解 InnoDB 缓冲池如何工作,并利用它来将频繁访问的数据保存在内存中,这是 MySQL 调优最重要的方面之一。目标是将工作集保存在内存中。在大多数情况下,这个值应该处于主机上60%-90%的可用内存之间show status like 'Innodb_buffer_pool_resize%';
MySQL Connections
连接数
1. Connections
试图连接MySQL服务器的尝试次数 2. Max Connections
允许同时保持在打开状态的客户连接的最大个数 3. Max Used Connections
此前曾同时打开处于打开状态的连接的最大个数 4. Threads Connectd
现在正处于打开状态的连接的个数
自服务器启动以来同时使用的最大连接数
MySQL Client Thread Activity
客户端活动线程数
未休眠线程数
MySQL Questions
服务器执行的语句数
与 QPS 计算中使用的查询不同,只包括客户端发送到服务器的语句,而不包括存储程序中执行的语句
MySQL Thread Cache
线程缓存
1. Thread Cache Size
线程缓存所能容纳的线程的最大个数.断开的mysql连接会放到这个缓存里,新建立的连接就会重复使用它们而不创建新的线程. 如果缓存中有自由的线程,MySQL就能很快的响应连接请求,不必为每个连接都创建新的线程.每个在缓存中的线程通常消耗256KB内存. 2. Thread Created
为处理连接创建的线程总数
当客户端断开连接时,如果缓存未满,客户端的线程将被放入缓存中
MySQL Temporary Objects
临时表信息
1.Created_tmp_tables
MySQL服务器在对SQL查询语句进行处理时在内存里创建的临时数据表的个数. 如果该值太高,唯一的解决办法是:优化查询语句. 2. Created_tmp_disk_tables
MySQL服务器在对SQL查询语句进行处理时在磁盘上创建的临时数据表的个数,如果这个值比较高,可能的原因: a.查询在选择BLOB或者TEXT列的时候创建了临时表 b.tmp_table_size和max_heap_table_size的值也许太小 3. Created_tmp_files
MySQL服务器所创建的临时文件的个数
MySQL Select Types
1. Select Full Join
没有使用索引而完成的多表联接操作的次数.这种情况是性能杀手,最好去优化sql. 2. Select Full Range Join
利用一个辅助性的参照表(reference table)上的区间搜索(range search)操作而完成的多数据表联接操作的次数. 该值表示使用了范围查询联接表的次数. 3. Select Range
利用第一个数据表上的某个区间而完成的多数据表联接操作的次数. 4. Select Range Check
该变量记录了在联接时,对每一行数据重新检查索引的查询计划的数量,它的开销很大. 如果该值较高或正在增加,说明一些查询没有找到好索引. 5. Select Scan
通过对第一个数据表进行全表扫描而完成的多数据表联接操作的次数.
MySQL Sorts
排序使用情况
1. Sort Rows
对多少行排序 2. Sort Range
利用一个区间进行的排序操作的次数 3. Sort Merge Passes
查询导致了文件排序的次数.可以优化sql或者适当增加sort_buffer_size变量 4. Sort Scan
利用一次全表扫作而完成的排序操作的次数
显示当前排序功能的使用情况
MySQL Slow Queries
慢查询使用情况
1. Slow Queries
慢查询的次数(执行时间超过long_query_time值)
显示当前慢查询功能的使用情况
MySQL Aborted Connections
终止的连接数
1. Aborted Clients
因客户端没有正确地关闭而被丢弃的连接的个数 2. Aborted Connects
试图连接MySQL服务器但没有成功的次数
当一个给定的主机连接到 MySQL 并且连接在中间被中断(例如由于凭证错误)时,MySQL 会将该信息保存在系统表中
MySQL Table Locks
表级锁使用情况
1. Table Locks Immediate
无需等待就能够立刻得到满足的数据表锁定请求的个数 2. Table Locks Waited
显示了有多少表被锁住了并且导致服务器级的锁等待(存储引擎级的锁,如InnoDB行级锁,不会使该变量增加). 如果这个值比较高或者正在增加,那么表明存在严重的并发瓶颈.
MySQL 因各种原因需要多个不同的锁。在这个图表中,我们看到 MySQL 从存储引擎请求了多少个表级锁
MySQL Network Traffic
网络流量
1. Bytes Send
发送字节数 2. Bytes Received
收到字节数
MySQL 产生了多少网络流量。出站是从 MySQL 发送的网络流量,入站是 MySQL 收到的网络流量
MySQL Network Usage Hourly
每小时网络流量
每小时 MySQL 产生多少网络流量
MySQL Internal Memory Overview
内存概述
1. Key Buffer Size
键缓存大小
数据库使用的内存情况
Top Command Counters
命令计数器
显示了MySQL(在过去1秒内)执行各种命令的次数
Top Command Counters Hourly
命令计数器(小时)
MySQL Handlers
请求个数
1. Handler_writer
向数据表里插入一个数据行的请求的个数 2. Handler_update
对数据表里的一个数据行进行修改的请求的个数 3. Handler_delete
从数据表删除一个数据行的请求的个数 4. Handler_read_first
读取索引中第一个索引项的请求的个数 5. Handler_read_key
根据一个索引值而读取一个数据行的请求的个数 6. Handler_read_next
按索引顺序读取下一个数据行的请求的个数 7. Handler_read_prev
按索引逆序读取前一个数据行的请求的个数 8. Handler_read_rnd
根据某个数据行的位置而读取该数据行的请求的个数 9. Handler_read_rnd_next
读取下一个数据行的请求的个数.如果这个数字很高,就说明有很多语句需要
MySQL Transcation Handlers
1. Handler Commit
提交一个事务的请求的个数 2. Handler Rollback
回滚一个事务的请求的个数 3. Handler Savepoint
创建一个事务保存点的请求的个数 4. Handler Savepoint Rollback
回滚到一个事务保存点的请求的个数.
Process States
Top Processs States Hourly
MySQL Query Cache Memory
MySQL Query Cache Activity
MySQL File Openings
MySQL Open Files
当前处于打开状态的文件的个数
如果与open_files_limit接近,则应该加大
MySQL Table Open Cache Status
MYSQL实践心得:table_open_cache的设置 - 盘思动 - 博客园 (cnblogs.com)
MySQL Open Tables
MySQL服务器已打开的数据表总数(包括显式定义的临时表)
如果这个值很高,应该慎重考虑,是否加大数据表缓存(table_open_cache)
MySQL Table Definition Cache
Percona监控MySQL模板详解 - 爱码网 (likecs.com)
Redis_exporter 安装 Releases · oliver006/redis_exporter (github.com)
1 2 3 4 5 6 useradd -s /sbin/nologin -M redisporter tar -xvf redis_exporter-v1.43.0.linux-386.tar.gz -C /usr/local cd /usr/localmv redis_exporter-v1.43.0.linux-386/ redis_exporterchown redisporter:redisporter -R redis_exporternano /usr/lib/systemd/system/redis_exporter.service
1 2 3 4 5 6 7 8 9 10 11 12 13 14 [Unit] Description =The redis_exporter Server After =network.target [Service] User =redisporter Group =redisporter ExecStart =/usr/local/redis_exporter/redis_exporter -redis.addr 192.168.0.89:6379 -redis.password test123 Restart =on-failure RestartSec =15s SyslogIdentifier =redis_exporter [Install] WantedBy =multi-user.target
1 2 3 4 systemctl daemon-reload systemctl start redis_exporter systemctl status redis_exporter systemctl enable redis_exporter
prometheus 1 2 3 4 5 nano /usr/local/prometheus/prometheus.yml - job_name: "redis" static_configs: - targets: ["192.168.0.89:9121"]
1 systemctl restart prometheus
访问:127.0.0.1:9121/metrics
Elasticsearch_exporter Prometheus + Grafana(十)系统监控之Elasticsearch - 曹伟雄 - 博客园 (cnblogs.com)
安装 Releases · prometheus-community/elasticsearch_exporter (github.com)
1 2 3 4 5 useradd -s /sbin/nologin -M esporter tar zxvf elasticsearch_exporter-1.5.0.linux-386.tar.gz -C /usr/local mv /usr/local/elasticsearch_exporter-1.5.0.linux-386/ /usr/local/elasticsearch_exporterchown esporter:esporter -R /usr/local/elasticsearch_exporternano /usr/lib/systemd/system/elasticsearch_exporter.service
1 2 3 4 5 6 7 8 9 10 11 12 13 14 [Unit] Description =The elasticsearch_exporter Server After =network.target [Service] User =esporter Group =esporter ExecStart =/usr/local/elasticsearch_exporter/elasticsearch_exporter --es.all --es.indices --es.cluster_settings --es.indices_settings --es.shards --es.snapshots --es.uri http://elastic:test123!%40%23@192.168.0.88:9200 Restart =on-failure RestartSec =15s SyslogIdentifier =elasticsearch_exporter [Install] WantedBy =multi-user.target
1 2 3 4 systemctl daemon-reload systemctl start elasticsearch_exporter systemctl status elasticsearch_exporter systemctl enable elasticsearch_exporter
prometheus 1 2 3 4 5 nano /usr/local/prometheus/prometheus.yml - job_name: "elasticsearch" static_configs: - targets: ["192.168.0.88:9114"]
1 systemctl restart prometheus
访问:127.0.0.1:9114/metrics
Alertmanager Release 0.24.0 / 2022-03-24 · prometheus/alertmanager (github.com)
Prometheus监控+Grafana+Alertmanager告警安装使用 (图文详解) - 九卷 - 博客园 (cnblogs.com)
1 2 3 4 5 6 7 8 9 10 11 tar -zxvf alertmanager-0.24.0.linux-386.tar.gz -C /usr/local mv /usr/local/alertmanager-0.24.0.linux-386 /usr/local/alertmanagernano /usr/local/prometheus/prometheus.yml - job_name: "alertmanager" static_configs: - targets: ["eck:9093" ] useradd -s /sbin/nologin -M alertmanager chown -R alertmanager:alertmanager /usr/local/alertmanagernano /usr/lib/systemd/system/alertmanager.service
1 2 3 4 5 6 7 8 9 10 11 12 13 14 [Unit] Description =The alertmanager Server After =network.target [Service] User =alertmanager Group =alertmanager ExecStart =/usr/local/alertmanager/alertmanager --config.file=/usr/local/alertmanager/alertmanager.yml Restart =on-failure RestartSec =15s SyslogIdentifier =alertmanager [Install] WantedBy =multi-user.target
1 2 3 4 5 systemctl daemon-reload systemctl start alertmanager systemctl status alertmanager systemctl enable alertmanager systemctl restart prometheus
MySQL 定时备份 MysqlDump 1 nano /home/bak/mysql/data/backup.sh
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 #!/bin/bash DATAdelete=`date +%F -d "-7 day" ` rm -rf /home/bak/mysql/data/*_${DATAdelete} .sql.gzMYSQL_CMD=/usr/local/mysql/bin/mysqldump MYSQL_USER=root MYSQL_PWD=test123!@ DATA=`date +%F` DBname=`mysql -u${MYSQL_USER} -p${MYSQL_PWD} -e "show databases;" | sed '1d' | grep -vE 'information_schema|mysql|performance_schema|sys' ` for DBname in ${DBname} do ${MYSQL_CMD} -u${MYSQL_USER} -p${MYSQL_PWD} --compact -B ${DBname} | gzip >/home/bak/mysql/data/${DBname} _${DATA} .sql.gz done
1 2 3 4 5 6 chmod +x /home/bak/mysql/data/backup.shsh /usr/local/mysqlDataBackup/backup.sh crontab -e 00 01 * * * /usr/local/mysql/backup.sh crontab -l service crond restart
docker mysql 1 2 3 mkdir -p /root/docker_workspace/mysql/backup/mysql_backup_file touch /root/docker_workspace/mysql/backup/backup.sh touch /root/docker_workspace/mysql/backup_src
nano /root/docker_workspace/mysql/backup/backup.sh
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 #!/bin/bash WORKDIR=/root/docker_workspace/mysql/backup BACKDIR=${WORKDIR}/mysql_backup_file DATAdelete=`date +%F -d "-7 day"` rm -rf ${BACKDIR}/*_${DATAdelete}.sql.gz MYSQL_USER=用户名 MYSQL_PWD=密码 DATA=`date +%F` #docker 容器 ID 删除mysql容器后需要修改变量 ID=0d5b0533fd66 #进入mysql备份文件路径下 cd ${WORKDIR} #遍历 backup_src文件夹 读取数据库名 for line in `cat backup_src`;do docker exec -i ${ID} mysqldump -u${MYSQL_USER} -p${MYSQL_PWD} ${line} | gzip > ${BACKDIR}/${line}_${DATA}.sql.gz done
nano /root/docker_workspace/mysql/backup/backup_src
crontab -e
(每日十一点备份)
1 0 23 * * * /root/docker_workspace/mysql/backup/backup.sh
Grafana外网访问 nginx 1 nano /etc/nginx/sites-available/eck
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 68 69 70 71 72 73 74 map $http_upgrade $connection_upgrade { default upgrade; '' close; } server { listen 3000 ; root /usr/share/nginx/www; index index.html index.htm; location ^~ /grafana/ { proxy_pass http://eck:3000/; proxy_set_header Host $host ; proxy_set_header X-Real-IP $remote_addr ; proxy_set_header X-Forwarded-Host $host ; proxy_set_header X-Forwarded-Server $host ; proxy_set_header X-Forwarded-For $proxy_add_x_forwarded_for ; add_header 'Access-Control-Allow-Origin' $http_origin ; add_header 'Access-Control-Allow-Credentials' 'true' always; add_header 'Access-Control-Allow-Methods' 'GET, POST, OPTIONS' always; add_header 'Access-Control-Expose-Headers' 'Content-Type,Content-Length,Content-Range' ; add_header 'Access-Control-Allow-Headers' 'Accept, Authorization, Cache-Control, Content-Type, DNT, If-Modified-Since, Keep-Alive, Origin, User-Agent, X-Requested-With' always; if ($request_method = 'OPTIONS' ) { return 204 ; } } location ^~ /grafana/api/live { proxy_http_version 1 .1 ; proxy_set_header Upgrade $http_upgrade ; proxy_set_header Connection "Upgrade" ; proxy_set_header Host $http_host ; proxy_pass http://eck:3000/; } } server { listen 9090 ; server_name eck; location / { proxy_pass http://eck:9090; proxy_redirect off ; proxy_set_header Host $host ; proxy_set_header X-Forwarded-For $proxy_add_x_forwarded_for ; proxy_set_header X-Real-IP $remote_addr ; } } server { listen 9100 ; server_name eck; location / { proxy_pass http://eck:9100; proxy_redirect off ; proxy_set_header Host $host ; proxy_set_header X-Forwarded-For $proxy_add_x_forwarded_for ; proxy_set_header X-Real-IP $remote_addr ; } }
Grafana nano /usr/local/grafana/conf/defaults.ini
1 2 3 domain = 192.168.0.88 root_url = %(protocol)s://%(domain)s:%(http_port)s/grafana serve_from_sub_path = true