Logstash 入门实战(4)()

  本篇文章为你整理了Logstash 入门实战(4)()的详细内容,包含有 Logstash 入门实战(4),希望能帮助你了解 Logstash 入门实战(4)。

  本文主要介绍 Logstash 的一些常用过滤插件;相关的环境及软件信息如下:CentOS7.9、Logstash 8.2.2。

  1、grok 过滤插件

  grok 是一种将行与正则表达式匹配,将行的特定部分映射到专用字段中以及根据此映射执行操作的方法。Logstash 中内置了超过 200 种Logstash 模式,用于过滤单词、数字和日期等。 如果找不到所需的模式,可以自定义模式。 还有多个匹配模式的选项,可简化表达式的编写以捕获日志数据。

  Logstash grok 过滤插件的基本语法格式:

  

%{PATTERN:FieldName}

 

  grok 可以非结构化的数据转变为结构化数据,非常适用于各类系统日志;下面使用Grok 来处理 Nginx访问日志。

  一行 Nginx 访问日志:

  

10.49.196.1 - - [27/Sep/2022:10:16:15 +0800] "GET /favicon.ico HTTP/1.1" 404 555 "http://10.49.196.11:8066/" "Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/105.0.0.0 Safari/537.36"

 

  配置 Logstash:

  

input {

 

   stdin {

  filter {

   grok {

   match = { "message" = "%{IP:ip} - %{USER:remoteUser} \[%{HTTPDATE:accessTimeStr}\] \"%{WORD:method} %{URIPATHPARAM:path} %{WORD:protocal}/%{NUMBER:version}\" %{INT:status} %{INT:bytes} \"%{DATA:referer}\" \"%{DATA:userAgent}\""}

  output {

   stdout {

  }

 

  运行 Logstash 后输入日志信息:

  

10.49.196.1 - - [27/Sep/2022:10:16:15 +0800] "GET /favicon.ico HTTP/1.1" 404 555 "http://10.49.196.11:8066/" "Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/105.0.0.0 Safari/537.36"

 

   "method" = "GET",

   "host" = {

   "hostname" = "pxc2"

   "accessTimeStr" = "27/Sep/2022:10:16:15 +0800",

   "userAgent" = "Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/105.0.0.0 Safari/537.36",

   "referer" = "http://10.49.196.11:8066/",

   "ip" = "10.49.196.1",

   "message" = "10.49.196.1 - - [27/Sep/2022:10:16:15 +0800] \"GET /favicon.ico HTTP/1.1\" 404 555 \"http://10.49.196.11:8066/\" \"Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/105.0.0.0 Safari/537.36\"",

   "@timestamp" = 2022-09-27T02:31:13.852428Z,

   "bytes" = "555",

   "remoteUser" = "-",

   "@version" = "1",

   "event" = {

   "original" = "10.49.196.1 - - [27/Sep/2022:10:16:15 +0800] \"GET /favicon.ico HTTP/1.1\" 404 555 \"http://10.49.196.11:8066/\" \"Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/105.0.0.0 Safari/537.36\""

   "path" = "/favicon.ico",

   "status" = "404",

   "version" = "1.1",

   "protocal" = "HTTP"

  }

 

  可以看到各字段信息都解析出来了。

  各正则表达式定义的详细信息,可以参考https://github.com/logstash-plugins/logstash-patterns-core/tree/main/patterns,如:

  

IPV6 ((([0-9A-Fa-f]{1,4}:){7}([0-9A-Fa-f]{1,4}:))(([0-9A-Fa-f]{1,4}:){6}(:[0-9A-Fa-f]{1,4}((25[0-5]2[0-4]\d1\d\d[1-9]?\d)(\.(25[0-5]2[0-4]\d1\d\d[1-9]?\d)){3}):))(([0-9A-Fa-f]{1,4}:){5}(((:[0-9A-Fa-f]{1,4}){1,2}):((25[0-5]2[0-4]\d1\d\d[1-9]?\d)(\.(25[0-5]2[0-4]\d1\d\d[1-9]?\d)){3}):))(([0-9A-Fa-f]{1,4}:){4}(((:[0-9A-Fa-f]{1,4}){1,3})((:[0-9A-Fa-f]{1,4})?:((25[0-5]2[0-4]\d1\d\d[1-9]?\d)(\.(25[0-5]2[0-4]\d1\d\d[1-9]?\d)){3})):))(([0-9A-Fa-f]{1,4}:){3}(((:[0-9A-Fa-f]{1,4}){1,4})((:[0-9A-Fa-f]{1,4}){0,2}:((25[0-5]2[0-4]\d1\d\d[1-9]?\d)(\.(25[0-5]2[0-4]\d1\d\d[1-9]?\d)){3})):))(([0-9A-Fa-f]{1,4}:){2}(((:[0-9A-Fa-f]{1,4}){1,5})((:[0-9A-Fa-f]{1,4}){0,3}:((25[0-5]2[0-4]\d1\d\d[1-9]?\d)(\.(25[0-5]2[0-4]\d1\d\d[1-9]?\d)){3})):))(([0-9A-Fa-f]{1,4}:){1}(((:[0-9A-Fa-f]{1,4}){1,6})((:[0-9A-Fa-f]{1,4}){0,4}:((25[0-5]2[0-4]\d1\d\d[1-9]?\d)(\.(25[0-5]2[0-4]\d1\d\d[1-9]?\d)){3})):))(:(((:[0-9A-Fa-f]{1,4}){1,7})((:[0-9A-Fa-f]{1,4}){0,5}:((25[0-5]2[0-4]\d1\d\d[1-9]?\d)(\.(25[0-5]2[0-4]\d1\d\d[1-9]?\d)){3})):)))(%.+)?

 

  IPV4 (? ![0-9])(?:(?:[0-1]?[0-9]{1,2}2[0-4][0-9]25[0-5])[.](?:[0-1]?[0-9]{1,2}2[0-4][0-9]25[0-5])[.](?:[0-1]?[0-9]{1,2}2[0-4][0-9]25[0-5])[.](?:[0-1]?[0-9]{1,2}2[0-4][0-9]25[0-5]))(?![0-9])

  IP (?:%{IPV6}%{IPV4})

 

  2、dissect 过滤插件

  基于分隔符原理解析数据,解决grok解析时消耗过多cpu资源的问题。使用分隔符将非结构化事件数据提取到字段中。 解剖过滤器不使用正则表达式,速度非常快。 但是,如果数据的结构因行而异,grok 过滤器更合适。dissect 的应用有一定的局限性:主要适用于每行格式相似且分隔符明确简单的场景。

  dissect 语法比较简单,有一系列字段(field)和分隔符(delimiter)组成

  

%{}字段

 

  %{}之间是分隔符

 

  使用 Dissect 处理 Nginx 访问日志:

  

input {

 

   stdin {

  filter {

   dissect {

   mapping = { "message" = %{ip} - %{remoteUser} [%{accessTimeStr} %{+accessTimeStr}] "%{method} %{path} %{protocal}/%{version

  }" %{status} %{bytes} "%{referer}" "%{userAgent}"}

  output {

   stdout {

  }

 

  结果如下,与使用 grok 处理 Nginx 访问日志的结果一致:

  

10.49.196.1 - - [27/Sep/2022:10:16:15 +0800] "GET /favicon.ico HTTP/1.1" 404 555 "http://10.49.196.11:8066/" "Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/105.0.0.0 Safari/537.36"

 

   "path" = "/favicon.ico",

   "status" = "404",

   "protocal" = "HTTP",

   "host" = {

   "hostname" = "pxc2"

   "bytes" = "555",

   "userAgent" = "Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/105.0.0.0 Safari/537.36",

   "@version" = "1",

   "event" = {

   "original" = "10.49.196.1 - - [27/Sep/2022:10:16:15 +0800] \"GET /favicon.ico HTTP/1.1\" 404 555 \"http://10.49.196.11:8066/\" \"Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/105.0.0.0 Safari/537.36\""

   "referer" = "http://10.49.196.11:8066/",

   "accessTimeStr" = "27/Sep/2022:10:16:15 +0800",

   "message" = "10.49.196.1 - - [27/Sep/2022:10:16:15 +0800] \"GET /favicon.ico HTTP/1.1\" 404 555 \"http://10.49.196.11:8066/\" \"Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/105.0.0.0 Safari/537.36\"",

   "method" = "GET",

   "version" = "1.1",

   "@timestamp" = 2022-09-27T09:28:36.042881Z,

   "remoteUser" = "-",

   "ip" = "10.49.196.1"

  }

 

  dissect 除了字段外面的字符串定位功能以外,还通过几个特殊符号来处理字段提取的规则:

  %{+key}+代表该匹配值追加到 key 字段下

  
%{?key} ?表示,这块只是一个占位,并不会实际生成捕获字段存到事件里面。

  %{?key} %{ key}当同样捕获名称都是string,但是一个?一个 的时候,表示这是一个键值对。

  
上述配置把 accessTimeStr 字段的值转成日期类型,并写入accessTime 字段(默认为 @timestamp 字段)。运行 Logstash:

  

{"accessTimeStr": "27/Sep/2022:10:16:15 +0800"}

 

   "event" = {

   "original" = "{\"accessTimeStr\": \"27/Sep/2022:10:16:15 +0800\"}\n"

   "host" = {

   "hostname" = "pxc2"

   "@timestamp" = 2022-09-27T07:21:00.181981Z,

   "accessTime" = 2022-09-27T02:16:15Z,

   "@version" = "1",

   "accessTimeStr" = "27/Sep/2022:10:16:15 +0800"

  }

 

  4、mutate 过滤插件

  mutate 插件可以对字段进行重命名、删除、替换、更新等操作:

  convert 类型转换

  gsub 字符串替换

  split 字符串分割

  join 数组合转为字符串

  merge 数组合并为数组

  rename 字段重命名

  update 字段内容更新,如果字段不存在,则不处理

  replace 字段内容替换,如果字段不存在,则新增字段

  


{"a":"1","b":"A_B_C","c":"X,Y,Z","d":[1,2,3],"e1":[1,2,3],"e2":[4,5,6],"f":"ABC","g":"123"}

 

   "d" = "1,2,3",

   "e1" = [

   [0] 1,

   [1] 2,

   [2] 3,

   [3] 4,

   [4] 5,

   [5] 6

   "e2" = [

   [0] 4,

   [1] 5,

   [2] 6

   "c" = [

   [0] "X",

   [1] "Y",

   [2] "Z"

   "event" = {

   "original" = "{\"a\":\"1\",\"b\":\"A_B_C\",\"c\":\"X,Y,Z\",\"d\":[1,2,3],\"e1\":[1,2,3],\"e2\":[4,5,6],\"f\":\"ABC\",\"g\":\"123\"}\n"

   "ff" = "ABC",

   "@timestamp" = 2022-09-28T02:45:26.305729Z,

   "b" = "ABC",

   "@version" = "1",

   "host" = {

   "hostname" = "pxc2"

   "h" = "new value",

   "g" = "new value",

   "a" = 1

  }

 

  5、json 过滤插件

  json 插件可以把内容为 json 字符串的字段转换为 json 格式数据。

  

input {

 

   stdin {

  filter {

   json {

   source = "message"

   target = "result"

  output {

   stdout {

  }

 

  启动 Logstash 并在控制输入测试数据:

  

{"a":"1","b":"2"}

 

   "message" = "{\"a\":\"1\",\"b\":\"2\"}",

   "event" = {

   "original" = "{\"a\":\"1\",\"b\":\"2\"}"

   "@version" = "1",

   "@timestamp" = 2022-09-28T03:20:35.827102Z,

   "result" = {

   "b" = "2",

   "a" = "1"

   "host" = {

   "hostname" = "pxc2"

  }

 

  6、ruby 过滤插件

  ruby 插件时最灵活的插件,可以使用 ruby 来随心所欲的修改 Logstash Event 对象。

  

input {

 

   stdin {

   codec = "json"

  filter {

   ruby {

   code = a = event.get("a");event.set("a", a + 123")

  output {

   stdout {

  }

 

  启动 Logstash 并在控制输入测试数据:

  

{"a":"1","b":"2"}

 

   "host" = {

   "hostname" = "pxc2"

   "b" = "2",

   "a" = "1abc",

   "@version" = "1",

   "event" = {

   "original" = "{\"a\":\"1\",\"b\":\"2\"}\n"

   "@timestamp" = 2022-09-28T06:01:35.180939Z

  }

 

  

  本文中介绍的插件只是 Logstash 输入插件的很小的一部分,可查看Logstash 官方文档了解更多信息。

  以上就是Logstash 入门实战(4)()的详细内容,想要了解更多 Logstash 入门实战(4)的内容,请持续关注盛行IT软件开发工作室。

郑重声明:本文由网友发布,不代表盛行IT的观点,版权归原作者所有,仅为传播更多信息之目的,如有侵权请联系,我们将第一时间修改或删除,多谢。

留言与评论(共有 条评论)
   
验证码: