Paul Czarkowski / paul@paulcz.net / @pczarkowski
A log is a human readable, machine parsable representation of an event.
LOG = TIMESTAMP + DATA
Jan 19 13:01:13 paulcz-laptop anacron[7712]: Normal exit (0 jobs run)
120607 14:07:00 InnoDB: Starting an apply batch of log records to the database...
[1225306053] SERVICE ALERT: FTPSERVER;FTP SERVICE;OK;SOFT;2;FTP OK - 0.029 second response time on port 21 [220 ProFTPD 1.3.1 Server ready.]
[Sat Jan 19 01:04:25 2013] [error] [client 78.30.200.81] File does not exist: /opt/www/vhosts/crappywebsite/html/robots.txt
208.115.111.74 - - [13/Jan/2013:04:28:55 -0500] "GET /robots.txt HTTP/1.1"
301 303 "-" "Mozilla/5.0 (compatible; Ezooms/1.0; ezooms.bot@gmail.com)"
A human readable, machine parsable representation of an event.
But they're machine parsable right?
208.115.111.74 - - [13/Jan/2013:04:28:55 -0500] "GET /robots.txt HTTP/1.1"
301 303 "-" "Mozilla/5.0 (compatible; Ezooms/1.0; ezooms.bot@gmail.com)"
Actual Regex to parse apache logs.
Actual Regex to parse apache logs.
208.115.111.74 - - [13/Jan/2013:04:28:55 -0500] "GET /robots.txt HTTP/1.1"
301 303 "-" "Mozilla/5.0 (compatible; Ezooms/1.0; ezooms.bot@gmail.com)"
208.115.111.74 - - [13/Jan/2013:04:28:55 -0500] "GET /robots.txt HTTP/1.1"
301 303 "-" "Mozilla/5.0 (compatible; Ezooms/1.0; ezooms.bot@gmail.com)"
{
"client address": "208.115.111.74",
"user": null,
"timestamp": "2013-01-13T04:28:55-0500",
"verb": "GET",
"path": "/robots.txt",
"query": null,
"http version": 1.1,
"response code": 301,
"bytes": 303,
"referrer": null
"user agent": "Mozilla/5.0 (compatible; Ezooms/1.0; ezooms.bot@gmail.com)"
}
cat chain.plugins | grep together | sed 's/like/unix/' > pipeline
Let's talk briefly about two filters that are
very important to making our logs useful
takes a timestamp and makes it ISO 8601 Compliant
Turns this:
13/Jan/2013:04:28:55 -0500
Into this:
2013-01-13T04:28:55-0500
Grok parses arbitrary text and structures it.
Makes complex regex patterns simple.
USERNAME [a-zA-Z0-9_-]+
USER %{USERNAME}
INT (?:[+-]?(?:[0-9]+))
MONTH \b(?:Jan(?:uary)?|Feb(?:ruary)?|Mar(?:ch)?|Apr(?:il)?|May|Jun(?:e)?|Jul(?:y)?|Aug(?:ust)?|Sep(?:tember)?|Oct(?:ober)?|Nov(?:ember)?|Dec(?:ember)?)\b
DAY (?:Mon(?:day)?|Tue(?:sday)?|Wed(?:nesday)?|Thu(?:rsday)?|Fri(?:day)?|Sat(?:urday)?|Sun(?:day)?)
COMBINEDAPACHELOG %{IPORHOST:clientip} %{USER:ident} %{USER:auth}
\[%{HTTPDATE:timestamp}\] "(?:%{WORD:verb} %{NOTSPACE:request}
(?: HTTP/%{NUMBER:httpversion})?|-)" %{NUMBER:response}
(?:%{NUMBER:bytes}|-) %{QS:referrer} %{QS:agent}
Remember our apache log from earlier?
Define Inputs and Filters.
input {
file {
type => "apache"
path => ["/var/log/httpd/httpd.log"]
}
}
filter {
grok {
type => "apache"
pattern => "%{COMBINEDAPACHELOG}"
}
date {
type => "apache"
}
geoip {
type => "apache"
}
}
Define some outputs.
output {
statsd {
type => "apache"
increment => "apache.response.%{response}"
# Count one hit every event by response
}
elasticsearch {
type => "apache"
}
}
input {
twitter {
type => "twitter"
keywords => ["bieber","beiber"]
user => "username"
password => "*******"
}
}
output {
elasticsearch {
type => "twitter"
}
}
input {
file {
type => "syslog"
path => ["/data/rsyslog/**/*.log"]
}
}
filter {
### a bunch of groks, a date, and other filters
}
output {
type => "elasticsearch"
}
input {
tcp {
type => "syslog"
port => "514"
}
udp {
type => "syslog"
port => "514"
}
}
filter {
### a bunch of groks, a date, and other filters
}
output {
type => "elasticsearch"
}
Use an encrypted Transport
Logstash Agent
input { file { ... } }
output {
lumberjack {
hosts => ["logstash-indexer1", "logstash-indexer2"]
ssl_certificate => "/etc/ssl/logstash.crt"
}
}
Logstash Indexer
input {
lumberjack {
ssl_certificate => "/etc/ssl/logstash.crt"
ssl_key => "/etc/ssl/logstash.key"
}
}
output { elasticsearch {} }
input { exec { type => "system-loadavg" command => "cat /proc/loadavg | awk '{print $1,$2,$3}'" interval => 30 } } filter { grok { type => "system-loadavg" pattern => "%{NUMBER:load_avg_1m} %{NUMBER:load_avg_5m} %{NUMBER:load_avg_15m}" named_captures_only => true } } output { graphite { host => "10.10.10.10" port => 2003 type => "system-loadavg" metrics => [ "hosts.%{@source_host}.load_avg.1m", "%{load_avg_1m}", "hosts.%{@source_host}.load_avg.5m", "%{load_avg_5m}", "hosts.%{@source_host}.load_avg.15m", "%{load_avg_15m}" ] } }
write a logstash module!
Can do powerful things with [ boilerplate + ] a few lines of ruby
Local log dir on clients = cheap queue
Paul Czarkowski / paul@paulcz.net / @pczarkowski