Configuration (0.5.0 and earlier)
Attention
This documentation is for versions prior to 0.6.0. For 0.6.0 there are two different documentation sections that replace this section: system configuration and definitions.
Syntax is sectional, with each section having a type and a name, followed by {
and ending with }
. Key/value pairs follow of the form key = value
. Key names are non-whitespace characters before the =
. The value goes until end of line and is a string. Multi-line strings are supported using backticks to delimit start and end of string. Comments go from a #
to end of line (unless the #
appears in a backtick string). Whitespace is trimmed at ends of values and keys. Files are UTF-8 encoded.
Variables perform simple text replacement - they are not intelligent. They are any key whose name begins with $
, and may also be surrounded by braces ({
, }
) to disambiguate between shorter keys (ex: ${var}
) Before an expression is evaluated, all variables are evaluated in the text. Variables can be defined at any scope, and will shadow other variables with the same name of higher scope.
Environment variables may be used similarly to variables, but with env.
preceding the name. For example: tsdbHost = ${env.TSDBHOST}
(with or without braces). It is an error to specify a non-existent or empty environment variable.
Globals are all key=value pairs not in a section. These are generally placed at the top of the file. Every variable is optional, though you should enable at least 1 backend.
tsdb-host:4242
. Defaults to port 4242 if no port specified. If you use opentsdb without relaying the data through Bosun currently the following won’t work (and this isn’t something we officially support):
avg:metric.name{tag=something-*}
. However single asterisks like tag=*
will still work.With bosun v0.5.0, bosun uses redis as a storage mechanism for it’s internal state. You can either run a redis instance to hold this data, or bosun can use an embedded server if you would rather run standalone (using ledisDb). Redis is recommend for production use. This gist shows an example redis config, tested redis version, and an example cron job for backing up the redis data.
Config items:
localhost:6379
. Redis 3.0 or greater is required.0
.ledis_data
in working dir if no redis host is provided.127.0.0.1:9565
.5m
1
.:8070
5
.1048576
)3d
; currently used by the hosts list on the items pagebosun.state
timeAndDate = 202,75,179,136
adds adds Portland, Denver, New York, and London to the datetime links generated in alerts. See timeanddate.com documentationThese optional fields, if either is specified, will authenticate with the SMTP server
Macros are sections that can define anything (including variables). It is not an error to reference an unknown variable in a macro. Other sections can reference the macro with macro = name
. The macro’s data will be expanded with the current variable definitions and inserted at that point in the section. Multiple macros may be thus referenced at any time. Macros may reference other macros. For example:
$default_time = "2m"
macro m1 {
$w = 80
warnNotification = default
}
macro m2 {
macro = m1
$c = 90
}
alert os.high_cpu {
$q = avg(q("avg:rate:os.cpu{host=ny-nexpose01}", $default_time, ""))
macro = m2
warn = $q > $w
crit = $q >= $c
}
Will yield a warn expression for the os.high_cpu alert:
avg(q("avg:rate:os.cpu{host=ny-nexpose01}", "2m", "")) > 80
and set warnNotification = default
for that alert.
Templates are the message body for emails that are sent when an alert is triggered. Syntax is the golang text/template package. Variable expansion is not performed on templates because $
is used in the template language, but a V()
function is provided instead. Email bodies are HTML, subjects are plaintext. Macro support is currently disabled for the same reason due to implementation details.
Status
field (an integer) with a textual string representation; and a Time
field. Most recent last. The status fields have identification methods: IsNormal()
, IsWarning()
, IsCritical()
, IsUnknown()
, IsError()
.$
. For example: {{.Alert.Vars.q}}
to print $q
.nil
tags if none exists, otherwise nil
.DescByValue
function may be called on the result of this to sort descending by value: {{(.EvalAll .Alert.Vars.expr).DescByValue}}
.metric
and name
are strings. tags
may be a tag string ("tagk=tagv,tag2=val2"
) or a tag set (.Group
). If If name
is the empty string, a slice of metadata matching the metric and tag is returned. Otherwise, only the metadata value is returned for the given name, or nil
for no match.expression
is a string or an expression and y_label
is a string. y_label
is an optional argument.expression
is a string.expression
is a string or an expression and y_label
is a string. y_label
is an optional argument.keyString
because the group (aka tags) if the alert is used.keyString
since it is not scoped to the alert.template test {
subject = {{.Last.Status}}: {{.Alert.Name}} on {{.Group.host}}
body = `
{{ $filter := (.Eval .Alert.Vars.filter)}}
{{ $index := (.Eval .Alert.Vars.index)}}
{{range $i, $x := .ESQuery $index $filter "5m" "" 10 }}
<p>{{$x.machinename}}</p>
{{end}}
`
}
alert test {
template = test
$index = esls("logstash")
$filter = esand(esregexp("source", ".*"), esregexp("machinename", "ls-dc.*"))
crit = avg(escount($index, "source,machinename", $filter, "2m", "10m", ""))
}
Global template functions:
$
character being used by the Go template syntax.{{5.1 | pct}}
-> 5.10%
.{{short "foo.baz.com"}}
-> foo
.$notes = <a href="...">Foo</a>
and the in the template you can render it as html with {{ html .Alert.Vars.notes }}
All body templates are associated, and so may be executed from another. Use the name of the other template section for inclusion. Subject templates are similarly associated.
An example:
template name {
body = Name: {{.Alert.Name}}
}
template ex {
body = `Alert definition:
{{template "name" .}}
Crit: {{.Alert.Crit}}
Tags:{{range $k, $v := .Group}}
{{$k}}: {{$v}}{{end}}
`
subject = {{.Alert.Name}}: {{.Alert.Vars.q | .E}} on {{.Group.host}}
}
The unknown template (set by the global option unknownTemplate
) acts differently than alert templates. It receives groups of alerts since unknowns tend to happen in groups (i.e., a host stops reporting and all alerts for that host trigger unknown at the same time).
Variables and function available to the unknown template:
Example:
template ut {
subject = {{.Name}}: {{.Group | len}} unknown alerts
body = `
<p>Time: {{.Time}}
<p>Name: {{.Name}}
<p>Alerts:
{{range .Group}}
<br>{{.}}
{{end}}`
}
unknownTemplate = ut
An alert is an evaluated expression which can trigger actions like emailing or logging. The expression must yield a scalar. The alert triggers if not equal to zero. Alerts act on each tag set returned by the query. It is an error for alerts to specify start or end times. Those will be determined by the various functions and the alerting system.
lookup("table", "key")
is an entire critNotification
value. See example below.ignoreUnknown
with this setting would be uneccesary.checkFrequency
at which to run this alert. If unspecified, the global defaultRunEvery
will be used.tagk=tagv
pairs. tagv
is a regex. If the current tag group matches all values, the alert is squelched, and will not trigger as crit or warn. For example, squelch = host=ny-web.*,tier=prod
will match any group that has at least that host and tier. Note that the group may have other tags assigned to it, but since all elements of the squelch list were met, it is considered a match. Multiple squelch lines may appear; a tag group matches if any of the squelch lines match.log = true
will make the alert behave as a “log alert”. It will never show up on the dashboard, but will execute notifications every check interval where the status is abnormal.maxLogFrequency = 5m
will ensure that notifications only fire once every 5 minutes for any given alert key. Only valid on log alerts.Example of notification lookups:
notification all {
#...
}
notification n {
#...
}
notification d {
#...
}
lookup l {
entry host=a {
v = n
entry host=b* {
v = d
}
}
alert a {
crit = 1
critNotification = all # All alerts have the all notification.
# Other alerts are passed through the l lookup table and may add n or d.
# If the host tag does not match a or b*, no other notification is added.
critNotification = lookup("l", "v")
# Do not evaluate this alert if its host is down.
depends = alert("host.down", "crit")
}
A notification is a chained action to perform. The chaining continues until the chain ends or the alert is acknowledged. At least one action must be specified. next
and timeout
are optional. Notifications are independent of each other and executed concurrently (if there are many notifications for an alert, one will not block another).
.
variable. The V
function is available as in other templates. Additionally, a json
function will output JSON-encoded data.application/x-www-form-urlencoded
, you may set the contentType variable.runOnActions = false
.Person Name <addr@domain.com>
and addr@domain.com
. Alert template subject and body used for the email.application/x-www-form-urlencoded
by default, but may be overriden by setting the contentType
variable for the notification.print = true
Example:
# HTTP Post to a chatroom, email in 10m if not ack'd
notification chat {
next = email
timeout = 10m
post = http://chat.meta.stackoverflow.com/room/318?key=KEY&message=whatever
}
# email sysadmins and Nick each day until ack'd
notification email {
email = sysadmins@stackoverflow.com, nick@stackoverflow.com
next = email
timeout = 1d
}
# post to a slack.com chatroom via Incoming Webhooks integration
notification slack{
post = https://hooks.slack.com/services/abcdef
body = {"text": {{.|json}}}
}
#post json
notification json{
post = https://someurl.com/submit
body = {"text": {{.|json}}, apiKey="2847abc23"}
contentType = application/json
}
Lookups are used when different values are needed based on the group. For example, an alert for high CPU use may have a general setting, but need to be higher for known high-CPU machines. Lookups have subsections for lookup entries. Each entry subsection is named with an OpenTSDB tag group, and supports globbing. Entry subsections have arbitrary key/value pairs.
The lookup
function can be used in expressions to query lookup data. It takes two arguments: the name of the lookup table and the key to be extracted. When the function is executed, all possible combinations of tags are fetched from the search service, matched to the correct rule, and returned. The first successful match is used. Unmatched groups are ignored.
For example, to filter based on host:
lookup cpu {
entry host=web-* {
high = 0.5
}
entry host=sql-* {
high = 0.8
}
entry host=* {
high = 0.3
}
}
alert cpu {
crit = avg(q("avg:rate:os.cpu{host=*}", "5m", "")) > lookup("cpu", "high")
}
Multiple groups are supported and separated by commas. For example:
lookup cpu {
entry host=web-*,dc=eu {
high = 0.5
}
entry host=sql-*,dc=us {
high = 0.8
}
entry host=*,dc=us {
high = 0.3
}
entry host=*,dc=* {
high = 0.4
}
}
alert cpu {
crit = avg(q("avg:rate:os.cpu{host=*,dc=*}", "5m", "")) > lookup("cpu", "high")
}
tsdbHost = tsdb01.stackoverflow.com:4242
smtpHost = mail.stackoverflow.com:25
template cpu {
body = `Alert definition:
Name: {{.Alert.Name}}
Crit: {{.Alert.Crit}}
Tags:{{range $k, $v := .Group}}
{{$k}}: {{$v}}{{end}}
`
subject = cpu idle at {{.Alert.Vars.q | .E}} on {{.Group.host}}
}
notification default {
email = someone@domain.com
next = default
timeout = 1h
}
alert cpu {
template = cpu
$q = avg(q("sum:rate:linux.cpu{host=*,type=idle}", "1m"))
crit = $q < 40
notification = default
}