Quantcast
Channel: Percona Database Performance Blog
Viewing all articles
Browse latest Browse all 1785

MongoDB Integrated Alerting in Percona Monitoring and Management

$
0
0
MongoDB Integrated Alerting

MongoDB Integrated AlertingPercona Monitoring and Management (PMM) recently introduced the Integrated Alerting feature as a technical preview. This was a very eagerly awaited feature, as PMM doesn’t need to integrate with an external alerting system anymore. Recently we blogged about the release of this feature.

PMM includes some built-in templates, and in this post, I am going to show you how to add your own alerts.

Enable Integrated Alerting

The first thing to do is navigate to the PMM Settings by clicking the wheel on the left menu, and choose Settings:

Next, go to Advanced Settings, and click on the slider to enable Integrated Alerting down in the “Technical Preview” section.

While you’re here, if you want to enable SMTP or Slack notifications you can set them up right now by clicking the new Communications tab (which shows up after you hit “Apply Changes” turning on the feature).

The example below shows how to configure email notifications through Gmail:

You should now see the Integrated Alerting option in the left menu under Alerting, so let’s go there next:

Configuring Alert Destinations

After clicking on the Integrated Alerting option, go to the Notification Channels to configure the destination for your alerts. At the time of this writing, email via your SMTP server, Slack and PagerDuty are supported.

Creating a Custom Alert Template

Alerts are defined using MetricsQL which is backward compatible with Prometheus QL. As an example, let’s configure an alert to let us know if MongoDB is down.

First, let’s go to the Explore option from the left menu. This is the place to play with the different metrics available and create the expressions for our alerts:

To identify MongoDB being down, one option is using the up metric. The following expression would give us the alert we need:

up{service_type="mongodb"}

To validate this, I shut down a member of a 3-node replica set and verified that the expression returns 0 when the node is down:

The next step is creating a template for this alert. I won’t go into a lot of detail here, but you can check Integrated Alerting Design in Percona Monitoring and Management for more information about how templates are defined.

Navigate to the Integrated Alerting page again, and click on the Add button, then add the following template:

---
templates:
  - name: MongoDBDown
    version: 1
    summary: MongoDB is down
    expr: |-
      up{service_type="mongodb"} == 0
    severity: critical
    annotations:
      summary: MongoDB is down ({{ $labels.service_name }})
      description: |-
        MongoDB {{ $labels.service_name }} on {{ $labels.node_name }} is down

This is how it looks like:

Next, go to the Alert Rules and create a new rule. We can use the Filters section to add comma-separated “key=value” pairs to filter alerts per node, per service, per agent, etc.

For example: node_id=/node_id/123456, service_name=mongo1, agent_id=/agent_id/123456

After you are done, hit the Save button and go to the Alerts dashboard to see if the alert is firing:

From this page, you can also silence any firing alerts.

If you configured email as a destination, you should have also received a message like this one:

For now, a single notification is sent. In the future, it will be possible to customize the behavior.

Creating MongoDB Alerts

In addition to the obvious “MongoDB is down” alert, there are a couple more things we should monitor. For starters, I’d suggest creating alerts for the following conditions:

  • Replica set member in an unusual state

mongodb_replset_member_state != 1 and mongodb_replset_member_state != 2

  • Connections higher than expected

avg by (service_name) (mongodb_connections{state="current"}) > 5000

  • Cache evictions higher than expected

avg by(service_name, type) (rate(mongodb_mongod_wiredtiger_cache_evicted_total[5m])) > 5000

  • Low WiredTiger tickets

avg by(service_name, type) (max_over_time(mongodb_mongod_wiredtiger_concurrent_transactions_available_tickets[1m])) < 50

The values listed above are just for illustrative purposes, you need to decide the proper thresholds for your specific environment(s).

As another example, let’s add the alert template for the low WiredTiger tickets:

---
templates:
  - name: MongoDB Wiredtiger Tickets
    version: 1
    summary: MongoDB Wiredtiger Tickets low
    expr: avg by(service_name, type) (max_over_time(mongodb_mongod_wiredtiger_concurrent_transactions_available_tickets[1m])) < 50
    severity: warning
    annotations:
      description: "WiredTiger available tickets on (instance {{ $labels.node_name }}) are less than 50"

Conclusion

Integrated alerting is a really nice to have feature. While it is still in tech preview state, we already have a few built-in alerts you can test, and also you can define your own. Make sure to check the Integrated Alerting official documentation for more information about this topic.

Do you have any specific MongoDB alerts you’d like to see? Given the feature is still in technical preview, any contributions and/or feedback about the functionality are welcome as we’re looking to release this as GA very soon!


Viewing all articles
Browse latest Browse all 1785

Trending Articles