Monitoring ENTERPRISE
Monitoring the integrity and performance of Styra DAS is important to deliver a highly available service.
Styra DAS supports pushing system and time-series metrics to a remote server rather than requiring that these metrics be pulled. Supported formats include dogstatsd
and signalfx
. Metrics exporters can be configured through the UI in Workspace >> Settings under Data Metrics Targets. For more information, see the Monitoring Integrations page.
Top Line Metrics
The following are the most important top line metrics for Styra DAS:
- 5xx responses from ingress load balancer to clients.
- Pod restart counts.
- Postgres disk utilization.
These metrics should be collected from your infrastructure because Styra DAS does not have a way to obtain them.
Collecting metrics from the ingress load balancer depends on the particular load balancer you use. Request latency is useful for observing performance, and the 5xx response count is the metric that indicates users are currently having problems. Styra recommends alerting when more than five 5xx responses are observed within a ten minute window.
Pod restart counts are obtained from Kubernetes metrics. Styra DAS pods should not be restarted by Kubernetes. Typically pods are restarted due to lack of memory or irrecoverable internal service errors. Styra recommends alerting when a pod is restarted more than once in a day.
Postgres disk utilization metrics are available from the Postgres storage solution. It is important to allocate more space to Postgres before it fills up its storage, because Postgres stores Styra DAS configuration and records of all decisions made by your OPAs. If Postgres run out of space, Styra DAS will not be able to write more decisions and will cause cascading failures to the rest of DAS. Styra recommends alerting when Postgres has used 80% or more of its disk space.
Additional Metrics
Each Styra DAS microservice exposes a Prometheus metrics endpoint located at /v1/system/metrics
.
These metrics give a general overview of the health of each microservice and are not critical to monitor.