Monitor Styra DAS
Monitoring the integrity and performance of Styra DAS is important to deliver a highly available service.
Styra DAS also supports pushing system and time-series metrics to a remote server. Supported formats include dogstatsd and signalfx. Metrics exporters can be configured through the Styra DAS UI in Workspace >> Settings under Data Metrics Targets.
KPI Metrics
The following are the most important top-line metrics for Styra DAS:
- 5xx responses from ingress load balancer to clients
- Pod restart counts
- Postgres disk utilization
These metrics are collected through your infrastructure since there is no way for Styra DAS to obtain them.
Collecting ingress load balancer metrics depends on the particular load balancer you use. Request latency is useful for observing performance, and the 5xx response count is the metric that indicates users are currently having problems. We recommend alerting when more than five 5xx responses are observed within a ten-minute window.
Pod restart counts can be obtained through Kubernetes metrics. Styra DAS pods should not be restarted by Kubernetes. Typically pods are restarted due to a lack of memory or irrecoverable internal service errors. We recommend alerting when a pod is restarted more than once a day.
Postgres disk utilization metrics are available from the Postgres storage solution. It is important to allocate more space to Postgres before it fills up its storage because Postgres stores Styra DAS configuration and records of all decisions made by all of the connected OPAs. If Postgres runs out of space, Styra DAS will not be able to write more decisions and will cause cascading failures throughout the system. We recommend alerting when Postgres has used 80% or more of its disk space.
You should also monitor the open connections on the database. Once the limit is reached, Styra DAS will not be able to open any more connections and therefore save any configuration changes or decision logs.
Additional Metrics
Each Styra DAS microservice exposes a Prometheus metrics endpoint located at /v1/system/metrics
.
These metrics give a general overview of the health of each microservice.