Skip to main content

Overview

A Data Source provides an API for reading and writing JSON data that can be imported and used to make policy decisions. A Data Source can be added to Systems so that data and the policies that utilize it can be shared however is appropriate. The data is versioned and stored compactly with a delta encoding to handle large and frequently changing JSON objects. Data Sources require a unique, hierarchical name.

The Data Source name is used in the following two ways:

  • The Data Source name references the data provided by the Data Source from within a policy, similar to how you use a policy's package name to refer to it.

  • The Data Source name is embedded within the Styra DAS API used to read or write JSON data.

Supported Data Source Types

Styra DAS supports the following Data Sources:

  • Amazon S3 for bundle import

    A Data Source that downloads a rego bundle in .tar.gz archive format from an Amazon S3 bucket.

  • Amazon S3 for data import

    A Data Source that reads files with .json, .yaml, or .xml extensions from an Amazon S3 bucket.

  • GCS for bundle import

    A Data Source that downloads a rego bundle in .tar.gz archive format from a GCS bucket.

  • GCS for data object import

    A Data Source that reads files with .json, .yaml, or .xml extensions from a GCS bucket.

  • Git for data import

    A Data Source that reads files with .json, .yaml, or .xml extensions from a Git repository.

  • HTTPS

    A Data Source that accesses data from an external server through a URL.

  • JSON

    A Data Source that opens an endpoint for writing data in JSON format.

  • LDAP

    A Data Source that reads data from a configured LDAP service.

  • Okta

    A Data Source that reads users, groups, roles, and applications from Okta services.

Large Dataset Support

Bundles in Styra DAS are stored similarly to OPA. However, Styra DAS bundles use a different file format where data is stored differently so that large datasets can be processed more efficiently.

By default, large datasets are configured with the following options:

  • Minimum polling interval = 10 seconds
  • Manual data source executions = unlimited
  • Maximum data source size = 100MB
  • Number of data sources = unlimited