Securing DataOps: The Case for Satori

by Jamie Lewis, Venture Partner

Few business leaders would argue against the value of making data-driven decisions. Today, data analytics give businesses insights in — or near — real-time, allowing them to respond to changing markets and customer needs. These benefits have driven a rapid evolution of the decades-old concept of the data warehouse. “Big data” and platforms such as Hadoop emerged, followed by “data as a service” and data lakes. Services such as Snowflake, Amazon Redshift Spectrum, and Google BigQuery brought massive scale to data operations by leveraging the cloud’s distributed and elastic nature, creating huge repositories that encompass both data warehouses and lakes.

These technologies evolved in lock-step with the use cases for the data themselves. The days of putting in a request for data and waiting hours — much less days or weeks — are long gone. Today, data is a self-serve resource, allowing data scientists to access whatever data they need whenever they need them, establishing data operations as an essential requirement for many companies. And just as DevOps transformed application development, DataOps has changed how data is stored, accessed, and used, especially in cloud-native environments.

While they are an obvious boon to the business, these self-serve data models create new risks. As companies concentrate data in these huge repositories, the security controls on the original data sources — which don’t fit either cloud technology or the self-serve use case — get left behind. Both security and compliance problems can occur as a result. Without proper controls, people can access personally identifiable information (PII) that regulations, security policies, or privacy policies intended to protect. More concerning still, security teams can quickly lose sight of how data flows in the organization, making it impossible to provide any level of consistent governance.

In the world of self-serve data, then, security leaders must strike a careful balance between the business’s need to move quickly and the security controls necessary to manage these risks. But that’s easier said than done.

These massive repositories contain information from many sources — sometimes in their original formats — with multiple points of ingress and egress. Building a new control layer on each data source that feeds into the repository isn’t just extremely difficult. It’s a patchwork job entailing levels of complexity that come close to guaranteeing failure — one the organization must repeat every time it adopts a new data platform. Architectural inertia is pushing organizations to build new controls that operate across multiple data services and many data types, often at the application layer. And that means the responsibility for building, monitoring, and maintaining these controls is often shared with, if not owned outright by, data operations people. The question for data leaders is how do they work with security and privacy leaders to enable controls that manage compliance and security risk while meeting the business’s needs?

Satori’s Approach: Agile Data Governance and Security

Satori co-founders Eldad Chai and Yoav Cohen understood this problem based on their previous experience securing big data systems. So they set out to build a DataOps security and privacy control layer. The Satori Secure Data Access Platform provides data governance and security for cloud data stores, allowing organizations to apply controls consistently across services such as Snowflake, BigQuery, and Redshift, independently of the data store or type.

Image for post
Image for post

Satori’s platform delivers these data governance functions:

  • Fine-grained policies: Satori can make access control decisions based on a variety of factors, including user identities, groups, data types, and schema. Teams can manage policies-as-code via APIs or the Satori console. The platform comes with policies for implementing the NIST Cyber Security Framework (CSF), the Payment Card Industry Data Security Standard (PCI DSS), and others.

Architecturally Speaking

Satori is a transparent proxy service that consists of two key components: the Context Engine and the Policy engine. The Context Engine asynchronously inspects all queries and their results, building a map of how data flows in the environment and how the organization is using it. Depending on the data access context, the Policy engine applies the policy for accessing a specific type of data.

Image for post
Image for post

In implementing these functions, Satori prioritized reliability and low latency to ensure the Secure Data Access Platform balances the business’s performance and security needs appropriately.

Satori accomplishes that goal by:

  • Using a proven network proxy for reliability and performance: While they can be effective, application layer proxies require organizations to add another component to their technology stack, increasing both complexity and the potential for higher latency. Instead, Satori based its service on , a proven and reliable network proxy. (According to , Nginx served or proxied 25.75 percent of the busiest websites in August 2020 and 36.45 percent of all active sites.) Consequently, Satori doesn’t require organizations to add application proxy components to their technology stacks, and Satori can focus on its query inspection, data mapping, and policy application functions, leveraging Nginx’s performance and reliability.

Deployment Options and Future Directions

The Secure Data Access Platform is fully containerized and runs on Kubernetes. Satori hosts the cloud service, allowing organizations to start very quickly. Alternatively, organizations can run the service on-prem in their Kubernetes clusters.

The asynchronous architecture enabled by dynamic inlining can also operate in either fail-open or fail-close configurations, allowing organizations to balance their overall performance and security approach. In a fail-open configuration (the default), the Data Access Controller will not interrupt connections between data consumers and data stores if the service is offline. No queries will go through in a fail-close configuration if the Data Access Controller is not inspecting them.

Today, Satori is focused on securing data warehouses and data lakes, including services such as Snowflake, Amazon Redshift, and Google BigQuery. But given its role as a medium between data consumers and data stores, Satori brings a level of future-proofing with it. As data operations and technologies continue to evolve, Satori can grow with them, providing its dynamic policy and data mapping capabilities. In the long term, Satori envisions using its architecture to help organizations with other aspects of data operations, such as performance management and troubleshooting.

Conclusion

The ongoing and rapid evolution of data operations technologies and services makes data access governance both more difficult and more necessary. Satori’s Secure Data Access Platform balances the business’s performance needs and security requirements, creating an agile governance layer that works across multiple data stores and types. The company’s focus on reliability and low latency shows in its product architecture — and the results seen by its early customers — yielding an effective solution that can evolve as data operations and services change. And that’s why we invested in the business.

Originally published at https://www.raincapital.vc on September 21, 2020.

Written by

Rain Capital is a cybersecurity venture fund based in the San Francisco bay area. A women-led and -managed fund, Rain invests in disruptive security companies.

Get the Medium app

A button that says 'Download on the App Store', and if clicked it will lead you to the iOS App store
A button that says 'Get it on, Google Play', and if clicked it will lead you to the Google Play store