Partially Redacted: Data, AI, Security, and Privacy

Partially Redacted brings together leaders in engineering, data, AI, security, and privacy to share knowledge, best practices, and real world experiences. Each episode provides an in-depth conversation with an industry expert who dives into their background and experience. They’ll share practical advice and insights about the techniques, tools, and technologies that every company – and every technology professional – should know about. Learn from an amazing array of founders, engineers, architects, and leaders in the data and AI space. Subscribe to the podcast and join the community at https://skyflow.com/community to stay up to date on the latest trends in data and AI, and to learn what lies ahead.

Listen on:

Episodes

Wednesday May 17, 2023

Canadian Data Privacy Regulations and History with nNovation’s Constantine Karbaliotis

Wednesday May 17, 2023

In this episode, Constantine Karbaliotis from nNovation, a certified privacy professional with a wealth of experience in the field of privacy and data protection joins the show. Constantine has served as a privacy officer for two multinational corporations, and now serves multiple organizations as a privacy advisor.
Constantine is well-versed in a range of privacy program management areas, including policy development, implementing PIA/PbD programs, vendor privacy management, breach management and response, addressing notice, consent, and data subject rights issues, as well as contract issues such as data transfer agreements and security/privacy addenda.
During our conversation, we explore the evolution of Canadian data privacy regulations, from their early beginnings to the current landscape, which is shaped by a range of federal and provincial laws. We discuss the primary Canadian privacy regulations that individuals and organizations should be aware of, and the differences between federal and provincial privacy laws, and how they impact individuals and organizations.
We also delve into how the Canadian government enforces privacy regulations, and the penalties that individuals and organizations can face for non-compliance. Additionally, we examine how recent high-profile data breaches have affected Canadian privacy regulations and the changes made in response.
We explore the challenges posed by emerging technologies, such as artificial intelligence and the Internet of Things, and their impact on Canadian privacy regulations. We also look at how individuals and organizations can stay up-to-date with the latest developments in Canadian privacy regulation and the resources available to help them comply.
Topics:
How has Canadian privacy regulation evolved over the years, and what impact has this had on individuals and organizations?
What are the primary Canadian privacy regulations that individuals and organizations should be aware of?
What are the differences between federal and provincial privacy laws in Canada, and how do they impact individuals and organizations?
How does the Canadian government enforce privacy regulations, and what penalties can individuals and organizations face for non-compliance?
How have recent high-profile data breaches affected Canadian privacy regulations, and what changes have been made in response?
How does Canadian privacy regulation compare to other countries, such as the EU's General Data Protection Regulation (GDPR)?
How can individuals and organizations stay up-to-date with the latest developments in Canadian privacy regulation, and what resources are available to help them comply?
How do emerging technologies, such as artificial intelligence and the Internet of Things, affect Canadian privacy regulations, and what challenges do they pose?
What do you see as the future of Canadian privacy regulation, and how do you think it will continue to evolve in the years to come?
Resources:
nNovation

Wednesday May 10, 2023

Understanding SOC-2 Compliance and Achieving It with Skyflow’s Daniel Wong

Wednesday May 10, 2023

In today's digital age, data privacy and security have become critical concerns for companies of all sizes. One way for companies to demonstrate their commitment to protecting customer data is by achieving SOC-2 compliance. But what exactly is SOC-2, and how can companies achieve it?
To answer these questions, Daniel Wong, Head of Security and Compliance at Skyflow, joins the show to share his insights into SOC-2 compliance and the steps companies can take to achieve it.
Throughout the interview, Daniel explains what SOC-2 compliance is, why it's important, and how it differs from other compliance standards. He also walks us through the key steps businesses can take to achieve SOC-2 compliance, including risk assessment, gap analysis, and remediation.
Daniel also highlights the benefits of using Skyflow's platform to achieve SOC-2 compliance, such as its ability to help companies protect sensitive data while still enabling secure data sharing. He also discusses the challenges that businesses may face when pursuing SOC-2 compliance and how to overcome them.
Whether you're a business owner or a data privacy professional, this interview with Daniel Wong provides valuable insights into SOC-2 compliance and how to achieve it.Topics:
Can you explain what SOC-2 compliance is, and why it's important for businesses to achieve it
What’s the difference between SOC-2 Type 1 and Type 2?
How do these compare to ISO 27001?
How does SOC-2 compliance differ from other compliance standards, such as PCI DSS or HIPAA?
What are some common challenges that businesses face when pursuing SOC-2 compliance, and how can they overcome them?
Can you walk us through the key steps that businesses need to take to achieve SOC-2 compliance?
Skyflow Data Privacy Vault is SOC-2 compliant, how long did that take and what was involved?
What’s that mean for our customers that want to achieve SOC-2 compliance?
What advice would you give to businesses that are just starting their SOC-2 compliance journey?
With something like a car, I can’t just manufacture a car in my house and start selling it. There’s certain inspections from a safety perspective that I have to pass. Do you think software needs more requirements like this before you can just launch something and start having people use it?
Where do you see standards like SOC-2 going in the future?
Resources:
Common Data Security and Privacy Mistakes with Daniel Wong
Skyflow is Certified SOC 2 Compliant

Wednesday May 03, 2023

Data Access Control with lakeFS’s Adi Polak

Wednesday May 03, 2023

Data access control is becoming increasingly important as more and more sensitive data is being stored and processed by businesses and organizations. In this episode, the VP of Developer Experience at lakeFS, Adi Polak, joins to help define data access control and give examples of sensitive data that requires access control.
Adi also talks about the concept of role-based access control (RBAC), which differs from traditional access control methods and provides several advantages. The steps involved in implementing RBAC are discussed, as well as best practices and challenges. Real-world examples of RBAC implementation and success stories are provided, and lessons learned from RBAC implementation are shared.
We also discuss lakeFS, an open-source platform that provides a Git-like interface for managing data lakes. In particular, we get into the data management controls, the security and privacy features, and the future of the product.
Topics:
What are some common types of data access controls?
Why are these types of controls important?
How can RBAC help organizations better manage and secure their data?
What are some challenges in implementing effective data access controls?
How can organizations balance data security with the need to provide employees with the information they need to do their jobs?
What are some best practices for managing data access control?
How do you ensure that data access controls remain effective over time as your organization grows and changes?
What is lakeFS?
What model of data access management does lakeFS support?
What are some of the other privacy and security features of lakeFS?
What’s next for lakeFS? Anything you can share?
Where do you see data access control going in the next 5-10 years?
Resources:
lakeFS Roadmap
Scaling Machine Learning with Spark: Distributed ML with MLlib, TensorFlow, and PyTorch

Wednesday Apr 26, 2023

The Ever Changing Privacy Landscape with Robert Bateman

Wednesday Apr 26, 2023

Europe has seen a significant evolution in privacy regulation over the past decade, with the introduction of the EU's General Data Protection Regulation (GDPR) in 2018 being a significant milestone. The GDPR establishes a comprehensive framework for protecting personal data and gives individuals greater control over how their data is collected, processed, and used.
The impact of these privacy regulations on businesses has been significant. Companies that operate in the EU or process EU citizens' data must comply with the GDPR's requirements or face significant fines and other penalties. This has required many businesses to implement new processes and technologies to ensure compliance, such as appointing data protection officers, conducting privacy impact assessments, and implementing data subject access request processes.
One particularly tricky situation to navigate for businesses is transatlantic data transfers.
Transatlantic data transfers face numerous challenges, including differing legal frameworks and data protection standards between the European Union (EU) and the United States (US). These differences can create legal uncertainty and potential risks for companies that transfer personal data across the Atlantic. In particular, the invalidation of the EU-US Privacy Shield framework by the European Court of Justice in 2020 has left companies without a clear mechanism for transatlantic data transfers, highlighting the need for a new agreement that meets the requirements of both the EU and the US. Additionally, concerns about government surveillance and data breaches have further complicated the transatlantic data transfer landscape, underscoring the need for strong data protection measures and clear regulatory frameworks.
Privacy and data protection writer and expert Robert Bateman, who has published over 1500 articles related to privacy, joins the show to breakdown the evolution of privacy regulations in Europe, the impact that’s had on businesses, and explain the challenges surrounding transatlantic data transfers.
Topics:
How have privacy regulations evolved and what impact have they had for businesses?
Can you discuss some of the history of Meta challenges in Europe?
How enforceable are the fines? Do companies actually end up paying the fines?
What are the key concerns around transatlantic data transfers?
How do the cultural differences between the US and EU impact their approach to privacy and what impact has this had?
How do organizations ensure compliance with privacy laws when transferring data between the US and EU?
EU and US data transfers. How do we make progress?
Could someone build meta from scratch today such that it is in compliance or is a business like this something that just can't exist under European privacy laws?
What are your thoughts on the impact that generative AI might have on privacy?
Resources:
Data Protection Newsletter

Wednesday Apr 19, 2023

Introduction to Zero Trust Infrastructure with Hashicorp’s Rosemary Wang

Wednesday Apr 19, 2023

Zero trust infrastructure is an approach to security that requires all users, devices, and services to be authenticated and authorized before being granted access to resources. Unlike traditional security models that assume everything inside the network is trusted, zero trust assumes that all traffic is untrusted.In today's world, where cyber threats are becoming increasingly sophisticated, Zero trust infrastructure is crucial for protecting sensitive data and preventing unauthorized access.
Hashicorp is a company that provides a suite of tools for building and managing secure systems. Their products, such as Vault, Consul, and Boundary, can help organizations implement a zero trust approach to security.Vault is a tool for securely storing and managing secrets such as passwords, API keys, and certificates. It provides a centralized place to manage access to secrets and has several features to ensure the security of these secrets, such as encryption, access control, and auditing.
Consul is a service discovery and configuration tool that provides a secure way to connect and manage services across different networks. It provides features such as service discovery, health checking, and load balancing, and can be integrated with Vault for secure authentication and authorization.
Boundary is a tool for securing access to infrastructure and applications. It provides a secure way to access resources across different networks and can be integrated with Vault and Consul for secure authentication and authorization.
Rosemary Wang, Developer Advocate at Hashicorp joins the show to explain zero trust infrastructure and how Vault, Consul, and Boundary help organizations build zero trust into their architecture.
Topics:
Why do you think we need developer tooling for access and authorization at a lower level within someone’s infrastructure?
Can you explain what zero trust is and why it's important for modern security architectures?
How does HashiCorp Vault, Boundary, and Consul fit into a zero trust security model?
What is HashiCorp Vault and what problem does it help a company solve?
What are some common use cases for HashiCorp Vault, and how can it help organizations with their security and compliance requirements?
How does HashiCorp Vault handle secrets rotation and expiration?
What is application based networking and how does this concept relate to HashiCorp Consul?
Can you walk us through the process of setting up and configuring HashiCorp Consul for a typical enterprise environment?
What are some common challenges or pitfalls that organizations face when using HashiCorp Consul, and how can they overcome them?
How does Boundary simplify remote access to critical resources in a zero trust environment?
What are some common use cases for HashiCorp Boundary, and how can it help organizations with their security and compliance requirements?
How does HashiCorp approach balancing security with ease of use for its products?
Can you talk about any upcoming features or developments in Vault, Boundary, or Consul that users should be excited about?
Resources:
@joatmon08

Wednesday Apr 12, 2023

Data Deletion and Mapping via a Data Privacy Vault with Lisa Nee and Robert Duffy

Wednesday Apr 12, 2023

The privacy landscape is changing. There is increasing consumer awareness and concern over the use of their personal data and there’s an ever growing list of privacy regulations that companies need to navigate.
Regulations like GDPR, CCPA, and others carry stiff fines for companies that fail to comply with data deletion requests. However, actually being able to delete someone’s information from an existing system is more complicated than you might expect. Large systems have been developed over many years ignoring the potential impact of PII sprawl. As a consequence, user data is everywhere and no one actually knows all the locations it might exist in.
A data privacy vault is an architectural approach to data privacy that helps address data deletion, mapping, and other data privacy challenges. A data privacy vault is an isolated, protected, single source of truth for customer PII.
Lisa Nee, Compliance Officer United States, Data Privacy Legal Expert North America and Legal Advisor Americas for Atos and Robert Duffy Counsel for McDermott Will & Emery with a focus in privacy and cybersecurity have spent their careers working in privacy. They join the show to discuss why 2023 is the year of privacy, the impact that failing to delete data is having on businesses, and how a data privacy vault along with synthetic data are the keys to addressing these problems.
Topics:
Why is 2023 the year of privacy?
What laws are out there to require deletion?
What is data retention and why is it a risk for businesses?
What is PII sprawl?
What’s the cost of a violation to delete someone’s data?
How do you fix this problem?
How do you comply?
What is a data vault?
Where did you first learn about this concept?
How does a data vault help address the deletion problem?
What is synthetic data and how does it help with the deletion problem?
What future looking tools and technologies are you excited about
Resources:
IEEE Privacy Engineering
Data Mapping, Extracting Data Utility Before Deleting Data Files

Wednesday Apr 05, 2023

Privacy Threat Modeling with DoorDash’s Nandita Rao Narla

Wednesday Apr 05, 2023

Privacy threat modeling is a structured approach to identifying and assessing potential privacy risks associated with a particular system, application, or process. It involves analyzing how personal data flows through a system, identifying potential vulnerabilities or weaknesses, and evaluating the potential consequences of a privacy breach.
The goal of privacy threat modeling is to identify and prioritize potential privacy risks and to develop effective strategies for mitigating those risks. This process involves considering various aspects of the system or process being analyzed, including the data that is collected, how it is stored and processed, who has access to it, and how it is transmitted.
Privacy threat modeling can help organizations better understand their privacy risks and make more informed decisions about how to protect personal data. Implementing privacy measures and conducting regular privacy threat modeling can help organizations minimize the risk of a privacy breach and ultimately save them money in the long run.
Nandita Rao Narla, Head of Technical Privacy & Governance at DoorDash, joins the show to explain privacy threat modeling, the common misconceptions, and how to make a privacy threat model program successful.
Topics:
What is privacy threat modeling?
How do you balance the need to collect and use data with the need to protect privacy, and what role does privacy threat modeling play in this process?
Who typically owns this process in an organization?
What are some of the typical approaches companies follow to privacy threat modeling?
How should companies think about setting up a process to continually iterate and evolve the model?
Once you’ve performed this process, how do you go about fixing the identified issues?
What are some common misconceptions about privacy threat modeling, and how would you address those misconceptions?
How do you determine which threats to prioritize when conducting privacy threat modeling, and what factors do you consider when making these decisions?
How do you involve stakeholders (e.g. customers, employees, regulators) in the privacy threat modeling process, and what benefits do you see in doing so?
What challenges have you encountered when conducting privacy threat modeling, and how have you overcome these challenges?
How does privacy threat modeling differ from other types of risk assessments (e.g. security risk assessments), and what unique challenges does it present?
What advice would you give to other companies looking to implement privacy threat modeling as part of their privacy and security strategy?
How do you see privacy threat modeling evolving in the future?
Resources:
Shostack and Associates Blog
Strategic Privacy by Design
LINDDUN privacy engineering

Wednesday Mar 29, 2023

Privacy-aware Data Pipelines with Skyflow’s Piper Keyes

Wednesday Mar 29, 2023

A data analytics pipeline is important to modern businesses because it allows them to extract valuable insights from the large amounts of data they generate and collect on a daily basis. This leads to better decision making, improved efficiency, and increased ROI.
However, despite your best efforts, sensitive customer data tends to find its way into our analytics pipelines, ending up in our data warehouses and metrics dashboards. Replicating customer PII to your downstream services greatly increases your compliance scope and makes maintaining data privacy and security significantly more challenging.
In this episode, Engineering Lead at Skyflow Piper Keyes joins the show to discuss what goes into building a privacy-aware data pipeline, what tools and technologies should you be using, and how Skyflow addresses this problem.
Topics:
What is a data analytics pipeline?
What does it mean to build a privacy-aware data pipeline?
Can you give some examples of use cases where privacy-aware data pipelines are particularly important?
What’s it mean to de-identify data and how does that work?
What are some common techniques used to preserve privacy in data pipelines?
How does analytics work for de-identified data?
How do you balance the need for data privacy with the need for actually being able to use the data?
What’s it take to build a privacy-aware pipeline from scratch?
What are some of the biggest challenges in building privacy-aware data pipelines?
How does something like this work with Skyflow?
Let’s say I have customer’s transactional data from Visa, how could I ingest that data into my data warehouse but avoid having to build PCI compliance infrastructure? Walk me through how that works.
Could you build a machine learning model based on the de-identified data?
Once I have the data in my warehouse, let’s say I needed to inform a clinical trial participant about an issue but I also want to maintain their privacy, how could I perform an operation like that?
What other use cases does this product enable?
Resources:
Running Secure Workflows with Sensitive Customer Data
Maximize Privacy while Preserving Utility for Data Analytics

Wednesday Mar 22, 2023

Ingesting and Processing Government Data with Merit’s Charlie Summers

Wednesday Mar 22, 2023

Merit’s verified identity platform brings visibility, liquidity, and trust to people-data, giving organizations the clarity to make better-informed decisions, engage with individuals effectively, and pursue their mission efficiently. Merit works with trusted private, state, and municipal organizations to solve critical real-world problems in sectors such as workforce development, emergency services, licensing, education, and defense readiness.
Merit ingests and processes highly sensitive data from a variety of government agencies. Privacy and security are of the utmost importance, but they must also balance data utility. To support customer and business needs, Merit uses a combination of off the shelf data stack tools and technologies along with off homegrown techniques around encryption and encryption key management.
Staff engineer and data tech lead, Charlie Summers, joins the show to breakdown Merit’s data stack, the life of data, the challenges they’ve faced with protecting sensitive data, and the ways they secure customer data.
Topics:
Can you talk about Merit and your role there?
What kind of data are you typically dealing with at Merit?
What’s your data stack?
Can you take me through the life of a piece of data?
What’s the scale of the data you’re working with? How big is this data set?
What challenges have you faced with securing sensitive data while using this stack?
What tools, technologies, or techniques are you using to protect the data?
How are you balancing the security of the data with the actual utility?
How do you control access to the data?
How does auditing work? Is every time the data touched logged in some way?
How did you think through build versus buy?
Why is privacy and security a priority for Merit?
What future technologies in this space are you particularly excited about?
Resources:
Careers at Merit

Wednesday Mar 15, 2023

Confidential Computing and Secure Enclaves with AWS’s Arvind Raghu

Wednesday Mar 15, 2023

For years engineers have relied on encryption at rest and transit to help protect sensitive data. However, historically data needs to be decrypted to actually use it, which risks the potential exposure of the underlying data. Confidential computing is a computing paradigm that aims to protect data in use, not just data in transit or at rest. The goal of confidential computing is to provide a secure computing environment where sensitive data can be processed without the risk of exposure or compromise.
AWS Nitro Enclaves is a service provided by Amazon Web Services (AWS) that enables customers to create isolated compute environments within their Amazon Elastic Compute Cloud (EC2) instances. In a Nitro Enclave, the application code and data are encrypted and processed inside the enclave, ensuring that they are protected from both the hypervisor and the host operating system. This makes Nitro Enclaves ideal for workloads that require a high level of security, such as confidential computing, secure machine learning, and blockchain-based applications.
Arvind Raghu, Principal Specialist in EC2 and Confidential Computing at AWS, joins the show to explain confidential computing, AWS Nitro Enclaves, and the use cases this technology unlocks.
Topics:
What is confidential computing?
What’s the motivation behind the investment in this technology?
What are some of the problems this approach to privacy and security solves that were previously a potential vulnerability for companies?
How does a hardware-based trusted execution environment prevent a bad actor from executing unauthorized code? How is the memory space protected?
Can you explain how Nitro Enclaves enhance the security of confidential computing on AWS?
What’s the process for using Nitro Enclaves versus a standard EC2 instance
How do I go about using Nitro Enclave for performing an operation on sensitive data? What does the programming process look like to do that?
What are some use cases that you’ve seen that you are particularly excited about?
How can Nitro Enclaves be used to protect sensitive data in specific use cases, such as financial services or healthcare?
Are there any limitations or trade-offs to consider when using Nitro Enclaves for confidential computing?
What innovations or business directions do you think secure enclaves will enable in the future?
What’s next for Nitro Enclaves? Anything you can share?
Where do you see the area of confidential computing going in the next 5-10 years?
Resources:
Introducing Unified ID 2.0 Private Operator Services on AWS Using Nitro Enclaves