Partially Redacted: Data, AI, Security, and Privacy
Partially Redacted brings together leaders in engineering, data, AI, security, and privacy to share knowledge, best practices, and real world experiences. Each episode provides an in-depth conversation with an industry expert who dives into their background and experience. They’ll share practical advice and insights about the techniques, tools, and technologies that every company – and every technology professional – should know about. Learn from an amazing array of founders, engineers, architects, and leaders in the data and AI space. Subscribe to the podcast and join the community at https://skyflow.com/community to stay up to date on the latest trends in data and AI, and to learn what lies ahead.
Episodes
Wednesday Sep 11, 2024
Wednesday Sep 11, 2024
In this episode, Sean sat down with Jack Godau to dive deep into the world of pseudoanonymization. They started by discussing Jack's career trajectory working with highly sensitive data and how that experience shapes his engineering mindset. Jack shared how pseudoanonymization differs from anonymization, explaining its value for maintaining data utility while complying with stringent regulations like GDPR.
Jack also walked us through the challenges and key components of building a pseudoanonymization engine, including the complexities of handling re-identification risks, ensuring scalability, and optimizing performance for large datasets. He shared insights on the trade-offs between data protection and usability, and whether building these systems in-house is worth the investment for startups. Finally, they explored where the field is heading, especially as data privacy concerns continue to grow.
Wednesday Aug 28, 2024
Wednesday Aug 28, 2024
In this episode, Sean sits down with Ben Burkert, Co-founder and CTO of Anchor, to dive into the world of certificate management and internal TLS. We explore how certificates and TLS function, the inherent difficulties in managing internal TLS certificates, and why nearly every engineer has a horror story related to it. Ben also shares insights into how Anchor is addressing these challenges and making internal TLS certificate management simpler and more reliable.
Key Topics:Understanding Certificates and TLS:Basics of how certificates and TLS work.The role of TLS in securing internal communications.
The Challenges of Internal TLS Certificate Management:Why managing internal TLS certificates is so difficult.Common pitfalls and challenges engineers face.
Engineer Horror Stories:Real-world examples of certificate management gone wrong.The impact of these failures on teams and organizations.
How Anchor is Fixing the Problem:Anchor’s approach to simplifying internal TLS certificate management.Key features and benefits of Anchor’s solution.
If you've ever struggled with internal TLS certificates or are looking for a way to avoid the pain altogether, Ben’s expertise provides a clear path to overcoming the challenges of certificate management with a modern, reliable approach.
Resources:https://anchor.dev/https://lcl.host/
Wednesday Aug 14, 2024
Wednesday Aug 14, 2024
In this episode, we sit down with Ori Rafael, CEO and Co-founder of Upsolver, to explore the rise of the lakehouse architecture and its significance in modern data management. Ori breaks down the origins of the lakehouse and how it leverages S3 to provide scalable and cost-effective storage. We discuss the critical role of open table formats like Apache Iceberg in unifying data lakes and warehouses, and how ETL processes differ between these environments. Ori also shares his vision for the future, highlighting how Upsolver is positioned to empower organizations as they navigate the rapidly evolving data landscape.
Wednesday Jul 31, 2024
Wednesday Jul 31, 2024
In this episode, Sean Falconer is joined by Aubrey King, solutions architect and community evangelist at F5, to discuss the top 10 security issues for LLM applications. They explore critical threats such as prompt injections, insecure output handling, and training data poisoning, among others. Aubrey provides insights into why these issues arise, the attacks being observed, and the methods used to mitigate these risks. This episode is essential listening for anyone interested in the security of large language models and their applications.
Wednesday Jul 17, 2024
Wednesday Jul 17, 2024
In this episode, host Sean Falconer sits down with Eric Flaningam, a researcher at Felicis Ventures, to explore the fascinating world of data warehouses. They dive into the history, evolution, and future trends of data warehousing, shedding light on its importance. Key topics discussed include an overview of the article "A Primer on Data Warehouses," and the definition and key characteristics of data warehouses. They also cover the historical evolution and major milestones in data warehousing, the shift from batch processing to real-time data, and the convergence of data warehouses and SQL.
Eric and Sean discuss the impact of unstructured and complex data, advancements in technology and their effect on data warehouses, and the technical architecture and components of a typical data warehouse. They share real-world benefits and use cases of data warehouses, common challenges in implementing and maintaining data warehouses, and future trends and the influence of AI and machine learning on data warehouses.
For further reading, check out Eric Flaningam’s article, A Primer on Data Warehouses: https://www.generativevalue.com/p/a-primer-on-data-warehouses
Wednesday Jul 10, 2024
Wednesday Jul 10, 2024
Join us as we chat with Tim Jensen, a privacy enthusiast, about personal online security. Tim shares his journey to becoming a privacy advocate and teacher and provides insights into the common mistakes people make with passwords. We discuss why passwords have persisted for over 60 years, the issues with current password creation methods, and the balance between complexity and usability.
We also explore strategies to protect personal information beyond just using better passwords. Finally, Tim shares his thoughts on future approaches to password and identity protection.
Wednesday Jun 19, 2024
Wednesday Jun 19, 2024
In this episode Sean welcomes Brian Vallelunga, CEO and founder of Doppler, to discuss secrets management. Brian shares the journey of founding Doppler, a company dedicated to securing sensitive data such as API keys and credentials. Sean and Brian discuss the nuances of secrets management, its distinction from password management, and the importance of dedicated services for safeguarding secrets.
The episode also addresses the alarming rise in data breaches, common mistakes companies make, and essential practices for managing secrets effectively. Brian offers expert advice on protecting secrets, the necessity for secret rotation, and the future of secrets management.
Wednesday Jun 05, 2024
Wednesday Jun 05, 2024
In this episode, Sean is joined by Eric Dodds, Head of Product Marketing at RudderStack, to dive into the world of data management, data pipelines, and common data mistakes. Eric shares his insights on when organizations should transition from basic tools like spreadsheets to a more sophisticated data stack, including data warehouses and modern tooling.
They discuss the challenges businesses face in data management, specifically about coming up with a common set of definitions that an organization is aligned around. They also discuss how to address these issues, and the importance of secure handling of customer data.
Eric also provides an overview of RudderStack, its open-source approach, and the value it brings to managing customer data. Eric shares a ton of practical advice on building and optimizing your data infrastructure.
Wednesday May 15, 2024
Wednesday May 15, 2024
In this episode, Kirk Marple, CEO and Co-founder of Graphlit, joins the show. Sean and Kirk dive into the world of unstructured data management, discussing the evolution and current challenges in the field.
While structured data has been well-handled since the 1970s, 80-90% of the world’s data remains unstructured, with predictions of 175 billion terabytes by 2025. Despite this vast amount, companies struggle to utilize it effectively due to immature tools and processes. Graphlit was founded to address this gap, providing scalable, maintainable systems with enhanced observability to handle unstructured data efficiently.
Kirk discusses the challenges in data security and privacy when building RAG-based applications. He discusses some of their exploration into PII scrubbing and also controlled access to the vector embeddings based on the roles of a user.
Finally, looking forward, Kirk shares insights into the future of Graphlit and their continued focus on enhancing the accessibility and utility of unstructured data for businesses across various industries.
Wednesday May 08, 2024
Wednesday May 08, 2024
In this episode, Jake Moshenko, CEO and co-founder of AuthZed, joins the show to explore the world of user permissions at scale. Inspired by Google's Zanzibar, AuthZed aims to tackle the challenges of authorization - a less common focus compared to authentication in the tech industry.
Jake discusses the initial simplicity and subsequent complications in role-based permission models, where businesses often struggle as they scale and need more nuanced access controls. He explains the Zanzibar paper from Google and the technical challenges with implementing the approach successfully. He explains how AuthZed facilitates a flexible and maintainable permission system and how companies get started.