Architecting an Agentic Data Pipeline - From Data Lake Discovery to Managed Orchestration
Overview
This session explores the strategy of leveraging AI to move beyond manual implementation and into the next level of data engineering. We dive into a process that positions the AI not as a syntax generator, but as a cognitive partner in the engineering lifecycle. We will examine the architectural shift required to transform raw data lake assets into high-performance, orchestrated systems, focusing on the strategic collaboration between human intent and agentic design.
Live presentation link to YouTube
Agenda
- Data Lake Discovery The strategy of deploying discovery agents to autonomously identify patterns and define the foundation of the data grain.
- Governance & Requirements Establishing the strategic guardrails and requirements that empower an "Architect" agent to maintain system consistency.
- Logical Design for the Staging Area A process dive into using AI to propose and build a logical abstraction layer, separating raw sources from core business logic.
- Designing and Implementing the Physical Model How agents navigate the transition to physical storage, building Dimension and Fact tables while maintaining referential integrity.
- Incremental Update Strategy Developing a sustainable approach to support continuous data feeds from the data lake using idempotent, self-healing processes.
- Pipeline Design and Orchestration The coordination of complex tasks to manage the relationship between dimensions and facts, ensuring strict lineage and integrated observability.
Why Attend?
- Elevate Your Role: Learn how to shift your focus from writing repetitive code to defining high-level architectural intent and performing strategic design reviews.
- Master Systemic Reasoning: Understand how to leverage AI to solve complex engineering challenges like referential integrity and dependency management at scale.
- Build for Operations: Move toward a model where system health and observability are built-in byproducts of the design process, not afterthoughts.
Who is this for?
- Data Engineers & Architects: Looking to evolve their workflow from manual scripting to high-level systemic design.
- Engineering Leaders: Interested in the ROI and reliability of integrating autonomous agents into the development lifecycle.
- AI Enthusiasts: Wanting to see a practical, "beyond-the-chatbot" application of agentic reasoning in a production environment.
- Technical Decision Makers: Seeking a strategy for maintaining governance and referential integrity in an AI-augmented organization.
🔗 Having trouble with the video player?
If the embedded livestream does not load, you can join directly on YouTube:
Direct Link: https://www.youtube.com/live/opelf_XJ8Js
Link to the presentation material:
Link to the GitHub Repo:
https://github.com/ozkary/data-engineering-mta-turnstile/tree/main/ai-agents
🙌 Support the project
If you enjoy the session, please consider:
- Joining the YouTube channel to follow future livestreams
- Starring the GitHub repository to support the open‑source work
Your support helps keep these community sessions going.
