Columnar Data Storage: A Deep-Dive into Parquet, Delta, Columnstore, and More
Analytic data storage on the Microsoft data platform has evolved greatly over the years. From the early days of PowerPivot and SQL Server Analysis Services to the advent of columnstore indexes and the eventual adoption of the parquet format as the de-facto storage standard for analytic data in Azure, a lot has happened in the past fifteen years.
This session is a deep-dive into how columnstore technologies work, including: • Overview and effectiveness of columnstore storage formats • Encoding and compression algorithms • Columnstore indexes in SQL Server • Parquet file format • Delta Parquet file format • Vertipaq (row order) optimization
Understanding how analytic data is stored can allow for optimizations to be made to queries and the decisions made when architecting data structures. These improvements can decrease data size, speed-up analytics performance, and reduce computational overhead, thereby reducing Azure hosting costs.
These technologies will continue to evolve as data grows larger and organizational needs become more complex. Working effectively with these data storage formats will allow for fast querying of large amounts of data, both now and in the future.
Presented By: Edward Pollack Data Architect | Microsoft Data Platform MVP Ed Pollack is a Microsoft Data Platform MVP with a passion for learning how the Microsoft Data Platform works and sharing that knowledge with the community. His experiences in data architecture, database design, performance optimization, and data security are motivation for public speaking, writing, coding, and other community activities. Ed has spoken at SQL Saturday events, SQL Bits, PASS Summit, EightKB, and many other regional and international events.
Ed is the organizer of the Capital Area SQL Server Group and SQL Saturday Albany, as well as a co-organizer of SQL Saturday New York City, and Future Data Driven. He has published a number of books, including "Dynamic SQL: Applications, Performance, and Security in Microsoft SQL Server", "Expert Performance Indexing in Azure SQL and SQL Server 2022", and "Analytics Optimization with Columnstore Indexes in Microsoft SQL Server: Optimizing OLAP Workloads". Ed is also an active contributor of content to SimpleTalk. In his free time, Ed enjoys video games, traveling, cooking exceptionally spicy foods, and hanging out with his amazing wife and sons.