The O’Reilly Data Show podcast: Joe Hellerstein on data wrangling, distributed systems, and metadata services.In this episode of the O’Reilly Data Show, I spoke with one of the most popular speakers at Strata+Hadoop World: Joe Hellerstein, professor of Computer Science at UC Berkeley and co-founder/CSO of Trifacta. We talked about his past and current academic research (which spans HCI, databases, and systems), data wrangling, large-scale distributed systems, and his recent work on metadata services.Data wrangling and preparation
The most interactive tasks that people do with data are essentially data wrangling. You're changing the form of the data, you're changing the content of the data, and at the same time you're trying to evaluate the quality of the data and see if you're making it the way you want it. … It's really actually the most immersive interaction that people do with data and it's very interesting.
… Actually, there's a long tradition of research in the database c