Case Study
Making Sense of Data Prep: ETL, Wrangling, Data Enrichment
3 Getting your Data Ready for ML — Data Preparation Data preparation is an essential, if sometimes overlooked, part of any machine learning (ML) lifecycle. It’s not that data scientists ignore it, but it’s easy to think that sorting data into a database and running a few Python functions will do the trick. You may be right if you’re working with a small dataset, or if your models are simply an academic exercise, but what if you’re dealing with production-ready models or datasets that have hundreds of columns and thousands of rows? Let’s put it another way. Imagine you’re cooking a meal, and you’ve gone through the trouble of raiding your pantry and going to the store to get all the ingredients you need. Do you simply toss everything into a pot and hope for the best? Probably not,