This site is a personal exploration of how data – or, more importantly, the powerful algorithms and platforms now available to analyze data – can best be leveraged to optimize and automate critical business decisions.
Specifically, I want to explore ‘overengineered’ data solutions: algorithms and processes that overwhelm predictive modeling problems with increasingly inexpensive computing power. The limiting factor in data analysis is usually the creative time invested by the data scientists themselves – exploring relationships, crafting models, adapting techniques, and visualizing results. Therefore, the algorithms, processes, and technologies leveraged by those data scientists must be designed to maximize the impact of that creative investment. This means prioritizing the ability to replicate past experiments, preserve and reuse intermediate results, and automate the data and modeling pipeline. The vision is small teams of analysts armed with cutting-edge algorithms, powerful servers and a rigorous development process to leverage fully leverage those assets and their own ingenuity. The anticipated result is exhaustive ensemble solutions applying a variety of predictive techniques which, in combination, far outperform more crafted approaches.
In sum, the design philosophy for predictive models is volume over precision, utility over elegance, and CPU over IQ. Thus, the site’s name – Overkill Analytics.
This site will explore both my own work, where possible, as well as the best use cases of this philosophy on real-world modeling problems. I began the site by reporting on my application of overkill analytics to Kaggle competitions – open data science contents on a wide variety of statistical problems. I then fell into a period of blog neglect – it happens – but I hope to reboot the site to discuss broader applications of overkill analytics and data science in general. In addition, I now work for Kaggle, and while I don’t work on the competition side (I’m part of their industry solutions team), I’d love to continue discussing the best solutions from the competition space, the lessons learned, and the potential discovery for the ‘overkill’ philosophy.
Please Help Me!
To my astonishment, I have learned from Kaggle there are others, like me, for whom a full-time job spelunking through data is just not enough. So, if you are one of these poor souls, please visit this site and help me learn! All polite comments and contributions are appreciated, and I’m always eager to discover the approaches being developed across industries.
Thanks for reading!