This site is my personal exploration of how big data – and the massive amount of processing power now available to analyze it – can be leveraged to optimize and automate business decisions.
More specifically, I hope to discuss on this site the pros and cons of overengineered data solutions: simple algorithms that overwhelm predictive modeling problems with increasingly inexpensive cloud computing power. I think the trend in data analysis is to use a few analysts with a large team of servers to create better (or at least cheaper) predictive modeling solutions than a large team of analysts with one server. The key is simple but exhaustive ensemble solutions that apply a wide variety of predictive techniques that, in combination, match or outperform more targeted apporaches.
In sum, my design philosophy for predictive models is quantity over quality, brute-force over elegance, and CPU over IQ. Thus, the site’s name – Overkill Analytics.
I don’t get to put this philosophy into practice too often. One reason is that I need to gain more experience with the array of modeling techniques I want to throw at my problems. In my industry (or at least my company), 90% of our work revolves around building GLMs. The seeming resistance to using less traditional and less explainable statistical methods has limited my practical knowledge of techniques such as random forests, SVM’s, genetic algorithms, and the like.
This site will follow my efforts to gain this experience and practice my philosophy on real-world modeling problems outside my industry. I’m doing this mainly via Kaggle, a very cool ‘crowdsourcing’ company that hosts data science competitions and provides a wide variety of statistical problems to attack. However, since Kaggle is a competition, with prize money involved, my posts will likely be after-the-fact to avoid losing any competitive edge I have. (Sorry, but I like to win.)
Please Help Me!
To my astonishment, I have learned from Kaggle there are others, like me, for whom a full-time job spelunking through data is just not enough. So, if you are one of these poor souls, please visit this site and help me learn how badly I’m abusing these modeling techniques. All polite comments and contributions are appreciated, and I’d love to learn more about the approaches used by data professionals in other industries.
Thanks for reading!