About Overkill Analytics

This site is my personal exploration of how big data – and the massive amount of processing power now available to analyze it – can be leveraged to optimize and automate business decisions.

My Philosophy

More specifically, I hope to discuss on this site the pros and cons of overengineered data solutions:  simple algorithms that overwhelm predictive modeling problems with increasingly inexpensive cloud computing power.     I think the trend in data analysis is to use a few analysts with a large team of servers to create better (or at least cheaper) predictive modeling solutions than a large team of analysts with one server.   The key is simple but exhaustive ensemble solutions that apply a wide variety of predictive techniques that, in combination, match or outperform more targeted apporaches.

In sum, my design philosophy for predictive models is quantity over quality, brute-force over elegance, and CPU over IQ.   Thus, the site’s name – Overkill Analytics.

My Experiment

I don’t get to put this philosophy into practice too often.   One reason is that I need to gain more experience with the array of modeling techniques I want to throw at my problems. In my industry (or at least my company), 90% of our work revolves around building GLMs.  The seeming resistance to using less traditional and less explainable statistical methods has limited my practical knowledge of  techniques such as random forests, SVM’s, genetic algorithms, and the like.  

This site will follow my efforts to gain this experience and practice my philosophy on real-world modeling problems outside my industry.  I’m doing this mainly via Kaggle, a very cool ‘crowdsourcing’ company that hosts data science competitions and provides a wide variety of statistical problems to attack.  However, since Kaggle is a competition, with prize money involved, my posts will likely be after-the-fact to avoid losing any competitive edge I have.   (Sorry, but I like to win.)

Please Help Me!

To my astonishment, I have learned from Kaggle there are others, like me, for whom a full-time job spelunking through data is just not enough. So, if you are one of these poor souls, please visit this site and help me learn how badly I’m abusing these modeling techniques. All polite comments and contributions are appreciated, and I’d love to learn more about the approaches used by data professionals in other industries.

Thanks for reading!

10 Comments

  1. [...] he prefers to use simple algorithms and throw as much computing power as possible problems. He calls the technique “overkill analytics,” and it just won him his first contest on Kaggle, defeating more than 80 other competitors in the [...]

  2. [...] he prefers to use simple algorithms and throw as much computing power as possible problems. He calls the technique “overkill analytics,” and it just won him his first contest on Kaggle, defeating more than 80 other competitors in the [...]

  3. [...] data science with cheap servers and cheap tricks HomeAbout MeAbout Overkill Analytics [...]

  4. [...] he prefers to use simple algorithms and throw as much computing power as possible problems. He calls the technique “overkill analytics,” and it just won him his first contest on Kaggle, defeating more than 80 other competitors in the [...]

  5. [...] data science with cheap servers and cheap tricks HomeAbout MeAbout Overkill Analytics [...]

  6. dmfunzone says:

    Just found your site – great idea! And I totally agree with you about the GLM in the actuarial world. In fact I was lamenting about this very thing here:

    http://dmfunzone.wordpress.com/2012/04/20/glm-glm-glm/

    Luckily, my company is starting to see the random forest for the trees, which is something I never would have predicted a few years ago.

  7. [...] Analytics More specifically, I hope to discuss on this site the pros and cons of overengineered data solutions: simple algorithms that overwhelm predictive [...]

  8. [...] Source: Overkill Analytics website [...]

  9. [...] of the Netflix Prize are using ensemble method heavily [1]. The future trend maybe, as the blog Overkill Analytics said, “Quantity over quality, CPU over [...]

  10. […] he prefers to use simple algorithms and throw as much computing power as possible problems. He calls the technique “overkill analytics,” and it just won him his first contest on Kaggle, defeating more than 80 other competitors in the […]

Leave a Reply