This site is a personal exploration of how data – or, more importantly, the powerful algorithms and platforms now available to analyze data – can best be leveraged to optimize and automate critical business decisions.

### My Philosophy

Specifically, I want to explore ‘overengineered’ data solutions: algorithms and processes that overwhelm predictive modeling problems with increasingly inexpensive computing power. The limiting factor in data analysis is usually the creative time invested by the data scientists themselves – exploring relationships, crafting models, adapting techniques, and visualizing results. Therefore, the algorithms, processes, and technologies leveraged by those data scientists must be designed to maximize the impact of that creative investment. This means prioritizing the ability to replicate past experiments, preserve and reuse intermediate results, and automate the data and modeling pipeline. The vision is small teams of analysts armed with cutting-edge algorithms, powerful servers and a rigorous development process to leverage fully leverage those assets and their own ingenuity. The anticipated result is exhaustive ensemble solutions applying a variety of predictive techniques which, in combination, far outperform more crafted approaches.

In sum, the design philosophy for predictive models is volume over precision, utility over elegance, and CPU over IQ. Thus, the site’s name – Overkill Analytics.

### My Site

This site will explore both my own work, where possible, as well as the best use cases of this philosophy on real-world modeling problems. I began the site by reporting on my application of overkill analytics to Kaggle competitions – open data science contents on a wide variety of statistical problems. I then fell into a period of blog neglect – it happens – but I hope to reboot the site to discuss broader applications of overkill analytics and data science in general. In addition, I now work for Kaggle, and while I don’t work on the competition side (I’m part of their industry solutions team), I’d love to continue discussing the best solutions from the competition space, the lessons learned, and the potential discovery for the ‘overkill’ philosophy.

### Please Help Me!

To my astonishment, I have learned from Kaggle there are others, like me, for whom a full-time job spelunking through data is just not enough. So, if you are one of these poor souls, please visit this site and help me learn! All polite comments and contributions are appreciated, and I’m always eager to discover the approaches being developed across industries.

Thanks for reading!

[…] he prefers to use simple algorithms and throw as much computing power as possible problems. He calls the technique “overkill analytics,” and it just won him his first contest on Kaggle, defeating more than 80 other competitors in the […]

[…] he prefers to use simple algorithms and throw as much computing power as possible problems. He calls the technique “overkill analytics,” and it just won him his first contest on Kaggle, defeating more than 80 other competitors in the […]

[…] data science with cheap servers and cheap tricks HomeAbout MeAbout Overkill Analytics […]

[…] he prefers to use simple algorithms and throw as much computing power as possible problems. He calls the technique “overkill analytics,” and it just won him his first contest on Kaggle, defeating more than 80 other competitors in the […]

[…] data science with cheap servers and cheap tricks HomeAbout MeAbout Overkill Analytics […]

Just found your site – great idea! And I totally agree with you about the GLM in the actuarial world. In fact I was lamenting about this very thing here:

http://dmfunzone.wordpress.com/2012/04/20/glm-glm-glm/

Luckily, my company is starting to see the random forest for the trees, which is something I never would have predicted a few years ago.

[…] Analytics More specifically, I hope to discuss on this site the pros and cons of overengineered data solutions: simple algorithms that overwhelm predictive […]

[…] Source: Overkill Analytics website […]

[…] of the Netflix Prize are using ensemble method heavily [1]. The future trend maybe, as the blog Overkill Analytics said, “Quantity over quality, CPU over […]