My name is Carter Sibley, and I am a data scientist at Kaggle. If you want a shorter, ‘just the facts’ description of this site and the (developing) philosophy of “Overkill Analytics”, please go here. This page is just an excuse to tell a long, boring story about myself.
How I Became a Data Scientist
Based on my childhood, it would not be surprising to anyone that I went into a data science career. I was the youngest kid at the Commodore 64 user group meetings (after graduating from the Timex Sinclair). I went to MathCounts. I spent summers building neural networks in a physics lab. I was, in short, a certified math and computer geek.
I then made the terrible, terrible mistake of reading a novel by an obscure author – John Grisham – called The Firm. In case you haven’t read it, The Firm is the story of a bright young man named Mitch McDeere who graduates from Harvard Law School, gets a law firm job making a boatload of money, works 16 hours a day, becomes miserable, finds out he actually works for the mafia, tries to leave, and gets hunted by trained assassins. In the end, Mitch escapes with a hard-won lesson that ‘easy money’ is never truly easy (and, paradoxically, a stolen Swiss bank account of laundered mob money).
For most, the novel is a harrowing morality tale about using one’s talents to pursue profit over purpose. As a teenager, my thought was, “Hey, lawyers make a boatload of money as soon as they graduate!”
So just like Mitch McDeere, I went to Harvard Law School, got a high-paying law firm job, worked 16 hours a day, and became miserable. I never worked for the Five Familes, and I certainly didn’t look like Tom Cruise, but I definitely needed to escape. I began using my rare off-hours to pursue frustrated technical impulses, secretly dabbling in genetic algorithms and neural network ensembles. I tried to justify the time by claiming it was for sports betting or improving my poker game, but really I just wanted to play with data.
In the end, I had to stop living the lie. I sat my wife down and told her that I was not the cocky lawyer I pretended to be, but a code-writing, graph-plotting, algorithm-designing, equation-loving, statistical-journal-reading data geek. (I think she had known all along.) I bit the bullet and left law to for the glitz and glamour of insurance pricing models.
It wasn’t a beach in the Caribbean with a fortune in laundered mob money, but I had escaped – just like Mitch.
My Data Science Career
For the ten years since, I have thoroughly enjoyed my work. As a statistical research director in the insurance industry, I was able to attack some fascinating analytical problems. I later had the opportunity to work as a data science consultant for Oracle’s Big Data team, where I learned about the current state and the potential of predictive modeling across a range of industries. Throughout, I’ve found that I enjoy nearly every aspect my new career: munging the data, building learning algorithms, developing a workflow, improving analytical platforms, and particularly finding and nurturing great analytical talent. I even enjoy reducing the fruits of my labor into terse but colorful slides for senior management.
That being said, I have observed some pain points in the practice of analytics at traditional enterprise companies. The rapid explosion of data now available requires analytics that are robust, iterable, and scalable: i.e., a full-scale development process rather than a few scripts and a PowerPoint deck. In most enterprises, however, analytics are (rightfully) owned by domain experts rather than development teams – making it difficult organizationally to implement this type of process. Moreover, the tools being built to bridge this gap – tools to allow business users to ‘easily’ implement complex machine learning – are often much too generalized to provide utility. Data science problems are highly domain-specific – and producing answers will require platforms and tools developed specifically for each domain.
There are two options for bridging this gap. One alternative is a massive infusion of ‘data scientists’ (i.e., hybrid statistician / developers) across a host of enterprises and industries. The shortage of talent makes this difficult, however – especially if companies want to produce superior machine learning that provides real marginal advantages. The other alternative, therefore, is the one I believe will succeed – centralizing data science talent to attack critical machine learning problems industry by industry.
That’s why I now work at Kaggle. I believe Kaggle is uniquely positioned to build domain-specific data science products that actually meet enterprise needs. The key is Kaggle’s experience in designing and hosting data science competitions to address problems across a huge range of domains. Kaggle has identified how to use a generalized process for high-performing analytic solutions while leveraging highly talented pools of data scientists for the time-intensive, problem-specific work of creating the solutions’ design. Kaggle is now leveraging both the best practices learned from the Kaggle competition process and the best talent identified in those competition to build more comprehensive machine learning solutions, initially in the energy industry. To me, this seems like the best recipe to bring scalable data science to the enterprise, and I’m extremely excited to be a part of the Kaggle team.
Thanks For Reading
There it is, as promised – a long, boring story about me. If you indulged me and read this self-centered narrative, I encourage you to indulge me further and read the actual content of the blog. Tell me what makes sense to you, and more importantly let me know what I don’t know.