Select Page

Charting the Boundaries of Data Science

By Rob Mitchum // May 22, 2015

When DJ Patil called data scientist “the sexiest job of the 21st century” in 2012, it caused quite a stir…in part because very few people at the time knew what “data science” actually meant. Despite detractors who claimed “all science is data science!” or joked that data science was just “statistics on a MacBook,” the term has picked up steam, from universities to corporations to government (where Patil was recently named Chief Data Scientist of the U.S. Office of Science and Technology Policy). But even for self-described data scientists, no consensus definition of the job — or how best to train for such a job — has emerged.

These persistent questions inspired The Data Science Handbook, a new ebook published by the CI’s Carl Shan with collaborators Henry Wang, William Chen, and Max Song. To better establish the parameters and career path of this young profession, the authors interviewed 25 well-known data scientists, including Patil, Hilary Mason of Fast Forward Labs, Mike Dewar of the New York Times, and data scientists from Uber, MailChimp, LinkedIn, and more.

“We all felt like there was a level of mystique and mystery, sometimes self-perpetuated by the field of data science, and it wasn’t completely clear what it was, what it entailed, or what it wasn’t,” Shan said. “So we said, why don’t we talk to people who worked in it for a while, and who were pioneers in the field. We thought that they would probably have some strong opinions about what it is.”

For Shan, a 2014 Data Science for Social Good fellow and research scientist with the Center for Data Science and Public Policy, the book had personal relevance as he charts his own future. While many online resources provide advice about what programming and statistical skills a aspiring data scientists should learn, few offer guidance about the “soft skills” and challenges of developing a career in the field. Shan and his co-writers decided upon the interview format to fill in this gap, with interviewees asked to tell their unique stories of how they fell into the role of data scientist — often years before it was known by that name.

“The value of stories, at least for me, is they stick with you,” Shan said. “It’s so much easier for the human mind to remember stories, and the really nice thing about stories is that they provide inspiration. We didn’t just ask them things like: what are the tools you should use, or what are the programming languages you should learn — you can find that anywhere online. What’s missing are the sort of stories that are much more powerful than the specific piece of advice of “learn R or python.”

The book’s interviews reveal the multitude of paths that led to careers in data science. Patil talks about how he worked hard and learned voraciously to escape the academic “box” of habits and practices that don’t translate to the business world and other data-driven pursuits, as he transitioned from meteorology research to positions at eBay and LinkedIn. Clare Corthell, a data scientist at Mattermark, developed an entire curriculum for herself to acquire technical skills, now available to all as the Open Source Data Science Masters. And Michaelangelo D’Agostino of Civis Analytics recalls how joining the Obama campaign as a data analyst paved his way from particle physics to data science (and a spell in 2013 as a Data Science for Social Good mentor).

Despite the very different stories, the authors noticed certain common threads that may be helpful for young data scientists, even if some of them were counterintuitive. For instance, a majority of the interviewees stressed the important role of communication in their workplace, learning how to translate advanced statistical methods and complex results in a language that non-experts will understand.

“Sure, it’s important to know the statistical techniques behind data science,” Shan said. “But what’s less obvious is you should also be really great at being a great communicator, especially in a field that’s so new and so vague and not that well defined.”

Overall, the interviews illustrate the growing clarity about what “data scientist” really means beneath the hype: a modernized update of classical statistics, bolstered with programming skill and business intuition, as Shan put it. And to make sure the field continues to grow, and that as many people as possible can access the collective wisdom they assembled, the Data Science Handbook authors chose a pay-what-you-want model, allowing readers to pay $0 (or $1000, if they were really excited about it).

“It’s very much in the spirit of DSSG to make things open source and publicly available, so that anybody can spin off and take what you’ve done and make it better,” Shan said. “Hopefully, our book can do the same.”

You can get the book for any price you’d like (including free) at www.thedatasciencehandbook.com.