How to become a Data Scientist

A new role is emerging in our brave new world, the Data Scientist!

What skills are needed, and which of them you have to improve, if you are or you want to be one of them?

I’ve got a question in mind in these days: can I consider myself a Data Scientist? To find an answer I looked at 2019 data from python developer survey published by Jetbrains: 46217 people participated, and one of the question was:

Do you consider yourself as a Data-Scientist?

So I analyzed the dataset, trying to understand:

  • Data scientist need a lot of experience?
  • Data scientist are working in some particular field?
  • Which skills or tech are used by Data Scientist?

Looking at data, it’s clear that Data Scientist don’t need a lot of experience: as in every role, you can be a junior or a senior profile, there is not a big difference on the two groups, and above all, Data Scientist have less experience in coding that other programmers.

So, the answer is: you don’t need a lot of experience to start working as a Data Scientist, but this is also because this role is growing a lot!

Ok, you have got some experience; but what are you going to do as a Data Scientist?

The answer is obvious: Machine learning (> 35%) and Data analysis (>32%)!

Ok, but let’s have a look together at which are the business that require your skills:

The winner is surely Information Technology / Software development.

But looking at the following slice, that focus on only the companies where Data Scientist are more present than other roles, it is shown where his unique skillset can help the business:

Part III: Data Scientist & skills

The more interesting part: what do you need to know?

I think that also for this role, there can be many different possibilities, and I suggest you to have a look at my github repository, or to raw data also, id order to search for your specific answers.

But here are some hints where I’ll work in the near future.

Let’s start from python library:

  • ML library, like Keras, Tensorflow, PyTorch and SKlearn
  • Pandas, numpy and Matplotlib are standard for every python user!
  • Seaborn could be a discovery, if you are not using it!
  • Visual library, like PyQT and Tkinter
  • Web-crawler like Scrapy
  • Pillow for images

Also cloud should be a good skill to not forget: in fact, Data Scientist are the group in which the non-user is lower!

AWS is the market leader, but also Microsoft Azure and Google Cloud Platform

Finally, database and big data tools:

  • SQL is winning, but also Mongo DB could help in some projects
  • From Hadoop, to Apache Spark or Dask, Data Scientist is using these tools to be able to access and process the big amount of data!

In this article we had a quick look at what is a Data Scientist:

  1. There is no need of a big experience to work in this role;
  2. The main field you will work are Machine learning or Data analysis, but the business sector can change a lot, and it will probably spread in the near future;
  3. The skillset is large and multicolored, but that means everyone can focus on what he prefers, and there is always something new that can help us improve!

I think that some other interesting point could arise by looking at an evolution of the situation by year (from 2017 to 2019).

In conclusion, the big question now is:

Do you consider yourself a Data Scientist?

I am a biomedical engineer, I like technology and software, from data science to web developmentc, but also comics and boardgames, walking, swimming, friends!