Original article was published on Artificial Intelligence on Medium
What is the difference between a data scientist and a data engineer?
Skills based comparison between data engineers and data scientists.
The right way to think about the difference between data engineers and data scientists is to think from a skills perspective. There are different companies which have different titles for a data professional. Companies started with data scientists and data engineer roles. Later on, they created more roles like ML engineers, full stack data scientist, ML scientist, big data engineer and so on.
The way each organization deals with titles may continue to be fluid, however, the skills required for these roles will not change.
Data Engineering Skills
Data engineers are software engineers who build automated systems and solutions around massive data lakes. A data engineer has advanced programming and system engineering skills.
They build automated pipelines and data structures to allow data to be efficiently processed, deposited and consumed. The infrastructure built by them are leveraged by data scientists for their jobs. At scale these pipelines process billions of messages per day. Each data engineer needs to have the skills to be able to pick the right tools and technologies which help the team and the organization to sustain scale.
Secondly, data engineers are amazing system architecture designers. They help design systems which will not just process the data but also help productize and consume models. They could use immersive, real-world or standalone model architecture patterns to make them available. For deployments, they can help to select the different ways to deploy these models. They work closely with devops folks for ML ops related work.
Finally, data engineers also build amazing data engineering processes. They leverage their data engineering skills to help the data team to come up with their strategy for data science version control. Data products like software products require quality assurance and testing. Data engineers help architect and build processes which enable QA for data science. When ML models and data products break, data engineers utilize their root cause analysis skills and work with devops teams to find incidents and fix them.
Data Scientist Skills
Data scientists analyze mass amounts of structured and unstructured data, often including big data and data mining, with the goal of extracting knowledge and insights to be used for crucial business decisions. A data scientist is someone who augments their math and statistics background with programming to analyze data and create applied mathematical models.
Data scientists utilize their skills which they use to understand product, business and data and provide insights and models which can be used to improve products or build new data products. They come up with the models which could be used to gain a competitive edge over competitors or serve existing customers better.
Secondly, they use their skills to help select model interpretation frameworks. From machine learning models to deep learning models it is important for teams to adopt model interpretation frameworks which aid in the inspection, explanation and refinement of models.
Finally, data scientists leverage their skill of communication effectively. They translate the meaning of incoming data and tell compelling stories that explain the implications of their findings to key stakeholders. These include convincing internal team members, business stakeholders, executives and other non technical folks.
As in the diagram above there are some overlap skills. These are analysis and programming in particular. However, there is a difference in the deployment of these skills. Data scientists program for model building while data engineers leverage there skills for large scale model deployments. It is important for both data engineers and data scientists to leverage each other and hire and build for strengths which are lacking in the team.