Source: Deep Learning on Medium
The data product manager has risen as a new role that follows the rapidly increasing need for data scientists. As businesses begin to recognize the unique challenges of building and running data teams, they are hiring data-focused product managers to tackle the unique strategic and tactical decision making that comes with building new data products.
These newly minted data product managers face unique challenges. How can businesses properly anticipate, plan and overcome these challenges? If you are a data product manager or want to become one — what challenges can you expect?
The responsibilities of a data product manager are largely the same as any software product manager. 90% of day-to-day responsibilities will still be prioritization, communications, stakeholder management, design collaboration and creating specifications. You are still expected to build business cases, manage a backlog, present release plans, and act as an interface with internal and external stakeholders. Fundamentally, you are still responsible for the proper prioritization of R&D investment in the short and long-term to optimize business results.
The challenge comes with the added work of not only prioritizing and specifying workflow focused solutions but also data-driven solutions. The delivery of data-driven solutions demands longer-term planning and often more capital-intensive investment. The processes also change with more complex development cycles and changes in ongoing maintenance.
Data Business Cases with Long Planning Cycles
If you are building features upon an established product or a net new product altogether, choosing the highest priority problem to solve remains your highest priority. The title “data product manager” may bias you to a delivery method of your feature — data science or analytics of some flavor. As a PM, you need to resist the urge to throw a data solution at all investable problems. Your investors will thank you for choosing data vs. workflow solves wisely.
So how do you work out your solution- workflow only, data-driven workflow?
- Is the value in the data solution that drives unique and valuable insights? Or does the value exist in the efficiency play of a workflow?
- Is your cost of data acquisition and or data labeling financially feasible within your business case? Don’t underestimate this cost- many projects require a fresh capital infusion just to create labeled training data.
- Are you well positioned to outpace the competition with data assets, vision, and talent?
If the answer to any of these questions leads you to the feeling of “no” or “sounds like I should just do a workflow feature”, then you have your answer and should work to solve these challenges first. Let’s says you say “Yes! Let’s do data science” to all of these, then there are a few additional things to anticipate.
More Complex Development Cycles
Your development cycles are about to change. They will get longer and more complex as you add resources and the unique processes that come with data science development. Data science teams are tasked with data collection/sourcing, discovery, cleaning/processing, training, and sometimes deployment. Your initial integration of these new data scientists and your existing development teams is extremely important. Many data scientists and executives with new data science teams are shocked at the time spent on data the first three steps — collection, discovery, and cleaning. Upwards of 90% of a data scientist’s time is spent on these tasks — rather than model creation, testing, and deployment. So what is one to do?
If you are stretched for data scientists — employ data analysts and enable them with the tools they will ask for such as Alteryx, DataRobot or Knime. Good data analysts can use these solutions experiment with creating data pipes, discovery, cleaning and testing generic models. This early work by your analysts can greatly accelerate your build time and be a budget-friendly solution where data scientists would otherwise spend mountains of time on prior to model creation and testing.
Plan for near-immediate and model degradation and therefore higher than usual ongoing maintenance cost. A machine learning model evolving in relation to the world it touches — it needs to be maintained to be in tip-top shape. As you do with new features for non-data projects- account for higher than usual upfront maintenance and a long tail for ongoing. If you have to manually label training data or have a manual collection process- don’t forget that this will be part of maintenance too as you will likely need to update your models with this new information.
Getting a handle on the complexities of data science
If you don’t have a data science background — that’s ok. My top recommendation is to immerse yourself in the world of data science, even if for a short time. A few paths I recommend:
- Take MOOC’s — brush up on your statistics first before diving into machine learning.
- If you don’t have a software development background — use tools that help you accelerate — Alteryx, Knime, RapidMiner, and DataRobot are all awesome tools at varying price points (Knime has the most usable free option).
- Do a Kaggle or DrivenData competition using one of the tools above. These can take as little as 2 hours and are a great way to learn by doing. If you want to see the power of deep learning with one of these competitions — try using H2O’s Driverless AI solution, it is incredible.