Thoughts on the beta Machine Learning certification

Source: Deep Learning on Medium

We recently announced a new certification, the AWS Certified Machine Learning Specialty. Right now, the exam is still in beta. I took it this morning at re:Invent and although I won’t go into specific details, here are some remarks that may help you prepare.

Winter is coming and I’m still catching up on SageMaker launches…

The exam has 70 questions and lasts for 3 hours (I was done in 90 minutes). Here are the topics listed in the exam guide (PDF)

Domain 1: Data Engineering (20%)
Create data repositories for machine learning.
Identify and implement a data-ingestion solution.
Identify and implement a data-transformation solution.

Opinion: IMHO this domain should be reduced to 15% or even 10%. I found the questions pretty repetitive, and they were about Big Data, not about Machine Learning. If you’ve already passed the Big Data Specialty certification, you’ll be fine. If not, make sure you’re very familiar with Kinesis and its different flavours, or you’ll have a miserable time.

Domain 2: Exploratory Data Analysis (24%)
Sanitize and prepare data for modeling.
Perform feature engineering.
Analyze and visualize data for machine learning.

Opinion: typical Data Science stuff, not really tied to any particular AWS service. Cleaning data, handling missing values, performing basic feature engineering. If you have hands-on ML experience, this won’t be a problem at all. Questions don’t go very deep. I was surprised to get a few questions on data viz, most of them pretty vague and awkward to answer without looking at any actual data. IMHO they should be dropped and replaced with more questions on feature engineering.

Domain 3: Modeling (36%)
Frame business problems as machine learning problems.
Select the appropriate model(s) for a given machine learning problem.
Train machine learning models.
Perform hyperparameter optimization.
Evaluate machine learning models.

Opinion: a reasonable mix of high-level questions on framing business problems (algo selection, etc.), SageMaker-related questions (built-in algos, HPO, etc.) and Deep Learning questions (CNN, LSTM, regularization, etc.). Again, if you do this for a living and if you’ve spent some time with SageMaker, you should be fine. I didn’t get any complex algorithm question, and none on specific Deep Learning frameworks (TensorFlow, etc.). IMHO, this could be a little more challenging than it is :)

Domain 4: Machine Learning Implementation and Operations (20%)
Build machine learning solutions for performance, availability, scalability, resiliency, and fault tolerance.
Recommend and implement the appropriate machine learning services and features for a given problem.
Apply basic AWS security practices to machine learning solutions.
Deploy and operationalize machine learning solutions.
Opinion: the Ops section. I got quite a few questions on security (VPC, IAM, KMS, etc.) with respect to S3 and SageMaker, so make sure you know that stuff (it’s super important anyway!). Model deployment is important too (endpoint, scaling, etc.).

Overall, I think that the beta exam hits the target, i.e. checking for applied Machine Learning skills in an AWS context. I would simply tone down the Big Data stuff and add deeper ML/DL questions, but that’s just me :)

Did you also take the exam? Please share your comments here, but refrain from sharing detailed information as I’ll have to delete it.