Practice Exam Advanced
This practice exam contains 30 questions covering all four DP-100 domains, weighted proportionally to the real exam. Try to complete it in 60 minutes (the real exam gives 120 minutes for 40-60 questions). Click "Show Answer" after each question to see the explanation.
Domain 1: Design and Prepare ML Solutions (Questions 1-10)
A. Managed identity
B. Private endpoint
C. Service endpoint
D. Customer-managed keys
Show Answer
B. Private endpoint. Private endpoints create a private IP address within your VNet for the workspace, ensuring all traffic stays on the Microsoft backbone network. Service endpoints route traffic through the Azure backbone but still use public IPs. Managed identity handles authentication, not networking. Customer-managed keys handle encryption.
A. Compute Instance with GPU
B. Compute Cluster with min_instances=2, max_instances=8, Low Priority VMs
C. Compute Cluster with min_instances=0, max_instances=8, Low Priority VMs
D. Serverless Compute with GPU
Show Answer
C. Compute Cluster with min_instances=0, max_instances=8, Low Priority VMs. Setting min_instances=0 ensures no cost when the pipeline is not running (6 days/week). Low Priority VMs save up to 80%. Max_instances=8 allows parallel processing of the large dataset. Option B wastes money by keeping 2 instances running 24/7.
A. The original data at registration time
B. The updated data (data assets are references, not copies)
C. An error because the data has changed
D. A merged dataset with versioning metadata
Show Answer
B. The updated data (data assets are references, not copies). URI-type data assets (URI File, URI Folder) are references to storage locations, not snapshots. If the underlying file changes, the data asset points to the updated version. To ensure reproducibility, create a new data asset version when data changes, or use immutable storage.
A. Install it with pip in the training script's first line
B. Create a custom environment with a conda.yml that includes the package
C. Add it to the workspace's global Python path
D. Use a compute instance with the package pre-installed
Show Answer
B. Create a custom environment with a conda.yml that includes the package. Custom environments are Docker images with specific dependencies defined in conda.yml or requirements.txt. They ensure reproducibility across runs and deployments. Installing packages in the script is fragile and slow. There is no global Python path for workspaces.
A. Set explicit depends_on properties for each component
B. Pass the output of each component as input to the next (implicit dependency)
C. Use a sequential execution mode in the pipeline decorator
D. Set priority levels for each component
Show Answer
B. Pass the output of each component as input to the next (implicit dependency). In Azure ML SDK v2, when you connect component outputs to the next component's inputs (e.g., train_step.inputs.data = prep_step.outputs.prepared_data), the pipeline automatically infers execution order. There is no need for explicit depends_on or priority settings.
A. Azure Storage Account
B. Azure Key Vault
C. Azure Container Registry
D. Application Insights
Show Answer
C. Azure Container Registry. ACR is created on-demand the first time you build a custom environment or deploy a model. Storage Account, Key Vault, and Application Insights are all created automatically with the workspace.
A. Create three separate workspaces
B. Use Azure RBAC with custom roles scoped to resource groups
C. Create separate storage accounts for each team
D. Use workspace tags to separate team resources
Show Answer
A. Create three separate workspaces. Azure ML workspaces are the security boundary for access control. While RBAC exists, fine-grained access control within a single workspace (e.g., limiting access to specific experiments) is limited. The recommended practice for team isolation is separate workspaces, potentially in the same resource group for shared governance.
A. JSON
B. YAML
C. Python script
D. Bicep template
Show Answer
B. YAML. Azure ML CLI v2 uses YAML files to define jobs, pipelines, environments, endpoints, and other resources. You submit jobs with az ml job create --file job.yml. While Python SDK v2 uses Python scripts, the CLI v2 is YAML-based.
A. Remove timestamp columns entirely
B. Convert all timestamps to UTC before training
C. Use timestamps as string features
D. Let AutoML handle timezone conversion automatically
Show Answer
B. Convert all timestamps to UTC before training. Standardizing timezones ensures consistent temporal features. Removing timestamps loses potentially valuable temporal patterns. String encoding of timestamps is inefficient and loses ordering. AutoML handles many data issues but standardizing timezones in preprocessing is a best practice.
A. Create two separate pipelines: one scheduled, one manual
B. Create one pipeline with a cron schedule, and invoke it on-demand via SDK when needed
C. Use Azure Data Factory for scheduling and Azure ML for manual runs
D. Create a Logic App that triggers the pipeline on both conditions
Show Answer
B. Create one pipeline with a cron schedule, and invoke it on-demand via SDK when needed. Azure ML pipelines support both scheduled execution (cron-based) and on-demand invocation via SDK/CLI. There is no need to duplicate the pipeline or use external orchestration services for this simple scenario.
Domain 2: Explore Data and Train Models (Questions 11-19)
A. StandardScaler
B. MinMaxScaler
C. Log transformation
D. One-hot encoding
Show Answer
C. Log transformation. Log transformation (e.g., np.log1p()) compresses the right tail of skewed distributions, making them more normally distributed. This helps many ML algorithms perform better. StandardScaler and MinMaxScaler normalize scale but do not fix skewness. One-hot encoding is for categorical features.
n_cross_validations=5. What does this mean?A. The experiment runs 5 separate training jobs
B. The data is split into 5 folds; each model is trained 5 times using different validation sets
C. The top 5 models are cross-validated against each other
D. The experiment uses 5 different random seeds
Show Answer
B. The data is split into 5 folds; each model is trained 5 times using different validation sets. K-fold cross-validation splits data into K folds, trains on K-1 folds, and validates on the remaining fold, rotating K times. This gives a more robust estimate of model performance than a single train/test split. Each candidate model in AutoML is evaluated using all 5 folds.
A.
print(f"accuracy: {accuracy}")B.
mlflow.log_metric("accuracy", accuracy)C.
workspace.log("accuracy", accuracy)D.
run.record("accuracy", accuracy)Show Answer
B. mlflow.log_metric("accuracy", accuracy). MLflow is the natively integrated experiment tracking framework in Azure ML SDK v2. The mlflow.log_metric() function logs scalar metrics that appear in the Azure ML Studio experiment view. Print statements are not captured as metrics. The other options use deprecated or non-existent APIs.
A. Grid
B. Random
C. Bayesian
D. Sequential
Show Answer
C. Bayesian. Bayesian optimization uses a probabilistic model to select the most promising hyperparameter combinations based on previous results. With only 50 trials out of 1000 possibilities, Bayesian sampling is the most efficient because it learns from prior trials. Grid search cannot cover 1000 combinations in 50 trials. Random is better than grid but less efficient than Bayesian.
A. SMOTE (Synthetic Minority Over-sampling Technique)
B. Class weight adjustment in the model
C. Undersampling the majority class
D. Removing all features correlated with the minority class
Show Answer
D. Removing all features correlated with the minority class. Removing features correlated with the target class would make the model worse, not better. SMOTE creates synthetic positive samples. Class weights penalize misclassification of the minority class. Undersampling reduces majority class size. All three are valid imbalance techniques.
enable_early_termination=True do?A. Stops the entire AutoML experiment if no improvement is seen
B. Terminates individual model training runs that show poor performance early
C. Limits each model to train for only 5 epochs
D. Stops the experiment after the first successful model
Show Answer
B. Terminates individual model training runs that show poor performance early. Early termination in AutoML monitors each candidate model during training. If a model's performance is consistently worse than other candidates at the same point in training, it is terminated to save compute. This does not stop the overall experiment — it just prunes unpromising individual models.
A. target_column_name
B. time_column_name
C. n_cross_validations
D. primary_metric
Show Answer
B. time_column_name. Forecasting requires specifying which column contains the timestamps so AutoML can respect temporal ordering, generate time-based features (day of week, lag values, rolling windows), and use proper time-series cross-validation. The other parameters are also required but are common to all AutoML task types.
mlflow.sklearn.autolog() in your training script. What does it automatically log?A. Only the model artifact
B. Model parameters, metrics, and model artifact
C. Only metrics computed during training
D. A screenshot of the training notebook
Show Answer
B. Model parameters, metrics, and model artifact. MLflow autologging for scikit-learn automatically captures: model hyperparameters (e.g., n_estimators, max_depth), training metrics (e.g., training score), the serialized model artifact, and the model signature (input/output schema). This eliminates the need for manual mlflow.log_param() and mlflow.log_metric() calls.
A. One-hot encoding
B. PCA (Principal Component Analysis) or feature selection (VIF)
C. Log transformation on all features
D. Increase the number of training epochs
Show Answer
B. PCA (Principal Component Analysis) or feature selection (VIF). Multicollinearity (highly correlated features) causes instability in linear regression coefficients. PCA creates uncorrelated components. VIF (Variance Inflation Factor) identifies and removes highly correlated features. One-hot encoding is for categorical data. Log transformation addresses skewness, not multicollinearity.
Domain 3: Deploy and Optimize Models (Questions 20-25)
A. Yes, always
B. No, MLflow models support no-code deployment
C. Only if the model uses a custom framework
D. Only for batch endpoints
Show Answer
B. No, MLflow models support no-code deployment. When a model is registered in MLflow format with a valid model signature, Azure ML automatically generates the scoring infrastructure. No scoring script or custom environment is needed. This is one of the key benefits of using MLflow format for model packaging.
A. Delete the green deployment
B. Update traffic to {"blue": 100, "green": 0}
C. Scale green deployment instances to 0
D. Disable the endpoint
Show Answer
B. Update traffic to {"blue": 100, "green": 0}. Updating traffic rules is nearly instantaneous and routes all requests to the blue deployment. Deleting the green deployment takes minutes and is irreversible. Scaling to 0 instances is not supported for online deployments. Disabling the endpoint would affect all users.
A. Managed online endpoint with a GPU instance type (e.g., Standard_NC6s_v3)
B. Managed batch endpoint with CPU instances
C. Azure Functions with HTTP trigger
D. Managed online endpoint with Standard_DS3_v2 (CPU)
Show Answer
A. Managed online endpoint with a GPU instance type (e.g., Standard_NC6s_v3). Image processing models (especially deep learning) require GPU for efficient inference. Managed online endpoints support GPU instance types from the NC, ND, and NV series. CPU instances would be too slow for large images. Azure Functions do not support GPU. Batch endpoints are for offline processing, not real-time.
A. instance_count (number of compute nodes)
B. auth_mode
C. max_concurrency_per_instance
D. output_file_name
E. endpoint description
Show Answer
A (instance_count) and C (max_concurrency_per_instance). Instance_count controls horizontal scaling (more nodes processing in parallel). Max_concurrency_per_instance controls how many mini-batches each node processes concurrently. Together they determine total parallelism. Auth_mode, output_file_name, and description do not affect throughput.
A.
ml_client.models.promote()B.
client.transition_model_version_stage()C.
mlflow.deploy_model()D.
model.set_stage("Production")Show Answer
B. client.transition_model_version_stage(). Using the MlflowClient, you call transition_model_version_stage(name, version, stage) to move a model between stages (None, Staging, Production, Archived). The other methods do not exist in the MLflow or Azure ML SDK.
A. Azure Monitor
B. Application Insights
C. Azure Event Hub
D. Azure Log Analytics
Show Answer
B. Application Insights. Managed online endpoints automatically integrate with Application Insights (created with the workspace). You can enable request/response logging to capture input/output data, latency, errors, and custom telemetry from the scoring script. Azure Monitor is the broader platform, but Application Insights is the specific service.
Domain 4: Responsible AI (Questions 26-30)
A. Reliability & Safety
B. Fairness
C. Transparency
D. Privacy & Security
Show Answer
B. Fairness. Fairlearn is specifically designed to assess and mitigate fairness issues in machine learning models. It provides metrics to evaluate whether models perform equitably across different demographic groups and mitigation algorithms to reduce disparities.
A. Error Analysis tree map
B. Counterfactual What-If analysis
C. Causal inference chart
D. Global feature importance bar chart
Show Answer
B. Counterfactual What-If analysis. Counterfactuals provide the most intuitive explanation: "The application was denied, but if income were $5,000 higher and credit score were 20 points higher, it would be approved." This is actionable and understandable by non-technical stakeholders. Error Analysis shows where the model fails but does not explain individual decisions.
A. Report 95% accuracy and deploy the model
B. Collect more training data for the underperforming cohort and retrain
C. Remove age and income features to eliminate the bias
D. Lower the confidence threshold for all predictions
Show Answer
B. Collect more training data for the underperforming cohort and retrain. The cohort likely has insufficient representation in the training data. Collecting more examples of customers in this demographic will help the model learn better patterns. Removing features loses predictive signal. Reporting only overall accuracy hides the disparity. Lowering thresholds globally does not fix the underlying issue.
A. Data anonymization (removing patient IDs)
B. Differential privacy
C. Data encryption at rest
D. Role-based access control
Show Answer
B. Differential privacy. Differential privacy provides a mathematical guarantee (controlled by the epsilon parameter) that the output of any analysis does not significantly change based on the presence or absence of any single individual. Simple anonymization does not prevent re-identification through quasi-identifiers. Encryption protects data storage, not analysis outputs.
A. Manually create each component in Azure ML Studio
B. Submit a Responsible AI Insights pipeline job with the desired components
C. Install a VS Code extension
D. Use the Azure CLI to generate a static HTML report
Show Answer
B. Submit a Responsible AI Insights pipeline job with the desired components. The Responsible AI dashboard is generated by running a pipeline job that computes the requested insights (error analysis, explanations, counterfactuals, causal analysis). The results are stored and viewable in Azure ML Studio. This is done programmatically via SDK v2 or CLI v2, not manually in the Studio UI.
Score Yourself
| Score Range | Assessment | Recommendation |
|---|---|---|
| 27-30 (90-100%) | Exam Ready | Schedule your exam with confidence. Do a quick review of any missed questions. |
| 21-26 (70-86%) | Nearly Ready | Review the domains where you missed questions. Take another practice exam in a few days. |
| 15-20 (50-66%) | More Study Needed | Re-read the lessons for your weakest domains. Practice hands-on labs in Azure ML Studio. |
| Below 15 (<50%) | Continue Studying | Work through the entire course again. Focus on hands-on practice with a free Azure account. |
Lilly Tech Systems