Practice Exam Advanced

This practice exam contains 30 questions covering all four DP-100 domains, weighted proportionally to the real exam. Try to complete it in 60 minutes (the real exam gives 120 minutes for 40-60 questions). Click "Show Answer" after each question to see the explanation.

How to Use This Practice Exam: Answer each question before revealing the answer. Track your score: 70%+ (21/30) is a good indicator of exam readiness. Review every wrong answer thoroughly — understanding why wrong answers are wrong is as valuable as knowing the right answer.

Domain 1: Design and Prepare ML Solutions (Questions 1-10)

Q1. You need to create an Azure ML workspace that restricts all data traffic to a private virtual network. Which feature should you enable?

A. Managed identity
B. Private endpoint
C. Service endpoint
D. Customer-managed keys

Show Answer

B. Private endpoint. Private endpoints create a private IP address within your VNet for the workspace, ensuring all traffic stays on the Microsoft backbone network. Service endpoints route traffic through the Azure backbone but still use public IPs. Managed identity handles authentication, not networking. Customer-managed keys handle encryption.

Q2. You are designing a training pipeline that processes 500GB of image data. The pipeline runs weekly. You need to minimize cost while maintaining fast training. Which compute configuration should you use?

A. Compute Instance with GPU
B. Compute Cluster with min_instances=2, max_instances=8, Low Priority VMs
C. Compute Cluster with min_instances=0, max_instances=8, Low Priority VMs
D. Serverless Compute with GPU

Show Answer

C. Compute Cluster with min_instances=0, max_instances=8, Low Priority VMs. Setting min_instances=0 ensures no cost when the pipeline is not running (6 days/week). Low Priority VMs save up to 80%. Max_instances=8 allows parallel processing of the large dataset. Option B wastes money by keeping 2 instances running 24/7.

Q3. You register a data asset pointing to a CSV file in Azure Blob Storage. A colleague updates the CSV with new rows. When you reference the same data asset version, what data do you get?

A. The original data at registration time
B. The updated data (data assets are references, not copies)
C. An error because the data has changed
D. A merged dataset with versioning metadata

Show Answer

B. The updated data (data assets are references, not copies). URI-type data assets (URI File, URI Folder) are references to storage locations, not snapshots. If the underlying file changes, the data asset points to the updated version. To ensure reproducibility, create a new data asset version when data changes, or use immutable storage.

Q4. You need to use a Python package that is not in any curated Azure ML environment. What should you do?

A. Install it with pip in the training script's first line
B. Create a custom environment with a conda.yml that includes the package
C. Add it to the workspace's global Python path
D. Use a compute instance with the package pre-installed

Show Answer

B. Create a custom environment with a conda.yml that includes the package. Custom environments are Docker images with specific dependencies defined in conda.yml or requirements.txt. They ensure reproducibility across runs and deployments. Installing packages in the script is fragile and slow. There is no global Python path for workspaces.

Q5. You have a pipeline with three components: data_prep, train, evaluate. The train component depends on data_prep output, and evaluate depends on train output. How should you define the dependencies?

A. Set explicit depends_on properties for each component
B. Pass the output of each component as input to the next (implicit dependency)
C. Use a sequential execution mode in the pipeline decorator
D. Set priority levels for each component

Show Answer

B. Pass the output of each component as input to the next (implicit dependency). In Azure ML SDK v2, when you connect component outputs to the next component's inputs (e.g., train_step.inputs.data = prep_step.outputs.prepared_data), the pipeline automatically infers execution order. There is no need for explicit depends_on or priority settings.

Q6. Which of the following is NOT automatically created when you create an Azure ML workspace?

A. Azure Storage Account
B. Azure Key Vault
C. Azure Container Registry
D. Application Insights

Show Answer

C. Azure Container Registry. ACR is created on-demand the first time you build a custom environment or deploy a model. Storage Account, Key Vault, and Application Insights are all created automatically with the workspace.

Q7. You need to share a workspace across three teams while ensuring each team can only access their own experiments and models. What should you configure?

A. Create three separate workspaces
B. Use Azure RBAC with custom roles scoped to resource groups
C. Create separate storage accounts for each team
D. Use workspace tags to separate team resources

Show Answer

A. Create three separate workspaces. Azure ML workspaces are the security boundary for access control. While RBAC exists, fine-grained access control within a single workspace (e.g., limiting access to specific experiments) is limited. The recommended practice for team isolation is separate workspaces, potentially in the same resource group for shared governance.

Q8. You want to run a training job using the Azure CLI v2. Which file format defines the job configuration?

A. JSON
B. YAML
C. Python script
D. Bicep template

Show Answer

B. YAML. Azure ML CLI v2 uses YAML files to define jobs, pipelines, environments, endpoints, and other resources. You submit jobs with az ml job create --file job.yml. While Python SDK v2 uses Python scripts, the CLI v2 is YAML-based.

Q9. Your training data contains timestamps with timezone information that varies across records. What is the recommended preprocessing step?

A. Remove timestamp columns entirely
B. Convert all timestamps to UTC before training
C. Use timestamps as string features
D. Let AutoML handle timezone conversion automatically

Show Answer

B. Convert all timestamps to UTC before training. Standardizing timezones ensures consistent temporal features. Removing timestamps loses potentially valuable temporal patterns. String encoding of timestamps is inefficient and loses ordering. AutoML handles many data issues but standardizing timezones in preprocessing is a best practice.

Q10. You need to schedule a pipeline to run every day at midnight UTC and also trigger it manually when new data arrives. How should you configure this?

A. Create two separate pipelines: one scheduled, one manual
B. Create one pipeline with a cron schedule, and invoke it on-demand via SDK when needed
C. Use Azure Data Factory for scheduling and Azure ML for manual runs
D. Create a Logic App that triggers the pipeline on both conditions

Show Answer

B. Create one pipeline with a cron schedule, and invoke it on-demand via SDK when needed. Azure ML pipelines support both scheduled execution (cron-based) and on-demand invocation via SDK/CLI. There is no need to duplicate the pipeline or use external orchestration services for this simple scenario.

Domain 2: Explore Data and Train Models (Questions 11-19)

Q11. You are performing EDA on a dataset and discover that the feature "income" has a right-skewed distribution. Which transformation is most appropriate?

A. StandardScaler
B. MinMaxScaler
C. Log transformation
D. One-hot encoding

Show Answer

C. Log transformation. Log transformation (e.g., np.log1p()) compresses the right tail of skewed distributions, making them more normally distributed. This helps many ML algorithms perform better. StandardScaler and MinMaxScaler normalize scale but do not fix skewness. One-hot encoding is for categorical features.

Q12. You configure an AutoML classification experiment with n_cross_validations=5. What does this mean?

A. The experiment runs 5 separate training jobs
B. The data is split into 5 folds; each model is trained 5 times using different validation sets
C. The top 5 models are cross-validated against each other
D. The experiment uses 5 different random seeds

Show Answer

B. The data is split into 5 folds; each model is trained 5 times using different validation sets. K-fold cross-validation splits data into K folds, trains on K-1 folds, and validates on the remaining fold, rotating K times. This gives a more robust estimate of model performance than a single train/test split. Each candidate model in AutoML is evaluated using all 5 folds.

Q13. You need to track experiments and log metrics during custom training. Which statement correctly logs a metric?

A. print(f"accuracy: {accuracy}")
B. mlflow.log_metric("accuracy", accuracy)
C. workspace.log("accuracy", accuracy)
D. run.record("accuracy", accuracy)

Show Answer

B. mlflow.log_metric("accuracy", accuracy). MLflow is the natively integrated experiment tracking framework in Azure ML SDK v2. The mlflow.log_metric() function logs scalar metrics that appear in the Azure ML Studio experiment view. Print statements are not captured as metrics. The other options use deprecated or non-existent APIs.

Q14. You are configuring a hyperparameter sweep with 1000 possible combinations. Your compute budget allows only 50 trials. Which sampling algorithm is most efficient?

A. Grid
B. Random
C. Bayesian
D. Sequential

Show Answer

C. Bayesian. Bayesian optimization uses a probabilistic model to select the most promising hyperparameter combinations based on previous results. With only 50 trials out of 1000 possibilities, Bayesian sampling is the most efficient because it learns from prior trials. Grid search cannot cover 1000 combinations in 50 trials. Random is better than grid but less efficient than Bayesian.

Q15. Your classification dataset has 95% negative samples and 5% positive samples. Which technique is NOT appropriate for handling this imbalance?

A. SMOTE (Synthetic Minority Over-sampling Technique)
B. Class weight adjustment in the model
C. Undersampling the majority class
D. Removing all features correlated with the minority class

Show Answer

D. Removing all features correlated with the minority class. Removing features correlated with the target class would make the model worse, not better. SMOTE creates synthetic positive samples. Class weights penalize misclassification of the minority class. Undersampling reduces majority class size. All three are valid imbalance techniques.

Q16. In AutoML, what does enable_early_termination=True do?

A. Stops the entire AutoML experiment if no improvement is seen
B. Terminates individual model training runs that show poor performance early
C. Limits each model to train for only 5 epochs
D. Stops the experiment after the first successful model

Show Answer

B. Terminates individual model training runs that show poor performance early. Early termination in AutoML monitors each candidate model during training. If a model's performance is consistently worse than other candidates at the same point in training, it is terminated to save compute. This does not stop the overall experiment — it just prunes unpromising individual models.

Q17. You need to train a time-series forecasting model with AutoML. Which additional parameter must you specify compared to a standard regression task?

A. target_column_name
B. time_column_name
C. n_cross_validations
D. primary_metric

Show Answer

B. time_column_name. Forecasting requires specifying which column contains the timestamps so AutoML can respect temporal ordering, generate time-based features (day of week, lag values, rolling windows), and use proper time-series cross-validation. The other parameters are also required but are common to all AutoML task types.

Q18. You want to use mlflow.sklearn.autolog() in your training script. What does it automatically log?

A. Only the model artifact
B. Model parameters, metrics, and model artifact
C. Only metrics computed during training
D. A screenshot of the training notebook

Show Answer

B. Model parameters, metrics, and model artifact. MLflow autologging for scikit-learn automatically captures: model hyperparameters (e.g., n_estimators, max_depth), training metrics (e.g., training score), the serialized model artifact, and the model signature (input/output schema). This eliminates the need for manual mlflow.log_param() and mlflow.log_metric() calls.

Q19. You have a dataset with 50 numerical features. Many features are highly correlated (multicollinearity). Which technique should you apply before training a linear regression model?

A. One-hot encoding
B. PCA (Principal Component Analysis) or feature selection (VIF)
C. Log transformation on all features
D. Increase the number of training epochs

Show Answer

B. PCA (Principal Component Analysis) or feature selection (VIF). Multicollinearity (highly correlated features) causes instability in linear regression coefficients. PCA creates uncorrelated components. VIF (Variance Inflation Factor) identifies and removes highly correlated features. One-hot encoding is for categorical data. Log transformation addresses skewness, not multicollinearity.

Domain 3: Deploy and Optimize Models (Questions 20-25)

Q20. You deploy an MLflow model to a managed online endpoint. Do you need to provide a scoring script?

A. Yes, always
B. No, MLflow models support no-code deployment
C. Only if the model uses a custom framework
D. Only for batch endpoints

Show Answer

B. No, MLflow models support no-code deployment. When a model is registered in MLflow format with a valid model signature, Azure ML automatically generates the scoring infrastructure. No scoring script or custom environment is needed. This is one of the key benefits of using MLflow format for model packaging.

Q21. You have a managed online endpoint with two deployments: "blue" (current model) and "green" (new model). You set traffic to {"blue": 80, "green": 20}. A critical bug is found in the green deployment. What is the fastest way to route all traffic away from green?

A. Delete the green deployment
B. Update traffic to {"blue": 100, "green": 0}
C. Scale green deployment instances to 0
D. Disable the endpoint

Show Answer

B. Update traffic to {"blue": 100, "green": 0}. Updating traffic rules is nearly instantaneous and routes all requests to the blue deployment. Deleting the green deployment takes minutes and is irreversible. Scaling to 0 instances is not supported for online deployments. Disabling the endpoint would affect all users.

Q22. You need to deploy a model that processes large images and requires a GPU for inference. Which endpoint type and configuration should you use?

A. Managed online endpoint with a GPU instance type (e.g., Standard_NC6s_v3)
B. Managed batch endpoint with CPU instances
C. Azure Functions with HTTP trigger
D. Managed online endpoint with Standard_DS3_v2 (CPU)

Show Answer

A. Managed online endpoint with a GPU instance type (e.g., Standard_NC6s_v3). Image processing models (especially deep learning) require GPU for efficient inference. Managed online endpoints support GPU instance types from the NC, ND, and NV series. CPU instances would be too slow for large images. Azure Functions do not support GPU. Batch endpoints are for offline processing, not real-time.

Q23. Your batch endpoint processes 1 million records. You want to maximize throughput. Which two settings should you tune? (Select two.)

A. instance_count (number of compute nodes)
B. auth_mode
C. max_concurrency_per_instance
D. output_file_name
E. endpoint description

Show Answer

A (instance_count) and C (max_concurrency_per_instance). Instance_count controls horizontal scaling (more nodes processing in parallel). Max_concurrency_per_instance controls how many mini-batches each node processes concurrently. Together they determine total parallelism. Auth_mode, output_file_name, and description do not affect throughput.

Q24. You want to transition a model from "Staging" to "Production" in the MLflow model registry. Which method do you use?

A. ml_client.models.promote()
B. client.transition_model_version_stage()
C. mlflow.deploy_model()
D. model.set_stage("Production")

Show Answer

B. client.transition_model_version_stage(). Using the MlflowClient, you call transition_model_version_stage(name, version, stage) to move a model between stages (None, Staging, Production, Archived). The other methods do not exist in the MLflow or Azure ML SDK.

Q25. After deploying a model, you want to log request/response data for debugging. Which Azure service provides this capability out-of-the-box with managed online endpoints?

A. Azure Monitor
B. Application Insights
C. Azure Event Hub
D. Azure Log Analytics

Show Answer

B. Application Insights. Managed online endpoints automatically integrate with Application Insights (created with the workspace). You can enable request/response logging to capture input/output data, latency, errors, and custom telemetry from the scoring script. Azure Monitor is the broader platform, but Application Insights is the specific service.

Domain 4: Responsible AI (Questions 26-30)

Q26. Which Microsoft Responsible AI principle is most directly addressed by the Fairlearn library?

A. Reliability & Safety
B. Fairness
C. Transparency
D. Privacy & Security

Show Answer

B. Fairness. Fairlearn is specifically designed to assess and mitigate fairness issues in machine learning models. It provides metrics to evaluate whether models perform equitably across different demographic groups and mitigation algorithms to reduce disparities.

Q27. You need to explain to a non-technical business stakeholder why your model denied a specific customer's application. Which Responsible AI dashboard component is most appropriate?

A. Error Analysis tree map
B. Counterfactual What-If analysis
C. Causal inference chart
D. Global feature importance bar chart

Show Answer

B. Counterfactual What-If analysis. Counterfactuals provide the most intuitive explanation: "The application was denied, but if income were $5,000 higher and credit score were 20 points higher, it would be approved." This is actionable and understandable by non-technical stakeholders. Error Analysis shows where the model fails but does not explain individual decisions.

Q28. Your model achieves 95% overall accuracy, but the Error Analysis dashboard reveals a cohort with only 60% accuracy. This cohort represents customers over age 65 with low income. What should you do?

A. Report 95% accuracy and deploy the model
B. Collect more training data for the underperforming cohort and retrain
C. Remove age and income features to eliminate the bias
D. Lower the confidence threshold for all predictions

Show Answer

B. Collect more training data for the underperforming cohort and retrain. The cohort likely has insufficient representation in the training data. Collecting more examples of customers in this demographic will help the model learn better patterns. Removing features loses predictive signal. Reporting only overall accuracy hides the disparity. Lowering thresholds globally does not fix the underlying issue.

Q29. A healthcare organization requires that aggregate statistics about patient outcomes cannot be used to infer information about any individual patient. Which privacy technique provides this guarantee?

A. Data anonymization (removing patient IDs)
B. Differential privacy
C. Data encryption at rest
D. Role-based access control

Show Answer

B. Differential privacy. Differential privacy provides a mathematical guarantee (controlled by the epsilon parameter) that the output of any analysis does not significantly change based on the presence or absence of any single individual. Simple anonymization does not prevent re-identification through quasi-identifiers. Encryption protects data storage, not analysis outputs.

Q30. You want to generate a Responsible AI dashboard that includes error analysis, model explanations, and counterfactual analysis. How do you create this in Azure ML?

A. Manually create each component in Azure ML Studio
B. Submit a Responsible AI Insights pipeline job with the desired components
C. Install a VS Code extension
D. Use the Azure CLI to generate a static HTML report

Show Answer

B. Submit a Responsible AI Insights pipeline job with the desired components. The Responsible AI dashboard is generated by running a pipeline job that computes the requested insights (error analysis, explanations, counterfactuals, causal analysis). The results are stored and viewable in Azure ML Studio. This is done programmatically via SDK v2 or CLI v2, not manually in the Studio UI.

Score Yourself

Score RangeAssessmentRecommendation
27-30 (90-100%)Exam ReadySchedule your exam with confidence. Do a quick review of any missed questions.
21-26 (70-86%)Nearly ReadyReview the domains where you missed questions. Take another practice exam in a few days.
15-20 (50-66%)More Study NeededRe-read the lessons for your weakest domains. Practice hands-on labs in Azure ML Studio.
Below 15 (<50%)Continue StudyingWork through the entire course again. Focus on hands-on practice with a free Azure account.