Practice Assessment
25 exam-style questions covering all four certification domains. Click each question to reveal the answer with a detailed explanation. Aim to answer at least 20 correctly before taking the real exam.
Domain 1: Transformers Library (Questions 1-7)
Q1: What is the correct way to load a BERT tokenizer and model for sentiment classification?
Answer: Use AutoTokenizer.from_pretrained("bert-base-uncased") and AutoModelForSequenceClassification.from_pretrained("bert-base-uncased", num_labels=2). The num_labels parameter is required to specify how many output classes the classification head should have. Using AutoModel instead would give you hidden states without a classification head.
Q2: What pipeline task should you use for zero-shot topic classification without fine-tuning?
Answer: Use pipeline("zero-shot-classification"). This pipeline uses NLI (Natural Language Inference) models to classify text into arbitrary categories without task-specific fine-tuning. You provide candidate_labels and the model determines which label best matches the input. Common model: facebook/bart-large-mnli.
Q3: What does tokenizer(..., return_tensors="pt", padding=True, truncation=True) do?
Answer: This tokenizes the input and returns PyTorch tensors (return_tensors="pt"), pads shorter sequences to match the longest in the batch (padding=True), and truncates sequences exceeding the model's max length (truncation=True). The output dictionary contains input_ids, attention_mask, and optionally token_type_ids.
Q4: How do you convert model output logits to probabilities?
Answer: Apply torch.softmax(logits, dim=-1) (or torch.nn.functional.softmax(logits, dim=-1)) for multi-class classification. For binary classification with a single logit, use torch.sigmoid(logits). The pipeline API handles this automatically, but for manual inference you must apply the activation function yourself.
Q5: Name three differences between encoder models (BERT) and decoder models (GPT).
Answer: (1) Attention: Encoders use bidirectional (full) attention; decoders use causal (left-to-right) attention. (2) Best tasks: Encoders excel at understanding tasks (classification, NER, QA); decoders excel at generation tasks (text completion, chat). (3) Input processing: Encoders process the entire input at once; decoders process tokens sequentially and generate one token at a time.
Q6: What is the purpose of the [CLS] and [SEP] tokens in BERT?
Answer: [CLS] (Classification) is prepended to every input. Its final hidden state is used as the aggregate sequence representation for classification tasks. [SEP] (Separator) separates two segments in sentence-pair tasks (e.g., NLI, QA). For single sentences, it marks the end of the input. These are special tokens added automatically by the tokenizer.
Q7: What is the difference between pipeline("fill-mask") and pipeline("text-generation")?
Answer: fill-mask predicts a single masked token within existing text (e.g., "Paris is the [MASK] of France" → "capital"). It uses encoder models (BERT). text-generation generates continuation text from a prompt (e.g., "Once upon a" → "time there was..."). It uses decoder models (GPT-2). Different model architectures, different use cases.
Domain 2: Fine-Tuning (Questions 8-13)
Q8: What is the role of compute_metrics in the Trainer API?
Answer: compute_metrics is a function passed to the Trainer that calculates evaluation metrics after each evaluation step. It receives an EvalPrediction object with predictions (logits) and label_ids (ground truth). You must convert logits to class predictions with np.argmax() and compute metrics like accuracy, F1, etc. Without it, the Trainer only reports loss.
Q9: What does evaluation_strategy="epoch" do in TrainingArguments?
Answer: It runs evaluation on the eval dataset after every training epoch. Other options are "steps" (evaluate every N steps, set with eval_steps) and "no" (never evaluate during training). When combined with load_best_model_at_end=True and save_strategy="epoch", it enables automatic selection of the best checkpoint.
Q10: How does LoRA reduce memory usage compared to full fine-tuning?
Answer: LoRA freezes all original model parameters and adds small trainable low-rank matrices (A and B) to specific layers (typically attention query and value projections). Instead of updating millions of parameters, only the small rank-decomposition matrices are trained (typically 0.1-1% of total parameters). This reduces GPU memory for gradients and optimizer states by 90%+.
Q11: What does dataset.map(fn, batched=True, remove_columns=["text"]) do?
Answer: It applies function fn to the dataset in batches for efficiency (batched=True), and removes the "text" column from the output (remove_columns). This is commonly used for tokenization: the function tokenizes the text, and the original raw text column is removed since the model only needs input_ids and attention_mask.
Q12: What metric is best for evaluating a translation model?
Answer: BLEU (Bilingual Evaluation Understudy) is the standard metric for machine translation. It measures n-gram precision between the generated translation and reference translations. Use sacrebleu for a standardized implementation. For more nuanced evaluation, chrF (character F-score) and COMET (neural metric) are also used.
Q13: What is the difference between fp16=True and bf16=True in TrainingArguments?
Answer: Both enable mixed-precision training to reduce memory and speed up training. fp16 (float16) is supported on all modern GPUs and halves memory usage. bf16 (bfloat16) has a larger exponent range, reducing overflow/underflow issues, but requires Ampere+ GPUs (A100, RTX 3090+). Use bf16 when available as it is more numerically stable.
Domain 3: NLP Tasks (Questions 14-19)
Q14: What is the BIO tagging scheme used in NER?
Answer: BIO stands for Begin, Inside, Outside. B- marks the first token of an entity (e.g., B-PER), I- marks subsequent tokens of the same entity (e.g., I-PER), and O marks tokens that are not part of any entity. For example, "New York City" would be tagged as: New(B-LOC) York(I-LOC) City(I-LOC). This scheme handles multi-token entities and adjacent entities of the same type.
Q15: How does extractive QA handle contexts longer than the model's max length?
Answer: Long contexts are split into overlapping chunks using a sliding window (controlled by doc_stride). Each chunk is processed independently, and the answer with the highest confidence score across all chunks is selected. The overlap ensures that answers spanning chunk boundaries are not missed. Typical settings: max_seq_len=384, doc_stride=128.
Q16: What model should you use for abstractive summarization of news articles?
Answer: facebook/bart-large-cnn is the best choice. It is a BART model fine-tuned specifically on the CNN/DailyMail news dataset. For extreme summarization (one sentence), use google/pegasus-xsum. For dialogue summarization, use philschmid/bart-large-cnn-samsum. All are encoder-decoder (seq2seq) models.
Q17: What is aggregation_strategy in the NER pipeline?
Answer: It controls how subword-level predictions are aggregated into word-level entities. Options: "none" returns raw token predictions, "simple" groups adjacent tokens with the same entity type and averages scores, "first" uses the first subword score, "average" averages all subword scores, "max" uses the maximum. Default is "none"; use "simple" for clean results.
Q18: What is the difference between AutoModelForSeq2SeqLM and AutoModelForCausalLM?
Answer: AutoModelForSeq2SeqLM loads encoder-decoder models (T5, BART) for sequence-to-sequence tasks (translation, summarization). It takes an input and generates a different output. AutoModelForCausalLM loads decoder-only models (GPT-2, LLaMA) for text generation. It continues from a prompt. The key difference is architecture: encoder-decoder vs. decoder-only.
Q19: How do you handle multilingual translation with a single model?
Answer: Use mBART or NLLB models. Set tokenizer.src_lang to the source language code (e.g., "en_XX") and pass forced_bos_token_id=tokenizer.lang_code_to_id["fr_XX"] to model.generate() to specify the target language. This allows a single model to translate between any supported language pair without needing separate models.
Domain 4: Model Sharing (Questions 20-25)
Q20: What files are uploaded when you call model.push_to_hub()?
Answer: The model weights (model.safetensors or pytorch_model.bin), the model configuration (config.json), and the tokenizer files (tokenizer.json, tokenizer_config.json, vocab.txt or equivalent). You should also call tokenizer.push_to_hub() separately to ensure tokenizer files are included. A README.md (model card) should be added manually.
Q21: What YAML metadata should a model card include?
Answer: The YAML header should include: language (e.g., "en"), license (e.g., "mit"), tags (task-related tags), datasets (training datasets), metrics (evaluation metrics used), and optionally model-index with results for automatic leaderboard integration. This metadata makes models discoverable and filterable on the Hub.
Q22: How do you create a Gradio interface with multiple inputs?
Answer: Pass a list of input components: gr.Interface(fn=my_func, inputs=[gr.Textbox(label="Question"), gr.Textbox(label="Context")], outputs=gr.Textbox(label="Answer")). The function my_func receives multiple arguments matching the inputs list. For more complex layouts, use gr.Blocks with explicit wiring via .click().
Q23: What is the difference between a Hub model repo and a Space repo?
Answer: A model repo (repo_type="model") stores model weights, config, and tokenizer files. Users download from it with from_pretrained(). A Space repo (repo_type="space") contains an application (Gradio, Streamlit, or Docker) that is built and deployed as a web app. Spaces have compute resources and a public URL; model repos are just file storage.
Q24: How do you authenticate with the Hugging Face Hub programmatically?
Answer: Three methods: (1) huggingface_hub.login(token="hf_...") in Python code, (2) run huggingface-cli login in the terminal and paste your token, (3) set the HF_TOKEN environment variable. Tokens are generated at huggingface.co/settings/tokens. Use "write" tokens for pushing models and "read" tokens for private model access.
Q25: Why should you include a "Limitations and Biases" section in your model card?
Answer: This section promotes responsible AI by: (1) warning users about known failure cases (e.g., "does not work well on informal text"), (2) documenting potential biases in training data that may affect predictions, (3) defining the scope of intended use to prevent misuse, (4) enabling informed decision-making by downstream users. It is a requirement for ethical AI deployment and many organizations mandate it.
Score Guide
- 22-25 correct: You are ready for the certification exam
- 18-21 correct: Almost ready — review the topics you missed
- 14-17 correct: Need more study — re-read lessons 2-5 and retry
- Below 14: Go through the entire course again before attempting the exam
Lilly Tech Systems