Skip to content

alexian-norvik/LLMs-Interview-questions

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

6 Commits
 
 

Repository files navigation

LLMs-Interview-questions

1. What is a Large Language Model (LLM)?

A Large Language Model is a type of deep learning model designed to understand, generate, and manipulate human language. LLMs are typically based on Transformer architectures and are trained on massive corpora to learn linguistic patterns, world knowledge, and reasoning abilities. They underpin many applications, from chatbots to code generation tools.

2. How are LLMs trained?

LLMs undergo pretraining on vase, diverse datasets using self-supervised learning. The most common objective is next-token prediction: given a sequence of tokens, the model predicts what comes next. This allows LLMs to learn grammar, semantics, and even factual knowledge without labeled datasets. Training requires significant computational resources and optimization techniques like AdamW and learning rate scheduling.

3. What is a token in LLMs?

In LLMs, a token is the smallest unit of text the model processes. Tokens can be full words, subwords, or even characters depending on the tokenizer (e.g., Byte-Pair Encoding or SentencePiece). For instance, the word "engineering" might be split into "engine" and "eering". Tokenization ensures that the model can handle rare or compound words efficiently.

4. What is a context window and why does it matter?

The context window defines how many tokens the model can attend to at one time. A model like GPT-3 has a context limit of 2,048 tokens, while newer models like GPT-4 and Claude 3 go up to 100,000+. A large context window enables better handling of long documents, persistent dialogue, and multi-part reasoning. Context limitations directly affect model memory and coherence.

5. What are embedding layers and why are they important?

Embedding layers are the entry point of LLMs. They map discrete tokens into high-dimensional continuous vectors, allowing the model to capture syntactic and semantic relationships. Words with similar meanings, like "car", and "vehicle", will have embeddings close in vector space, which improves the model's ability to generalize language.

6. How do LLMs handle out-of-vocabulary (OOV) words?

Unlike earlier NLP systems with fixed vocabularies, modern LLMs mitigate OOV issues via subword tokenization. Unknown words are decomposed into known sub-units. For example, "bioinformatics" could be tokenized as "bio", "inform", "atics". This strategy ensures that the model can interpret new or rare words contextually.

7. What are the most common real-world use cases for LLMs?

LLMs power a broad spectrum of AI applications, including:

  • Conversational AI (chatbots, voice assistants)
  • Content creation (blogs, ads, summaries)
  • Code generation & completion (GitHub Copilot)
  • Information retrieval and Q&A
  • Sentiment analysis and moderation
  • Language Transalation and localization

8. What is fine-tuning and why is it used?

Fine-tuning is the process of continuing training on a smaller, domain-specific dataset after general pretraining. It allows LLMs to specialize in a particular industry (e.g., legal, healthcare, finance) or task (e.g., summarization, classification). It improves performance, reduce hallucination, and enables alignment with brand or organizational voice.

9. What is prompt engineering?

Prompt engineering involves crafting precise input prompts that guide the model toward a desired output. It's critical when using models in a zero-shot or few-shot setting. A well-engineered prompt can significantly improve model performance without requiring retraining-making it a high-leverage skill for AI practitioners.

10. What is zero-shot learning in the context of LLMs?

Zero-shot learning refers to the model's ability to perform tasks without prior examples. For instance, asking in LLM, "Translate 'good morning' into Japanese" assumes the model can infer the instruction and complete the task without training examples. This showcases the model's inherent generalization capability.

11. What is few-shot learning and how does it differ from zero-shot?

Few-shot learning involves giving the model a small numer of task-specific examples within the prompt. This primes the model for the expected structure and output style. For instance, few-shot prompting is particularly effecitve in classification, summarization, or role-based dialogue simulations.

12. What are the main challenges of deploying LLMs in production?

key challenges include:

  • Cost: Serving large language models requires GPUs or TPUs.
  • Latency: Response time can be high for large inputs.
  • Hallucination: LLMs may confidently generate incorrect information.
  • Bias: Models may reflect societal or dataset biases.
  • Data privacy: Sensitive inputs need to be managed carefully.

13. What is a hallucination in LLMs?

Hallucination refers to the generation of plausible but incorrect or fictional information. For example, the model might invent citations, fake statistics, or historical facts. It's one of the most studied failure modes and is especially problematic in high-stakes applications like mediciine or law.

14. Are LLMs deterministic in their responses?

Not inherently. LLMs use probabilistic sampling methods (e.g., top-k, nucleus sampling) when generating text. This means responses can vary across runs with the same input. However, by setting temperature to 0 and using greedy decoding, you can force deterministic outputs.

15. What is the temperature setting in LLMs?

Temperature controls randomness in output generation:

  • Low temperature (e.g., 0.2): More focused, deterministic outputs.
  • High temperature (e.g., 0.9): More diverse and creative, but less predictable.

It's a trade-off between accuracy and creativity, useful in both business and creative contexts.

16. How does transfer learning apply to LLMs?

LLMs use transfer learning by leveraging generalized pretraining knowledge and adapting it to specific downstream tasks through fine-tuning or prompt-based learning. This significantly reduces the data and compute needed for new applications.

17. What is pretraining and how does it differ from fine-tuning?

Pretraining is the large-scale unsupervised learning phase where the model learns general language structure and world knowledge. Fine-tuning comes afterward and is task-specific. Together, they form the two-stage pipeline that powers most state-of-the-art LLMs today.

18. What is the role of attention in LLMs?

Attention mechanisms allow the model to weigh the importance of each word relative to others in a sentence. For example, in the phrase "The thropy didn't fit in the suitcase because it was too small", attention helps resolve the reference of "it". This is central to understanding context and reasoning.

19. How do LLMs differ from traditional NLP models?

Traditional NLP approaches were task-specific and relied heavily on feature engineering and labeled data. LLMs are end-to-end models that learn representations directly from raw text, offering much better generalization and fewer domain constraints. They're also far more scalable and flexible across tasks.

20. Can LLMs understand non-textual data like images or audio?

Base LLMs are text-only. However, multimodal models like GPT-4o, Gemini, and Claude 3 Opus integrate vision, audio, and even video. These models can process PDFs, screenshots, diagrams, or spoken language-expanding the scope of LLM applications beyond traditional NLP.

21. What is the Transformer architecture and why is it foundational to LLMs?

The Transformer is a neural network architecture introduced in 2017 through the "Attention is All You Need" paper. It replaced recurrence with self-attention mechanisms, enabling parallel computation and long-range dependency modeling. LLMs like GPT, BERT, and T5 are built upon Transformer blocks, making it the backbone of modern NLP.

22. What is self-attention in LLMs?

Self-attention allows the model to focus on different words in a sentence based on their relevance. It computes a weighted representation for each word by considering all others in the sequence, enabling nuanced understanding of grammar, co-reference, and context, key for tasks like translation and summarization.

23. How do positional encodings work in Transformers?

Since Transformers lack recurrence, they use positional encodings to inject information about word order. These encodings are added to token embeddings and can be sinusoidal or learned. This lets the model distinguish between sequences like "cat sat on mat" vs. "mat sat on cat".

24. What is the difference between encoder-only, decoder-only, and encoder-decoder models?

  • Encoder-only (e.g., BERT): Used for understanding tasks like classification.
  • Decoder-only (e.g., GPT): Suited for generative tasks like text completion.
  • Encoder-decoder (e.g., T5, BART): ideal for sequence-to-sequence tasks like translation or summarization.

25. What is instruction tuning?

Instruction tuning involves fine-tuning LLMs using prompts framed as instruction paired with ideal responses. This helps models follow human commands better in zero-shot settings and improves alignment with real-world user intents, crucial for LLM-as-a-service platforms.

26. What is RLHF (Reinforcement Learning with Human Feedback)?

RLHF is a post-training technique where human preferences guide model behavior. It uses a reward model trained on human-labeled responses and fine-tunes the LLM via reinforcement learning. It's critical in models like ChatGPT to align outputs with human expectations and ethics.

27. What are safety layers in LLM deployments?

Safety layers are mechanisms built around LLMs to prevent harmful or inappropriate outputs. This include moderation filters, guardrails, rejection sampling, and constitutional AI techniques. They're essential in regulated environments like finance or healthcare.

28. What is model alignment in the context of LLMs?

Model alignment refers to the process of ensuring that an LLM behaves in accordance with human values, legal standards, and organizational goals. Techniques include fine-tuning, RLHF, and prompt design. Alignment is vital for trustworthiness and safe AI deployment.

29. How do retrieval-augmented generation (RAG) systems work?

RAG systems combine LLMs with external knowledge retrieval. First, a search component fetches relevant documents, then, the LLM uses them to generate responses. This improves factual accuracy, reduces hallucination, and enables real-time knowledge access.

30. What is a vector database and how is it used with LLMs?

Vector databases (like Qdrant, Pinecone, FAISS) store text embeddings as vectors and allow fast similarity search. LLM can generate embeddings for queries and match them against stored vectors, enabling semantic search, recommendation, and contextual grounding.

31. What is chain-of-thought prompting?

Chain-of-thought (CoT) prompting encourages the LLM to break down reasoning into steps, improving performance in logic-heavy tasks like arithmetic or multi-hop questions. For example, asking "Let's think step by step" can significantly boost reasoning accuracy.

32. How does an LLM handle ambiguous inputs?

LLMs infer meaning based on context but may struggle with ambiguous prompts. Techniques like clarification questions, few-shot prompting, or disambiguation through instruction fine-tuning help models respond more accurately.

33. What is a system prompt in LLMs?

A system prompt is a hidden instruction provided to the model to shape its behavior throughout a session. It defines tone, role, or constraints (e.g., "You are a helpful medical assistant"). System prompts are crucial for controlling model output in multi-turn interactions.

34. How are embeddings used for personalization?

LLM-generated embeddings can capture user preferences, query history, or content interactions. These vectors are used to personalize responses or recommendations, making AI assistants more context-aware and user-centric.

35. What are hallucination mitigation techniques in LLMs?

Mitigation strategies include:

  • RAG or grounding with verified knowledge bases
  • Confidence scoring
  • Few-shot or CoT prompting
  • Post-hoc fact-checking using external tools

These reduce false outputs in mission-critical applications.

36. What is model quantization and why is it useful?

Quantization reduces model size and speeds up inference by converting weights from 32-bit floating point to 8-bit or lower. While it may introduce minor accuracy loss, it enables LLM deployment on edge devices and improves scalability.

37. What is LoRA (Low-Rank Adaptation) in LLM fine-tuning?

LoRA is a parameter-efficient fine-tuning technique that injects trainable low-rank matrices into transformer layers, avoiding the need to update the entire model. It drastically reduces compute cost and memory usage during task-specific adaptation.

38. What is a multi-modal LLM?

A multi-modal LLM can process and generate across text, image, audio, and video inputs. Models like GPT-4o or Gemini combine vision and language understanding, enabling tasks like image captioning, diagram Q&A, or even speech-to-text reasoning.

39. How do LLMs support enterprise search?

LLMs enhance enterprise search by understanding semantic intent and retrieving relevant documents using embeddings and RAG. They also summarize, rank, and answer questions over internal content, transforming knowledge management and decision support.

40. What is synthetic data generation using LLMs?

LLMs can create labeled examples to augment datasets for training smaller models or testing NLP pipelines. For instance, generating fake customer support chats or legal clauses accelerates AI development without requiring expensive human labeling.

41. What is the difference between autoregressive and autoencoding models?

Autoregressive models, such as GPT, are designed to generate text by predicting the next token based on the previous ones. This means they operate in a unidirectional fashion, left to right making them ideal for generative tasks like text completion or chatbot responses.
On the other hand, autoencoding models like BERT are trained to reconstruct masked tokens by learning context from both left to right directions (bidirectional). This makes them suitable for understanding tasks such as sentiment analysis, text classification, and question answering. The key distinction lies in how they learn and apply context, and each is optimized for different types of downstream NLP tasks.

42. What is the role of layer normalization in LLMs?

Layer normalization is a stabilization technique used within transformer layers of LLMs to normalize inputs across the feature dimension. It ensures that each neuron's output distribution remains consistent, which speeds up training and improves convergence. Without normalization, deep models often face exploding or vanishing gradients, making training unstable. Layer normalization helps maintain gradient flow and reduces internal covariate shift, which is critical in training large-scale LLMs with billions of parameters.

About

LLMs interview questions

Topics

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

 
 
 

Contributors