AI‑900 Ultra‑Deep Dive: AI Workloads + How They Really Work (Generative AI, NLP, Vision, Speech) & Responsible AI

This article goes beyond recognition—into mechanics. We’ll cover how Generative AI, NLP, Vision, and Speech actually work (tokens, semantics, attention, OCR pipelines, STT/TTS), while staying within AI‑900 scope. We’ll also map everything to Azure services and Responsible AI principles used on the exam.

Exam framing: AI‑900 tests your ability to identify workloads and apply Responsible AI, not to code the internals. But a conceptual grasp of “how it works” makes service selection obvious and reduces trick questions. See the official study guide for skills/weights.

Quick Navigation

AI Workloads Overview
Generative AI (Deep Dive: tokens, context, attention, sampling, RAG)
NLP (Deep Dive: semantics, PII, summarization, intents)
Computer Vision (Deep Dive: image analysis, OCR, faces, video)
Speech (Deep Dive: STT/TTS/Translation, diarization)
Document Intelligence (deep)
Responsible AI (fairness, transparency, privacy, inclusiveness, accountability)
Service Chooser (scenario → workload → service)
New‑Learner Q&A
References

AI Workloads Overview (in Azure)

Workload	What it solves	Azure services to recognize
Generative AI	Create new content (text, images, audio/code); augment apps with LLMs + RAG.	Azure OpenAI Service (GPT‑family, DALL·://www.youtube.com/watch?v=HRRqocX1o9Y) -->
NLP	Understand text: sentiment, entities/PII, summarization, intents/Q&A.	Azure Language (NER/PII, Sentiment, Summarization, CLU, Question Answering).
Vision	Understand images/video: tags, objects, captions, OCR, faces, video topics.	Azure Vision (Image Analysis/OCR/Face/Spatial), Video Indexer, Vision Studio.
Speech	Speech↔Text, translation, diarization, neural TTS, speaker recognition.	Azure Speech (real‑time/fast/batch STT, TTS, Translation, Speaker Recognition).
Document Intelligence	Extract fields/tables from PDFs/images; classify/split docs.	Azure AI Document Intelligence (prebuilt/custom; containers; studio).

“3‑step” exam method: (1) Identify the workload from the scenario words; (2) map to the Azure service; (3) apply Responsible AI & data constraints. This mirrors the skills outline and Microsoft Learn modules.

Generative AI — How It Works (Tokens, Attention, Context, Sampling, RAG)

What “generative” means in practice

Generative models learn statistical patterns over tokens (sub‑word units) and predict the next token given the previous context, producing coherent text, code, or even image descriptions. In Azure, this is delivered via Azure OpenAI Service with enterprise controls for security, regions, and throughput (standard, provisioned, batch).

Key technical terms (with context)

Token — the atomic unit models process (≈ sub‑word pieces like "inter", "national"). Tokenization (e.g., BPE) converts text to token IDs; models output a probability distribution (logits) over the next token.
Embedding — dense vector representation of a token/word/sentence capturing semantics used for search, clustering, and RAG retrieval.
Context window — maximum tokens the model attends to (prompt + response). Longer windows support longer conversations & documents, but still require retrieval for large corpora.
Attention — mechanism (Transformer) that lets each token weigh others to compute the next‑token distribution; enables long‑range dependencies vs. older RNNs/CNNs.
Sampling — converts logits to text. Temperature scales randomness; top‑p (nucleus) restricts to the smallest set of tokens whose cumulative probability ≥ p; top‑k limits to k highest probability tokens.
System/User messages — structured prompts that set behavior/instructions vs. user inputs; supported by Azure OpenAI chat APIs.
Hallucination — plausible‑sounding but incorrect output; mitigated with grounding via RAG and content filtering.

RAG (Retrieval‑Augmented Generation): why it matters

RAG combines a search layer (vector or hybrid) with the LLM: retrieve relevant chunks (via embeddings) and inject them into the prompt so answers are grounded in your data. This boosts factuality and reduces hallucinations—recommended in Microsoft’s Azure architecture guidance for enterprise apps.

RAG grounds LLM answers with retrieved, domain‑specific context.

Azure OpenAI in practice: You choose deployment type (standard, provisioned throughput, batch). Batch can reduce costs for non‑interactive jobs; provisioned gives predictable latency/throughput; regional options address data boundaries.

New‑learner: “Is token = word? Why does context window matter?”

Not exactly. Tokens are sub‑word pieces; “internationalization” may be several tokens. The model can only “see” up to its context limit (prompt+response). Long policies or PDFs require chunking + RAG to remain within limits while staying factual.

NLP — Semantics, Summarization, PII, Intents (CLU)

Azure Language unifies traditional text analytics (sentiment, key phrases, entities/PII, language detection), summarization (documents & conversations), question answering, and Conversational Language Understanding (CLU) for intents/entities—all accessible via REST/SDK and Language Studio.

How NLP features work conceptually

Semantics via embeddings: Text is mapped to vectors so “refund” is near “return,” enabling grouping, retrieval, and clustering. Prebuilt features hide the math; you use high‑level APIs.
Sentiment & opinion mining: Classifies per‑document/sentence polarity and associates sentiments with targets (e.g., “shipping was slow”). Useful for VoC analytics.
Named Entity Recognition (NER) & PII redaction: Extract entities (Person, Org, Date, etc.) and redact sensitive data before storage or search—key for privacy.
Summarization: Extractive picks key sentences; abstractive generates concise prose. Azure Language offers both for documents and conversations.
CLU: Train intents/entities with labeled utterances; great for routing and slot‑filling in bots.

New‑learner: “Language vs. Azure OpenAI for summarization?”

Use Language when you want task‑specific, predictable summarization with minimal tuning. Use Azure OpenAI when you need flexible generative summaries and are ready to add guardrails/grounding. Many production apps mix both.

Computer Vision — Image Analysis, OCR, Face, Video

Azure’s Vision family provides Image Analysis (tags/objects/captions), OCR for text extraction, Face for face detection/analysis, and Video Indexer for higher‑level video insights. Vision Studio gives you a no‑code way to try features before coding.

How core pieces work

Image analysis: Modern vision backbones (CNN/Transformers) output tags, object boxes, and captions; the API returns JSON with confidence scores.
OCR: The pipeline detects text regions → lines → words → outputs text + coordinates; works for printed and many handwritten cases.
Face: Detects faces and can return attributes (e.g., landmarks). Identity features are gated under Responsible AI approvals.
Video Indexer: Combines speech transcription, face/object detection, and topic extraction across time.

Responsible use: Sensitive capabilities (certain face/voice features) are access‑controlled with transparency notes and policies. Plan for user consent, disclosure, and purpose limitations.

Speech — STT/TTS/Translation, Diarization, Customization

Azure Speech supports real‑time, fast, and batch transcription; natural‑sounding neural TTS (including custom neural voice with governance); translation; speaker recognition; and avatar/voice‑live features—deployable in cloud or containers.

How STT works (conceptually)

Acoustic model turns waveform → phonetic probabilities; language model resolves words from likely sequences; diarization separates speakers in meetings/calls.
Modes: real‑time for streaming captions; fast for synchronous file transcription; batch for archives/large volumes.
Customization: domain lexicons and custom speech models improve accuracy in noisy/jargon contexts.

Speech‑to‑Text pipeline (left→right). TTS reverses it with neural vocoders/voices.

New‑learner: “Is Whisper available on Azure?”

Yes—Azure Speech/Azure OpenAI offer Whisper‑based transcription scenarios (availability evolves; check current docs).

Document Intelligence — Field/Tables Extraction, Classification & Splitting

Azure AI Document Intelligence (formerly Form Recognizer) extracts key‑value pairs and tables from documents using prebuilt models (invoice, receipt, ID, business card) or custom models trained on your layouts. It also supports document classification and splitting for multi‑doc files. Deploy in cloud or containers and integrate with Power Automate/Logic Apps.

Document Intelligence: from OCR/layout to business fields and tables.

New‑learner: “Why not just Vision OCR for invoices?”

OCR gives you text. Document Intelligence gives you semantics (Vendor, Invoice #, Due Date, line items) and optional custom models tuned to your layout—much more useful downstream.

Responsible AI — Fairness, Transparency, Privacy, Inclusiveness, Accountability

Microsoft implements Responsible AI via six principles: fairness, reliability & safety, privacy & security, inclusiveness, transparency, and accountability—with tools/governance across Azure services. AI‑900 expects you to map scenarios to these principles and suggest safeguards.

Principle	What to remember	Exam‑style safeguard
Fairness	Comparable performance across groups; reduce biased outcomes.	Run cohort error analysis; rebalance data; monitor drift.
Transparency	Explain behavior/limits; documentation, transparency notes.	Provide explanation summaries and usage disclosures.
Privacy	Minimize, protect, and govern personal data (PII).	Use Language PII redaction; encrypt at rest/in transit; RBAC.
Inclusiveness	Design for diverse users/abilities; accessibility.	Captions, TTS, accent support; UX testing across abilities.
Accountability	Humans own outcomes; auditable; escalation paths.	Human‑in‑the‑loop for consequential decisions; audit logs.

For the exam: If a scenario mentions identity/biometrics or synthetic voice, mention approvals/policies, disclosure, user consent, and guardrails—these are emphasized in Microsoft’s Responsible AI materials.

Service Chooser — Scenario → Workload → Azure Service

Scenario	Workload	Service
Generate concise FAQs from a long policy and chat over it.	Generative + NLP	Azure OpenAI + RAG; optional Language (PII redaction).
Tag product images, detect objects, and caption them.	Vision	Azure Vision Image Analysis; try in Vision Studio first.
Extract vendor, totals, and line items from invoices.	Document Intelligence	Prebuilt Invoice or Custom model in Document Intelligence.
Real‑time multilingual captions for webinars.	Speech	Speech‑to‑Text + Translation (real‑time).
Detect sentiment and redact PII from customer reviews.	NLP	Language (Sentiment/Opinion + PII detection).

New‑Learner Q&A (short, exam‑aligned)

Is AI‑900 about coding?

No. It’s about recognizing workloads, mapping to Azure services, and applying Responsible AI in scenario questions. Hands‑on via studios is encouraged but not required.

How do I try features without writing code?

Use Vision Studio, Language Studio, and Speech Studio to experiment with images, text, and audio—great for intuition and demos.

What’s the difference between extractive and abstractive summaries?

Extractive picks existing sentences; abstractive writes new sentences. Azure Language supports both.

When do I need approvals?

Sensitive face identification and custom neural voice scenarios require use‑case review under Microsoft’s Responsible AI processes.

References (Official Docs & Study)

Official AI‑900 study guide (skills & weights): Microsoft Learn.
Responsible AI concepts & principles: Responsible AI in Azure ML, Microsoft Principles & Approach.
Vision: Vision Studio, Azure Vision, Vision learning path.
Language: Language overview, Language learning path.
Document Intelligence: Connector overview, Renaming & updates.
Speech: Speech overview, STT modes.
Generative AI (Azure OpenAI): Developer resources hub, Pricing & deployment models.
Architecture patterns for AI & RAG: Azure Architecture Center.

Note: Styles are safely scoped to this post (.ai900-pro) so they won’t affect your blog’s header or template. If your theme uses a sticky header that overlaps content, add extra top padding to .ai900-pro.

↑ Back to top

AI‑900 Essentials: Identify AI Workloads & Practice Responsible AI (Deep, New‑Learner Friendly)

AI‑900 Ultra‑Deep Dive: AI Workloads + How They Really Work (Generative AI, NLP, Vision, Speech) & Responsible AI

AI Workloads Overview (in Azure)