AI‑900 Masterclass (Part 2, C3 Rewrite): Natural Language Processing (NLP) & Speech Intelligence
This part of the AI‑900 series focuses on how AI systems understand text and speech — two of the most important workload families in modern artificial intelligence.
1. What Is Natural Language Processing?
Natural Language Processing (NLP) is the field of AI that enables machines to analyze, understand, and generate human language. It allows computers to work with text not as raw strings of characters, but as meaningful expressions filled with concepts, emotions, relationships, and intentions.
NLP powers capabilities such as:
- Sentiment analysis
- Entity extraction
- PII detection
- Summarization
- Language detection
- Intent detection (Conversational AI)
- Key phrase extraction
- Document-level insights
2. How Do Machines Understand Text?
2.1 Step 1 — Tokenization
Before AI can interpret text, the input must be broken into pieces called tokens. Tokens are usually sub-word units (like "inter", "nal", "ization") or sometimes full words, punctuation, or special symbols.
2.2 Step 2 — Embeddings (Meaning as Numbers)
Each token is mapped to a vector of numbers — an embedding. Embeddings allow the model to understand:
- semantic similarity (doctor ↔ nurse)
- relationships (king → queen)
- contextual meaning (bank → riverbank vs. bank → finance)
2.3 Step 3 — Transformer-Based Understanding
Modern NLP relies heavily on transformer models. These models use self-attention to allow each token to “look at” every other token in the input to determine relevance, context, and meaning.
This allows systems like Azure Language to detect complex patterns such as:
- sentiment tied to specific topics
- entities embedded in long phrases
- summaries from multi-paragraph text
- customer intentions in chat logs
3. Azure Language — Unified NLP Platform
Azure Language provides a modern, unified set of NLP capabilities used throughout enterprise applications. The key advantage of Azure Language is that developers can perform advanced NLP tasks without building or training their own models.
3.1 Core Capabilities
- Named Entity Recognition (NER): Extracts people, locations, organizations, dates, products.
- PII Detection: Identifies and can redact sensitive information such as phone numbers, emails.
- Sentiment & Opinion Mining: Determines emotion and associates opinions with specific topics.
- Key Phrase Extraction: Identifies the most important topics in text.
- Summarization: Creates concise summaries using extractive or generative approaches.
- Language Detection: Identifies language and dialect.
- Conversational Language Understanding (CLU): Extracts intents and entities from chat messages.
- Question Answering: Answers questions directly using knowledge bases or documents.
4. Practical NLP Examples
4.1 Example: Sentiment Analysis
Input: “The delivery was late and the support team was unhelpful.”
Output:
- Overall Sentiment: Negative
- Aspects Detected:
- “delivery” → negative
- “support team” → negative
4.2 Example: Entity Extraction
Input: “Meet Sarah at 5 PM at Contoso HQ on Friday.”
Output:
- Person: Sarah
- Time: 5 PM
- Date: Friday
- Location: Contoso HQ
4.3 Example: Summarization
Azure Language can produce both short summaries and structured meeting summaries with sections, timestamps, topics, and action items.
5. Conversational Language Understanding (CLU)
CLU is used for virtual agents, chatbots, and conversational applications. It extracts:
- Intents — what the user wants
- Entities — key details needed to fulfill the task
Example:
User: “Book a flight from New York to Dallas next Monday.”
- Intent: BookFlight
- Entities:
- origin = New York
- destination = Dallas
- date = next Monday
SECTION B — SPEECH AI
Speech workloads allow applications to process audio in natural ways: Speech-to-Text (STT), Text-to-Speech (TTS), translation, voice customization, and speaker recognition.
6. Speech-to-Text (STT)
STT converts spoken audio into written text using three conceptual stages:
Azure Speech supports:
- Real-time transcription
- Fast synchronous transcription
- Batch transcription for large audio files
7. Text-to-Speech (TTS)
TTS converts written text into natural-sounding speech. Azure Speech provides:
- Hundreds of neural voices
- Support for many languages and dialects
- Custom Neural Voice (with ethical safeguards)
- SSML for controlling pitch, rate, style, and emotion
Use Cases:
- Voice assistants
- Audiobook generation
- Accessibility tools
- Announcements and automated messaging
8. Speech Translation
Speech translation combines several steps:
- Speech → Text (source)
- Text → Text (translation)
- Optional: Text → Speech (target language)
This enables real-time multilingual conversations across languages.
9. Speaker Recognition
- Speaker Verification: “Is this the person they claim to be?”
- Speaker Identification: “Which known speaker is talking?”
Uses voice biometrics such as cadence, pitch, and vocal tract characteristics.
10. AI‑900 Workload Mapping (Memorize This!)
- Analyze text → Azure Language
- Understand intent → CLU
- Detect sentiment or extract entities → Azure Language
- Convert audio to text → Speech-to-Text
- Translate audio → Speech Translation
- Generate speech → Text-to-Speech
- Customize vocabulary → Custom Speech
0 Comments