Full Curriculum · vExpertAI Academy

M 01 · 12h

Python for Network Engineers

When the keyboard isn't enough

What you'll build

50-device config-backup tool with retry logic + Git integration
Multi-vendor CLI parser (Cisco IOS/NX-OS, Junos, Arista EOS) using only string methods
Streaming syslog histogram from a 100MB file — never load it all in memory
Resilient BGP-neighbor state extractor across 20 routers
Subnet calculator toolkit on top of Python's `ipaddress` library

Pain we solve

"I can't SSH into 50 boxes one at a time anymore." Bash loops break on the 14th device. This module replaces them with proper Python automation that handles vendor mix, timeouts, retries, and version control.

M 02 · 9h

Data Wrangling & Exploration

When the counter is the truth

What you'll build

BGP flap detector — 30 days of state-change logs, 1-hour rolling windows, ranked offenders
NetFlow analysis: top-N talkers, per-protocol traffic matrix, flow-count histograms
Cleaning pipeline that handles 5% missing, 3% duplicate, 2% counter-rollover rows
Multi-panel latency report — per-region, with incident annotations, pod-to-pod heatmap

Pain we solve

"My 32-bit interface counter rolled over and the dashboard lied." Real operational data is dirty, has gaps, and contains rare-but-real edge cases. This module teaches you to model it correctly and tell the truth.

M 03 · 11h

ML Foundations (Classical)

When the model has to defend itself

What you'll build

Logistic regression trained by hand — log-loss computed from scratch, matched to sklearn within 0.001
XGBoost severity classifier on 50K incidents with isotonic calibration
HDBSCAN clustering on 200K cleaned NetFlow rows — identifying DDoS-style outliers
SHAP + LIME + PDP explainers on 3 P1 incidents — defensible feature attribution
Causal analysis: Did WAN optimization actually reduce latency? Difference-in-differences on 200 sites × 6 months

Pain we solve

"The model says P1 with 95% confidence and it's wrong half the time." This module teaches calibration, interpretability, and causal reasoning — so the model defends its own outputs to the change-management board.

M 04 · 14h

Deep Learning & NLP/LLMs

When the model needs the runbook

What you'll build

3-layer PyTorch classifier on 200K flows — beats your XGBoost baseline
Graph Attention Network for link-failure prediction on a 500-node ISP topology
SetFit fine-tune on 200 hand-labeled incident summaries — better than zero-shot Llama-3
NetOps RAG assistant over 500 runbooks + 6 months of postmortems — MRR + LLM-as-judge eval
GraphRAG for multi-hop root-cause analysis — beats vector RAG on 10 hand-crafted RCA queries
Federated learning across 5 simulated regions — non-IID data

Pain we solve

"The LLM doesn't know my topology and won't admit it." This module turns generic LLMs into ones that understand your runbooks, your topology, and your incident history.

M 05 · 16h

Agentic AI & Frameworks

When the LLM has to act

What you'll build

3-agent NetOps team in CrewAI — Analyzer + Planner + Executor — solving 10 incident scenarios
ReAct troubleshooting agent from scratch — no framework — then ported to CrewAI and LangGraph
Multi-agent guardrail layer: dry-run, confirmation token, HITL on state-changing actions
MCP integration: 3-agent crew calls tools through a Model Context Protocol server
A2A (Agent-to-Agent) protocol for cross-framework agent communication

Pain we solve

"I want an agent to fix BGP without rebooting the wrong router." This module teaches multi-agent architecture with explicit safety patterns — no autonomous production actions without human-in-the-loop gates.

M 06 · 10h

Production Deployment & MLOps

When the demo meets 100 operators

What you'll build

FastAPI gateway in front of the M5 crew — auth, rate limiting, tiered routing
Redis prompt cache + PostgresSaver for conversation state across restarts
100-concurrent-operator load test — measure p50/p95/p99 + cost per query
Drift detection on a 1000-query stream — catch shifts within 50 requests
Production-grade evaluation: LLM-as-judge + golden context dataset + offline benchmarks
Observability: Opik traces from gateway → agent → MCP → backend

Pain we solve

"The notebook worked. The 100-user load test didn't." This module is the bridge from "I built it" to "I run it" — cost discipline, observability, eval gates, and incident response built in.

The full curriculum

Python for Network Engineers

What you'll build

Pain we solve

Data Wrangling & Exploration

What you'll build

Pain we solve

ML Foundations (Classical)

What you'll build

Pain we solve

Deep Learning & NLP/LLMs

What you'll build

Pain we solve

Agentic AI & Frameworks

What you'll build

Pain we solve

Production Deployment & MLOps

What you'll build

Pain we solve

Next: the advanced track