Search courses or pages...
Start with rows, columns, events, labels, units, and missing values using everyday examples. You build the mental model needed to recognize what a dataset can and cannot say.
Work with simple tables in a spreadsheet: sorting, filtering, formulas, pivots, charts, and common cleanup mistakes. This gives you a low-friction way to practice data thinking before coding.
Trace how data science grew from statistics, databases, business intelligence, scientific computing, the web, and machine learning. The chapter shows why today’s tools, job roles, and expectations look the way they do.
Handle ratios, percentages, logs, growth rates, linear relationships, and orders of magnitude. These ideas show up constantly in metrics, models, and business decisions.
Use mean, median, spread, percentiles, distributions, and outliers to describe data honestly. You practice choosing summaries that match the shape of the data.
Reason about chance, randomness, conditional probability, independence, and expected value. These concepts prepare you for uncertainty, testing, and model evaluation.
Set up Python, Jupyter notebooks, files, packages, and reproducible working folders. You write small scripts that load data, calculate results, and save outputs.
Use NumPy and pandas to select, filter, join, reshape, group, and summarize tables. By the end, you can replace many manual spreadsheet steps with repeatable code.
Read data from CSV, Excel, JSON, APIs, databases, and messy text files. You check encodings, data types, headers, duplicates, and import errors before analysis begins.
Clean names, dates, categories, numbers, missing values, duplicated rows, and inconsistent records. The focus is on creating clear, documented choices instead of silently changing the evidence.
Use SQL to select, filter, group, join, and aggregate data from relational databases. You practice writing queries that answer real questions without moving all data into Python first.
Model entities, keys, relationships, indexes, transactions, and normal forms well enough to read and design practical schemas. This helps you avoid wrong joins and broken metrics.
Create line charts, bar charts, histograms, scatterplots, heatmaps, and small multiples that match the question being asked. You also spot misleading scales, clutter, and chart choices that distort meaning.
Use exploratory data analysis to inspect shape, quality, relationships, anomalies, and surprising patterns. You build a repeatable checklist before making claims or training models.
Turn vague requests into measurable questions, target variables, success criteria, and decision points. This chapter teaches how to keep analysis connected to the action someone will take.
Use sampling, confidence intervals, standard errors, bias, and margin of error to reason from part of a population to the whole. You learn when a dataset is large but still not representative.
Use hypothesis tests, p-values, statistical power, effect sizes, and multiple-testing safeguards. The emphasis is on what tests can support, what they cannot prove, and how to report results responsibly.
Design A/B tests, choose metrics, randomize users, monitor experiments, and read results without overreacting to noise. You handle practical issues like sample size, guardrail metrics, and peeking.
Build simple regression models, read coefficients, check assumptions, and diagnose residuals. Regression becomes a practical tool for prediction, explanation, and adjustment.
Create useful predictors from dates, text, categories, locations, counts, and domain rules. You also prevent data leakage by making sure features would be available at prediction time.
Train, validate, and test models using splits, cross-validation, baselines, and error analysis. You practice judging whether a model is genuinely better than a simple rule.
Use logistic regression, decision trees, random forests, gradient boosting, and common metrics such as precision, recall, ROC-AUC, and calibration. You learn how classification choices affect real people and operations.
Predict numbers with linear models, tree ensembles, regularization, and error metrics such as MAE, RMSE, and MAPE. You connect model error to practical costs and tolerances.
Use clustering, dimensionality reduction, anomaly detection, and association patterns when labels are not available. The chapter shows how to validate results that do not have a simple right answer.
Handle trends, seasonality, lag features, forecasting horizons, backtesting, and forecast error. You build forecasts that respect time order instead of treating history like shuffled rows.
Work with latitude, longitude, distance, regions, maps, spatial joins, and geographic bias. You use location data for service areas, movement, risk, and regional comparisons.
Prepare text, tokenize, count terms, classify documents, find topics, and measure similarity. This gives you a bridge from messy language to structured analysis.
Build recommendation systems using popularity, collaborative filtering, content features, ranking metrics, and feedback loops. You see how personalization changes both user experience and data collection.
Separate correlation from causation with confounding, directed graphs, matching, regression adjustment, difference-in-differences, and instrumental variables. You practice deciding when data can support a cause-and-effect claim.
Connect data science work to business, public service, health, education, science, and operations metrics. You design metric definitions that resist gaming and match the real goal.
Write clear notebooks, reports, dashboards, and executive summaries that show evidence, uncertainty, and recommended action. You practice explaining methods without hiding important caveats.
Use Git, project folders, environments, code review, and reproducible notebooks to make work shareable. These habits reduce confusion when projects last longer than a single session.
Build pipelines that ingest, validate, transform, and store data on a schedule. You work with batch jobs, orchestration, lineage, and failure handling so analysis can run reliably.
Use data warehouses, columnar storage, partitions, ELT, semantic layers, and analytics engineering practices. This chapter shows how modern teams create trusted tables for many analysts and products.
Work with large data using distributed files, Spark-style processing, cloud storage, and lakehouse table formats. You learn when big-data tools are worth the added complexity.
Use cloud notebooks, managed databases, object storage, serverless jobs, and permissions in a practical data science setup. The focus is on cost, security, collaboration, and avoiding fragile local-only workflows.
Train neural networks with embeddings, backpropagation, optimization, regularization, and GPUs at a practical level. You learn where deep learning outperforms classical methods and where it is unnecessary.
Use transformers, pretrained models, embeddings, and fine-tuning for language, images, and tabular-adjacent tasks. The chapter explains how foundation models changed what data scientists can build from limited labeled data.
Use language models to draft code, inspect data, summarize documents, generate features, and speed up analysis while checking every result. You build habits for verification, privacy, and prompt design.
Build retrieval-augmented systems with embeddings, vector search, chunking, ranking, evaluation, and source citation. This supports question-answering over private documents without blindly trusting a model’s memory.
Package models, serve predictions, schedule batch scoring, track experiments, version data, and manage feature consistency. You connect a trained model to the system where it creates value.
Monitor data drift, model performance, latency, cost, fairness, and incidents after launch. You set alerts and retraining plans so models remain useful as the world changes.
Handle privacy, consent, anonymization, access control, retention, bias, fairness, explainability, and audit trails. You practice decisions that protect people while still allowing useful analysis.
Apply data quality checks, data contracts, cataloging, ownership, and observability to prevent broken dashboards and bad models. This chapter treats trustworthy data as an operational practice, not a one-time cleanup.
Follow one realistic project from stakeholder request through data access, cleaning, analysis, modeling, validation, communication, deployment, and monitoring. You make the tradeoffs a working data scientist faces across the full lifecycle.
Map common roles such as data analyst, data scientist, machine learning engineer, analytics engineer, and data engineer. You plan portfolio projects, interview preparation, certifications when useful, communities, and habits for staying current.