Home pageNew course

Learn Data Science | Zoonk

Data Science

Data science turns raw data into clear insights that support better decisions. Covers finding patterns, building predictions, and communicating results in ways that matter to businesses, nonprofits, and research teams. Opens paths to roles in analytics, data science, and business intelligence across tech companies, consulting firms, and many other organizations.

How data science turns data into decisions

How data science turns data into decisions

Data science turns messy facts—clicks, measurements, prices, photos, survey answers, sensor readings—into useful evidence. This chapter shows how data scientists ask better questions, find patterns, test guesses, and build tools that help people make decisions in business, health, sports, science, government, and everyday life.

Data, variables, and datasets

Data, variables, and datasets

Defines observations, variables, records, tables, labels, features, targets, and datasets. Learners practice reading a small dataset and describing what each column and row means.

Not started

Measurement, uncertainty, and data quality

Measurement, uncertainty, and data quality

Covers measurement error, uncertainty, bias, validity, reliability, missing values, and data quality checks. Learners build the habit of asking where data came from and what it can safely support.

Spreadsheets, CSV files, and tidy tables

Spreadsheets, CSV files, and tidy tables

Shows how spreadsheets, CSV files, delimiters, encodings, and tidy table structure work. Learners clean a small spreadsheet and turn it into an analysis-ready table.

Computational thinking for data work

Computational thinking for data work

Builds the mental model behind variables, conditions, loops, functions, and reproducible steps. Learners write simple procedures for counting, filtering, and transforming data.

Python programming for data science

Python programming for data science

Covers Python syntax, data types, functions, errors, files, and simple scripts for data tasks. Learners write small programs that load data, process rows, and produce results.

The Python data stack

The Python data stack

Uses NumPy, pandas, and common plotting libraries for real data analysis. Learners filter, group, join, summarize, and visualize datasets in Python.

Notebooks, environments, and the command line

Notebooks, environments, and the command line

Covers Jupyter notebooks, virtual environments, package installation, file paths, and basic command-line use. Learners set up a working data science project folder that can run again later.

Version control with Git

Version control with Git

Shows how Git tracks changes, supports collaboration, and protects project history. Learners commit notebooks, scripts, data notes, and reports in a clean repository.

History and evolution of data science

History and evolution of data science

Traces data science from statistics, databases, business intelligence, scientific computing, and machine learning into today’s cloud and AI toolchains. The chapter explains why the field combines evidence, computation, products, and decisions.

Analytical questions and problem framing

Analytical questions and problem framing

Covers how to turn a vague request into a clear analytical question, success measure, scope, and deliverable. Learners write project briefs that separate what can be answered from what needs more data.

Metrics, domain knowledge, and decision context

Metrics, domain knowledge, and decision context

Shows how metrics connect data work to real decisions, incentives, tradeoffs, and domain knowledge. Learners design useful metrics and spot cases where a metric can mislead people.

Data ethics and consent

Data ethics and consent

Covers consent, harm, representation, surveillance, sensitive attributes, and respectful data use. Learners review a data project for ethical risks before any modeling begins.

Descriptive statistics

Descriptive statistics

Covers center, spread, ranks, percentiles, correlation, and summary tables. Learners describe a dataset accurately without jumping to unsupported conclusions.

Probability

Builds probability with events, conditional probability, independence, Bayes’ rule, and expected value. Learners use probability to reason about uncertainty in everyday and data science settings.

Sampling and study design

Sampling and study design

Covers populations, samples, sampling bias, random assignment, surveys, and observational studies. Learners judge whether a dataset can represent the group it claims to describe.

Probability distributions

Probability distributions

Covers common distributions, random variables, variance, standard error, and simulation. Learners use distributions to model real variation and check whether results are surprising.

Statistical inference

Statistical inference

Covers estimators, confidence intervals, standard errors, bootstrap methods, and the logic of inference. Learners quantify uncertainty instead of reporting single numbers as if they were exact.

Hypothesis testing

Hypothesis testing

Covers null hypotheses, p-values, statistical power, multiple comparisons, and practical significance. Learners run tests responsibly and explain what the results do and do not prove.

Bayesian data analysis

Bayesian data analysis

Covers priors, likelihoods, posteriors, credible intervals, and Bayesian updating. Learners solve small uncertainty problems where prior knowledge and new evidence must be combined.

Linear algebra for data science

Linear algebra for data science

Covers vectors, matrices, dot products, projections, matrix factorization, and high-dimensional data. Learners connect these ideas to regression, embeddings, recommender systems, and deep learning.

Calculus and optimization for data science

Calculus and optimization for data science

Covers rates of change, gradients, loss functions, and numerical optimization. Learners see how models improve by minimizing error rather than by magic.

Experimental design

Experimental design

Covers control groups, randomization, blocking, confounding, power planning, and measurement choices. Learners design experiments that can support stronger claims about cause and effect.

A/B testing and online experiments

A/B testing and online experiments

Covers online experiments, treatment assignment, guardrail metrics, sample size, peeking risks, and rollout decisions. Learners plan and analyze an A/B test for a product or service change.

Causal inference

Causal inference

Covers causal graphs, confounders, interventions, matching, difference-in-differences, instrumental variables, and causal assumptions. Learners separate prediction questions from cause-and-effect questions.

Data acquisition and APIs

Data acquisition and APIs

Covers data collection from files, databases, forms, sensors, public datasets, and APIs. Learners request, document, and store data while keeping provenance clear.

Web scraping and responsible collection

Web scraping and responsible collection

Covers HTML structure, scraping tools, rate limits, robots.txt, terms of service, and respectful collection practices. Learners collect simple web data without treating the public internet as a free-for-all.

SQL for data science

SQL for data science

Covers SELECT queries, filtering, sorting, grouping, joins, subqueries, window functions, and query debugging. Learners answer real analytical questions directly from relational data.

Relational database design for analytics

Relational database design for analytics

Covers primary keys, foreign keys, normalization, fact tables, dimension tables, and star schemas. Learners design database structures that make analysis reliable and efficient.

Nonrelational data stores

Nonrelational data stores

Covers document stores, key-value stores, graph databases, and when nonrelational storage fits data science work. Learners choose storage patterns based on access needs rather than trends.

Data cleaning and normalization

Data cleaning and normalization

Covers type conversion, duplicate records, inconsistent categories, units, dates, text cleanup, and reproducible cleaning scripts. Learners turn messy raw data into a trustworthy working dataset.

Missing data and imputation

Missing data and imputation

Covers missingness patterns, deletion, simple imputation, model-based imputation, and the risks of hiding missing data. Learners decide how to handle missing values without inventing false certainty.

Outliers and robust analysis

Outliers and robust analysis

Covers outlier detection, heavy-tailed data, robust summaries, winsorization, and sensitivity checks. Learners decide when an unusual value is an error, a rare event, or the most important part of the story.

Exploratory data analysis

Exploratory data analysis

Covers the practical cycle of summarizing, visualizing, slicing, questioning, and checking assumptions. Learners produce an exploratory notebook that records both findings and doubts.

Data visualization

Data visualization

Covers visual encodings, chart choice, scales, color, uncertainty, and accessibility. Learners create charts that reveal patterns without distorting the data.

Dashboarding and business intelligence

Dashboarding and business intelligence

Covers dashboards, filters, drilldowns, refresh schedules, stakeholder needs, and common business intelligence patterns. Learners build a dashboard that supports repeated decisions rather than one-time curiosity.

Data storytelling and reporting

Data storytelling and reporting

Covers narrative structure, evidence, caveats, executive summaries, and audience-specific explanations. Learners turn analysis into a report or presentation that helps people act responsibly.

End-to-end analytical workflow

End-to-end analytical workflow

Follows a complete analysis from question framing through data acquisition, cleaning, exploration, statistics, visualization, and final recommendation. Learners practice the full rhythm of a professional data analysis project.

Supervised learning

Supervised learning

Covers the idea of learning from labeled examples, training data, test data, features, targets, loss, and generalization. Learners train their first predictive models and compare them to simple baselines.

Regression modeling

Regression modeling

Covers linear regression, regularization, diagnostics, residuals, interactions, and prediction intervals. Learners build regression models for numerical outcomes and explain their limits.

Classification modeling

Classification modeling

Covers logistic regression, decision trees, random forests, gradient boosting, class imbalance, and probability calibration. Learners build models that predict categories while respecting real-world error costs.

Model evaluation and validation

Model evaluation and validation

Covers train-test splits, cross-validation, leakage, confusion matrices, ROC curves, precision, recall, and model selection. Learners judge models by the decision they must support, not just by a single score.

Feature engineering

Feature engineering

Covers encoding categories, scaling, transformations, date features, text features, aggregations, and feature leakage. Learners create features that improve models while staying faithful to what would be known at prediction time.

Model interpretability and explainability

Model interpretability and explainability

Covers permutation importance, partial dependence, SHAP-style explanations, surrogate models, and explanation limits. Learners communicate why a model made predictions and where those explanations may fail.

Unsupervised learning foundations

Unsupervised learning foundations

Covers pattern discovery without labels, distance measures, similarity, representation, and evaluation challenges. Learners use unsupervised methods carefully when there is no clear target variable.

Clustering

Covers k-means, hierarchical clustering, density-based clustering, cluster validation, and cluster interpretation. Learners group customers, documents, or observations while avoiding false stories about arbitrary clusters.

Dimensionality reduction and embeddings

Dimensionality reduction and embeddings

Covers principal component analysis, manifold methods, embeddings, visualization of high-dimensional data, and information loss. Learners compress complex data while keeping the purpose of the compression clear.

Anomaly detection

Anomaly detection

Covers statistical rules, isolation forests, reconstruction error, rare-event evaluation, and alert fatigue. Learners build anomaly detection systems for fraud, quality control, operations, or security use cases.

Time series analysis and forecasting

Time series analysis and forecasting

Covers timestamps, seasonality, trend, autocorrelation, lag features, backtesting, and forecasting horizons. Learners forecast demand, traffic, revenue, or sensor readings with honest uncertainty.

Recommender systems

Recommender systems

Covers collaborative filtering, matrix factorization, ranking metrics, cold-start problems, and feedback loops. Learners build recommendation models and evaluate whether they improve user experience or only chase clicks.

Graph data science

Graph data science

Covers nodes, edges, centrality, communities, graph features, and graph-based prediction. Learners analyze networks such as transactions, social links, citations, or supply chains.

Geospatial data science

Geospatial data science

Covers coordinates, projections, spatial joins, spatial aggregation, maps, and geographic bias. Learners analyze location data without making misleading maps or distance calculations.

Natural language processing

Natural language processing

Covers tokenization, text cleaning, bag-of-words, TF-IDF, topic models, sentiment analysis, and text classification. Learners build useful text models before moving into modern neural language systems.

Deep learning

Covers neural networks, backpropagation, activation functions, regularization, optimizers, and training curves. Learners train small neural models and diagnose underfitting, overfitting, and unstable training.

Computer vision

Computer vision

Covers image data, convolutional networks, image classification, object detection ideas, augmentation, and evaluation. Learners build computer vision models while noticing dataset bias and labeling quality.

Reinforcement learning

Reinforcement learning

Covers agents, environments, rewards, policies, value functions, exploration, and offline evaluation risks. Learners see when reinforcement learning fits decision problems and when simpler methods are safer.

GPU computing for data science

GPU computing for data science

Covers GPU hardware concepts, batching, memory limits, acceleration libraries, and cost-aware training. Learners decide when GPUs are useful and how to avoid wasting compute.

Transformers

Covers attention, transformer architecture, embeddings, pretraining, transfer learning, and transformer-based prediction tasks. Learners connect modern language, vision, and multimodal systems to their statistical roots.

Large language models and foundation models

Large language models and foundation models

Covers foundation models, large language models, tokens, context windows, instruction tuning, hallucination, and evaluation. Learners use LLMs as powerful but fallible tools for data and text work.

Prompt engineering and AI-assisted data work

Prompt engineering and AI-assisted data work

Covers prompt structure, tool use, code generation, notebook assistance, synthetic examples, and verification habits. Learners use AI assistants to speed up work without outsourcing judgment.

Retrieval-augmented generation

Retrieval-augmented generation

Covers document chunking, embeddings, vector search, retrieval quality, grounding, citations, and failure modes. Learners build a retrieval-augmented generation system that answers from a controlled knowledge base.

Fine-tuning and model adaptation

Fine-tuning and model adaptation

Covers supervised fine-tuning, parameter-efficient methods, instruction data, evaluation sets, and cost tradeoffs. Learners decide when adapting a model is better than prompting or retrieval alone.

Diffusion models and synthetic data

Diffusion models and synthetic data

Covers diffusion model ideas, image generation, tabular synthetic data, data augmentation, and privacy risks. Learners judge when generated data helps and when it creates new bias or leakage.

Software engineering for data science

Software engineering for data science

Covers modular code, functions, packages, logging, configuration, code review, and maintainable project structure. Learners move from one-off notebooks toward reusable data science software.

Data validation and testing

Data validation and testing

Covers unit tests, data tests, schema checks, expectation suites, and regression tests for analysis results. Learners catch broken assumptions before they reach dashboards, models, or decision makers.

Containers and reproducible workflows

Containers and reproducible workflows

Covers containers, images, Dockerfiles, dependency pinning, reproducible builds, and portable execution. Learners package a data project so it can run on another machine or in the cloud.

Kubernetes for data and ML workloads

Kubernetes for data and ML workloads

Covers pods, jobs, services, resource requests, scaling, and when Kubernetes is worth the complexity. Learners see how data and machine learning workloads run in production clusters.

Data pipelines and orchestration

Data pipelines and orchestration

Covers scheduled jobs, task dependencies, retries, backfills, orchestration tools, and operational handoffs. Learners build pipelines that run reliably after the original notebook is closed.

Distributed data processing

Distributed data processing

Covers partitioning, parallel processing, Spark-style dataframes, shuffle costs, and cluster execution. Learners process datasets that are too large or too slow for a single machine workflow.

Cloud data warehouses

Cloud data warehouses

Covers managed analytical databases, columnar storage, query optimization, workload management, and cost control. Learners use cloud warehouses for scalable SQL analytics without treating them like ordinary spreadsheets.

Data lakes

Covers object storage, raw zones, curated zones, open file formats, and governance problems in large shared data stores. Learners organize data lakes so they remain useful instead of becoming dumping grounds.

Lakehouse architecture

Lakehouse architecture

Covers lakehouse tables, ACID transactions on object storage, schema evolution, time travel, and mixed SQL and machine learning workloads. Learners see why lakehouse architecture became a major pattern for modern data platforms.

Streaming data and real-time analytics

Streaming data and real-time analytics

Covers events, message queues, stream processing, windows, late data, and real-time metrics. Learners design analytics that respond to data as it arrives instead of waiting for batch reports.

Analytics engineering and semantic layers

Analytics engineering and semantic layers

Covers transformation projects, tested SQL models, metric definitions, semantic layers, and collaboration between analysts and engineers. Learners build trusted business datasets that many teams can reuse.

Metadata, lineage, and data catalogs

Metadata, lineage, and data catalogs

Covers metadata, lineage, ownership, data catalogs, discovery, and impact analysis. Learners trace where data came from, who owns it, and what will break if it changes.

Model deployment and serving

Model deployment and serving

Covers batch scoring, real-time APIs, model packaging, latency, throughput, fallback behavior, and rollback plans. Learners turn trained models into services or jobs that other systems can use.

MLOps and experiment tracking

MLOps and experiment tracking

Covers experiment tracking, model registries, reproducible training runs, approval stages, and release management. Learners manage machine learning work as a controlled lifecycle rather than a pile of files.

Model monitoring and observability

Model monitoring and observability

Covers data drift, concept drift, performance tracking, alerting, logging, observability, and incident response. Learners keep models useful after deployment and know when to retrain, pause, or retire them.

Data security for analytics

Data security for analytics

Covers access control, secrets, encryption, secure sharing, dependency risk, and attacks on data workflows. Learners protect analytical systems from leaks, tampering, and unsafe shortcuts.

Data privacy and compliance

Data privacy and compliance

Covers personal data, de-identification, retention, consent records, cross-border transfer, GDPR-style rights, HIPAA-style duties, and sector rules. Learners plan data work that respects legal and organizational obligations.

Privacy-preserving data science

Privacy-preserving data science

Covers aggregation, masking, differential privacy, federated learning, secure enclaves, and privacy-utility tradeoffs. Learners choose privacy-preserving methods when ordinary access controls are not enough.

Fairness and bias testing

Fairness and bias testing

Covers measurement bias, representation gaps, subgroup performance, fairness metrics, proxy variables, and mitigation tradeoffs. Learners test whether data products work differently for different groups.

Responsible AI governance

Responsible AI governance

Covers model cards, data sheets, approval processes, audit trails, human oversight, risk tiers, and AI policy expectations. Learners document and govern data science systems before they affect people at scale.

End-to-end data product workflow

End-to-end data product workflow

Follows a data product from request intake through data contracts, pipeline design, modeling, validation, deployment, monitoring, user feedback, and iteration. Learners practice the full workflow used by teams that operate data products over time.

Capstone portfolio project

Capstone portfolio project

Guides learners through a complete portfolio project with a clear question, documented data, tested code, analysis, model or dashboard, written findings, and deployment or reproducible handoff. The final artifact shows practical skill rather than only course completion.

Data science careers and field navigation

Data science careers and field navigation

Covers roles such as analyst, data scientist, machine learning engineer, analytics engineer, data engineer, and applied AI practitioner. Learners plan entry paths, portfolio proof, interview preparation, certifications when useful, communities, journals, conferences, and habits for staying current.