Search courses, chapters, or pages...
Turn an everyday thing, like a grocery trip or a weather reading, into recorded values. Notice what gets captured, what gets simplified, and what disappears when reality becomes data.
Use what you learned in the previous lesson to solve real-world problems.
Read a simple table as a set of claims: each row is an observation, each column is a variable, and each cell is one recorded value. Use the tidy-data pattern of one observation per row, one variable per column, and one value per cell.
Check what you understood with a short quiz.
Decide what each row is actually about: a person, an order, a visit, a product, or a day. See how changing the row unit changes which questions the dataset can answer.
Trace an event record such as a click, purchase, login, or delivery scan. Identify the actor, action, timestamp, and context that make an event different from a static description.
Separate values that identify something, like customer IDs or order numbers, from values that describe or measure it. Recognize why IDs are useful for tracking records but usually should not be treated like quantities.
Read labels as categories assigned to observations, such as spam/not spam, product type, or pass/fail. Consider whether a label came from a human judgment, a rule, or a later outcome.
Classify values as numbers, categories, dates, times, true/false values, or text. Match each type to sensible questions, such as counting categories, comparing dates, or averaging real quantities.
Attach units and reference frames to every number that needs them: dollars, pounds, meters, minutes, percentages, time zones, or temperature scales. Avoid comparisons that look valid but mix incompatible units.
Turn fuzzy ideas like “active user,” “late delivery,” or “high income” into exact recording rules. See how an operational definition can make data usable while also narrowing what the measurement really means.
Recognize when a column stands in for something harder to measure, such as using clicks for interest or ratings for satisfaction. Reason about what the proxy captures well and what it may miss.
Identify common ways recorded values can be wrong: typos, sensor limits, rounding, outdated records, self-reporting errors, or inconsistent rules. Treat data quality as part of the measurement process, not an afterthought.
Interpret blanks and special codes such as NA, null, unknown, none, or -999. Distinguish “not collected,” “not applicable,” “unknown,” and “zero,” because they mean different things.
Ask why values are missing instead of assuming blanks are random. Compare examples where missing data is harmless, informative, or likely to distort conclusions.
Compare the dataset’s records with the real-world group or process you care about. Notice who, what, where, and when is included or excluded before generalizing from the data.
Judge whether a dataset can support a description, comparison, prediction, or causal claim. Use its variables, row unit, timing, coverage, and measurement limits to decide what it can and cannot honestly say.
Review this chapter with practice based on your mistakes.