3Blue1Brown Essence of Calculus Notes: All 12 Chapters Explained

·14 min read
3Blue1Brown Essence of Calculus Notes: All 12 Chapters Explained

Share this article

Grant Sanderson's Essence of Calculus series on 3Blue1Brown is one of the most-watched mathematics resources on the internet. The eleven videos (plus a preview) cover all of single-variable calculus in a way that most calculus textbooks never achieve: with genuine geometric intuition. Students who struggled through a semester of calculus have watched this series and finally understood what they were doing.

These 3Blue1Brown Essence of Calculus notes cover all twelve chapters — from the series preview through derivatives, integrals, chain rule, implicit differentiation, limits, and Taylor series. They are designed for students who have watched (or are about to watch) the series and want a compact reference organized by chapter.

The full series is on 3blue1brown.com and YouTube. These notes are a companion to the videos, not a replacement. The animations are part of the understanding.

Chapter 1: The Essence of Calculus — What Is This Series?

The opening video is a preview and motivation. Sanderson does something unusual: he derives the formula for the area of a circle using the idea of integration — before defining integration formally. The goal is to give you the feeling of calculus before giving you the machinery.

The area of a circle is πr². How do you know that? You can memorize it, but Sanderson shows you can derive it. Slice the circle into concentric rings. Each ring has circumference 2πr and thickness dr. "Unroll" it and you get a rectangle of area 2πr · dr. Sum all the rings from r=0 to r=R and you get the integral of 2πr from 0 to R, which equals πR². This is integration.

The central theme of the series:

Calculus is the study of continuous change. Derivatives answer: "how fast is something changing right now?" Integrals answer: "what is the accumulated total of change?" The fundamental theorem of calculus connects these two ideas — and it is, as Sanderson argues, genuinely surprising that they are inverses of each other.

Every chapter returns to this theme. The machinery (rules, formulas, techniques) exists to serve the geometric intuition, not the other way around.

Chapter 2: The Paradox of the Derivative

The derivative is a rate of change at a single instant. But that is immediately paradoxical — how can something "change" at a single point? Change requires comparing two different moments. At a single point, there is nothing to compare.

Sanderson resolves this by being precise about what "instantaneous rate of change" actually means. It is a limit — a value that the average rate of change approaches as the time interval shrinks toward zero.

Average rate of change over interval [t, t+dt]:

[f(t + dt) - f(t)] / dt

This is the slope of a secant line through two points on the curve. As dt → 0, the secant line approaches the tangent line at t. The derivative f'(t) is the slope of that tangent line.

For f(t) = t³:

[(t+dt)³ - t³] / dt
= [t³ + 3t²dt + 3t(dt)² + (dt)³ - t³] / dt
= 3t² + 3t·dt + (dt)²

As dt → 0, the terms with dt vanish. The derivative is 3t². This is the formal derivation behind the power rule.

Key insight from Chapter 2: The derivative is not a ratio of infinitesimals — it is a limit of ratios. This distinction matters for understanding limits properly in Chapter 7. The phrase "as dt approaches zero" does not mean "when dt equals zero" — it means the value the expression approaches as dt gets arbitrarily small.

Notation: Sanderson uses both f'(t) (Lagrange notation, common in North America) and df/dt (Leibniz notation, common in physics and engineering). Leibniz notation makes the chain rule look obvious: if y = f(u) and u = g(x), then dy/dx = (dy/du)(du/dx). The du's appear to cancel.

Chapter 3: Derivative Formulas Through Geometry

Chapter 3 derives the core differentiation rules from geometric first principles — not from formulas dropped from the sky.

Power rule: d/dx[xⁿ] = nxⁿ⁻¹

Visualize x² as a square with side length x. Increase x by dx. The area increases by two thin rectangles (each roughly x·dx) plus a tiny corner square (dx)². The total increase is 2x·dx + (dx)². Divide by dx: 2x + dx. As dx → 0: 2x. This is the derivative of x².

Generalize: for xⁿ, think of an n-dimensional hypercube. The dominant volume change comes from n "faces," each of area xⁿ⁻¹. Rate of change: nxⁿ⁻¹.

Sum rule: d/dx[f + g] = f' + g'

If f(x) gives the position of one particle and g(x) another, their combined position changes at rate f'(x) + g'(x). The rates simply add.

Product rule: d/dx[f · g] = f'g + fg'

Visualize f(x)·g(x) as the area of a rectangle with sides f and g. Increase x by dx. The area change is a right strip (f'dx · g), a top strip (f · g'dx), and a tiny corner (f'dx · g'dx). Divide by dx and take the limit: f'g + fg'.

This geometric derivation makes the product rule unforgettable. You no longer need to memorize it; you can re-derive it in seconds.

Derivative of sin(x):

Sanderson proves d/dx[sin(x)] = cos(x) using a unit circle argument. The key identity needed is: lim(dθ→0) sin(dθ)/dθ = 1. This comes from the fact that for small angles, sin(θ) ≈ θ (in radians). This is why calculus always uses radians — the derivative of sine is only cos(x) in radians.

Chapter 4: Visualizing the Chain Rule and Product Rule

The chain rule handles composite functions: if y = f(g(x)), then dy/dx = f'(g(x)) · g'(x).

Intuition: Think of g as a first machine that transforms x slightly (by g'(x)·dx) and f as a second machine that transforms g(x) slightly (by f'(g(x))·dg). The total change in f is f'(g(x)) applied to the change in g, which is g'(x)·dx. Multiply: f'(g(x))·g'(x)·dx. Divide by dx: f'(g(x))·g'(x).

Example: d/dx[sin(x²)]

  • Outer function f(u) = sin(u), f'(u) = cos(u)
  • Inner function g(x) = x², g'(x) = 2x
  • Chain rule: cos(x²) · 2x = 2x·cos(x²)

Composition depth: The chain rule extends to any depth. For f(g(h(x))):

dy/dx = f'(g(h(x))) · g'(h(x)) · h'(x)

In neural networks, the backpropagation algorithm is exactly the chain rule applied to a deep composition of functions. The gradient of the loss with respect to early-layer weights is computed by chaining derivatives from the output back through every layer. This connection — Essence of Calculus as foundation for deep learning — is worth noting explicitly. If you're heading toward neural networks, see learn machine learning on YouTube.

Chapter 5: What's the Derivative of eˣ?

Exponential functions are special because their rate of change is proportional to their value. A population that doubles every hour grows faster when it is larger. This self-referential quality gives the exponential its unique derivative.

d/dx[eˣ] = eˣ

The exponential function e^x is its own derivative. This is not a coincidence — it is the definition of e (approximately 2.71828...), chosen precisely so that the derivative condition holds.

For the general exponential a^x:

d/dx[aˣ] = aˣ · ln(a)

When a = e, ln(e) = 1, so the factor disappears.

Natural logarithm:

Since e^x and ln(x) are inverse functions: if y = ln(x), then x = e^y. Differentiating implicitly:

1 = e^y · dy/dx  →  dy/dx = 1/e^y = 1/x

So d/dx[ln(x)] = 1/x. This is one of the most-used derivatives in applied mathematics, statistics (log-likelihood functions), and information theory.

Compound interest and e:

The number e arises naturally from the question: what is the limit of (1 + 1/n)ⁿ as n → ∞? This models continuous compounding. If you compound interest of 100% annually at n equal intervals, you get (1 + 1/n)ⁿ times your starting amount. As n → ∞, this approaches e ≈ 2.718.

Chapter 6: Implicit Differentiation

Not all functions are written as y = f(x). Sometimes x and y are related by an equation like x² + y² = 1 (a circle). Implicit differentiation allows computing dy/dx without solving for y explicitly.

Technique: Differentiate both sides with respect to x, treating y as a function of x and applying the chain rule to any y terms.

For x² + y² = 1:

2x + 2y · (dy/dx) = 0
dy/dx = -x/y

This is the slope of the tangent to the circle at any point (x, y). Geometrically: at point (1, 0), dy/dx = -1/0 — undefined, which makes sense because the tangent is vertical.

Derivative of ln(x) revisited (more elegant approach):

Let y = ln(x), so x = e^y. Differentiate implicitly:

1 = e^y · (dy/dx)  →  dy/dx = 1/e^y = 1/x

Same result, different path. Implicit differentiation is more elegant here because it uses the known derivative of e^x rather than requiring a separate computation.

Inverse function derivatives:

The same technique gives the derivatives of arcsin, arccos, arctan. For y = arcsin(x):

x = sin(y)  →  1 = cos(y) · dy/dx  →  dy/dx = 1/cos(y) = 1/√(1-x²)

Chapter 7: Limits

Chapter 7 steps back to formalize the concept of limit that has been used intuitively throughout the series. This is the ε-δ definition that analysis courses build on.

Informal definition: lim(x→a) f(x) = L means f(x) can be made arbitrarily close to L by making x sufficiently close (but not equal) to a.

Formal (ε-δ) definition: For every ε > 0, there exists δ > 0 such that if 0 < |x - a| < δ, then |f(x) - L| < ε.

Sanderson is characteristically honest: this definition is hard to parse. His explanation: ε is a tolerance on the output side ("I want the output to be within ε of L"). δ is a promise on the input side ("if you keep x within δ of a, I guarantee the output stays within ε of L"). The definition says: for any output tolerance you name, I can find an input tolerance that achieves it.

L'Hôpital's rule:

For limits of the form 0/0 or ∞/∞:

lim(x→a) f(x)/g(x) = lim(x→a) f'(x)/g'(x)

(provided the limit on the right exists). Useful for limits like lim(x→0) sin(x)/x = 1, which can be shown directly but L'Hôpital gives cos(0)/1 = 1 immediately.

ε-δ and its connection to practical computing: Floating-point arithmetic inherently involves small errors. Understanding limits and approximations — that f(x+ε) ≈ f(x) + f'(x)·ε for small ε — is the foundation of numerical differentiation and the mathematics behind why deep learning's automatic differentiation works.

Chapter 8: Integration and the Fundamental Theorem of Calculus

What is an integral? In 3Blue1Brown's framing, an integral is the area under a curve, computed as the limit of sums of skinny rectangles. The definite integral ∫ₐᵇ f(x)dx asks: if you slice the region under f(x) between x = a and x = b into infinitely many rectangles of infinitesimal width, what is the total area?

This is the conceptual climax of the series. The definite integral ∫ₐᵇ f(x)dx is defined as the area under the curve f(x) from a to b — more precisely, as the limit of Riemann sums.

Riemann sums: Partition [a,b] into n equal intervals of width dx = (b-a)/n. Approximate the area under each by a rectangle of height f(xᵢ). Sum all rectangles. As n → ∞ (and dx → 0), this sum converges to the integral.

The antiderivative:

Define A(x) as the area under f from a to x. Increase x by dx. The new area A(x+dx) - A(x) ≈ f(x)·dx. Therefore:

dA/dx ≈ f(x)  →  A'(x) = f(x)

A(x) is an antiderivative of f(x). This is the fundamental theorem.

Fundamental Theorem of Calculus:

If F is any antiderivative of f (F'(x) = f(x)), then:

∫ₐᵇ f(x)dx = F(b) - F(a)

Why is this remarkable? Derivatives and integrals are defined by completely different geometric processes (tangent lines vs. areas). The theorem says they are inverse operations. Sanderson calls this "a beautiful relationship that doesn't seem like it should be true."

Chapters 9–12: Deeper Theorems and Taylor Series

Chapter 9 — Higher Order Derivatives:

The second derivative measures the rate of change of the rate of change — acceleration, curvature. If f'(x) > 0 and f''(x) > 0, the function is increasing and the rate of increase is growing (concave up). The sign of f''(x) determines whether the curve "opens up" or "opens down."

Chapter 10 — The Other Definition of the Derivative:

Alternative definition using the symmetric difference quotient: [f(x+h) - f(x-h)] / (2h). This converges faster numerically and is used in finite difference methods for PDEs.

Chapter 11 — Taylor Series:

A Taylor series approximates a function as an infinite polynomial centered at a point a:

f(x) = f(a) + f'(a)(x-a) + f''(a)(x-a)²/2! + f'''(a)(x-a)³/3! + ...

Why? A polynomial is the simplest possible function — easy to compute, easy to differentiate. The Taylor series asks: what polynomial matches f at point a through all its derivatives?

Classic examples:

eˣ = 1 + x + x²/2! + x³/3! + ...  (converges everywhere)
sin(x) = x - x³/3! + x⁵/5! - ...  (converges everywhere)
cos(x) = 1 - x²/2! + x⁴/4! - ...  (converges everywhere)
ln(1+x) = x - x²/2 + x³/3 - ...   (converges for |x| ≤ 1)

Applications:

  • Small angle approximation: sin(x) ≈ x for small x (first term only)
  • Euler's formula: e^(ix) = cos(x) + i·sin(x) (combine the series)
  • Calculator algorithms: computing sin, cos, exp via truncated Taylor series
  • Machine learning: understanding why activation functions like ReLU and sigmoid behave the way they do near the origin

Chapter 12 — What Does It Feel Like to Invent Mathematics?

The closing chapter is philosophical. Sanderson asks: were these ideas discovered or invented? The derivative was independently found by Newton and Leibniz, which suggests discovery — the ideas were waiting. But the definitions, the notation, the entire framework was constructed by human minds. Sanderson's answer: mathematics is discovered in the sense that its truths are inevitable, but invented in the sense that the concepts and language are human constructions.

How to Study the Essence of Calculus Series Most Effectively

The series is dense. Thirteen videos totaling roughly seven hours. Watching passively — letting the animations wash over you — does not build the retention you want.

Recommended approach:

  1. Before each video, write down what you expect to learn (this primes your attention)
  2. Watch with a notebook open; pause when you see something you want to capture
  3. After the video, close the notebook and try to reproduce the main idea in your own words without looking
  4. Check your reproduction against your notes; fix gaps

The pause-and-reproduce step is where most learning happens. It is uncomfortable because you realize how much did not stick — which is exactly the information you need.

For calculus specifically, connecting the series to practice problems is important. The series builds intuition; you also need procedural fluency (actually computing derivatives and integrals). The learn calculus from YouTube guide covers practice resources that complement 3B1B's conceptual approach.

For the continuation to multivariable calculus and its connections to machine learning, the MIT linear algebra notes are the natural next step. For seeing where all of this leads in applied mathematics, see best YouTube channels for math — there are several channels that continue exactly where 3B1B leaves off.

The learn statistics from YouTube guide connects calculus to probability and statistics, which is the other mathematical pillar of data science.

Generating Your Own 3B1B Study Notes

Every Essence of Calculus chapter is on YouTube. Pasting a chapter URL into an AI note-taking tool gives you a structured summary with the key equations, intuitions, and examples extracted — before you sit down to work through the chapter carefully. This pre-reading dramatically improves how much you absorb from the animation.

The workflow: generate the notes, skim them to know what to expect, watch the video with full attention, then review the notes again to check your understanding against them.


3Blue1Brown's Essence of Calculus is the visual intuition layer that most calculus courses never provide. These notes give you the reference structure; the animations give you the understanding. Use both.

Ready to turn any 3Blue1Brown chapter into structured study notes? Try Notiq free at notiq.study — paste the YouTube URL and get a complete chapter summary automatically.

Share this article

Related Articles