Locating Peaks and Valleys with Derivatives (Explaining Calculus #9)

We have gone through a couple of discussions now on how to actually calculate the value of a derivative. This is great, after all what good is a shiny new tool if you don’t know how to operate it? But we are done with that now. Now that we’ve gone through a tutorial with derivatives, we can dive into the depth of what derivatives are capable of telling us. Since the initial motivation behind derivatives is all about change, you might think I’m about to write about calculations of how things change over time. That is partly right, but more accurately I’m choosing to show how we can go beyond “just” talking about changes over time by using the idea of change along the way.

What I plan to discuss here is a broad idea called optimization, and we will explore how derivatives play a key role in this theory.

What are “Peaks” and “Valleys”?

The type of math called optimization talks about finding maximum and minimum values. For example, if you are a business owner and found a function that would predict your profits, you would want to maximize that function. Or perhaps you want to build a fence around some grass for your animals, but don’t want to use any more fencing than you have to. You want to minimize the amount of fencing material you use. These ideas of maximizing and minimizing – making a value as large or as small as possible – are what we want to talk about.

But why the terms peak and valley? Well, the maximum height you find on a mountain is the peak of the mountain. The lowest point in a landscape is a valley. So, if we translate the very numerical ideas of maximizing and minimizing into visual language, we are learning how to find the peaks and valleys in our mathematical landscape. And we can make use of numerical values of derivatives to help us.

Connecting Peaks/Valleys with Derivatives

Imagine you draw a graph in a U shape. In fact, you can just look at the letter U and pretend it is a graph. Let’s think about finding the minimal point on U – that is, the lowest point on the letter. Now, imagine drawing a tangent line. It should look like you just underlined the letter U. In other words, it should be a horizontal line! Now, remember what derivatives are – slopes of tangent lines. What is the slope of a horizontal line? Well, that would be zero. So, we’ve arrived at a rather interesting observation through the letter U: the minimum value of U happened at a place where the tangent line had a slope of zero.

Now, let us instead use the letter V instead of U. What about the minimum point on V? Well, again, we can ask questions about the tangent line. But this time, we run into a hitch. The lowest point on the letter V is a sharp corner, and sharp corners don’t have tangent lines. So there isn’t any way we could even try to define a derivative at that point. This is our second interesting observation: the minimum value of V happened at a place with no tangent lines.

Now, here is a big, big point. Our observations about the letters U and V are actually comprehensive statements about max and min values of all derivatives! To make this statement more precise, I need to write down a couple definitions in more “mathy” language.

We begin by defining minimum and maximum more precisely. In fact, we will give two subtly different definitions – these two types we call absolute and relative. In intuitive terms, absolute max/min values are really bigger than every other value of the function. On the other hand, a relative max/min is one that “looks like” an absolute max/min if you “zoom in” far enough. To see what I mean here, imagine a mountain range with a variety of peaks. Every single peak is a relative maximum value, because if you took binoculars and narrowed your focus on just that peak, it would look to you like the highest point on the mountain. On the other hand, the absolute maximum of the mountain range is whichever peak is actually higher than all the other peaks. Notice that you can have many, many relative maximum values, but only one absolute maximum value (well, there could be ties, like two different mountain peaks that are equally tall, but there is still only one largest height that is shared by two peaks).

Now, let’s do some defining. We begin first with absolute max/min:

Definition: Let f(x) be a function. The function f has an absolute maximum at the point x if f(x) \geq f(y) for every value y. If x is an absolute maximum point, then f(x) is the absolute maximum value of f. Similarly, f(x) has an absolute minimum at the point x if for every value y, f(x) \leq f(y), and f(x) is the absolute minimum value of f.

Notice that the definitions makes its claim about every possible value of y. Also notice that “points” refer to inputs and “values” refer to outputs – perhaps the visual f(\text{point}) = \text{value} is of use to show what I mean here. On the other hand, consider the following definition of relative max/min values, and note the key differences.

Definition: Let f(x) be a function. Then f(x) has a relative maximum at the point x if, for every point y close enough to x, f(x) \geq f(y). Similarly, f(x) has a relative minimum at the point x if for all y close enough to x, f(x) \leq f(y).

Notice the presence of “close enough” in this definition. That is the key difference. As is suggested by the language, absolute is a stronger term than relative, as I can show explicitly by showing that absolute maxima/minima are also relative maxima/minima.

Fact: If f(x) has an absolute maximum at x, then f(x) also has a relative maximum at x.

Proof: Since x is an absolute maximum of f, f(x) \geq f(y) for every value y for which f makes any sense. This, of course, means that f(x) \geq f(y) for all y close to x. So, f(x) has a relative maximum at x.

The same kind of fact works for absolute and relative minima, and so a proof is not needed. You can fill in the blanks yourself. Before we can finalize the big claim I made about “U and V” I first need to take a detour into a related topic.

Increase and Decrease

We now move into a section more directly related to the idea of “change over time” that originally led us to the idea of a derivative in the first place. The idea here is fairly straightforward. As time progresses, a quantity could go up, it could go down, or it could stay the same. This is actually the complete idea. In mathematics, we say increasing instead of going up, decreasing instead of going down, and constant for staying the same.

How can we frame these ideas in equation-language? It is fairly easy if we use an analogy. Imagine you are walking on a mountain, that x represents how long you’ve been walking for, and f(x) represents how high you are. Intuitively, the function f(x) (your height) is increasing if after each step you take, you are higher than you were before. The progression of time can be captured pretty easily – time y is after time x if y > x. So, in order for each step to leave us higher than we started, we need f(y) > f(x) to be true whenever y > x is also true.

But, remember, we are talking about calculus too. The idea of “zooming in” will almost always be important. So, while the way we described the idea of increasing it is completely right, we want to be a bit more specific. We want to be able to say where f(x) is increasing and where it is not. To do this, we allow our where to be some interval of time, say [a,b]. Now, we can be more specific in our definition.

Definition: Let f(x) be a function with values in the range [a,b]. Then f(x) is increasing on the interval [a,b] if for any values of x,y between a and b, if y > x then f(y) > f(x).

It is pretty easy using the same intuition to come up with definitions for decreasing on an interval and constant on an interval. If we are going down a mountain, then with each step (time going forward) our height decreases (function value going down). Just as before, time going forward tells us that we want to think about y > x, and height going down whenever time goes forward tells us that instead of f(y) > f(x), we need f(y) < f(x). So, we have an easy definition.

Definition: Let f(x) be a function with values in the range [a,b]. Then f(x) is decreasing on the interval [a,b] if for any values of x,y between a and b, if y > x then f(y) < f(x).

Notice these two definitions are verbatim the same except for the word ‘decreasing’ and the change from > to < at the very end. Similar considerations to these lead us to a definition for constant in an interval.

Definition: Let f(x) be a function with values in the range [a,b]. Then f(x) is constant on the interval [a,b] if for any values of x,y between a and b, then f(x) = f(y).

Notice the similarity of all three definitions, and notice where they are different. It is always helpful I find to think of hiking up and down a mountain as the function.

Connecting Increasing/Decreasing with Derivatives

By using derivatives, we can zoom in even further. So far, we’ve only defined increase/decrease/constant in terms of intervals of numbers. But what about points? Those are even smaller than intervals. Can we somehow extend these ideas to miniscule points?

Actually, yes. We can. The easiest place to begin is the idea of being constant at a point. In the mountain analogy, this is the peak of the mountain. Everything levels out when you reach such a point. If you think in terms of tangent lines, the tangent line is totally flat there. That just is what it looks like to be constant, or flat, at a singular point. Imagining a rounded-off top to a mountain or hill is a good visual aid here. So, in terms of derivatives, we have a definition for being constant at a point.

Definition: A function f(x) is constant at the point a if f^\prime(a) = 0.

We can move forward now and say that increasing on an interval means increasing on every point inside that interval. Makes sense. Every point on a red wall is also red. It’s analogous to that. More specifically, f(x) is increasing at x if x is inside some interval [a,b] (that is, a < x < b) on which the old definition tells us that f(x) is increasing on [a,b]. But now, look at the definition of the derivative,

f^\prime(x) = \lim\limits_{h \to 0} \dfrac{f(x+h) - f(x)}{h}.

Let’s take this new definition of increasing at a point and play with it. Let’s pick an interval [a,b] on which f(x) is increasing. Since the variable h is part of a limiting process, we can make it as small as we want. So, we can make it small enough that x+h is always inside of [a,b]. This means that the rules of increasing apply here. Since f(y) > f(x) if, and only if, y > x, f(x+h) > f(x) if, and only if, x+h > x. Simplified, f(x+h) > f(x) is equivalent to h > 0. In exactly the same way, f(y) < f(x) is equivalent to y < x, and so f(x+h) < f(x) is equivalent to h < 0.

This may seem a bit weird, but pause and think about what we’ve done. Notice that f(x+h) - f(x) must be positive whenever h > 0, and it must be negative whenever h < 0. In both situations, \dfrac{f(x+h) - f(x)}{h} must always be positive. In fact, these values always being positive is going to force the limit of them, which is just f^\prime(x), to definitely not be negative. So, f^\prime(x) \geq 0. But… we already have a definition for f^\prime(x) = 0. So we won’t use that. When we keep the rest, we get a new definition for increasing functions.

Definition: The function f(x) is increasing at the point a if f^\prime(a) > 0.

You might be able to guess the definition of decreasing at a point now. If not, take some time to think about it before reading on.

Definition: The function f(x) is decreasing at the point a if f^\prime(a) < 0.

This discussion has now linked our ideas about increase and decrease to this new calculus language. You might not think we’ve made all that much progress, at least not yet, but I think we have. Notice how much shorter our new definitions are than the old ones! I would consider that a win. The are also easier to read, assuming of course you know what a derivative is. But that isn’t exactly the point. The point is that by the cleverly using the concept of a derivative, we made the idea of increasing and decreasing easier to put into a legitimate definition. And that is always a sign of mathematical progress. Simpler ways of saying the same thing is almost always better.

With all of the previous discussion in mind, we can now move on to the main point of the article, that puts all these ideas together.

Finding Peaks and Valleys with Math

We can move forward now with the big idea we mentioned before with the letters U and V. I made a big, big claim that the letters U and V essentially describe every possible type of maximum or minimum value you could ever find. I will now explain this.

Theorem: Let f(x) be a function whose input can be any real number and whose outputs are real numbers (I sometimes use the shorthand f : \mathbb{R} \to \mathbb{R} for this, where \mathbb{R} is a stand-in symbol that represents all the real numbers). Suppose that f(x) has a relative maximum value at the point x = a. Then either f^\prime(a) = 0 or f^\prime(a) does not exist.

Proof: Our claim is that there are only two possibilities. Another way to say this is that, if we are not in one of the situations, we must be in the other (since no third option exists). Viewing the problem in this manner is a common approach to either-or proofs in mathematics, and this is the approach I use here. I will assume that f^\prime(a) definitely exists. Our goal, then, is to discover that f^\prime(a) = 0 has to be true.

We already know that f(x) has a relative maximum value at x = a. Let’s remember briefly the definition of the derivative of f at a.

f^\prime(a) = \lim\limits_{h \to 0} \dfrac{f(a+h) - f(a)}{h}.

Now, since we are taking a limit, we can assume that h is actually small enough that a+h is close enough to a so that f(a+h) \leq f(a). Then for every small enough value of h, we know that f(a+h) - f(a) \leq 0. Now, the number h itself can be either small and positive or small and negative. If it is small and positive, then \dfrac{f(a+h) - f(a)}{h} \leq 0. If it is small and negative, then \dfrac{f(a+h) - f(a)}{h} \geq 0.

This is a rather curious fact. Remember that the derivative value f^\prime(a) definition definitely exist. Since the limit must exist, the approaches from negative and positive values of h cannot contradict each other, since if they did, the derivative just wouldn’t exist. This means that both of the inequalities we just figured out have to be true about f^\prime(a). That is, we have figured out that

0 \leq f^\prime(a) \leq 0.

It should strike us pretty quickly that the only number between zero and zero is zero. Therefore, f^\prime(a) = 0. This is exactly what I set out to discover, and so we are done proving what we set out to prove.

How to Find Relative Max and Min Values

We can make use of this super-important theorem to actually locate these special peaks and valleys in graphs. And, by using the increasing-decreasing ideas, we can pick out which of the located special points are peaks, which are valleys, and which are masquerading.

Here is the idea. Our big theorem tells us that the special points we are looking for – relative maximum and minimum points – always have either a zero derivative or a non-existent derivative. So, if we have a function f(x), we can locate all of these points by finding all solutions to f^\prime(x) = 0 and all points where the equation f^\prime(x) doesn’t make any sense but f(x) does make sense. We can then list all of these out, as there are never too many of these special points.

Just by the last paragraph, we now have a short list of possible places where we might find maximum or minimum values. But how do we tell which are which? This is where the increasing/decreasing ideas come in to play. Let’s start with minimum values – which have the shape of either U or V. Notice that to the left of the minimum points have down-sloping curves or lines, and the right of the minimum points have up-sloped curves or lines. In our discussion earlier, we already pointed out that down-sloping means decreasing, and up-sloped means increasing. We also pointed out that decreasing means f^\prime(x) < 0 and increasing means f^\prime(x) > 0. What does this mean then? Well, if a is one of our special points, and if f^\prime(x) < 0 to the left of a and f^\prime(x) > 0 to the right of a, then the graph of f(x) must be shaped either like a U or a V at the special point a. This means that a definitely has a minimum point at a! In exactly the same way, if the graph is increasing on the left side of a special point and decreasing on the right side, that must mean our graph is either an inverted-U or an inverted-V, which tells us that we have found a maximum point there!

But what if neither of those are true? What if the graph is increasing on both sides of the special point? Well, then, we’ve found a point masquerading as a max or min point, even though it isn’t either. It passed the first test by having a derivative that was 0 or didn’t exist, but it didn’t pass the second test. A good example here would be the letter D. The very right-most part of the letter D has a derivative that does not exist (to be clear about this part, since infinity isn’t a number, a line that goes straight up is said to not have a derivative). So, this point on D passes the first test. But if you zoom in nearby that point, it definitely is never the highest or lowest point in that zoomed-in window. So it can’t be a relative maximum or minimum point. And so it fails the second test. If you graph the function y = x^3 on a calculator or on the internet, you’ll see the point x = 0 has a zero slope (passes the first test) but isn’t a max or min (and so fails the second test). Again, this will be because the graph is going up on both sides of the key point (that is f^\prime(x) > 0).

Conclusion

This is one of the immediately most useful things about calculus. It has enabled us to maximize and minimize basically anything we want. The real-world implications here are pretty obvious. But to really lay out how you actually pull this sort of thing off, the next post will go through some examples of how to actually maximize and minimize things when someone hands you an equation.

Homework Problems

Problem 1: The big theorem I proved assumes that the function f(x) can take on any input we want. Think about what would happen if we constrain the acceptable input values. In particular, notice that a graph can’t have a tangent lines at endpoints (because all endpoints are ‘sharp’ and derivatives are designed to find locations that are not ‘sharp’). Can a graph have its absolute maximum value at an endpoint?

Problem 2: In the appendix, I go through the method of how to find the location of the peak or valley in a parabola. Convince yourself of a method for figuring out whether the point you find is going to be a maximum or minimum. Then, convince yourself that this is not just a relative maximum or minimum, but is an absolute maximum or minimum.

Problem 3: Find all the relative maximum and minimum points of the graph of f(x) = x^3 - x + 2 without using a graph. Then, check your answer by using a graph.

Problem 4: I claimed towards the end of the article that y = x^3 has a point that pretends to be a max/min point but actually isn’t. Use the second test (looking at increasing/decreasing behaviors of \dfrac{dy}{dx}) to understand why y = x^3 doesn’t have any max or min points. What is the special property of the graph that prevents it from having max and min points? Can you come up with other graphs that have this same special property?

Challenge Problem: For those looking for a challenge, find similarities between the two tests for max/min points from this post and the Intermediate Value Theorem that was used in Explaining Calculus #4 to prove that equations have solutions.

Appendix

What I want to do here to show how you can use this idea to find the x-value of the peak/valley point on any parabola.

The equation of a parabola is always given by a quadratic equation f(x) = ax^2 + bx + c for some constant numbers a,b,c. We’ve already mentioned that polynomials always have derivatives, and so in the “big theorem” the case where f^\prime(x) does not exist cannot rear its head here. So, to find the peak/valley of the graph, the only thing we need to do is solve the equation f^\prime(x) = 0.

To do this, let’s first find the derivative of f(x). Using the standard method for polynomials we discovered earlier, we can quickly discover that

f^\prime(x) = a \dfrac{d}{dx}[x^2] + b \dfrac{d}{dx}[x] + \dfrac{d}{dx}[c] = a * (2x) + b*(1) + 0 = 2ax + b.

Then to solve the equation f^\prime(x) = 0, we solve 2ax + b = 0. Subtracting b on both sides, 2ax = -b. By dividing both sides by 2a, then x = \dfrac{-b}{2a}. Therefore, if we want to find the peak/valley of some parabola, we can just check that value of x.

Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out /  Change )

Google photo

You are commenting using your Google account. Log Out /  Change )

Twitter picture

You are commenting using your Twitter account. Log Out /  Change )

Facebook photo

You are commenting using your Facebook account. Log Out /  Change )

Connecting to %s

%d bloggers like this: