, and it’s officially leaf-raking season. As I engaged in this tedious task, I realized it is basically one big optimization problem.
When raking my leaves, I made four piles: one on either side of the tree, one near the front, and one on the far side of the yard to catch the sparsely distributed leaves that the wind had caught.
As I worked, I began to wonder: how far from optimal was this? Would fewer piles be faster because I’d bag everything at once? Or more piles, so I wouldn’t have to rake as far? Maybe a single back-to-front pass would minimize total time?
Desperate to minimize the time I spent raking, my brain stopped raking and started modeling.
Optimization is a powerful yet often overlooked tool in data science circles now dominated by machine learning. Don’t get me wrong, machine learning is great, but sometimes I think it can be easy to miss the efficient solution (optimization algorithms) and jump right to the fun one (ML techniques).
Hopefully, after reading this, you’ll think that optimization can be fun, too. I know that when I first learned it, I started to see it everywhere. It literally changed the way I perceived the world around me.
In this article, I’ll show how a simple yard chore can be modeled as a linear programming problem. Along the way, we’ll:
In defining any optimization problem, there are a few key elements:
Objective function: Determining the objective function is simply determining the goal of the problem. The objective function is always designed to maximize or minimize some value. In this problem, we are minimizing the time spent raking and bagging leaves.
Decision Variables: The decision variables are the parts of the problem that you can control. For our leaf raking problem, the decision variables are the number of piles that the raker decides to make, where those piles are located, and which leaves are raked into each of those piles.
Constraints: We also have some that are out of our control. When we determine constraints, we are asking, “What factors are beyond my control?” or “What are the rules?” Usually, there are a lot of these. When it comes to raking leaves, there are rules that we can’t change to ensure the job is done well. Every leaf needs to be raked into a pile, and they can only be raked to a location where a pile has already been started. This also means that we cannot have an infinite number of piles (maybe we could, but that defeats the purpose of consolidating leaves into piles before bagging). Lastly, we are constrained by the amount of leaves we can fit into each bag since they have a limited capacity.
And just like that, we have the basic elements of a linear program.
These elements are present in every linear program formulation, but it can also be useful to define sets and parameters at this stage. We will proceed to this in the next sections.
Because the model employs binary variables to select open piles and integer variables to represent the number of bags (with linear constraints and costs), we formulate it as an integer linear program (ILP). If the assignment of one cell to one pile is relaxed so that a cell may be split between several piles, it becomes a mixed integer linear program (MILP).
At first I thought I’d just need my raking speed and bagging time.
However, that quickly expanded into a short list, including raking speed, bagging time, leaf distribution, wind direction, and yard slope. I also needed a way to represent each location in the yard.
If I were to calibrate this, I’d rake a 100-ft section, mark every 10 ft, and record split times to estimate how raking speed changes with density and distance. My hunch is that as I rake leaves a farther distance, I slow down. Perhaps I can rake a pound of leaves one foot in less than half the time that I can rake a pound of leaves two feet.
Likewise, I could time bagging for piles of different sizes: ¼ bag, ½ bag, ¾ bag, and a full bag-sized pile. Then, I could fit a function to show how bagging time grows with leaf volume. I suspect this is roughly linear.
I didn’t actually time myself doing these tasks. Instead, I made rough estimates. The goal isn’t to achieve perfect numbers, but to demonstrate the structure and approach of linear programming.
All data in this article is simulated or estimated from my own yard. No external datasets involved.
To account for leaf density per square foot and which leaves to rake (or assign) to which pile, we discretize the yard into a grid of cells.
Sets:
We also define other parameters, such as yard length, yard width, the size of a side of the grid square, the center of the tree trunk (from which the leaf distribution is derived), the radius of the tree trunk, the wind direction, ellipticity of the leaf distribution, the spread of the leaves, the base density of the leaves, an accumulation parameter, and decay power parameter. All of these factors contribute to determining where the leaves are set at the very beginning of the problem and can be observed with their default values in Table 1.
Table 1 – Parameters and their estimated values
| Parameter | Description | Value |
|---|---|---|
| L | Yard length | 60 ft |
| W | Yard width | 40 ft |
| s | Grid cell size | 1.0 ft |
tree_center |
Tree center coordinates (x,y) | (15,20) ft |
| rtrunk | Tree trunk radius | 1.5 ft |
| φ | Wind direction | 90° |
axis_ratio |
Ellipticity of leaf distribution | 1.5 |
| σ | Spread (std. dev.) of leaf density | 10 |
| ρ₀ | Base leaf density | 0.03 |
| Aamp | Wind accumulation amplitude | 0.28 |
| ppow | Decay power in leaf density | 2.0 |
From these parameters and the leaf-density model, we obtain:
These are computed numerically (in code) and treated as given constants in the optimization model.
We also define the derived pile mass mⱼ as the total mass of leaves assigned to pile j:
mⱼ = ∑ᵢ mᵢ xᵢⱼ
Now, we have all of the parameters that we need to create the representation of leaves distributed across a yard, and the parameters defined that govern the mechanics of how those leaves can be raked and bagged. Let’s move on to our decision variables: raking (assigning) leaves to piles, opening piles, and the number of bags used.
Assignment variables
To determine which leaves will be raked to which piles, we set up an assignment as such:
xᵢⱼ ∈ {0, 1}
for all i ∈ I, j ∈ J: xᵢⱼ = 1 if cell i is raked to pile j, else 0.
Pile-open variables
To determine where to rake leaves to, we define a pile-opening decision variable:
yⱼ ∈ {0, 1}
for all j ∈ J: yⱼ = 1 if pile j is opened (used), else 0.
Bag-count variables
Finally, to ensure that we have enough bags to hold the leaves at each pile, we have a bag count variable for each pile, defined as:
nⱼ ∈ ℕ₀
for all j ∈ J: integer number of bags used at pile j.
Every cell’s leaves must be fully assigned to exactly one pile (via the xᵢⱼ’s), and bag counts nⱼ must be sufficient to hold the mass assigned to each open pile.
Next, we declare the variable domains explicitly. This is set notation (also used above), but do not be confused by the notation. All this is saying is that each yard grid with leaves in it may either be assigned to a pile j or it may not, each pile may either be in one cell or it may not, and the number of bags used to bag all leaves in pile j must be an integer greater than zero.
Remember, i represents the leaves in the yard and j represents the location of the piles.
xᵢⱼ ∈ {0, 1}, for all i ∈ I, j ∈ J
yⱼ ∈ {0, 1}, for all j ∈ J
nⱼ ∈ ℕ₀, for all j ∈ J
In Python, we specify these as:
# Decision variables
x = [[pulp.LpVariable(f"x_{i}_{j}", cat="Binary") for j in range(m)] for i in range(n)]
y = [pulp.LpVariable(f"y_{j}", cat="Binary") for j in range(m)]
B = [pulp.LpVariable(f"B_{j}", lowBound=0, cat="Integer") for j in range(m)]
Raking time depends on mass and distance; bagging time depends on pile mass and number of bags.
We minimize total time:
minimize over x, y, n:
∑ᵢ∈ᴵ ∑ⱼ∈ᴶ xᵢⱼ · α · mᵢ · dᵢⱼ^β + ∑ⱼ∈ᴶ ( b₀ · nⱼ + b₁ · mⱼ )
The first term approximates raking effort: mass × distance^β, scaled by α. The second term adds bag setup time: b₀ times the number of bags nⱼ and bagging time proportional to pile mass mⱼ via b₁.
In code, this is implemented as:
∑ᵢ,ⱼ xᵢⱼ ( α mᵢ dᵢⱼ^β + b₁ mᵢ ) + b₀ ∑ⱼ nⱼ,
which is algebraically identical, since
∑ᵢ,ⱼ xᵢⱼ b₁ mᵢ = b₁ ∑ⱼ ∑ᵢ mᵢ xᵢⱼ = b₁ ∑ⱼ mⱼ.
Now recall the constraints we discussed earlier. All we are doing here is taking those same constraints and defining them with formal math so that we can formulate and solve the full problem.
(C1) Assignment
Each cell’s leaves must be assigned to exactly one pile:
∑ⱼ∈ᴶ xᵢⱼ = 1, for all i ∈ I.
for i in range(n):
prob += pulp.lpSum(x[i][j] for j in range(m)) == 1
(C2) Linking: use a pile only if it is opened
The leaves in a cell cannot be assigned to a location without an opened pile. We define this constraint as:
xᵢⱼ ≤ yⱼ, for all i ∈ I, j ∈ J.
for i in range(n):
for j in range(m):
prob += x[i][j] <= y[j]
(C3) Bag capacity
Total mass assigned to pile $j$ must not exceed the capacity of its bags:
∑ᵢ∈ᴵ mᵢ xᵢⱼ = mⱼ ≤ C nⱼ, for all j ∈ J.
# In this instance, I used B[j] to represent n_j
for j in range(m):
prob += pulp.lpSum(masses[i] * x[i][j] for i in range(n)) <= bag_capacity_lb * B[j]
(C4) Maximum number of piles
We limit the total number of piles that can be opened:
∑ⱼ∈ᴶ yⱼ ≤ Kₘₐₓ.
prob += pulp.lpSum(y[j] for j in range(m)) <= K_max
Putting it all together, we get:
minimize over x, y, n:
∑ᵢ∈ᴵ ∑ⱼ∈ᴶ xᵢⱼ · α · mᵢ · dijβ + ∑ⱼ∈ᴶ ( b₀ · nⱼ + b₁ · mⱼ )
subject to:
∑ⱼ∈ᴶ xᵢⱼ = 1, for all i ∈ I.
xᵢⱼ ≤ yⱼ, for all i ∈ I, j ∈ J.
∑ᵢ∈ᴵ mᵢ xᵢⱼ = mⱼ ≤ C nⱼ, for all j ∈ J.
∑ⱼ∈ᴶ yⱼ ≤ Kₘₐₓ.
This completes the model specification.
For the full implementation (calibration, solver setup, and plots) see the project repository: Leaf-Raking Optimization Code.
A heuristic is a “rule of thumb” approach to solving problem. When most people rake their yards, they use a simple heuristic whether they know it or not.
To check the value added by optimization, I compared the ILP to three simple heuristics:
Each heuristic returns a set of pile centers based on simple rules. Once these piles are made, we assume that the raker will rake each cell of leaves to the nearest pile. Note that under the optimization, the raker can rake leaves to a pile even if that pile is not the closest.
Formulating the problem as a linear program prior to creating formal heuristics is essential to ensure that the values returned by heuristics are feasible and align with the optimization objective.
Even when solving for an optimization is impractical, formulating one is often very helpful.
Below is an example snippet showing how the “micro-piles” heuristic initializes and refines centers based on leaf density:
Example: Micro-piles heuristic
def centers_micro(cells, masses, bag_capacity_lb, seed=42):
rng = np.random.default_rng(seed)
M_total = masses.sum()
k = max(1, int(round(M_total / (2 * bag_capacity_lb))))
probs = masses / (M_total + 1e-12)
centers = cells[ rng.choice(len(masses), size=k, replace=False, p=probs) ]
# Iteratively split dense clusters
for _ in range(8):
...
return centers
Full implementations for all heuristics are available in the repository under src/core/centers.py and src/core/front_sweep.py.
Figure 1: Total time taken to rake the yard by each strategy

The solution found by the optimization identifies 5 piles that are mostly centered around the tree. Its improvement over simple heuristics is notable but not dramatic.
Notice that there is no massive optimality gap between the micro-piles methodology and the optimized plan. This illustrates the power of heuristic methods, particularly when tested against an optimal solution to illustrate the performance of that heuristic method.
Figure 2: Heatmap of heuristic vs optimal raking (image by author). Rendered GIF visible on GitHub.

In many situations, computing the full linear program is too computationally expensive, so we must use a heuristic that is “close enough” to optimal. Even for a simple problem like this, I had to cut the solver off after it reached an optimality gap of 5% from the lower bound. In activities as commonplace and trivial as raking leaves, we use heuristics all the time. Perhaps we lose around 2.5 minutes by raking suboptimally, but we save hours by not having to formulate and code a linear program.
In many other applications, however, a few hours (or days) of math and code can save several weeks, or millions of dollars. Think, for example, of the time and money saved by improving the routing of planes to various airports around the world, or trucks around the country.
These types of problems are all around us, even if we do not solve them with explicit optimization methods. Optimization can serve as an effective methodology for translating everyday problems into a robust decision-making framework.
To summarize the process, you must: 1) Identify the discrete and continuous decisions you control, 2) Determine what the parameters of the problem are, 3) Define the objective (what you minimize or maximize) clearly, 4) State constraints explicitly (capacity, assignment, etc), and 5) Formulate and solve the problem.
Once you internalize this process, you’ll spot optimization opportunities everywhere.
[1] Leaf-raking optimization code and data simulation (2025). GitHub repository: [https://github.com/Josiah-DeValois/Optimize-Leaf-Raking\].
If you enjoyed this, I write about analytical reasoning, decision science, optimization, and data science.