TL;DR: Profiling a Terraform provider revealed a performance bottleneck caused by repeated d.Get("task") calls inside a DiffSuppressFunc. The fix, implemented in PR #44543, introduces a singleTask flag to avoid repeated expensive calls, reducing plan time from ~15 minutes to seconds.
A colleague told me she had issues deploying aws_appflow_flow, a Terraform resource for AWS. Every time she ran it, Terraform indicated that all task blocks were being modified, even though nothing had changed. She shared a truncated Terraform plan, but running a full plan would take ~15 minutes — which was clearly unusual.
I tried to replicate the problem with a minimal resource. Even without a pre-existing resource, terraform plan was extremely slow when there were hundreds of tasks.
Since Terraform runs providers in separate processes, we can confirm the provider is causing the slowdown:
''' PID USER PR NI VIRT RES SHR S %CPU %MEM TIME+ COMMAND 1687211 matthew 20 0 2625140 338656 219444 S 113.3 5.6 1:02.02 /home/matthew/.terraform.d/plugins/terraform-provider-aws '''
A standard provider resource has four main functions:
Even during a plain plan without a pre-existing resource, all of these should not have been called — yet the plan was still extremely slow.
d.Get() in DiffSuppressFuncLooking at the resource:
‘‘‘golang “source_fields”: { Type: schema.TypeList, Optional: true, Computed: true, Elem: &schema.Schema{ Type: schema.TypeString, ValidateFunc: validation.StringLenBetween(0, 2048), }, DiffSuppressFunc: func(k, oldValue, newValue string, d *schema.ResourceData) bool { if v, ok := d.Get(“task”).(*schema.Set); ok && v.Len() == 1 { if tl, ok := v.List()[0].(map[string]any); ok && len(tl) > 0 { if sf, ok := tl[“source_fields”].([]any); ok && len(sf) == 1 { if sf[0] == "" { return oldValue == “0” && newValue == “1” } } } } return false }, }, '’’
Each call to d.Get("task") reconstructs the entire task set. With hundreds of tasks, calling this repeatedly inside DiffSuppressFunc creates an O(n²) performance problem.
I added a small HTTP server for Go’s pprof:
‘‘‘diff import ( “net/http” _ “net/http/pprof” )
go func() { log.Println(http.ListenAndServe(“0.0.0.0:6060”, nil)) }() '’’
Running:
'’’ go tool pprof -http=:8081 http://localhost:6060/debug/pprof/profile?seconds=30 '''
confirmed that d.Get() inside DiffSuppressFunc was dominating CPU usage.
singleTask flag (PR #44543)The PR introduced a simple flag to ensure the expensive operation only happens once per plan evaluation.
Before: repeated d.Get() for each task
‘‘‘golang DiffSuppressFunc: func(k, oldValue, newValue string, d *schema.ResourceData) bool { if v, ok := d.Get(“task”).(*schema.Set); ok && v.Len() == 1 { if tl, ok := v.List()[0].(map[string]any); ok && len(tl) > 0 { if sf, ok := tl[“source_fields”].([]any); ok && len(sf) == 1 { if sf[0] == "" { return oldValue == “0” && newValue == “1” } } } } return false } '’’
After: use zsingleTask flag
I added a flag that is calculated once per resource and then the field’s diff suppression simply references this: ‘‘‘golang DiffSuppressFunc: func(k, oldValue, newValue string, d *schema.ResourceData) bool { if !d.Get(“single_task_flag”).(bool) { return false }
return oldValue == "0" && newValue == "1"
} '’’
With this change, execution time dropped dramatically from ~15 minutes to seconds for hundreds of tasks.
d.Get() for large nested resources.pprof is simple and powerful.plan with no resource changes.singleTask) to cache or guard expensive operations can make a huge difference.Golang makes it easy to profile and optimize Terraform providers. With minimal setup, you can identify expensive calls, implement small optimizations, and see massive performance improvements — as demonstrated in PR #44543.
For further reading: