📄Python Notebook
🍯Introduction
🔍Example ABC Agent Search Progress
⏳Agent Lifecycle in Swarm Optimization
🐝The 3 Bee Agent Roles
🪻Iris Dataset
❄ Clustering – No labels? No problem!
🏋️Fitness Model for Clustering
🤔Confusion Matrix as a Diagnostic Tool
🏃Running the Agentic AI Loop
📊Reporting Results
💬Designing Agent Prompts for Gemini
⚠️Gemini Agentic AI Issues
⚔️Agentic AI Competitive Landscape towards 2026
✨Conclusion and Future Work
Explore my interactive notebook on Google Colab — and feel free to connect with me on LinkedIn for any questions or feedback.
With the incredible innovation going on around Agentic AI, I wanted to get hands‑on with a project that integrates LLM prompts into a Data Science workflow. The Artificial Bee Colony (ABC) algorithm is inspired by honey bees’ foraging behavior and works remarkably well in nature. It belongs to the family of swarm intelligence algorithms, designed for decentralized decision‑making processes whereby “bee agents” pursue their individual goals autonomously, while collectively improving the quality of the overall solution (the “honeypot”).
This popular technique has been widely applied to many fields, in particular: scheduling, routing, energy optimization, resource allocation and anomaly detection. Researchers often combine ABC with neural networks in a hybrid approach, for example, using ABC to tune hyperparameters or optimize model weights. The algorithm is particularly relevant when data is scarce or when the problem is combinatorial – when the solution space grows exponentially (or even factorially) with the number of features.
In this project, my approach has been to mimic Swarm Optimization for an Adaptive Grid Search. The creative twist is that I applied Google’s new Agentic AI tools to implement the bee agents. In the ABC algorithm, there are three types of autonomous bee agents, and I defined their roles using text prompts powered by the latest Gemini LLMs.
Each foraging cycle (algorithm iteration) proceeds as follows:

Source: Author
The ABC algorithm was first proposed by Derviş Karaboğa in 2005. In my modernized meta‑heuristic adaptation, I focused on the goal of improving clustering performance for an unsupervised dataset.
Below are the Python classes I implemented:
The agents determine parameter values and ranges through natural language prompts provided to the Gemini generative AI model. All three agents inherit from the BeeAgent base class, which handles shared setup and candidate tracking. Part of each prompt is informed by the WebResearcher, which summarizes scikit-learn clustering algorithms algorithms and their key hyperparameters to ensure accuracy and relevance. Here’s how each agent works:
In essence, the Python code defines the task goal, parameters, constraints, and return values as text within the prompts. The generative AI model (Gemini) then “reads” and “understands” these instructions to produce or modify the actual numerical and categorical parameter values for the clustering algorithms. Different LLMs may respond differently to subtle changes in the input text, so it is important to experiment with the wording of prompts for the three agent classes. To refine the wording further, you can always consult your preferred LLM.
A natural choice for this study is Sir Ronald Fisher’s classic Iris flower dataset, introduced in his 1936 paper. In the subsequent sections, this dataset is utilized as a small, well‑defined demonstration case to illustrate how the proposed ABC optimization method can be applied within the context of a clustering problem.
The Iris dataset (License : CC0 1.0) comprises 150 labeled samples, each belonging to one of 3 Iris classes: Iris Setosa, Iris Versicolor, Iris Virginica. Each flower sample is associated with 4 numeric features: Sepal length, Sepal width, Petal length, Petal width.

Source: Flores de Íris by Dcbmariano via Wikimedia Commons, licensed under CC BY‑SA 4.0.

Source: Author (see Google Colab notebook)

Source: Author (see Google Colab notebook)
As shown in both the pairwise relationship plots and the mutual information feature‑importance plots, petal length and petal width are by far the most informative features when measured against the target labels of the Iris dataset.
Mutual Information (MI) is computed feature‑wise with respect to the labels, whereas the Adjusted Rand Index (ARI), used in this project for fitness evaluation, measures the agreement between two partitions (predicted cluster labels versus true labels). Note that even if feature selection is applied, since Iris Versicolor and Iris Virginica share similar petal lengths and widths, their clusters overlap in feature space. As a result, the ARI can be strong but cannot reach a perfect score of 1.0.
Clustering algorithms are a cornerstone of unsupervised learning and so I chose to focus on the goal of blindly determining the flower classes based solely on their features. In other words, the model was not trained on the flower labels; those labels were used only to validate performance metrics. Traditional clustering algorithms such as KMeans or DBSCAN often struggle with parameter sensitivity and dataset variability. Therefore, a meta-heuristic like ABC, which balances exploration vs exploitation, appears promising.
Note that in clustering algorithms, parameters should technically be referred to as hyperparameters, because they’re not learned from the data during training (as weights in a neural network or regression coefficients are) but they are set externally. Nevertheless, for brevity, they are often referred to as parameters.
Here’s a concise visual comparison of different clustering algorithms applied to several toy datasets, different colors represent different clusters that each algorithm found for 2D representations:

Source: Image from the scikit‑learn documentation (BSD 3‑Clause License)
In the classic Iris dataset, the two most similar species — versicolor and virginica — often pose a challenge for clustering algorithms. Many methods mistakenly group them into a single cluster, treating them as one continuous dense region. In contrast, the more distinct setosa species is consistently identified as a separate cluster.
Table comparing several popular clustering algorithms available in the scikit‑learn library:
| Algorithm | Summary | Key Hyperparameters | Efficiency | Accuracy |
| KMeans | Centroid-based, partitions data into k spherical clusters; simple and fast. | n_clusters, init, n_init, max_iter, random_state, tol | Fast on medium–large datasets; scales well; benefits from multiple restarts. | Strong for well-separated, convex clusters; poor on non-convex or varying-density shapes. |
| DBSCAN | Density-based, finds arbitrarily shaped clusters and marks noise without needing k. | eps, min_samples, metric, leaf_size | Moderate; slower in high dimensions; efficient with spatial indexing. | Excellent for irregular shapes and noise; sensitive to eps and density differences. |
| Agglomerative (Hierarchical) | Builds a dendrogram by iteratively merging clusters; no fixed k until cut. | n_clusters, affinity, linkage, distance_threshold | Slower (often O(n²)); memory-heavy for large n. | Good structural discovery; linkage choice impacts results; handles non-spherical clusters. |
| Gaussian Mixture Models (GMM) | Probabilistic mixture of Gaussians using EM (Expectation Maximization); soft assignments. | n_components, covariance_type, tol, max_iter, n_init, random_state | Moderate; EM can be costly with full covariance. | High when data is near-Gaussian; flexible shapes; risk of overfitting without constraints. |
| Spectral clustering | Graph-based; embeds data via eigenvectors before clustering (often KMeans). | n_clusters, assign_labels, n_neighbors, random_state, affinity | Slow on large n due to eigen-decomposition; best for small–medium sets. | Strong for manifold/complex structures; quality hinges on graph construction and affinity. |
| MeanShift | Mode-seeking via kernel density; no need to predefine k. | bandwidth, cluster_all, max_iter, n_jobs | Slow; expensive with many points/features. | Good for discovering cluster modes; performance highly dependent on bandwidth choice. |
Source: Table by author, generated with GPT-5
K‑Means is among the most widely used clustering algorithms, valued for its simplicity and efficiency. Because of its prevalence, I will outline it here in more detail as a representative example of how clustering is commonly performed. Its popularity comes from its simplicity and efficiency, though it does have limitations. A key drawback is that the number of clusters k must be specified in advance.
Initialize Centroids:
Select k starting centroids, either randomly or with smarter strategies like K‑Means++, which spreads them out to improve clustering quality.
Assign Points to Clusters:
Represent each data point as an n-dimensional vector, where each component corresponds to one feature. Assign points to the nearest centroid using a distance metric (commonly Euclidean). In high‑dimensional spaces, this step is complicated by the Curse of Dimensionality, where distances lose discriminative power.
Update Centroids & Repeat:
Recompute each centroid as the mean of all points in its cluster, then reassign points to the nearest centroid. Repeat until assignments stabilize — this is convergence.
The FitnessModel evaluates clustering candidate solutions on a dataset. The goal of a good clustering algorithm is to produce clusters that ideally map closely to the true classes but usually it’s not a perfect match. ARI (Adjusted Rand Index) is used to measure the similarity between two clusterings (predicted vs. ground truth) – it is a widely used metric for evaluating clustering performance because it corrects for chance agreement, works across different clustering algorithms, and provides a clear scale from −1 to +1 that’s easy to interpret.
| ARI Range | Meaning | Typical Edge Case Scenario |
| +1.0 | Perfect agreement | Predicted clustering exactly matches ground truth labels |
| ≈ 0.0 | Random clustering (chance level) | – Assignments are random- All points forced into one cluster (unless ground truth is also one cluster) |
| < 0.0 | Worse than random | – Systematic disagreement (clusters consistently mismatched or flipped)- Each point its own cluster when ground truth is different |
| Low/Negative (close to −1) | Strong disagreement | Extreme imbalance or mislabeling across clusters |
Source: Table by author, generated with GPT-5
Fitness = 1 – ARI, so lower fitness is better. This allows ABC to directly optimize clustering quality. Shown below is an example run for the initial iterations of an ABC with Gemini Agents that I developed including a preview of the LLM raw response texts. Note how the GMM (Gaussian Mixture Models) steadily improves as new candidates are selected on each iteration by the different bee agents. Refer to the Google Colab notebook for the logs for more iterations.
Starting ABC run with Fitness Model for dataset: Iris
Features: 4, Classes: 3
Baseline Models (ARI): {'DBSCAN': 0.6309344087637648, 'KMeans': 0.6201351808870379, 'Agglomerative': 0.6153229932145449, 'GMM': 0.5164585360868599, 'Spectral': 0.6451422031981431, 'MeanShift': 0.5681159420289855}
Runner: Initiating Scout Agent for initial solutions...
Scout Generating initial candidate solutions...
Scout : Sending prompt to Gemini model... n_candidates=12
Scout : Received response from Gemini model.
Scout : Raw response text: ```json[{"model":"KMeans","params":{"n_clusters":3,"init":"k-means++","n_init":10,"random_state":42}},{"model":"KMeans","params":{"n_clusters":4,"init":"random","n_init":10,"random_state":42}},{"model":"KMeans","params":{"n_clusters":5,"init":"k-mean...
Scout : Initial candidates generated.
Runner: Scout Agent returned 12 initial solutions.
Runner: Starting iteration 1/8...
Runner: Agents completed actions for iteration 1.
--- Iteration 1 Details ---
GMM Candidate 1 (Origin: Scout-10010) : Best previous ARI=0.820, Current ARI=0.820, Params: {'n_components': 4, 'covariance_type': 'tied', 'max_iter': 100, 'random_state': 42}
KMeans Candidate 2 (Origin: Scout-10000): Best previous ARI=0.620, Current ARI=0.620, Params: {'n_clusters': 3, 'init': 'k-means++', 'n_init': 10, 'random_state': 42}
DBSCAN Candidate 3 (Origin: Scout-10004): Best previous ARI=0.550, Current ARI=0.550, Params: {'eps': 0.7, 'min_samples': 4}
GMM Candidate 4 (Origin: Scout-10009) : Best previous ARI=0.820, Current ARI=0.516, Params: {'n_components': 3, 'covariance_type': 'full', 'max_iter': 100, 'random_state': 42}
KMeans Candidate 5 (Origin: Scout-10001): Best previous ARI=0.620, Current ARI=0.462, Params: {'n_clusters': 4, 'init': 'random', 'n_init': 10, 'random_state': 42}
DBSCAN Candidate 6 (Origin: Scout-10003): Best previous ARI=0.550, Current ARI=0.442, Params: {'eps': 0.5, 'min_samples': 5}
KMeans Candidate 7 (Origin: Scout-10002): Best previous ARI=0.620, Current ARI=0.435, Params: {'n_clusters': 5, 'init': 'k-means++', 'n_init': 5, 'random_state': 42}
DBSCAN Candidate 8 (Origin: Scout-10005): Best previous ARI=0.550, Current ARI=0.234, Params: {'eps': 0.4, 'min_samples': 6}
*** Global Best so far: ARI=0.820, Candidate={'model': 'GMM', 'params': {'n_components': 4, 'covariance_type': 'tied', 'max_iter': 100, 'random_state': 42}, 'origin_agent': 'Scout-10010', 'current_ari_for_display': 0.8202989638185834}
-----------------------------
Runner: Starting iteration 2/8...
Scout Generating initial candidate solutions...
Scout : Sending prompt to Gemini model... n_candidates=12
Employed Refining current solutions...
Employed : Sending prompt to Gemini model... n_variants=12
Onlooker Evaluating candidates and selecting promising ones...
Onlooker : Sending prompt to Gemini model... top_k=5
Scout : Received response from Gemini model.
Scout : Raw response text: ```json[{"model":"KMeans","params":{"n_clusters":3,"init":"k-means++","n_init":10,"random_state":42}},{"model":"KMeans","params":{"n_clusters":4,"init":"random","n_init":10,"random_state":42}},{"model":"KMeans","params":{"n_clusters":5,"init":"k-mean...
Scout : Initial candidates generated.
Employed : Received response from Gemini model.
Employed : Raw response text: ```json[{"model":"GMM","params":{"n_components":5,"covariance_type":"tied","max_iter":100,"random_state":42}},{"model":"GMM","params":{"n_components":3,"covariance_type":"full","max_iter":100,"random_state":42}},{"model":"KMeans","params":{"n_cluster...
Employed : Solutions refined.
Onlooker : Received response from Gemini model.
Onlooker : Raw response text: ```json[{"model":"GMM","params":{"n_components":4,"covariance_type":"tied","max_iter":100,"random_state":42}},{"model":"KMeans","params":{"n_clusters":3,"init":"k-means++","n_init":10,"random_state":42}},{"model":"DBSCAN","params":{"eps":0.7,"min_sam...
Onlooker : Promising candidates selected.
Runner: Agents completed actions for iteration 2.
--- Iteration 2 Details ---
GMM Candidate 1 (Origin: Scout-10022) : Best previous ARI=0.820, Current ARI=0.820, Params: {'n_components': 4, 'covariance_type': 'tied', 'max_iter': 100, 'random_state': 42}
GMM Candidate 2 (Origin: Scout-10010) : Best previous ARI=0.820, Current ARI=0.820, Params: {'n_components': 4, 'covariance_type': 'tied', 'max_iter': 100, 'random_state': 42}
GMM Candidate 3 (Origin: Onlooker-30000): Best previous ARI=0.820, Current ARI=0.820, Params: {'n_components': 4, 'covariance_type': 'tied', 'max_iter': 100, 'random_state': 42}
GMM Candidate 4 (Origin: Employed-20007): Best previous ARI=0.820, Current ARI=0.820, Params: {'n_components': 4, 'covariance_type': 'tied', 'max_iter': 80, 'random_state': 42}
GMM Candidate 5 (Origin: Employed-20006): Best previous ARI=0.820, Current ARI=0.820, Params: {'n_components': 4, 'covariance_type': 'tied', 'max_iter': 120, 'random_state': 42}
GMM Candidate 6 (Origin: Employed-20000): Best previous ARI=0.820, Current ARI=0.693, Params: {'n_components': 5, 'covariance_type': 'tied', 'max_iter': 100, 'random_state': 42}
KMeans Candidate 7 (Origin: Scout-10012): Best previous ARI=0.620, Current ARI=0.620, Params: {'n_clusters': 3, 'init': 'k-means++', 'n_init': 10, 'random_state': 42}
KMeans Candidate 8 (Origin: Scout-10000): Best previous ARI=0.620, Current ARI=0.620, Params: {'n_clusters': 3, 'init': 'k-means++', 'n_init': 10, 'random_state': 42}
*** Global Best so far: ARI=0.820, Candidate={'model': 'GMM', 'params': {'n_components': 4, 'covariance_type': 'tied', 'max_iter': 100, 'random_state': 42}, 'origin_agent': 'Scout-10010', 'current_ari_for_display': 0.8202989638185834}

Source: Author (see Google Colab notebook)
While the Adjusted Rand Index (ARI) provides a single score for clustering quality, the Confusion Matrix reveals where misclassifications occur by showing how true classes are distributed across predicted clusters.
In the Iris dataset, scikit‑learn encodes the species in a fixed order:
0 = Setosa, 1 = Versicolor, 2 = Virginica.
Even though there are only three true species, the algorithm below mistakenly produced four clusters. The matrix illustrates this mismatch:
[[ 0 6 44 0]
[ 2 0 0 48]
[49 0 0 1 ]
[ 0 0 0 0 ]]
⚠️ Note: The order of the columns (clusters) does not necessarily correspond to the order of the rows (true classes). Cluster IDs are arbitrary labels assigned by the algorithm, and they don’t carry any inherent meaning.
📌The confusion matrix shows that Setosa is distinct (its clusters don’t overlap with the other species), while Versicolor and Virginica are not separated cleanly – both are spread across the same two clusters (columns 0 and 3). This overlap highlights the algorithm’s difficulty in distinguishing between them. The confusion matrix makes these misclassifications visible in a way that a single ARI score cannot.
The Runner orchestrates iterations:
In the Runner class and throughout the Artificial Bee Colony (ABC) algorithm, a candidate refers to a specific clustering model together with its defined parameters. In the example in the solution pool shown below, two candidates are returned.
Candidates are orchestrated using python’s concurrent.futures.ThreadPoolExecutor, which enables parallel execution. As a result, the ScoutAgent, EmployedBeeAgent, and OnlookerBeeAgent are run asynchronously in separate threads during each iteration of the algorithm.
The runner.run() method returns two objects:
solution_pool: This is a list of the pool_size most promising candidates (each being a dictionary containing a model and its parameters) found across all iterations. This list is sorted by fitness (ARI), so the very first element, solution_pool[0], will represent the best-fitting model and its specific parameters that the ABC algorithm discovered.
best_history: This is a list that tracks only the best Adjusted Rand Index.
For example:
solution_pool = [
{
"model": "KMeans",
"params": {"n_clusters": 3, "init": "k-means++"},
"origin_agent": "Employed",
"current_ari_for_display": 0.742
},
{
"model": "AgglomerativeClustering",
"params": {"n_clusters": 3, "linkage": "ward"},
"origin_agent": "Onlooker",
"current_ari_for_display": 0.715
}
]
best_history = [
{"ari": 0.642, "model": "KMeans", "params": {"n_clusters": 3, "init": "random"}},
{"ari": 0.742, "model": "KMeans", "params": {"n_clusters": 3, "init": "k-means++"}}
]
ThreadPoolExecutor(): Initializes a pool of worker threads that can execute tasks concurrently.
ex.submit(…): Submits each agent’s act method as a separate task to the thread pool.
from concurrent.futures import ThreadPoolExecutor
import copy
# ... inside Runner.run() ...
for it in range(iterations):
print(f"Runner: Starting iteration {it+1}/{iterations}...")
if it == 0:
results = []
else:
# Use threads instead of processes
with ThreadPoolExecutor() as ex:
futures = [
ex.submit(self.scout.act),
ex.submit(self.employed.act, solution_pool),
ex.submit(self.onlooker.act, solution_pool)
]
results = [f.result() for f in futures]
print(f"Runner: Agents completed actions for iteration {it+1}.")
# ... rest of the loop unchanged ...
Each agent’s act method is dispatched to the thread pool, allowing them to run in parallel. The call to f.result() ensures that the Runner waits for all tasks to finish before moving forward.
This design achieves two things:
From the Runner’s perspective, iterations still appear sequential, but internally each iteration benefits from concurrent execution of agent tasks.
While ThreadPoolExecutor provides concurrency through threads, it can be seamlessly replaced with ProcessPoolExecutor to achieve true parallel CPU execution.
With ProcessPoolExecutor, each agent runs in its own separate process, which bypasses Python’s GIL (Global Interpreter Lock). The GIL is a mutex (mutual exclusion lock) that ensures only one thread executes Python bytecode at a time, even on multi‑core systems. By using processes instead of threads, heavy numerical workloads can fully leverage multiple CPU cores, enabling genuine parallelism and improved performance for compute‑intensive tasks.
from concurrent.futures import ProcessPoolExecutor
import copy
# ... inside Runner.run() ...
for it in range(iterations):
print(f"Runner: Starting iteration {it+1}/{iterations}...")
if it == 0:
results = []
else:
# Use processes instead of threads
with ProcessPoolExecutor() as ex:
futures = [
ex.submit(self.scout.act),
ex.submit(self.employed.act, solution_pool),
ex.submit(self.onlooker.act, solution_pool)
]
results = [f.result() for f in futures]
print(f"Runner: Agents completed actions for iteration {it+1}.")
# ... rest of the loop unchanged ...
📌Key Takeaway:
✅ Use ProcessPoolExecutor if your agents do heavy computation (matrix ops, clustering, ML training).
❌ Stick with ThreadPoolExecutor if your agents are mostly I/O‑bound (waiting for data, network, disk).
The repetition of candidate parameter values across iterations is a natural outcome of how the Artificial Bee Colony algorithm works and how the agents interact:
Scout Bee Agent’s Exploration: The ScoutBeeAgent is tasked with generating new and diverse candidate solutions. While it aims for diversity, given a limited parameter space or if the generative model finds certain parameter combinations consistently effective, it might suggest similar solutions in different iterations.
Employed Bee Agent’s Exploitation: The EmployedBeeAgent refines existing promising solutions. If a solution is already very good or close to an optimal configuration, the “local neighborhood” exploration (e.g., adjusting parameters by ±10-20%) might lead back to the same or very similar parameter values, especially after rounding or if the parameter adjustments are small.
Onlooker Bee Agent’s Selection: The OnlookerBeeAgent selects the top_k most promising solutions from a larger set of candidates (which includes newly scouted, refined by employed, and previously promising solutions). If the algorithm is converging, or if several distinct solutions yield very similar high-fitness scores, the OnlookerBeeAgent might repeatedly select parameter sets that are effectively identical from one iteration to the next.
Solution Pool Management: The Runner maintains a solution_pool of a fixed pool_size. It sorts this pool by fitness and keeps the best ones. If the top solutions remain consistently the same, or if new good solutions are identical to previous ones, those parameter sets will persist and thus be “repeated” in the iteration details.
Convergence: As the ABC algorithm progresses, it’s expected to converge towards optimal or near-optimal solutions. This convergence often means that the search space narrows, and agents repeatedly find the same high-performing parameter configurations unless some kind of pruning method (like deduplication) is applied.
Before applying ABC, it is useful to establish a baseline by evaluating the performance of standard clustering methods. I ran a comparison benchmark using default configurations for the following algorithms:
As shown in the Google Colab notebook, the ABC agents discovered parameter sets that significantly improved the Adjusted Rand Index (ARI), reducing misclassifications between the closely related classes Versicolor and Virginica.
The Reporter class is responsible for generating final evaluation outputs after running the Artificial Bee Colony (ABC) optimization. It provides three main functions:
I decided to design each agent’s prompt with the following template for a structured approach:
• Task Goal: What the agent must achieve.
• Parameters: Inputs like dataset name, number of candidates for the agent type, allowed algorithms and the hyperparameter input dictionary returned by the WebResearcher via its LLM prompt.
• Constraints: Ensure each candidate is unique, maintain balanced distribution across algorithms, require hyperparameters to stay within valid ranges.
• Return Values: JSON list of candidate solutions.
To ensure deterministic LLM behavior, I used this generation_config. In particular, note that specifying a temperature of zero leaves the model with no room for creativity between prompts and simply repeats the previous response.
generation_config={
"temperature": 0.0,
"top_p": 1.0,
"top_k": 1,
"max_output_tokens": 4096
}
res = genai_model.generate_content(prompt, generation_config=generation_config)
While developing new code like in this project, it is important to ensure that for the same input, you get the same output.
I ran into a common limitation for the “Lite” models:
LLMs don’t reliably obey instructions like “always include these parameters” just because you put them in the prompt. As of today, models often revert to defaults or minimal sets unless structure is enforced after generation. Why the explicit prompt still failed:
📌Key Takeaway: Prompts alone won’t guarantee compliance. You need prompt + schema enforcement to ensure outputs consistently include required parameters.
Models can prioritize other parts of the prompt or simplify outputs despite emphasis on required items.
On occasion, I got this response:
>> ScoutAgent: Error during API call (Attempt 1/3): Invalid operation: The response.text quick accessor requires the response to contain a valid Part, but none were returned. The candidate’s finish_reason is 2.
That error message means the model didn’t actually return any usable content in its response, so when my code tried to access response.text, there was no valid “Part” to read. The key clue is finish_reason = 2, which in Google’s API corresponds to a STOP or no content generated condition (the model terminated without producing text).
Why it happens:
How to handle it:
📌 Key Takeaway: This isn’t a network error — it’s the model signaling that it stopped without generating text. You can find the full list of FinishReason values and guidance on interpreting them in Google’s documentation: Generate Content API – FinishReason.
On occasion, the Gemini API call failed with:
📌 Key Takeaway: This is a network error and occurred without code changes, indicating transient network or service issues. Add retries with exponential backoff, timeouts, and robust logging to capture context (request size, rate limits, finish_reason) and recover gracefully.
One more thing to pay attention to, especially if you are using Agents for corporate use – security is mission-critical!
⚠️Provide strict guardrails between Agents and the LLM. Actively prevent agents from deleting critical files, taking off‑topic actions, making unauthorized external API calls, etc.
📌 Key takeaway: Apply the Principle of Least Privilege
This table outlines how the agentic AI market is expected to develop in the near future. It highlights the main companies, emerging competitors, and the trends that will shape the space as we move towards 2026. Presented here as a non‑exhaustive list of direct competitors to Gemini, the aim is to give readers a clear picture of the strategic environment in which agentic AI is evolving.
| Provider | Core Focus | Strengths | Notes |
| Google Gemini API | Multimodal LLM service (text, vision, code, etc.) | High‑quality generative outputs; Google Cloud integration; strong multimodal capabilities | Primarily a model API, Gemini 3 explicitly designed to support orchestration of agentic workflows |
| OpenAI GPT APIs | Text + code generation | Widely adopted; strong ecosystem; fine‑tuning options | Limited multimodal support compared to Gemini |
| Anthropic Claude | Safety‑focused text LLMs | Strong alignment and safety features; long context handling | Less multimodal capability |
| Mistral AI | Open and enterprise models | Flexible deployment; community driven; customizable | Requires infrastructure setup |
| Meta LLaMA | Open‑weight research models | Open source; strong research backing; customizable | Needs infra and ops for production |
| Cohere | Enterprise NLP and embeddings | Enterprise features; embeddings; privacy options | Narrower scope than general LLMs |
Source: Table by author, generated with GPT-5
This table examines the management and orchestration aspects of agentic AI. It highlights how different frameworks handle coordination, reliability, and integration to enable scalable agent systems.
| Framework | Core Focus | Strengths | Notes |
| LangGraph | Graph‑based orchestration | Models workflows as nodes/edges; strong memory; multi‑agent collaboration | Requires developer setup; orchestration only |
| LangChain | Agent/workflow orchestration | Rich ecosystem; tool integration; memory/state handling | Can increase token usage and complexity |
| CrewAI | Role‑based crew orchestration | Role specialization; collaboration patterns; good for teamwork scenarios | Depends on external LLMs |
| OpenAI Swarm | Lightweight multi‑agent orchestration | Simple handoffs; ergonomic routines | Good for running experiments |
| AutoGen (Microsoft) | Multi‑agent framework | Research + production focus; extensible | Still evolving; requires Microsoft ecosystem |
| AutoGPT | Autonomous agent prototype | Fast prototyping; community driven | Varying production readiness |
Source: Table by author, generated with GPT-5
This project was my first experiment with Gemini’s agentic AI, adapting the Artificial Bee Colony algorithm to an optimization task. Even on a small dataset, it demonstrated how LLMs can take on bee‑like roles in a meta‑heuristic process, while also revealing both the promise and the practical challenges of this approach. Feel free to copy and adapt the Google Colab notebook for your own projects.
Future Work