that your social media feed may know you too well.
When you browse social media, you notice a very typical behavior: you watch one video, and suddenly your timeline is flooded with more of the same. 5 years ago, it felt a bit like magic. But today, we talk about “the algorithm” as if it were a mysterious entity pulling strings in some Silicon Valley basement. The truth is much less dramatic, and much more interesting.
The algorithm isn’t inherently evil, it doesn’t sit there plotting your radicalisation. It’s just a chunk of code running cosine similarities and weighted averages, trying to predict what you’ll click on next. The trouble is what we interact with creates engagement. And the surest way to keep humans engaged turns out to be the worst way to keep them informed (rage-baits, fake news, or worse).
This post is about how recommendation engines work, why they tilt us toward echo chambers, and, because reading about a thing is never the same as seeing it, we’ll build one from scratch, point it at real news data, and watch the bubble form.
A social media algorithm is, at its heart, a curator. Its job is to sift through millions of posts and serve you the ones you’re most likely to engage with: click, watch, like, share, rage-comment on. It does this based on one word: data.
Every action you take is a clue:
Using machine learning, the algorithm spots patterns in this firehose of behaviour. It’s constantly asking the same question: what keeps this person on the platform longer? Remember that this is the largest goal of any social media company: keeping you on the platform longer.
Two classic techniques sit underneath most recommender systems:
Real platforms blend these methods with hundreds of other signals. But the core idea is the same: learn from your behaviour, predict what else might grab you.
The algorithm doesn’t intend to show you bad or false content. It optimises for engagement. And one of the surest ways to keep humans engaged is to tap into our emotions, especially the strong, negative ones. Or videos of cats.
Let’s stop talking about this abstractly and build one. We will use real anonymised click logs from Microsoft News. The dataset is called MIND (Microsoft News Dataset), published for academic research by Microsoft Research. This sample contains 50,000 users, over 51,000 English news articles across 17 categories (news, sports, finance, lifestyle, health, travel, and more), and 156,000+ real impression sessions, each recording what a user was shown and what they clicked on. The whole thing fits in about 30 lines of Python, although you don’t really need to now this inde detail:
import numpy as np
import pandas as pd
from scipy.sparse import csr_matrix
from sklearn.metrics.pairwise import cosine_similarity
# Build a sparse user × article matrix (1 = clicked, 0 = didn't)
matrix = csr_matrix((np.ones(len(clicks)), (user_rows, article_cols)),
shape=(n_users, n_articles))
def recommend(user_id, matrix, top_n=15, n_neighbors=50):
"""Find 50 most similar users and rank the articles
they clicked that our user hasn't seen yet."""
u = user_idx[user_id]
# Cosine similarity between this user and everyone else
sims = cosine_similarity(matrix[u], matrix).flatten()
sims[u] = 0 # don't recommend to yourself
# Take the top 50 most similar users
top_neighbors = np.argsort(sims)[-n_neighbors:][::-1]
weights = sims[top_neighbors]
# Score articles by weighted sum of neighbour clicks
scores = np.asarray(matrix[top_neighbors].T.dot(weights)).flatten()
# Zero out articles the user already clicked
scores[matrix[u].toarray().flatten() > 0] = 0
# Return the top-scoring articles
top_articles = np.argsort(scores)[-top_n:][::-1]
return top_articles
Cosine similarity finds your fifty closest neighbours, people who click on the same kinds of articles you do. We take the articles they clicked, weight them by how similar each neighbour is to you, and serve the top fifteen. This is the base of what powers a billion-dollar industry.
Cosine similarity might sound like something out of a math textbook, but bear with me, it’s easier than it looks. To show you how it works, let’s take a quick detour.
Imagine the following data points scattered across two axes, mechanical vs. biological, and cuteness:

Emoji Similarity Mapper — Cat and Dog – Image by Author
Cosine similarity measures the angle between two arrows, each one starting from the origin (0,0) and pointing toward one of our data points. The smaller the angle between them, the more similar the two items are.
Think of it this way: if two arrows are almost pointing in the same direction, the items they represent share similar characteristics. Take cats and dogs as an example. Both score high on ‘biological’ and high on ‘cuteness’, so their arrows point in nearly the same direction and cosine similarity returns a value close to 1 (its maximum).
But if we compare cats with teddy bears, although they are similar on the cute dimension, they are different on the biological axis:

Emoji Similarity Mapper — Cat and Teddy Bear – Image by Author
If we compare cats with teddy bears, although they are similar on the cute dimension, they are different on the biological axis, a cat is fully biological, while a teddy bear scores zero.
This pulls their arrows apart.The angle between them widens, and cosine similarity returns a lower value, reflecting that despite sharing one trait, these two objects occupy very different regions of our space.
And, of course, comparing cats to cars, give almost no similarity as the arrows between both point in different directions:

Emoji Similarity Mapper — Cat and Car – Image by Author
AI models use this kind of information to recommend content that is likely to trigger a similar response in you. Imagine a two-dimensional space where one axis captures how a video makes you feel (calm, entertained, outraged) and the other captures its topic. Every video gets plotted somewhere in that space.
If you click on a political video that makes you angry, and you watch it all the way through. The platform registers both dimensions: the topic and the emotional response. Using cosine similarity, it finds other videos whose ‘arrow’ points in the same direction (rage-baiting political videos) and serves them to you next. The more you engage, the more confidently the algorithm learns which corner of that space keeps you watching.
I picked a user from the MIND dataset whose reading history is pure sports, NFL power rankings, NBA trade rumours, MLB bans. It read twenty-five articles, all sport.
Let’s ask the recommender what to serve them:

Joe’s recommended reading list – Image by Author
The category breakdown:
The algorithm recognises this person’s sports habit and feeds it back, but it also serves a reasonably varied diet. There’s politics, entertainment, lifestyle, finance. Not bad, right?
Now watch what happens.
I simulated something far more common than a massive rabbit-hole binge: a moment of idle curiosity.
Our sports fan didn’t spend hours reading politics. Looking at their initial feed, they simply clicked on three items that caught their eye:
Just three clicks in less than ten minutes of reading and watching. Three tiny breadcrumbs left for the algorithm and Joe goes on with his life during the rest of the day.
Now, if we ran those clicks through the basic 30 lines of Python code we wrote earlier, nothing much would happen. Mathematically, 25 historical sports clicks would still overpower 3 new political clicks. The algorithm would still see a user who is 89% interested in sports, and the feed would barely budge.
But here is the very important secret sauce of modern social media: Recency Weighting (or Time Decay).
Real algorithms don’t treat all your clicks equally as a click you made three years ago is practically ancient history; a click you made three minutes ago is gold. To keep you hooked in the current session, platforms apply a heavy multiplier to whatever you are doing recently_._
A single line of code implements this in the algorithm we saw earlier. If we decide that the most recent clicks should carry up to 100 times more weight than older ones, we could write something like this:
time_decay_weights = np.array([0.1 if historical_click else 10.0 for click in user_history])
If we do this, let’s run the recommendations again:

Joe’s new reading recommendation list – Image by Author
Here’s the damage that just 3 clicks have done to our Time Weighting recommendation system:

Feed Category Before – Image by Author

Feed Category Before – Image bu Author
Political news went from 13% to 40% of the feed. A 3x increase. From one evening of clicking and reading three pieces of news. Sports (the thing this person has read for years) can drop from the dominant category to second place. The algorithm didn’t pause to think “hold on, this person has 25 sports articles in their history, and one evening of politics doesn’t define them.”
It doesn’t think, it just recalculates time weighted similarity matrices, found a new set of neighbours and served what other users that clicked on this may enjoy.
Two things jump out:
Note: real platforms don’t publish their decay constants, so this is illustrative, not a measurement, but the mechanism is real and the direction is what matters. My 100x example is possibly an exaggeration of the recency bias.
You now know how clicks influence what the math of what the algorithm shows you next.
But this gets worse — content that makes us angry, fearful, or shocked glues us to the screen far better than content that makes us feel good or informed. Social media companies didn’t engineer this consciously, their algorithms simply discovered it.
A massive 2025 study analyzing the digital trace data of 25,000 SmartNews users found that humans possess a trait-level “negativity bias” when selecting news. Evolutionarily, we are hardwired to pay attention to threats, avoiding danger was critical to our ancestors’ survival. What happens when this ancient instinct meets modern machine learning? The study confirmed that personalized recommendation feeds take our inherent negativity bias and actively augment it.
Furthermore, data from researchers analyzing hundreds of millions of posts on platforms like Facebook and X (formerly Twitter) reveals that social media users are roughly 1.91 times more likely to share negative news links than positive ones. Negativity equals virality, and the outrage loop is born.
The impact of these algorithmic loops isn’t just about the type of content we consume; it’s about how it fundamentally alters our brains. A recent 2025 systematic review analyzing 71 studies and 98,299 participants of short-form video feeds (like TikTok, Instagram Reels, and YouTube Shorts) found profound cognitive consequences.
Increased engagement with these endless-scroll platforms is associated with poorer cognitive performance, specifically impacting our sustained attention and inhibitory control.
Psychologists point to a dual process of habituation and sensitization to explain this phenomenon. The rapid, high-stimulation nature of short videos desensitizes us to slower, more effortful tasks like reading a book or deep problem-solving. At the same time, the algorithm’s instant delivery of curated content sensitizes our brain’s reward system, reinforcing impulsive engagement patterns and encouraging the habitual seeking of instant gratification.
Heavy users of these platforms exhibit reduced electrophysiological activity during attention-demanding tasks. Some researchers even point to structural differences in key cognitive control regions, including the prefrontal cortex and striatal reward circuits, linked to this constant bombardment of highly rewarding algorithmic stimuli.
Due to mathematical matrices, each of us in our own personalised bubble of information.
In the short term, it’s annoying for most people. But zoom out and the picture darkens. When algorithms feed us content that confirms what we already believe, we experience confirmation bias on steroids.
These filter bubbles deepen the social divide we have today. We will continue to be extremely divided and there is no end in sight for this chasm.
Misinformation thrives in closed loops because false stories don’t get exposed to scrutiny outside the bubble. By the time a fact-check goes out, the original lie has done a lap around the platform and built a small army of believers.
And democracy, which depends on a shared baseline of reality and some willingness to argue in public, takes a hit when citizens occupy entirely different reality bubbles.
You’re not powerless here. The algorithm is responsive but there a couple of things you can do, that, although annoying, may take you out of your bubble.
The same mechanism that built your bubble can be used to widen it. Some practical move:
I hope this blog post informed you on how these recommendation systems bubbles work. We built a recommender on real news data, and it took three clicks to flip a sports fan’s feed from 40% sports to 53% politics.
The first step to breaking free is simply being aware. Next time you find yourself in an online frenzy, take a breath and ask: Why am I seeing this? Who benefits from me reacting this way? The answer usually traces back to an algorithm doing its job, and that job is rarely “informing you.”
Stay informed, stay open-minded,
— Ivo