be honest. Writing code in 2025 is much easier than it was ten, or even five, years ago.
We moved from Fortran to C to Python, each step lowering the effort needed to get something working. Now tools like Cursor and GitHub Copilot can write boilerplate, refactor functions, and improve coding pipelines from a few lines of natural language.
At the same time, more people than ever are getting into AI, data science and machine learning. Product managers, analysts, biologists, economists, you name it, are learning how to code, understand how AI models work, and interpret data efficiently.
All of this to say this:
The real difference between a Senior and a Junior Data Scientist is not the coding level anymore.
Do not get me wrong. The difference is still technical. It still depends on understanding data, statistics and modeling. But it is no longer about being the person who can invert a binary tree on a whiteboard or solve an algorithm in O(n).
Throughout my career, I have worked with some outstanding data scientists across different fields. Over time, I started to notice a pattern in how the senior data professionals approached problems, and it wasn’t about the specific models they adopted or their coding abilities: it is about the structured and organized workflow that they adopt to convert a non-existing product into a robust data-driven solution.
In this article, I will describe this six-stage workflow that Senior Data Scientists use when developing a DS product or feature. Senior Data Scientist:
Throughout the article I will expand on each one of these points. My goal is that, by the end of this article, you will be able to apply these six stages on your own so you can think like a Senior Data scientist in your day to day work.
Let’s get started!
I get it, data professionals like us fall in love with the “data science core” of a product. We enjoy tuning models, trying different loss functions, playing with the number of layers, or testing new data augmentation tricks. After all, that is also how most of us were trained. At university, the focus is on the technique, not the environment where that technique will live.
However, Senior Data Scientists know that in real products, the model is only one piece of a larger system. Around it there is an entire ecosystem where the product needs to be integrated. If you ignore this context, you can easily build something clever that does not actually matter.
Understanding this ecosystem starts from asking questions like:
In a few words, before doing any coding or system design, it is crucial to understand what the product is bringing to the table.

Image made by author
Your answer, from this step, will sound like this:
[My data product] aims to improve feature [A] for product [X] in system [Y]. The data science product will improve [Z]. You expect to gain [Q], improve [R], and decrease [T].
Ok, now that we have a clear understanding of the ecosystem, we can start thinking about the data product.
This is an exercise of switching chairs with the actual user. If we are the user of this product, what does our experience with the product look like?
To answer our question, we need to answer questions like:

Image made by author
As you may notice, we are getting in the realm of system design, but we are not quite there yet. This is more of the preliminary phase where we determine all the constraints, limits and functionality of the system.
Ok, now we have:
So we have everything we need to start the System Design* phase.
In a nutshell, we are using everything we have discovered earlier to determine:
Tools you can use to brainstorm this part are Figma and Excalidraw. For reference, this image represents a piece of System Design (the model part/part 2 of the above list) using Excalidraw.

System Design made by author using Excalidraw
Now this is where the real skills of a Senior Data Scientist emerge. All the information you have accumulated so far must converge to your system. Do you have a small budget? Probably training a 70B parameter DL structure is not a good idea. Do you need low latency? Batch processing is not an option. Do you need a complex NLP application where context matters and you have a limited dataset? Maybe LLMs can be an option.
Keep in mind that this is still only “pen and paper”: no code is written just yet. However, at this point, we have a clear understanding of what we need to build and how. NOW, and only now, we can start coding.
*System Design is a huge topic per se, and to treat it in less than 10 minutes is basically impossible. If you want to expand on this, a course I highly recommend is this one by ByteByteGo.
When a Senior Data Scientist works on the modelling, the fanciest, most powerful, and sophisticated Machine Learning models are usually the last ones they try.
The usual workflow follows these steps:
When I say “build your way up”, this is what I mean:

Image made by author
In a few words: we only increase the complexity when necessary. Remember: we are not trying to impress anyone with the latest technology, we are trying to build a robust and functional data-driven product.
When I say “reasonably simple” I mean that, for certain complex problems, some very basic Machine Learning algorithms might already be out of the picture. For example, if you have to build a complex NLP application, you probably will never use Logistic Regression and it is safe to start from a more complex architecture from Hugging Face (e.g. BERT).
One of the key differences between a senior figure and a more junior professional is the way they look at the model output.
Usually, Senior Data Scientitst spend a lot of time manually reviewing the output manually. This is because manual evaluation is one of the first things that Procuct Managers (the people that Senior Data Scientists will share their work with) do when they want to have a grasp of the model performance. For this reason, it is important that the model output looks “convincing” from a manual evaluation standpoint. Moreover, by reviewing hundreds or thousands of cases manually, you might spot the cases where your algorithm fails. This will give you a starting point to improve your model if necessary.
Of course, that is just the beginning. The next important step is to choose the most opportune metrics to do a quantitative evaluation. For example, do we want our model to properly represent all the classes/choices of the dataset? Then, recall is very important. Do we want our model to be extremely on point when it does a classification, even at the cost of sacrificing some data coverage? Then, we are prioritizing precision. Do we want both? AUC/F1 scores are our best bet.
In a few words: the best data scientists know exactly what metrics to use and why. Those metrics will be the ones that will be communicated internally and/or to the clients. Not only that, those metrics will be the benchmark for the next iteration: if someone wants to improve your model (for the same task), it has to improve that metric.
Let’s recap where we are:
Now it is finally time to present our work. This is crucial: the quality of your work is only as high as your ability to communicate it. The first thing we have to understand is:
Who are we showing this to?
If we are showing this to a Staff Data Scientist for model evaluation, or we are showing this to a Software Engineer so they can implement our model in production, or a Product Manager that will need to report the work to higher decisional roles, we will need different kinds of deliveries.
This is the rule of thumb:

In 2025, writing code is not what distinguishes Senior from Junior Data Scientists. Senior data scientists are not “better” because they know the tensorflow documentation on the top of their heads. They are better because they have a specific workflow that they adopt when they build a data-powerted product.
In this article, we explained the standard Senior Data Scientist workflow though a six layer process:
Thank you again for your time. It means a lot ❤️
My name is Piero Paialunga, and I’m this guy here:

Image made by author
I’m originally from Italy, hold a Ph.D. from the University of Cincinnati, and work as a Data Scientist at The Trade Desk in New York City. I write about AI, Machine Learning, and the evolving role of data scientists both here on TDS and on LinkedIn. If you liked the article and want to know more about machine learning and follow my studies, you can:
A. Follow me on Linkedin, where I publish all my stories
B. Follow me on GitHub, where you can see all my code
C. For questions, you can send me an email at [email protected]