, whether you’re a manager, a data scientist, an engineer, or a product owner, you’ve almost certainly been in at least one meeting where the discussion revolved around “putting a model in production.”
But seriously, what does production even mean?
As you may know, I’m an AI engineer. I started my first data science job in 2015, in a large French company in the energy sector. At the time, we were among the first actors building AI applications for energy management and production (nuclear, hydraulic, and renewable). And if there’s one domain where putting AI into production is heavily regulated, it’s energy, especially nuclear. This is closely related to the nature of the data and the fact that you can’t push machine learning models easily into an existing environment.
Thanks to this experience, I learned very early that creating a model in a notebook is just the tip of the iceberg. I also started talking about production very quickly, without really knowing what it meant. For these reasons, I want to share with you the clearer view I’ve developed over the years when it comes to pushing machine learning projects into production.
But let’s pause for a moment and think about our main question.
What does production actually mean?
Sometimes, what’s behind this buzzword, “production,” can be tough to read and understand. There are countless YouTube videos and articles about it, but very few that translate into something you can actually apply in real projects.
If you try to answer it, our views will likely converge by the end of this article, even if the methods we use to reach production can differ from one context to another.
In the context of machine learning, production means that your model’s outputs directly affect a user or a product.
That impact can take many forms, such as educating someone, helping them make a decision, or enabling something they couldn’t do before; it can also mean adding a feature to a shopping app’s recommendation system.
Any program containing a machine learning algorithm used by a final user or another product or application can be considered a model in production.
Beyond having impact, production also comes with a layer of accountability. What I mean is that if nobody or no system is responsible for correcting the model when it is wrong, then your model may be deployed, but not in production.
There’s a common idea that 87% of ML projects fail to reach the final stage of production. I don’t know if that is strictly true, but my interpretation is simple: many ML models never reach the point where they actually have an impact on a user or a product. And even when they do, there is often no system in place to make them reliable over time, so they are just deployed and accessible.
So if we agree that production means having an ML project that is impactful and accountable, how do we get there?
To answer that, we need to accept that production has many faces. The model is only one component inside a larger ETL pipeline.
This point is crucial.
We often imagine a model as a black box, data goes in, math magic happens, and a prediction comes out. In reality, that’s a big oversimplification. In production, models are usually part of a broader data flow, often closer to a data transformation than an isolated decision engine.
Also, not all “production” looks the same depending on how forceful the model is in the final system.
Sometimes the model supports a decision, like a score, a recommendation, an alert, or a dashboard.
Sometimes it makes a decision, such as automatic actions, real-time blocking, or triggering workflows.
The difference matters a lot. When your system acts automatically, the cost of a mistake is not the same, and the engineering requirements usually increase very fast.
From my experience, most production systems can be broken down into:
→ The data storage system in production, this means that all data is stored in file systems or databases that are safely hosted in production environments (cloud or on-premise).
→ The production of the data acquisition part, this means having a system or workflow that connects to production databases and retrieves the data that will be used as input for the model. These workflows can contain the data preparation steps.
→ Pushing the machine learning component into production, this is the part that interests us. It means the model is already trained, and we need a system that allows it to run in the same environment as the other components.
These three parts show us clearly that ML in production is not about the machine learning model itself, it’s about everything around it.
But let’s focus only on component 3, “pushing the ML into production,” because the other steps are often handled by different teams in a company.
If I had a junior data scientist to whom I needed to explain how to work on this component, I would separate it as follows:
You start with a trained model. The first thing you need is a function, some code that loads the model, receives input data, performs the prediction, and returns an output.
At this stage, everything works locally. It’s exciting the first time you see predictions appear, but we don’t want to stop there.
A practical detail that matters early, don’t only think “does it predict?”, also think “does it fail cleanly?” In production, your function will eventually receive weird inputs, missing values, unexpected categories, corrupted files, or out-of-range signals. Your future self will thank you for basic validation and clear error messages.
To make this function usable by others (without asking them to run your code), you need an interface, most often an API.
Once deployed, this API receives standardized requests containing input data, passes them to your prediction function, and returns the output. This is what allows other systems, applications, or users to interact with your model.
And here is a production reality, the interface is not only a technical thing, it’s a contract. If another system expects /predict and you expose something else, friction is guaranteed. The same applies if you change the schema every two weeks. When teams say “the model is in production,” many times what they really mean is “we created a contract that other people depend on.”
Now we need portability. That means packaging the environment, the code, the API, and all dependencies so the system can run elsewhere without modification.
If you’ve followed the steps so far, you’ve built a model, wrapped it in a function, and exposed it through an API. But none of that matters if everything stays locked in your local environment.
This is where things become more professional: reproducibility, versioning, and traceability. Not necessarily fancy, just enough so that if you deploy v1.2 today, you can explain in three months what changed and why.
The final step is hosting everything somewhere users or applications can actually access it.
In practice, this often means the cloud, but it can also be internal company servers or edge infrastructure. The key point is that what you built must be reachable, stable, and usable where it’s needed.
And this is where many teams learn a hard lesson. In production, the “best model” is often not the one with the best metric in a notebook. It’s the one that fits real constraints, latency, cost, security, regulation, monitoring, maintainability, and sometimes simply, “can we operate this with the team we have?”
You can have the cleanest API and the nicest infrastructure, and still fail in production because you don’t see problems early.
A model in production that isn’t monitored is basically broken already, you just don’t know it yet.
Monitoring doesn’t have to be complicated. At minimum, you want to know:
With many real-world projects, performance doesn’t collapse with a big crash. It decays quietly.
Having all these components in place is what turns a model into something useful and impactful. Based on experience, here are a few practical guidelines.
For Step 1 (The Function), stick to tools you know (scikit-learn, PyTorch, TensorFlow), but think about portability early. Formats like ONNX can make future automation much easier. If you develop your own packages, you need to be sure, whether you are a manager or a data scientist, that the required software engineering or data engineering skills are present, because building internal libraries is a very different story from using off-the-shelf tools.
For Step 2 (The Interface), frameworks like FastAPI work very well, but always think about the consumer. If another system expects /predict and you expose something else, friction is guaranteed. You need to be aligned with your stakeholders, all technical points about where the machine learning output goes should be very clear.
For Step 3 (The Environment), this is where Docker comes in. You don’t need to master everything immediately, but you should understand the basics. Think of Docker as putting everything you built into a box that can run almost anywhere. If you already have good data engineering skills, this should be fine. If not, you either need to build them or rely on someone in the team who has them.
For Step 4 (The Infrastructure), constraints dictate choices. Lambda, microservices, edge devices, and of course, GPUs. ML workloads often need specialized infrastructure, sometimes via managed services like SageMaker.
Across all steps, one rule that saves lives: always keep a simple way to roll back. Production is not only about deploying, it’s also about recovering when reality hits.
Don’t consider this step of your data science project as a single milestone. It’s a sequence of steps and a shift of mindset. In a company, we are not waiting for you to push the most complicated model, we want you to build a model that answers business questions or adds a feature expected by a specific product. We need this model to reach the product or the user, and to be monitored so that people keep trusting and using it.
Understanding your environment is very important. The tools I mentioned before can differ from one team to another, but the methodology is the same. I’m sharing them only to give you a concrete idea.
You can build a great model, but if no one uses it, it doesn’t matter.
And if people use it, then it becomes real, it needs ownership, monitoring, constraints, and a system around it.
Don’t let your work stay in the 87%.
Note: Some parts of this article were initially written in French and translated into English with the assistance of Gemini.
🤝 Stay Connected
If you enjoyed this article, feel free to follow me on LinkedIn for more honest insights about AI, Data Science, and careers.
👉 LinkedIn: Sabrine Bendimerad
👉 Medium: https://medium.com/@sabrine.bendimerad1
👉 Instagram: https://tinyurl.com/datailearn