At Fleetondemand, our commitment to world-class customer service relies on rock-solid technology infrastructure. Behind the scenes, our DevOps team ensures our platform runs smoothly 24/7, enabling our customer-facing teams to deliver the exceptional service that earned us a customer loyalty NPS score of 75 (where 70+ is considered world-class).
We sat down with Stefan Oliwa, our DevOps Lead, to understand how his team's work in infrastructure, security, and automation directly impacts the reliability and quality of service our customers experience every day.
You've been in your DevOps Lead role with FOD for almost a year now. Tell us about what your role involves and the current projects you're involved in.
As infrastructure engineers, we're operational people, which means we have to cover every situation. When an issue arises or something is damaged, the main principle in our work (what I always call the "raining scenario") is: if something needs fixing and we go back to square one, how can our infrastructure be automatically redeemed without wasting time? Pragmatic solutions are mission critical.
We need to oversee backup policies, security from every angle, when we use the phrase 'infrastructure', that's all those logical bits and pieces working together. We oversee how one component works with others. We're also DevOps (Development Operations) which means we're in the middle of everything.
If you make a diagram for DevOps, you'll see two key aspects. First, we continuously improve what we've built - we never truly finish a job. Second, we're a bridge between teams with different aims. For example, the development team always wants new features, that's their main goal.
For infrastructure engineers, we care about stability over everything. We're always focused on security and stability. On top of that, we have stakeholders who want to plan new initiatives, and we're in the middle managing this feedback, trying to create a solution that satisfies all of those requirements.
If something needs fixing and we go back to square one, how can our infrastructure be automatically redeemed without wasting time? Pragmatic solutions are mission critical.
You were recently nominated for our "Be Disruptive" value on HiBob for your work in AWS. What did that involve, and how did you transform our infrastructure?
When I joined FOD in November last year, I could see opportunities to modernise our infrastructure and adopt cloud-native best practices. There was discussion about migrating to a more modern architecture, and when I reviewed the AWS DevOps plan, I worked closely with Matt Heald and Dan Metcalfe to develop a clear roadmap with specific metrics and defined best practices.
One of the first opportunities I identified was implementing comprehensive observability. Previously, troubleshooting required manual server access and file reviews, which was time-consuming. I knew we could do better. The right approach is being able to access dashboards, quickly navigate systems, and understand what's happening in real-time.
I built the first prototype for observability, which we now use with our monitoring systems daily. During Christmas, I started shaping our migration plan. By January, we concluded we needed to move forward with dedicated focus rather than an ad-hoc approach.
Over the next 2-3 months, I created a comprehensive migration plan, developed our database continuity strategy, and worked closely with our Information Security Director Dan to ensure we documented everything properly and got it right.
We completed the migration in an impressive 8-week period. This was ambitious, but we achieved it. This involved carefully updating legacy systems and addressing gaps in our infrastructure.
We're now moving to our V2 architecture at the beginning of next year, and GT Suite production will be fully established on that new architecture.
The right approach is being able to access dashboards, quickly navigate systems, and understand what's happening in real-time.
What's the benefit of this new infrastructure for our teams?
You just log into the system where you have dashboard visuals, and you can explore your logs, which are returned in a matter of seconds. I received feedback previously where someone said this improvement has changed a lot for us positively because they can read and preview tests, run experiments, and check errors easily.
When I showed this to Dave in back-end development, he said it's going to be a game changer for him and his team. That's why I consider myself disruptive. Something that everyone talks about and only dreams of, I came to the office and got it done as it needed to be done.
How does the functionality of our technology connect from a customer perspective? How does the work you do connect to the level of service our customers receive?
From the customer perspective, first things first: if we encounter an issue with the platform, we can fix it in under 30 minutes, not days or weeks. Secondly, if our clients have data storage mandates for their regions, we're flexible. We can recreate the current infrastructure in half an hour, create all the essential components, and have everything ready in an hour. That's huge for our customers' flexibility.
On top of that, there's resilience. We've had a few situations where something needed urgent attention but it was self-healing. Our containers are in two different data centres that communicate 24/7. When the system recognised something as problematic, we automatically moved our resources to the other data centre in London. This was happening in the background the whole time and our customers experienced no disruption.
From the customer perspective, first things first: if we encounter an issue with the platform, we can fix it in under 30 minutes, not days or weeks.
Could you explain what modularity means in relation to our infrastructure? I know it's an important aspect of how you've built things.
It's like building with Lego blocks. When we describe infrastructure now, we actually type commands. This is called 'infrastructure-as-code'. Everything is driven by variables: how powerful the computing instance should be, how much storage, which IP addresses can communicate with each other. That's the type of modularity we're talking about.
It also works like templating, which allows you to change one part of a template or extend it separately from other components. You can change something in one area without it affecting everything else. This makes our infrastructure incredibly flexible and maintainable.
Fleet data is highly confidential. How do we ensure the security of data for our clients?
I learned important lessons on security in my previous role working with an NHS manager. I asked him about data security between GPs, and he confirmed what I'd always believed: the only truly secure things are those that stay offline and aren't accessible via the internet.
That opened my thinking about our security approach. If you need to gain access to something that allows you to make changes, that should be on a principle-of-least-privilege basis with limited time granted to specific people.
Those permissions require robust authentication, similar to how you get two-factor authentication when logging into email. We've moved away from the outmoded way of connecting to servers, and we have implemented proper authentication protocols instead.
Secondly, we trust AWS for security, because it does great things and is trusted by tech giants, financial institutions and the armed forces, but as with all security solutions you have to continuously monitor and evaluate its performance.
So, we encrypt everything we store within AWS. Every component should be encrypted. We also create isolation as far as we can between different systems and data.
It's like building with Lego blocks. When we describe infrastructure now, we actually type commands. This is called 'infrastructure-as-code'... You can change something in one area without it affecting everything else. This makes our infrastructure incredibly flexible and maintainable.
Do you have any plans for the next 12 months and upcoming projects?
My main aim in development operations (and I support this in other companies as well) is to avoid situations when a client comes to us and says something is down. By ensuring this doesn’t happen, it means we’re achieving one of our most important goals as a company in terms of delivering a reliable and consistent service.
Netflix is my inspiration here. I remember calling them once with a problem. They picked up in less than two minutes and said, "Good afternoon, Stefan, how can I help you?" I wanted to explain my TV problem, but they said, "We see you're having an issue with loading time." They knew the exact problem and recommended a solution right away.
My point is, I love situations when a client calls and we can say, "We know you're experiencing this issue, and our engineer is already working on it." That's a huge difference. My aim is to create tools for Technical Support where they can see the client ID and any errors on the endpoint, so they're totally prepared for what someone is struggling with and how to solve it.
I'm even exploring merging this with AI so our tech support team can automatically generate emails that are easy to understand for non-technical people, tailored to whoever is involved.
My point is, I love situations when a client calls and we can say, 'We know you're experiencing this issue, and our engineer is already working on it.' That's a huge difference.
Have you used AI much in your current role or previously?
AI is quite trendy now, but I've been involved with it for years. I actually built a large language model that upscaled images. I'm a big fan of the technology, and since NVIDIA started accelerating this space, I've bought accelerators and broadened my knowledge of AI in my spare time.
I remember in high school, my IT professor asked why I didn’t attend his lessons. I told him he was teaching a language with no future. At the end of the year, he asked me to bring something to class to demonstrate my knowledge. People thought I wasn't going to pass, but I received the highest score possible. My professor told the class, "If anyone questions Stefan's score, I will question all of your scores," because he knew I'd built the right project.
I say the same thing to my daughter, who's six years old, about IT and traditional languages. I'm going to show her what to do with large language models and their implementations. I want her to be able to learn cutting-edge technology rather than what I see as ancient technology that won't be applicable in ten years if she wants to be an IT specialist.
Building for the Future
Stefan's work demonstrates how modern infrastructure and DevOps practices directly enable the exceptional customer service that defines Fleetondemand. By building resilient, secure, and automated systems, his team ensures our platform delivers the reliability our customers depend on, contributing directly to the world-class NPS score that we're proud to have achieved.
When infrastructure runs smoothly in the background, our customer-facing teams can focus entirely on delivering outstanding service. That's the power of having the right technical foundation in place.





