
Engagement Overview
Product and Delivery Lead in a crucial cloud transformation programme focused on modernising and enhancing the infrastructure used by vital law enforcement agencies across the UK. Delivering key solutions within a multi-tenanted cloud hosting platform, with an emphasis on scalability, security, resilience, and cost optimisation, ultimately supporting improved policing outcomes.
FinOps
Problem
The platform’s cloud spend was accelerating with minimal visibility, an outdated charging model and no clear ownership of financial decisions across teams. This resulted in poor assessment ratings and a pressing need to regain control, stabilise costs, implement effective governance and embed a FinOps-by-Design operating model.
Outcome
Accelerated the FinOps capabilities within the programme by advancing cost control measures, governance and oversight over cloud spend, leading to progressing the programme from a ‘Pre-crawl’ to a ‘Walk’ maturity during assessments, as well as achieving the best-in-class role model rating for AWS Landing Zones within the public sector.
Achieved Platform savings of £150,000 per month, through the adoption of Compute Optimiser recommendations, Instance Scheduling, migrating to Graviton Instances, migrating logs from Splunk to Unified Lake and migrating to a shared Home Office Jira account. Increased cost visibility and transparency through the implementation of QuickSuite dashboards and ensuring stakeholder buy-in whilst also identifying an additional £200,000 in potential savings across resources, architecture and licenses.
Implemented a usage-based recharging model for tenants, ensuring fairer and transparent cost allocation aligned with actual consumption and subsequently reducing programme costs and achieving a cost-neutral platform. This replaced the previous t-shirt sizing approach, which led to tenants not understanding their spending and the platform absorbing costs as a result.
Managed the implementation of a new tagging policy, whilst driving platform-wide tagging compliance through pipeline enforcement, change process stage gates, ensuring new resources adhered to the policy, as well as policing teams to update existing resources through tagging compliance dashboards and regular reviews. This increased tagging compliance from less than 20% to 80% and ultimately provided the platform with a clearer understanding of cost allocation and security posture.
In addition to improving cost visibility, allocation, and cost efficiency, implemented cost governance and control measures, such as anomaly management processes and budget controls. Inserted FinOps processes into the entire lifecycle, including in the early design stages, to ensure a FinOps by Design approach.
Observability
Problem
Observability across the platform was fragmented. With no unified solution in place, individual teams were building their own bespoke monitoring and alerting setups instead of adopting a shared, scalable approach. Splunk performance resulted in a mean time to triage of 45 minutes, monitoring-as-code practices weren’t understood or adopted, and there was no obsolescence tracking or resilience checks. The team’s overall scope was too limited to provide meaningful end-to-end visibility or governance.
Outcome
Defined a 24-month roadmap to address the platform’s critical observability and monitoring gaps. Consolidated Operational Monitoring and Protective Monitoring into a single, cohesive team to improve scope, governance and visibility. Led the delivery of the modernised Observability roadmap that addressed gaps in existing monitoring, lack of shared solutions and automation, feature development and technical debt.
Delivered a shared Alert Manager capability that standardised alert ingestion from different data sources, enriched and routed them into ServiceNow. This replaced the fragmented, team-built solutions that were unreliable, inconsistent and difficult to scale. By moving to a single platform service, teams and tenants could onboard quickly, reduce operational risk, and ensure that critical alerts were captured and acted on consistently.
Migrated from the legacy monitoring solution that relied on Monitoring as Code, Prometheus and Grafana, replacing this with a cloud native approach, providing a service wrapper that included templates, empowering teams to define their own alarms and dashboards.
Implemented Unified Data Lake to reduce dependency on Splunk, improving data availability and lowering ingestion costs. This reduced Mean Time to Triage (MTTT) from 45 minutes to eight minutes and reduced the Splunk bill by 40%. Led the rearchitecture of the Splunk implementation, including removing the resource-intensive Heavy Forwarders and automating the Splunk onboarding process.