Generative AI
Capco identifies compelling use cases and unlocks game-changing Gen AI solutions – safely, securely and responsibly.
Chinese technology start-up DeepSeek has become an overnight sensation with the launch of its new open source AI model, upending the received wisdom around the scalability and costs of Large Language Models (LLMs) and calling into question the dominance of US-based AI firms.
DeepSeek’s R1 “pure reinforcement learning” model has been hailed for delivering equivalent or better results than existing LLMs such as ChatGPT using less data and at a significantly lower cost. Opening up new opportunities for AI democratization through the orchestration of smaller models*, R1 addresses not only the cost and scalability concerns that have dogged established LLMs while also enabling opportunities to more effectively explore and fine tune models to meet required needs (*not to be confused with ‘small language models’, a separate and interesting topic).
As a challenger to existing offerings from hyperscalers like OpenAI, Anthropic and Google, the R1 model seems to live up to DeepSeek’s claim of “unlock[ing] new levels of intelligence in artificial systems, paving the way for more autonomous and adaptive models in the future”.1 They have achieved this by automating the subsequent ‘post training’ stage of the process, which up until now has been heavily reliant on human feedback for validation and/or refinement.
“This… is not only an “aha moment” for the model but also for the researchers observing its behavior. It underscores the power and beauty of reinforcement learning: rather than explicitly teaching the model on how to solve a problem, we simply provide it with the right incentives, and it autonomously develops advanced problem-solving strategies.” – DeepSeek-R1: Incentivizing Reasoning Capability in LLMs via Reinforcement Learning, January 20251
What, then, does the arrival of the R1 model mean for the financial services industry? Two headline takeaways would be that, following the seismic and rapid impact of ChatGPT on the world upon its launch in late 2022, the pace of change in this field is still accelerating; and secondly, the expectation that smaller models were coming and would be a compelling option for organizations has been borne out.
As DeepSeek notes in its paper: “We demonstrate that the reasoning patterns of larger models can be distilled into smaller models, resulting in better performance compared to the reasoning patterns discovered through [reinforcement learning] on small models.” If they felt that large LLMs were previously the only game in town, the R1 model is now a real option for addressing organizations’ problem statements.
For example, smaller language models give firms the opportunity to leverage and curate their own training datasets due to the lower data requirements needed to train smaller language models. If organizations are able to use their own training data rather than relying on hyperscaler datasets this can enable several key benefits:
There has already been plenty of discussion around the benefits of building AI capability in an agnostic way – that is, avoiding vendor lock in to ensure firms have sufficient flexibility to adapt to market changes and benefit from ongoing AI innovation. The R1 model underlines the importance of this agnostic perspective – and it will only be a matter of time before we see a response from hyperscalers, the emergence of other AI startups, and the evolution of SaaS providers who could look to integrate multiple AI models to offer more flexible AI-powered solutions. Indeed, OpenAI has already moved rapidly to regain the initiative (and reclaim the headlines) with the announcement of 'deep research' for ChatGPT, a new agent "that uses reasoning to synthesize large amounts of online information and complete multi-step research tasks".
The follow-on question is how should organizations look to build enterprise solutions, and the models that sit behind them, in a flexible fashion? In line with the solutions we have been implementing in our own AI Labs, our advice to our clients will be to focus on scalable enterprise solutions that allow easy model swaps, providing flexibility while also minimizing transition costs. By embedding modularity and interoperability into the solution architecture early on, organizations can future-proof their AI investments against rapid advancements in the field.
While DeepSeek’s claim that it spent a mere $6 million on training its new model is already the focus of debate (it does not appear to take account of the exponentially higher costs associated with, for example, data collection or GPU hardware), it does reflect a continued downward trend in the cost of AI training (speaking in late 2024, AWS CEO Matt Garman described these costs as having decreased “two or three times over the last 18 months”).2 These compounded reductions are now making AI at scale increasingly viable, including for token- or subscription-based offerings like GPT. As a result, we are seeing stronger business cases and improved ROI.
Data security is another key consideration for financial services firms, who typically prefer to use trusted cloud partners such as Azure, GCP, and AWS for their AI needs. While there is interest in open-source solutions, they often raise additional questions about governance and compliance – and we do not expect an immediate shift in this attitude. It is worth noting that, as this article was being finalized, Microsoft announced that – following “rigorous red teaming and safety evaluations” – it had added DeepSeek R1 to Azure AI Foundry and GitHub, and plans to bring Distilled DeepSeek R1 models to Copilot+ PCs.3
Considering all aspects, organizations should not rush to pivot to DeepSeek, but instead recognize that they may no longer need to be so heavily tied to GenAI as a cloud solution, instead looking to more on-prem solutions that rely on lower levels of compute power.
Building AI capability in a way that recognizes the pace of change is likewise key – and to build for adaptability and high pace of change you can't be locked into a single vendor. This points to a ‘model of models’ approach as the optimal path forward, with a focus on orchestration rather than ‘one model to rule them all’.
This further highlights the importance of robust model benchmarking, and we recommend that organizations undertake benchmarking exercises to thoroughly analyze which AI models best suit each use case, enabling them to select the most optimal options to maximize performance and achieve the highest return on investment.
Speed of future change is also worth reiterating. Firms might even be advised to think less on today's news about DeepSeek, but rather about all the news that is still inevitably to come. The R1 model has been described as AI’s ‘Sputnik moment’ – and of course Sputnik’s impact was to trigger a massive acceleration in change. We will see the same now for AI: the trick will be to try and keep pace.
References
1 https://arxiv.org/abs/2501.12948
2 https://valorinternational.globo.com/business/news/2024/11/21/high-costs-limit-ai-adoption-but-prices-are-dropping.ghtml
3 https://www.theregister.com/2025/01/30/microsoft_deepseek_azure_github/