- Aug 7, 2024
- 5 minutes
- Gábor Kiss
The competition in the AI market is undeniably fierce, yet many enterprises hesitate to embrace deeper AI integration. A significant number of large language model (LLM) adopters tend to rely on a few major vendors, using similar tools in a bid to outpace their rivals. However, this strategy often results in a paradox where competitors mirror each other’s efforts, thereby neglecting a wealth of untapped potential that could differentiate them in the marketplace. By failing to explore unique AI solutions, these companies leave substantial value unrealized, missing out on opportunities that could set them apart from the competition.
Meanwhile, we live in an era where data breaches and cyber threats are ever-present, safeguarding sensitive information is paramount. On-premise LLMs provide a significant advantage in terms of data security, privacy and performance. By keeping data within the company’s own infrastructure, organizations can better control access and ensure that sensitive information does not leave their secure environment. This is particularly critical for industries handling highly confidential data, such as finance, healthcare, and government.
In this GenAI mini-series, we will shed light on the potential drawbacks of sticking with Software-as-a-Service LLM providers, explore alternative options available to organizations, and discuss reasons for moving to on-premise solutions and customising LLM models. Additionally, we will showcase some intriguing projects we have undertaken at Unit8 to advance LLM adaptation.
Compliance and Regulatory Requirements
Certain industries are subject to stringent regulatory requirements that dictate how and where data can be processed and stored. For example, financial institutions and healthcare providers must adhere to regulations that often mandate keeping data on-premises to ensure compliance and protect sensitive information.
While applications like a RAG chat are easy to verify from an enterprise data security perspective since the teams control and filter what data will be ingested into the knowledge base of the chatbot, more open use-cases, like AI development IDE copilot can prove difficult as operation teams lose control what data will be fed to the LLM.
Using on-premise LLM architectures helps organizations meet these regulatory obligations more easily. By maintaining control over their data and processing environments, companies can ensure that they remain compliant with industry-specific laws and standards when applicable. This not only mitigates the risk of legal penalties but also builds trust with clients and stakeholders who are increasingly concerned about data privacy and security.
Customization and Control
Cloud vendors often offer standardized solutions that may not fully align with an organization’s unique requirements. On-premise deployment allows companies to fine-tune and optimise LLMs for their particular use cases. Purposefully training or fine-tuning smaller models to specific tasks on proprietary data sets resulting in better performance and more relevant outputs at a much better cost-per-output-token ratio (for high volume usage patterns) with performance comparable to one-size larger models.
Foundation models, like ChatGPT or Claude, don’t have a specific mission; they want to be the best in everything: mathematics, reasoning, multiple languages, etc. For most of the use cases, some of these features are completely unnecessary, we don’t need advanced mathematical skills when we just want to extract information from a long piece of text. Why anyone would pay for all these unnecessary features while getting subpar performance when it is possible to build a model that could do exactly the same thing that you need but running significantly faster (even 700 tokens/sec from a single cluster of GPUs [1]) resulting in lower latencies and by being deployed on-premise; running on a flat cost profile.
Operational costs
In such a rapidly moving market research and development should be rapid to maintain a competitive edge. Training and testing LLM-based applications are exceptionally resource-heavy; fine-tuning could be expensive for large-sized data sets and a well-tested application should go through thousands of test cases in each iteration.
While most cloud services’ pay-per-use model keeps costs low in the early phase of adoption the aforementioned customisation and testing phase can rack up a significant bill with the LLM or cloud provider, especially when the use-case is exceptionally heavy e.g. summarising large datasets. A well-timed investment in LLM-capable hardware can pay off as quickly as in a year securing modern platform capabilities for the ongoing race for AI supremacy by arming developers and scientists with cheap access to AI hardware letting them explore, experiment and develop without worrying about costs or legal repercussions.
Performance and Latency
Performance and latency are critical factors in many applications, especially those requiring real-time responses like a chatbot. Cloud-based LLM providers struggle to keep up with today’s rapidly rising AI demands, which they trying to remedy by sacrificing latency to throughput which heavily impacts the users’ perceived quality of service. On-premise LLMs eliminate this dependency on external services, providing customisable latency, accuracy and guaranteed processing speeds and always ready-on-demand performance.
Closing thoughts
One thing is certain: the software-as-a-service (SaaS) model offered by current large language model (LLM) vendors is highly appealing to organizations in the early stages of LLM adoption. This appeal comes from its pay-as-you-go cost structure and no-cost updates. However, as with all *-as-a-Service models, the sharply rising costs become increasingly difficult to accept as platform usage scales with organization-wide adoption. Additionally, the challenge of gaining a market-differentiating edge with unified solutions often drives enterprises to explore fine-tuning and other deeper AI adoption techniques. In the next article, I will outline a possible journey to fully integrate LLMs and unlock their potential. Through a demonstration scenario, we will explore the key areas an organisation should focus on at each stage of adoption to stay prepared and facilitate the next steps in its AI journey.