Artificial intelligence is revolutionizing the way we interact with technology, but local implementation offers a unique mix of opportunities and challenges. Let's explore together the advantages and disadvantages of bringing AI directly into our operating environment.

Hardware and Software Requirements for Local AI

Running an artificial intelligence model on a local machine is no small feat. To be clear, we need a high-performance machine with an AI-optimized CPU that integrates GPU/iGPU and perhaps even an NPU to optimize performance. Furthermore, it is essential to have an adequate amount of (fast) RAM and sufficient storage space to manage the data and models.

A modern server room with high-performance AI hardware including CPUs, GPUs, and NPUs visible, brigh

The hardware configuration must be accompanied by appropriate software, including specific libraries and frameworks for AI, such as TensorFlow or PyTorch, or the use of LM Studio, which automatically handles model downloads from Hugging Face and simplifies the installation of necessary dependencies.

Why use a local AI model?

The answer is always the same: cost optimization or, in my case regarding my company, revenue optimization.

As always, using and integrating artificial intelligence within a project implies using cloud AI models: Google, OpenAI, Meta, etc. Using these models comes with a price per million tokens. Therefore, having a local AI model allows for a significant reduction in long-term operating costs, as you are no longer tied to recurring payments for access to cloud services.

If, as in my case, you integrate AI solutions for clients, the company that integrated them only sees money passing through from the client's account directly to OpenAI's account, for example. It is a flow that does not stop within the company. Having a local server powerful enough to provide AI models to your clients means earning from that processing as well—provided, of course, that you carefully evaluate whether the volume of requests is compatible with the purchased hardware.

In essence, by integrating an AI Server in your own offices, client applications will no longer point to Google or OpenAI's cloud, but will instead point to the local server.

Privacy and Data Security

There is another reason why a local AI model is preferable to a cloud-based model: data privacy.

With cloud models, we are sending data—including sensitive information—to the cloud; do we really know what happens to that data? In local solutions, the data stays put, never leaving the boundaries of the client company and the provider. This aspect is crucial, especially for companies operating in regulated sectors or those handling sensitive information. Maintaining control over data also means being able to guarantee compliance with data protection regulations, such as GDPR. Furthermore, having a local server allows for the implementation of customized security measures tailored to the specific needs of the company.

An office environment showing a compact Mini PC setup (BOSGAME M5) on a desk with multiple monitors

How to implement a local AI server

First, you need to equip yourself with solutions suitable for the type of traffic you expect to handle.
Currently, my clients collectively make no more than 300 requests per day. A Mini PC equipped with appropriate hardware is perfectly sufficient.

In my case, we decided to go with a BOSGAME M5 featuring an AMD Ryzen AI Max+ Chip with 96GB of LPDDR5X RAM and a 2TB NVMe SSD. This is a solid machine that can easily handle workloads in the range of 2,000 to 3,000 calls per day. If we consider a fixed cost applied to the client regardless of the number of tokens—€0.015 (one and a half cents) per call—with 300 requests, you would earn €4.50 per day. With 3,000 calls per day, the revenue would rise to €45.00 per day. In one month, that would be €1,350.00 in turnover, minus about 20/25 euros in electricity costs. Naturally, earnings could improve significantly if you decided to estimate based on tokens consumed or if you wanted to apply costs simply based on the time elapsed between the call and the response. This mini AI server costs approximately 2,500 euros. Essentially, by utilizing it to its full potential—say, with 2,000 calls per day—it would pay for itself in three months!

Using Mac Studio M3 Ultra for AI

Here we are talking about truly monstrous Hardware, and daily calls could easily rise to reach 10,000 per day. Keeping the aforementioned costs unchanged at 0.015 euros per call, we would have 150 euros per day and a monthly turnover of approximately 4,500 euros. Even considering operating costs, the net profit would be significant, allowing the initial investment to be paid back in less than two months. Furthermore, with such powerful hardware, further applications and services could be explored, expanding the offering and increasing the customer base.

A high-tech workspace featuring a Mac Studio M3 Ultra setup with multiple screens, showcasing AI pro

Hybrid Solution for AI Servers

This is exactly the approach I have in mind. Integrating two AI servers, one based on BOSGAME M5 and one based on Mac Studio, so that simple queries can be handled by the BOSGAME and complex queries by the Mac Studio. Perhaps letting the BOSGAME decide when to call one model or the other based on complexity.

Naturally, latency and workload management must also be taken into account to ensure the system remains responsive and high-performing. Implementing an intelligent routing algorithm could further optimize resources, ensuring that each server is utilized to its full capacity.

Disadvantages of Local AI

However, there are some disadvantages to consider. The complexity of a hybrid architecture may require more maintenance and monitoring, increasing operational costs. Additionally, synchronization between the two servers could introduce latency. Hardware costs must also be taken into account; the investment for purchasing and configuring both servers can be significant, especially when opting for high-quality components. Finally, network stability must be considered. If you are providing a service to customers, it is essential to integrate software-level fallbacks or cloud redundancies in case the local server becomes unreachable for any reason, in order to ensure service continuity.