Hey everyone, Alex here. Welcome back to Coding with Alex on sysseder.com.
If you've been following the global race for AI sovereignty, you probably saw the recent headlines: the city of Rio de Janeiro proudly announced its own "homegrown" Large Language Model (LLM), tailored specifically for municipal administration. It sounded like a massive win for localized, state-funded AI development. But within days, the open-source community did what it does best—they looked under the hood. The verdict? Rio’s model isn't a ground-up breakthrough; it appears to be a merge of existing open-source models.
While the mainstream media is framing this as a political scandal or a "copy-paste" job, we developers need to look at this through a engineering lens. The truth is, model merging is one of the most exciting, cost-effective, and powerful techniques in modern machine learning. It’s not "cheating"—it’s pragmatic software engineering. However, how you license, attribute, and market these merges is where things get tricky.
Today, we're going to dive deep into the mechanics of LLM merging. We’ll look at why developers are doing it, the math behind how it works, how you can merge your own models using tools like Mergekit, and the architectural implications of this "Frankenstein" approach to AI.
The Anatomy of an LLM Merge: Why Train When You Can Blend?
To understand why Rio's engineers (or any developer) would merge models, we have to look at the economics of AI. Training an LLM from scratch is catastrophically expensive. You need millions of dollars, thousands of H100 GPUs, massive datasets, and weeks of training time. Fine-tuning is cheaper, but it can still lead to "catastrophic forgetting"—where the model gains a new skill (like Portuguese administrative law) but loses its general reasoning or coding capabilities.
Enter Model Merging. This is a technique that combines two or more pre-trained LLMs into a single model without any additional training or GPU-heavy compute. It takes minutes on a consumer-grade CPU/GPU, costs essentially zero dollars, and frequently produces a model that outperforms both of its parents.
Imagine merging a model that is incredibly good at writing clean Python code with another model that excels at conversational Portuguese. Instead of spending $50k fine-tuning a base Llama model on both datasets, you can merge them mathematically in under ten minutes.
How the Math Works (In Plain English)
At its core, an LLM is a massive collection of weights (floating-point numbers). When we merge models, we are executing algebraic operations on these weight tensors. There are several popular algorithms for doing this:
- Linear Averaging: The simplest method. You literally take the weighted average of the parameters of Model A and Model B. While simple, this often degrades performance because neural network parameters are highly non-linear.
- SLERP (Spherical Linear Interpolation): Instead of interpolating along a straight line, SLERP interpolates along a spherical arc. This preserves the geometric characteristics of high-dimensional vector spaces, preventing the merged model from losing its mind.
- DARE (Drop And REscale): This method zeroes out small differences between the models (dropping fine-tuning noise) and rescales the remaining weights to keep the model's overall activation scale stable.
- Frankenmerging: This involves stacking layers from different models sequentially (e.g., layers 1-16 from Model A, and layers 17-32 from Model B). It’s highly experimental but can result in bizarrely capable models.
Hands-On: How to Merge Your Own LLMs with Mergekit
If you want to understand how Rio's model was likely built, the best way is to build one yourself. The gold standard tool for this in the open-source community is mergekit. It’s a Python-based utility that lets you define merges using a simple YAML configuration file.
Let's walk through a practical example. We will use the SLERP method to merge two popular open-source 8B models: one optimized for roleplay/conversation, and one optimized for instruct tasks.
Step 1: Install Mergekit
First, we need to set up our environment. Mergekit can run on CPU, but if you have a GPU with CUDA support, it will run much faster.
# Create a virtual environment and activate it
python3 -m venv merge-env
source merge-env/bin/activate
# Install mergekit directly from source for the latest features
pip install git+https://github.com/arcee-ai/mergekit.git
Step 2: Define the Merge Configuration
The magic of Mergekit lies in its YAML configuration. Create a file named config.yml. In this configuration, we are going to use the SLERP method to merge two variations of Mistral/Llama models. We'll define a base model, the two models we want to merge, and how to interpolate their tensors.
slices:
- sources:
- model: cognitivecomputations/dolphin-2.9-llama3-8b
layer_range: [0, 32]
- model: meta-llama/Meta-Llama-3-8B-Instruct
layer_range: [0, 32]
merge_method: slerp
base_model: meta-llama/Meta-Llama-3-8B
parameters:
t:
- filter: self_attn
value: [0, 0.5, 0.3, 0.7, 1]
- filter: mlp
value: [1, 0.5, 0.7, 0.3, 0]
- value: 0.5
dtype: bfloat16
Let's break down what's happening in this config:
slices: We are taking all 32 layers of both models.merge_method: slerp: We are using Spherical Linear Interpolation to keep the vector space geometry intact.parameters.t: This controls the interpolation gradient. We can apply different weights (value) to different parts of the neural network (like the self-attention mechanism vs. the multi-layer perceptron).dtype: bfloat16: We save the merged weights in Brain Floating Point 16-bit format to preserve accuracy while keeping file sizes manageable.
Step 3: Run the Merge
Now, run the merge command. Mergekit will download the models from Hugging Face (you may need to log in via huggingface-cli login if you are using gated models like Llama), load them into memory, perform the tensor math, and output the new model to your local disk.
# Run the merge using CPU or GPU (use --device cuda if you have a GPU)
mergekit-yaml config.yml ./my-merged-llm --device cpu
Just like that, you have a custom, hybrid LLM. You didn't spend a dime on renting cluster time, yet you have a model that inherits the specialized behavior of both parent models.
The Technical and Ethical Dilemma of the "Homegrown" Claim
Now that we know how easy and powerful merging is, let’s go back to the Rio de Janeiro story. Why did the community react so negatively? If merging is a legitimate engineering pattern, why the uproar?
The issue boils down to two things: attribution and transparency.
When an organization—especially a government entity—claims to have developed a "homegrown, state-of-the-art AI," it implies that they did the heavy lifting of dataset curation, pre-training, or at least extensive reinforcement learning (RLHF). Claiming a merged model is "completely homegrown" is the software equivalent of taking an upstream Linux distribution (like Debian), changing the wallpaper, installing a custom package manager, and claiming you wrote a brand-new operating system from scratch.
The Architecture of Attribution: How Merges are Detected
You might be wondering: how did the open-source community find out so quickly? LLMs aren't black boxes to the developers who build them. There are highly effective forensic methods to identify merged models:
- Weight Fingerprinting: Every model has a unique distribution of weights. If you merge Model A and Model B, the resulting model will have statistical correlation coefficients with both parents that are impossible to achieve via independent training.
- Tokenizer Commonalities: Tokenizers are the translators that turn text into numbers. If your "homegrown" Portuguese model uses the exact, highly customized tokenizer vocabulary of a proprietary or specific open-source English model (complete with specific formatting tokens), the jig is up.
- Hallucination Signatures: LLMs tend to hallucinate in highly specific, reproducible ways based on their training data. If your new model reproduces the exact niche errors and biases of an existing model, it’s a dead giveaway.
The Verdict: Embrace the Merge, but Be Transparent
As software engineers, our job is to solve problems efficiently. If you can deliver a highly functional, specialized AI system to your users by merging two open-source models for $0 instead of fine-tuning for $10,000, you should absolutely do it. It is smart, resource-efficient engineering.
But as members of the open-source ecosystem, we have to play by the rules of respect and open collaboration. If you merge models, put it in your model card on Hugging Face. Document your config.yml. Attribute the creators of the parent models. Give credit to the research teams who spent millions of dollars training the base architectures that allowed you to build your "franken-model."
Rio de Janeiro’s engineering team likely built a highly effective tool for their city. Their mistake wasn't the technology they chose—it was the marketing narrative they wrapped it in. Let's learn from this: build smart, leverage the power of tensor math, and always, always credit your upstream sources.
Have you experimented with model merging or Mergekit? What are your thoughts on the ethics of rebranding merged models? Let’s chat in the comments below!
Until next time, keep coding.
— Alex