Introduction to Model Merging

How can we make LLMs more versatile without extensive training? Model merging allows us to combine the strengths of multiple AI models into a single, more powerful model.

In this blog post, we delve into an exciting discussion between Bernardo, our AI Lead, and Aaro, our CEO & co-founder, on the emerging technique of model merging in the realm of large language models (LLMs). This comprehensive overview will cover what model merging is, its advantages, applications, methods, and real-world examples.

Understanding Model Merging

What is Model Merging?

Model merging is a technique where different models are combined into a single one, retaining the abilities of each model involved in the merge. Unlike model ensembling, which combines the outputs of different models to achieve a more accurate prediction, model merging integrates the weights of various models into a single architecture. This integration allows a single model to perform multiple tasks without needing to maintain and deploy several models.

Differences Between Model Merging and Model Ensembling

In model ensembling, multiple models are kept, and their outputs are combined to synthesize a response. This approach requires maintaining all models involved, which increases complexity and resource usage. On the other hand, model merging creates one unified model by merging the weights of different models, which simplifies deployment and reduces the resources needed for inference.

Applications of Model Merging

Multitask Models

One of the primary use cases of model merging is converting single-task models into multitask models. For instance, merging a model that excels at question-answering with another that excels at summarization results in a single model proficient at both tasks. This process does not require additional data or compute resources, making it a cost-effective solution.

Domain Adaptation

Model merging is also beneficial for domain adaptation. A general-purpose model can be merged with a domain-specific model to enhance its performance in specialized tasks. Conversely, a domain-specific model can be endowed with general capabilities, such as chat functions, by merging it with a model fine-tuned for such purposes. An example of this is the BioMistral model, where the Mistral 7B model was adapted to the biomedical domain and merged with a chat-capable model, resulting in improved performance on medical benchmarks.

For more on domain adaptation and similar strategies, check out our article on open LLMs.

Resource Efficiency

Merging models can significantly reduce the resources required for developing better models. By leveraging existing models and merging their capabilities, researchers can create enhanced models without the need for extensive computational power or large datasets. This approach reutilizes previous work, similar to transfer learning, and can be done with minimal hardware, such as a CPU with 8GB of RAM.

Techniques for Model Merging

Linear Method or Model Soups

The linear method, also known as model soups, involves averaging the weights of different models. This can be a simple average or a weighted average. This technique was initially used to accelerate training by merging different checkpoints of a model.

Spherical Linear Interpolation (SLERP)

SLERP is used to merge two models by creating a smooth transition between their weights. This method works well for merging two models and provides a seamless integration of their capabilities.

Task Vectors

Task vectors represent the difference in weights between a fine-tuned model and its pre-trained version. By computing these vectors and performing arithmetic operations on them, new models can be created that combine the abilities of the original models. This method can address potential interferences in model parameters through techniques like TIES-merging and DARE.

Layer Stacking or Frankenmerging

Layer stacking involves concatenating layers from different models to create a new architecture. This experimental method can result in models with unconventional sizes, such as 160 billion parameters, and has shown promising results in creating powerful models like Goliath-120B.

Evolutionary Model Merging

A novel approach by Sakana AI applies evolutionary algorithms to model merging. This technique explores hundreds of combinations of layers and merging methods, automating the development of foundational models and potentially leading to significant advancements in model performance.

Real-World Examples and Future Directions

Successful Model Merges

The AI community has seen several successful implementations of model merging. The 120B Llama model, created by Maxime Labonne, combined layers from different Llama models to produce a highly powerful model. Additionally, many merged models have topped the open LLM leaderboard, demonstrating their effectiveness. For a comprehensive view of how LLM systems are evaluated, refer to our previous posts.

Limitations and Future Potential

Despite its advantages, model merging has limitations. Finding the right models for specific tasks can be challenging, and not all tasks have readily available models to merge. However, as the number of specialized models increases, model merging is expected to become more prevalent and powerful.

flow-merge: An Open-Source Library

To facilitate model merging, our team has developed an open-source library called flow-merge. This Python library supports various popular merging methods and is open to community contributions. It aims to make model merging accessible and straightforward for researchers and developers.

Conclusion

Model merging presents a promising frontier in the development of large language models. By combining the strengths of different models into a single architecture, it offers a resource-efficient way to create versatile and powerful models. As research in this area progresses, we can expect even more innovative and impactful applications of this technique in the AI community.

June 11, 2024

Bernardo García del Río

Aaro Isosaari