Before the inception of Flow AI, a small team of AI engineers faced a significant problem: how to effectively evaluate the outputs of their LLM-powered product. Understanding how the outputs changed across multiple model and system updates was proving difficult.
Initially, the team relied on manual evaluations, scoring each output based on a set criteria.
However, this approach quickly revealed major drawbacks:
- Scalability: Manual evaluation took too much time and resources, which slowed down the iteration cycles.
- Subjectivity: Human evaluators often gave inconsistent scores, which led to potential bias.
To find a better solution, the team thought about using LLMs to evaluate their own system.
They explored various tools on the market, but none met their specific needs. The available options were too generic and the evaluators lacked meta-evaluation, which resulted in misalignments between human and LM evaluations.
The team also wanted to switch to from closed-source models to custom open models to address privacy concerns and achieve cheaper, faster inference. However, finding a model as powerful as GPT-4 proved to be challenging. Fine-tuning a model would have required a significant investment of resources — half of the team's capacity for six months. A huge risk for a startup.
Exploring experimental community LMs revealed more problems. Many models lacked clear lineage history and transparency about previous changes.
In response to these challenges, the team came up with a solution that combined their needs of automated evaluation and specialized model development. This solution was named Flow AI.
Flow AI
At Flow AI, our mission is to empower modern AI teams with advanced tools for evaluating generative AI products across various domains and use cases. Our approach offers a controllable, transparent, and cost-effective LM-as-a-judge alternative to labor-intensive human evaluations and proprietary model-based evaluations.
Additionally, we aim to revolutionize the development of generative products by promoting the use of smaller, specialized LMs over large, general-purpose proprietary models. We know that the current process of selecting and refining these smaller models is complex and time-consuming, which is a barrier to widespread adoption.
We seek to remove these barriers by automating the selection and enhancement of specialized models. We use rapid, cost-effective, and aligned evaluation techniques, along with model merging for developing new LMs. This makes open LMs more accessible for companies with limited engineering resources and budgets.
Flow AI was born from a blend of necessity and ingenuity. We are here to redefine the standards of AI evaluation and model development, paving the way for a new era of generative AI.