> For the complete documentation index, see [llms.txt](/llms.txt). Every page on this site is also available as markdown at `<path>.md`.

# Flow Judge — The language model for evaluations

Flow Judge v0.1 is an open-source, lightweight (3.8B parameter) language model optimized for LLM system evaluations. It is crafted for accuracy, speed, and customization.

## A holistic approach to LLM evaluation

Most LLM-as-a-judge solutions rely on prompting massive general-purpose models. Flow Judge is purpose-built and small enough to run locally, while delivering evaluation quality competitive with much larger commercial judges.

## Customizable for diverse evaluations

Define evaluation criteria and rubrics in natural language. Flow Judge supports custom score scales, multi-criteria evaluations, and domain-specific judging across HR, biomedical, finance, and other verticals.

## Smaller and faster than the leaders

At 3.8B parameters, Flow Judge runs on a single consumer GPU. It delivers substantially lower latency and infrastructure cost compared with judging through frontier APIs while keeping evaluations reproducible and inspectable.

## Accuracy comparable to commercial judges

On internal and public benchmarks Flow Judge achieves accuracy comparable to gpt-4o-class judges and ahead of similar-sized open models. See the technical report for full benchmark results.

## Resources

- [Technical report (blog)](https://flow-ai.com/blog/flow-judge)
- [Hugging Face collection](https://huggingface.co/collections/flowaicom/flow-judge-v01)
- [GitHub repository](https://github.com/flowaicom/flow-judge)
- [Apache 2.0 License](https://github.com/flowaicom/flow-judge/blob/main/LICENSE)

## License

Flow Judge is released under the Apache 2.0 license. Use, modification, and redistribution are permitted with attribution.