Blueprint for tool-augmented LLMs: Learning from Salesforce’s APIGen approach

We explore how standard JSON, combined with function-call generation and verification, helps AI engineers build more reliable, verifiable, and varied function-calling datasets.

Introduction

Function-calling agents represent an exciting evolution in how large language models interact with the world. While LLMs excel at generating text, many real-world tasks require them to execute external tools—fetching live data, running computations, or accessing other specialized services. This capability extends their utility far beyond conversation alone.

However, fine-tuning an LLM for function-calling demands large-scale training data that isn’t just text-based but also demonstrates how to call these external tools. Existing datasets often remain static, leading to models that struggle when faced with new, unseen APIs.

APIGen is an automated pipeline for generating verifiable and diverse function-calling datasets. Proposed by Salesforce AI Research, APIGen aims to address the gap by producing high-quality, varied data for real-world API usage.

Its main contribution is a novel data generation pipeline that emphasizes:

  1. Verifiability via multi-stage checks,
  2. Scalability using structured, modular design, and
  3. Diversity of function calls, thanks to a unified data format that easily adapts to different types of APIs.

These innovations show how function-calling agents can be trained on data that’s both realistically complex and thoroughly verified, bridging the gap between a model’s theoretical ability and practical real-world performance.

Why a unified standard JSON format matters

Implementing function-calling agents in production systems brings its own set of challenges:

  • How do we ensure consistent interaction between LLMs and various tools?
  • How do we validate function calls before execution?
  • How do we scale our system as we add more tools?

The standardization of tool interactions through a unified standard JSON format offers several significant advantages:

  1. Structural verification: JSON provides a reliable framework for verifying that outputs contain all necessary fields. This verification process allows us to automatically identify and filter out malformed responses, ensuring data quality.
  2. Function call validation: The structured nature of JSON enables efficient validation of generated function calls, including parameter checking and argument verification. This helps prevent the execution of invalid or hallucinated functions, a common challenge when working with LLMs.
  3. Scalability and integration: As your system grows, you'll likely need to incorporate various tools - from simple Python functions to complex REST APIs. A standardized JSON format makes this integration seamless through format converters, eliminating the need to modify core components.
  4. Data quality assurance: The format checker ensures all generated data adheres to specified requirements, helping maintain consistency across your entire system.This is particularly important when working with LLMs, which may sometimes generate creative but invalid responses.
  5. Efficient data processing: A consistent JSON structure makes extracting and processing data more straightforward and cost-effective, enabling rapid iteration and improvement of your function-calling system.

Practical implementation with Pydantic

The example below demonstrates how to:

  • Implement the APIGen models with Pydantic
  • Validate function calls generated by LLMs against these specifications
  • Convert Python functions into standardized APIGen formats to demonstrate the scalability
  • Check function call validity against an API library

Let's start by defining the APIGen core models using Pydantic. These models will serve as the foundation for our JSON validation system. We'll create models for API parameters, API specifications, and function calls generated by LLMs:

# APIGen models with Pydantic
from pydantic import BaseModel, RootModel, Field,

class APIParameter(BaseModel):
    """Model for an API parameter in APIGen format."""
    type: str = Field(..., description="The data type of the parameter")
    description: str = Field(..., description="A brief description of the parameter")
    required: bool = Field(True, description="Whether the parameter is required")
    default: Optional[Any] = Field(None, description="Default value for the parameter")

class API(BaseModel):
    """Model for an API specification in APIGen format."""
    name: str = Field(..., description="The name of the API function")
    description: str = Field(..., description="Description of what the API function does")
    parameters: Dict[str, APIParameter] = Field(
        ...,
        description="Parameters of the API function where key is the parameter/argument name"
    )
    
class Tools(RootModel[List[API]]):
    """Model for a list of APIs in APIGen format."""
    def get_apis(self) -> List[API]:
        return self.root

class FunctionCall(BaseModel):
    name: str
    arguments: Dict[str, Any]

class FunctionCallList(RootModel[List[FunctionCall]]):
    """Model for a list of function calls in APIGen format."""
    
    def get_fn_calls(self) -> List[FunctionCall]:
        return self.root


To demonstrate this in action, we can use the original APIGen dataset from HuggingFace. The dataset contains queries, tools, and their corresponding function calls, and was used for fine-tuning the function-calling models presented in the paper.

from datasets import load_dataset, Dataset
from pprint import pprint

DATASET_NAME = "Salesforce/xlam-function-calling-60k"
apigen = load_dataset(DATASET_NAME, split="train", streaming=True).take(10)
apigen = Dataset.from_list([record for record in apigen])

pprint(apigen[2])


When we inspect a sample from the dataset, we can see the structured format of the tools and their corresponding function calls:

{'answers': '[{"name": "t3ma", "arguments": {"symbol": "ETH/BTC", "interval": '
            '"1h", "time_period": 14}}]',
 'id': 2,
 'query': "What is the T3MA for 'ETH/BTC' using a 1h interval and a time "
          'period of 14?',
 'tools': '[{"name": "t3ma", "description": "Fetches the Triple Exponential '
          'Moving Average (T3MA) for a given financial instrument.", '
          '"parameters": {"symbol": {"description": "Instrument symbol, which '
          'can be any equity, index, ETF, forex, or cryptocurrency (e.g., '
          '\'AAPL\', \'EUR/USD\', \'ETH/BTC\').", "type": "str", "default": '
          '"AAPL"}, "interval": {"description": "Interval between two '
          'consecutive points in the time series. Supported intervals include '
          "'1min', '5min', '15min', '30min', '45min', '1h', '2h', '4h', "
          '\'1day\', \'1week\', and \'1month\'.", "type": "str", "default": '
          '"1min"}, "format": {"description": "Format of the response data, '
          'either \'CSV\' or \'JSON\'. Default is \'json\'.", "type": "str, '
          'optional", "default": "json"}, "v_factor": {"description": "Volume '
          'factor used in the calculation of the T3MA.", "type": "int, '
          'optional", "default": 0.7}, "series_type": {"description": "Type of '
          "series to use in the calculation. Supported values are 'open', "
          '\'high\', \'low\', and \'close\'. Default is \'close\'.", "type": '
          '"str, optional", "default": "close"}, "outputsize": {"description": '
          '"Number of data points to return. Default is 30.", "type": "int, '
          'optional", "default": 30}, "time_period": {"description": "Number '
          'of periods over which to calculate the T3MA. Default is 9.", '
          '"type": "int, optional", "default": 9}}}, {"name": '
          '"stock_v2_get_profile", "description": "Retrieves the company '
          'profile information for a given performance ID using the RapidAPI '
          'Morning Star service.", "parameters": {"performanceid": '
          '{"description": "The performance ID of the stock, obtained from '
          'endpoints such as /auto-complete, /get-summary, or /get-movers.", '
          '"type": "str", "default": "0P0000OQN8"}}}]'}

Validating generated function calls

One of the key advantages of using a standard JSON format and Pydantic models, is that we can verify the structure of the generated function calls - e.g., that they contain all the necessary fields.

# LLM output
pprint(apigen[2]['answers'])

('[{"name": "t3ma", "arguments": {"symbol": "ETH/BTC", "interval": "1h", '
 '"time_period": 14}}]')


We now use FunctionCallList to validate the JSON string:

function_calls = FunctionCallList.model_validate_json(apigen[2]['answers'])
function_calls.get_fn_calls()

[FunctionCall(name='t3ma', arguments={'symbol': 'ETH/BTC', 'interval': '1h', 'time_period': 14})]


The validation process using Pydantic makes it straightforward to ensure all the generated calls match our expected format.

Another advantage is that we can check the validity of the function call against our API library.

The following function checks multiple aspects of a function call:

  1. Existence of the function in our API library
  2. Correct argument types
  3. Presence of required parameters.

This comprehensive validation ensures the generated function calls are not just well-formatted, but also practically valid.

def check_function_call_validity(function_call: str, api_library: List[API]) -> bool:
    """
    Checks if the JSON-encoded function call is valid against a given API library.
    This includes checking whether the function exists, if the arguments match the API specification,
    and if the argument types are correct.
    
    Args:
        function_call: JSON string containing the function call
        api_library: List of API specifications as API objects
    """
    try:
        # Validate function call format using Pydantic
        call = FunctionCall.model_validate_json(function_call)
        
        # Find matching API in library
        matching_api = next((api for api in api_library if api.name == call.name), None)
        if not matching_api:
            print(f"Function '{call.name}' not found in API library.")
            return False
            
        # Check all required parameters are present and validate types
        for param_name, param_spec in matching_api.parameters.items():
            if param_spec.required and param_name not in call.arguments:
                print(f"Required argument '{param_name}' missing for function '{call.name}'.")
                return False
            
            if param_name in call.arguments:
                arg_value = call.arguments[param_name]
                # Basic type checking
                python_type = {
                    "string": str,
                    "str": str,
                    "integer": int,
                    "int": int,
                    "number": (int, float),
                    "float": float,
                    "boolean": bool,
                    "bool": bool,
                    "array": list,
                    "object": dict
                }.get(param_spec.type.lower())
                
                if python_type and not isinstance(arg_value, python_type):
                    print(f"Type mismatch for argument '{param_name}': expected {param_spec.type}, got {type(arg_value).__name__}")
                    return False
                
        # Check no unexpected arguments are present
        for arg_name in call.arguments:
            if arg_name not in matching_api.parameters:
                print(f"Unexpected argument '{arg_name}' for function '{call.name}'.")
                return False
        
        print("Function call is valid.")
        return True
        
    except ValidationError as e:
        print(f"Pydantic validation error: {e}")
        return False


Let's create a simple API library and test the function in various scenarios.

We'll start with a simple weather API example:

weather_api = API(
    name="weather_api.get_current_weather",
    description="Retrieves the current weather conditions for a specified location.",
    parameters={
        "location": APIParameter(
            type="string",
            description="The location to get weather for",
            required=True
        ),
        "units": APIParameter(
            type="string",
            description="Temperature units (Celsius/Fahrenheit)",
            required=False,
            default="Celsius"
        )
    }
)

api_library = [weather_api]


Now, we can test the validator with different types of function calls, valid and invalid ones:

# Test valid function call
function_call = '{"name": "weather_api.get_current_weather", "arguments": {"location": "Palo Alto", "units": "Celsius"}}'
is_valid_call = check_function_call_validity(function_call, api_library)
print(f"Is valid function call: {is_valid_call}\n")

Function call is valid.
Is valid function call: True

# Test invalid function call (wrong function name)
function_call = '{"name": "weather_api.get_current_temperature", "arguments": {"location": "Palo Alto", "units": "Celsius"}}'
is_valid_call = check_function_call_validity(function_call, api_library)
print(f"Is valid function call: {is_valid_call}\n")

Function 'weather_api.get_current_temperature' not found in API library.
Is valid function call: False

# Test invalid function call (wrong type)
function_call = '{"name": "weather_api.get_current_weather", "arguments": {"location": ["Palo Alto"], "units": "Celsius"}}'
is_valid_call = check_function_call_validity(function_call, api_library)
print(f"Is valid function call: {is_valid_call}\n")

Type mismatch for argument 'location': expected string, got list
Is valid function call: False

Scalability

One of the most powerful aspects of this standardized approach is its extensibility.

The standard JSON format makes the incorporation of tools from different sources scalable by developing converters that adapt them into basic JSON elements, without modifying the core components. This is key since as our agents evolve and grow in complexity, we will use a combination of python functions, REST APIs, etc. as tools.

Let's look at how we can automatically convert Python functions into our API format. This converter will allow us to seamlessly integrate existing Python functions into our tool ecosystem:

from inspect import signature, Parameter
import docstring_parser
from typing import Callable, Dict, Any

def python_fn_to_api(func: Callable) -> API:
    """
    Converts a Python function into an API specification.
    
    Args:
        func: Python function to convert
    
    Returns:
        API: API specification object
    """
    # Get function signature
    sig = signature(func)
    
    # Parse docstring
    parsed_doc = docstring_parser.parse(func.__doc__ or "")
    
    # Map Python types to API types
    type_mapping = {
        str: "string",
        int: "integer",
        float: "number",
        bool: "boolean",
        list: "array",
        dict: "object",
        List: "array",
        Dict: "object"
    }
    
    # Convert parameters
    parameters: Dict[str, APIParameter] = {}
    for name, param in sig.parameters.items():
        # Get type annotation or default to Any
        param_type = param.annotation if param.annotation != Parameter.empty else Any
        
        # Find parameter description in docstring
        param_desc = next(
            (param.description for param in parsed_doc.params if param.arg_name == name),
            f"Parameter: {name}"  # Default description if none found
        )
        
        # Determine if parameter is required
        is_required = param.default == Parameter.empty
        default_value = None if is_required else param.default
        
        # Get API type from mapping
        api_type = type_mapping.get(param_type, "string")
        
        # Create APIParameter
        parameters[name] = APIParameter(
            type=api_type,
            description=param_desc,
            required=is_required,
            default=default_value
        )
    
    # Create API specification
    return API(
        name=f"{func.__name__}",
        description=parsed_doc.short_description or "No description available",
        parameters=parameters
    )


To demonstrate the practical application of our converter, we can create a weather analysis function, convert it to our standardized format and add it to our API library:

def get_forecast_score(temperature: float, humidity: float, wind_speed: float, units: str = "Celsius") -> Dict[str, Any]:
    """
    Calculates a weather score and provides activity recommendations based on weather conditions.
    
    Args:
        temperature: Current temperature value
        humidity: Humidity percentage (0-100)
        wind_speed: Wind speed in km/h or mph
        units: Temperature units, either 'Celsius' or 'Fahrenheit' (default: 'Celsius')
        
    Returns:
        Dict containing weather score, comfort level, and recommendations
    """
    # Normalize temperature to Celsius if needed
    if units.lower() == "fahrenheit":
        temperature = (temperature - 32) * 5/9
    
    # Calculate comfort score (0-100)
    # Optimal conditions: temp=21°C, humidity=45%, wind=10km/h
    temp_score = max(0, 100 - abs(temperature - 21) * 3)
    humidity_score = max(0, 100 - abs(humidity - 45) * 1.5)
    wind_score = max(0, 100 - abs(wind_speed - 10) * 2)
    
    # Weighted average for final score
    weather_score = (temp_score * 0.5 + humidity_score * 0.3 + wind_score * 0.2)
    
    # Determine comfort level
    if weather_score >= 80:
        comfort = "Excellent"
    elif weather_score >= 60:
        comfort = "Good"
    elif weather_score >= 40:
        comfort = "Fair"
    else:
        comfort = "Poor"
    
    # Generate recommendations
    recommendations = []
    if temperature > 25:
        recommendations.append("Stay hydrated")
        if humidity > 70:
            recommendations.append("Avoid strenuous outdoor activities")
    elif temperature < 10:
        recommendations.append("Dress warmly")
    
    if wind_speed > 20:
        recommendations.append("Be cautious of strong winds")
    
    if humidity > 80:
        recommendations.append("High humidity - indoor activities recommended")
    
    return {
        "weather_score": round(weather_score, 1),
        "comfort_level": comfort,
        "recommendations": recommendations,
        "conditions": {
            "temperature": f"{temperature:.1f}°C",
            "humidity": f"{humidity}%",
            "wind_speed": f"{wind_speed} km/h"
        }
    }

api_spec = python_to_api(get_forecast_score)
pprint(api_spec)

API(name='get_forecast_score', description='Calculates a weather score and provides activity recommendations based on weather conditions.', parameters={'temperature': APIParameter(type='number', description='Current temperature value', required=True, default=None), 'humidity': APIParameter(type='number', description='Humidity percentage (0-100)', required=True, default=None), 'wind_speed': APIParameter(type='number', description='Wind speed in km/h or mph', required=True, default=None), 'units': APIParameter(type='string', description="Temperature units, either 'Celsius' or 'Fahrenheit' (default: 'Celsius')", required=False, default='Celsius')})


We can now easily add the new tool:

api_library.append(api_spec)
pprint(api_library)

[API(name='weather_api.get_current_weather', description='Retrieves the current weather conditions for a specified location.', parameters={'location': APIParameter(type='string', description='The location to get weather for', required=True, default=None), 'units': APIParameter(type='string', description='Temperature units (Celsius/Fahrenheit)', required=False, default='Celsius')}),
 API(name='get_forecast_score', description='Calculates a weather score and provides activity recommendations based on weather conditions.', parameters={'temperature': APIParameter(type='number', description='Current temperature value', required=True, default=None), 'humidity': APIParameter(type='number', description='Humidity percentage (0-100)', required=True, default=None), 'wind_speed': APIParameter(type='number', description='Wind speed in km/h or mph', required=True, default=None), 'units': APIParameter(type='string', description="Temperature units, either 'Celsius' or 'Fahrenheit' (default: 'Celsius')", required=False, default='Celsius')})]

Conclusion

The standardized JSON format provides a robust foundation for building scalable tool-augmented LLMs but also custom data generation pipelines. By implementing proper validation and conversion utilities, you can create a system that's both powerful and maintainable.

This approach becomes particularly valuable as your system grows and incorporates more tools. Whether you're building a simple chatbot or a complex AI system, having a standardized format for tool interactions will save you time and prevent potential issues down the line.

February 11, 2025

Bernardo García del Río