Skip to content

Agent Walkthrough

File: examples/agent_demo.py

This example demonstrates using TRex to manage Iceberg tables via natural language commands.

Full Source

"""TRex agent demo.

Demonstrates using TRex, the DRLS AI agent, to manage Iceberg tables
via natural language commands. Works with any LLM provider (cloud
or local via Ollama/vLLM).

Run with Ollama (air-gap):
    pip install drls[agent]
    ollama serve &
    ollama pull llama3:8b
    python examples/agent_demo.py --provider ollama --model llama3:8b

Run with OpenAI:
    export OPENAI_API_KEY=sk-...
    python examples/agent_demo.py --provider openai --model gpt-4o

Run with Anthropic:
    export ANTHROPIC_API_KEY=sk-ant-...
    python examples/agent_demo.py --provider anthropic --model claude-sonnet-4-20250514
"""
import argparse
import json
import ray
import drls
from drls.iceberg import compact_table, get_table_health

ray.init()

spark = drls.init_spark(
    app_name="agent-demo",
    num_executors=1,
    executor_cores=1,
    executor_memory="512M",
    iceberg_catalog="hadoop",
    iceberg_warehouse="/tmp/drls-warehouse",
)

# Setup: create a table with some data
spark.sql("CREATE DATABASE IF NOT EXISTS drls.agent_demo")
spark.sql("DROP TABLE IF EXISTS drls.agent_demo.events")
spark.sql("""
    CREATE TABLE drls.agent_demo.events (
        id BIGINT,
        event_type STRING,
        payload STRING
    ) USING iceberg
""")

# Insert some data in multiple small batches to create fragmentation
for i in range(5):
    spark.sql(f"""
        INSERT INTO drls.agent_demo.events VALUES
        ({i*10 + 1}, 'click', '{{"page": "home"}}'),
        ({i*10 + 2}, 'view', '{{"page": "product"}}'),
        ({i*10 + 3}, 'purchase', '{{"amount": {i * 10 + 5}}}')
    """)

# Check health manually first
print("--- Manual Health Check ---")
health = get_table_health(spark, "drls.agent_demo.events")
print(json.dumps(health, indent=2))

# Parse arguments for LLM provider
parser = argparse.ArgumentParser()
parser.add_argument("--provider", default="ollama", help="LLM provider")
parser.add_argument("--model", default="llama3:8b", help="Model name")
parser.add_argument("--api-base", default=None, help="API base URL")
args = parser.parse_args()

print(f"\n--- Agent Demo (provider={args.provider}, model={args.model}) ---")

try:
    from drls.agentic import TRex

    agent = TRex(
        spark,
        provider=args.provider,
        model=args.model,
        api_base=args.api_base,
    )

    # Ask the agent about table health
    prompts = [
        "How healthy is the events table in the agent_demo database?",
        "How many tables are in the catalog?",
        "What's the schema of agent_demo.events?",
    ]

    for prompt in prompts:
        print(f"\nUser: {prompt}")
        result = agent.run(prompt)
        if result.get("success"):
            print(f"TRex: {result['response']}")
            if result.get("tool_calls"):
                print(f"  (used {len(result['tool_calls'])} tool calls)")
        else:
            print(f"Error: {result.get('error')}")

except ImportError:
    print("Agent dependencies not installed. Run: pip install drls[agent]")
except Exception as e:
    print(f"Agent error (expected if no LLM available): {e}")

# Cleanup
drls.stop_spark()
ray.shutdown()
print("\nAgent demo complete!")

Step-by-Step

Setup Test Data

# Create a table with fragmented data
for i in range(5):
    spark.sql(f"""
        INSERT INTO drls.agent_demo.events VALUES
        ({i*10 + 1}, 'click', '{{"page": "home"}}'),
        ...
    """)

Multiple small inserts create fragmentation — exactly the kind of problem the agent can diagnose and fix.

Manual Health Check

from drls.iceberg import get_table_health

health = get_table_health(spark, "drls.agent_demo.events")

This is the programmatic equivalent of what the agent does internally when you ask about table health.

Initialize the Agent

from drls.agentic import TRex

agent = TRex(
    spark,
    provider=args.provider,  # ollama, openai, anthropic, etc.
    model=args.model,
    api_base=args.api_base,
)

The agent wraps any LLM provider via litellm and equips it with 13 lakehouse tools.

Run Prompts

prompts = [
    "How healthy is the events table in the agent_demo database?",
    "How many tables are in the catalog?",
    "What's the schema of agent_demo.events?",
]

for prompt in prompts:
    result = agent.run(prompt)
    print(f"Agent: {result['response']}")
    if result.get("tool_calls"):
        print(f"  (used {len(result['tool_calls'])} tool calls)")

The agent:

  1. Receives the prompt + 13 tool definitions
  2. Asks the LLM which tools to call
  3. Executes the tool calls against Spark/Iceberg
  4. Returns the LLM's final response

Running the Example

# Local with Ollama
ollama serve &
ollama pull llama3:8b
python examples/agent_demo.py --provider ollama --model llama3:8b

# With OpenAI
OPENAI_API_KEY=sk-... python examples/agent_demo.py --provider openai --model gpt-4o

# With Anthropic
ANTHROPIC_API_KEY=sk-ant-... python examples/agent_demo.py --provider anthropic --model claude-sonnet-4-20250514