About Langfuse

Langfuse is an open-source LLM engineering platform that helps teams collaboratively debug, analyze, and iterate on their LLM applications.

Use Atla With Langfuse

Evaluate your LLM application and track evaluations in your Langfuse observability pipeline.

1

Install Atla and Langfuse

pip install atla langfuse
2

Set up authentication

export ATLA_API_KEY=pk-...
export LANGFUSE_PUBLIC_KEY=pk-lf-...
export LANGFUSE_SECRET_KEY=sk-lf-...
export LANGFUSE_HOST=https://cloud.langfuse.com  # or https://us.cloud.langfuse.com for US data region
3

Create clients

from atla import Atla
from langfuse import Langfuse

client = Atla()
langfuse = Langfuse()

Score Individual Traces

Add Atla evaluation scores and critiques to your Langfuse traces to track the quality of your LLM interactions. Each evaluation will include both a numerical score and a detailed critique explaining the rating.

from langfuse.decorators import langfuse_context, observe

# Sample data
sample_inputs = {
    "question": "What is water purification?",
    "context": "[2] Water purification in treatment plants typically involves several stages: coagulation, sedimentation, filtration, and disinfection. Coagulation adds chemicals that bind with impurities, forming larger particles. Sedimentation allows these particles to settle out of the water. Filtration removes smaller particles by passing water through sand, gravel, or charcoal. Finally, disinfection kills any remaining microorganisms, often using chlorine or ultraviolet light.",
}

# Generate a trace
@observe(as_type="generation")
def mock_generate(inputs: dict[str, str]) -> dict[str, str]:
    sample_output = "Water is purified through filtration and chemical treatment."
    return {
        "response": sample_output,
        "trace_id": langfuse_context.get_current_trace_id(),
    }

sample_output = mock_generate(sample_inputs)

# Evaluate with Atla
evaluation = client.evaluation.create(
    model_id="atla-selene",
    model_input=sample_inputs["question"],
    model_output=sample_output["response"],
    model_context=sample_inputs["context"],
    metric_name="atla_default_faithfulness",
).result.evaluation

# Log evaluation results to Langfuse
langfuse.score(
    name="faithfulness",
    value=int(evaluation.score),
    comment=evaluation.critique,
    trace_id=sample_output["trace_id"],
)

Score Batches

Process batches in parallel to efficiently evaluate your Langfuse traces.

import asyncio
from atla import AsyncAtla

async_client = AsyncAtla()

async def evaluate_traces():
    observations = langfuse.fetch_observations(name="mock_generate").data
    tasks = []

    # Evaluate each trace asynchronously
    for observation in observations:
        tasks.append(async_client.evaluation.create(
            model_id="atla-selene",
            model_input=observation.input["args"][0]["question"],
            model_output=observation.output["response"],
            model_context=observation.input["args"][0]["context"],
            metric_name="atla_default_faithfulness"
        ))

    evaluations = await asyncio.gather(*tasks)

    # Log evaluation results to Langfuse
    for observation, evaluation in zip(observations, evaluations):
        langfuse.score(
            name="faithfulness",
            value=int(evaluation.result.evaluation.score),
            comment=evaluation.result.evaluation.critique,
            trace_id=observation.output["trace_id"],
        )

await evaluate_traces()

Monitor your LLM application’s performance over time in the Langfuse UI.