Multi-Implementation Patch Selection

When developing patches for code implementation tasks, language models can sometimes produce hallucinations or suboptimal solutions. This example demonstrates how to generate multiple patch implementations for a single task and use another LLM with chain-of-thought reasoning to select the best implementation.

Implementation

Below are the specific changes needed to implement multi-implementation patch selection:

Step 1: Add required imports

Location: libs/core/deep_next/core/steps/implement/develop_patch.py

# Add these imports to the top of the file
import random
from langchain_core.output_parsers import PydanticOutputParser
from pydantic import BaseModel, Field

Step 2: Create the evaluation model

Location: libs/core/deep_next/core/steps/implement/develop_patch.py

class PatchEvaluation(BaseModel):
    """Model for the LLM to evaluate and select the best patch implementation."""
    reasoning: str = Field(description="Detailed reasoning for the selection")
    selected_patch_index: int = Field(description="Index of the selected patch (0-based)")

Step 3: Implement the evaluator agent

Location: libs/core/deep_next/core/steps/implement/develop_patch.py

def _create_evaluator_agent():
    """Creates LLM agent for evaluating and selecting the best patch implementation."""
    evaluation_prompt_template = ChatPromptTemplate.from_messages(
        [
            (
                "system",
                """You are an expert software engineer tasked with evaluating multiple code
                implementation patches generated by LLMs and selecting the best one.

                Evaluate each implementation based on:
                1. Correctness - Does it correctly address the requirements?
                2. Completeness - Does it cover all aspects of the task?
                3. Code quality - Is the code well-structured, readable, and maintainable?
                4. Best practices - Does it follow Python conventions and best practices?
                5. Integration - Would it integrate well with the existing codebase?

                Provide thorough reasoning for your selection, comparing the strengths and
                weaknesses of each implementation.""",
            ),
            (
                "human",
                """I need you to evaluate {num_patches} different patch implementations for the same task
                and select the best one.

                Task information:
                - File path: {file_path}
                - High-level description: {high_level_description}
                - Detailed description: {description}
                - Issue statement: {issue_statement}

                Here are the patch implementations to evaluate:

                {patches}

                Compare these implementations and select the best one. Provide detailed reasoning
                for your selection, analyzing the strengths and weaknesses of each approach.

                Return the index (0-based) of the best implementation along with your reasoning.

                EXAMPLE OUTPUT:
                --------------------
                {example_patch_evaluation}
                --------------------
                """
            ),
        ]
    )

    parser = PydanticOutputParser(pydantic_object=PatchEvaluation)

    return evaluation_prompt_template | _create_llm() | parser

Step 4: Implement the multi-implementation function

Location: libs/core/deep_next/core/steps/implement/develop_patch.py

def develop_multiple_file_patches(
    step: Step,
    issue_statement: str,
    num_implementations: int = 3
) -> str:
    """Generate multiple patch implementations and select the best one.

    Args:
        step: The step containing file and description information
        issue_statement: The issue statement for context
        num_implementations: Number of implementations to generate (default: 3)

    Returns:
        The raw patch text of the best implementation
    """
    if not step.target_file.exists():
        logger.warning(f"Creating new file: '{step.target_file}'")

        with open(step.target_file, "w") as f:
            f.write("# Comment added at creation time to indicate empty file.\n")

    logger.info(f"Generating {num_implementations} patch implementations for {step.target_file}...")

    # Generate multiple implementations with different seeds
    raw_patches = []
    for i in range(num_implementations):
        seed = random.randint(1, 10000)  # Use different seeds for diversity
        logger.info(f"Generating implementation {i+1}/{num_implementations} with seed {seed}...")

        llm_agent = _create_llm_agent()
        raw_patch = llm_agent.invoke(
            {
                "path": step.target_file,
                "code_context": read_txt(step.target_file),
                "high_level_description": step.title,
                "description": step.description,
                "issue_statement": issue_statement,
            }
        )
        raw_patches.append(raw_patch)

    if num_implementations == 1:
        return raw_patches[0]

    # Format the patches for evaluation
    formatted_patches = []
    for i, patch in enumerate(raw_patches):
        formatted_patches.append(f"### Implementation {i}\n```\n{patch}\n```\n")

    # Evaluate and select the best implementation
    logger.info("Evaluating patch implementations to select the best one...")
    evaluator = _create_evaluator_agent()
    evaluation = evaluator.invoke(
        {
            "num_patches": num_implementations,
            "file_path": step.target_file,
            "high_level_description": step.title,
            "description": step.description,
            "issue_statement": issue_statement,
            "patches": "\n\n".join(formatted_patches),
            "example_patch_evaluation": example_patch_evaluation.model_dump_json(),
        }
    )

    selected_index = evaluation.selected_patch_index
    logger.info(
        f"Selected implementation {selected_index} as the best. Reasoning: {evaluation.reasoning}"  # noqa: E501
    )

    return raw_patches[selected_index]


example_patch_evaluation = PatchEvaluation(
    reasoning=("The first implementation is the best because it correctly "
                "implements the required functions and follows best "
                "practices."),
    selected_patch_index=0
    )

Step 5: Update the graph node

Location: libs/core/deep_next/core/steps/implement/graph.py

# Add the imports
from deep_next.core.steps.implement.develop_patch import  develop_multiple_file_patches

    # Update the code_development method
    @staticmethod
    @tenacity.retry(
        stop=tenacity.stop_after_attempt(5),
        retry=tenacity.retry_if_exception_type((ApplyPatchError, ParsePatchesError)),
        reraise=True,
    )
    def code_development(
        state: _State,
    ) -> _State:
        # Use develop_multiple_file_patches instead of develop_single_file_patches
        # to generate multiple implementations and select the best one
        raw_patches = develop_multiple_file_patches(
            step=state.selected_step,
            issue_statement=state.issue_statement,
            num_implementations=3  # Generate 3 implementations
        )

        patches: list[CodePatch] = parse_patches(raw_patches)
        patches = [patch for patch in patches if patch.before != patch.after]

        for patch in patches:
            apply_patch(patch)

        return state

Benefits

Improved Reliability: By generating multiple implementations and selecting the best one, we reduce the chance of using a patch with hallucinations or other flaws.
Quality Control: The evaluation step serves as a quality gate that analyzes each implementation based on multiple criteria.
Reduced Failure Rate: The code is less likely to have issues, reducing the number of retry attempts needed.
Cross-Validation: Having multiple implementations allows for cross-validation of approaches to identify the most robust solution.

Usage Example

The function is designed to be a drop-in replacement for the original develop_single_file_patches function:

from deep_next.core.steps.implement.develop_patch import develop_multiple_file_patches
from deep_next.core.steps.action_plan.data_model import Step
from pathlib import Path

step = Step(
    target_file=Path("./libs/core/tests/_resources/example_project/src/hello_world.py"),
    title="Add type hints",
    description="Add type hints to the add_integers function"
)

raw_patch = develop_multiple_file_patches(
    step=step,
    issue_statement="Add type hints to the add_integers function",
    num_implementations=3  # Generate 3 different implementations
)

print(raw_patch)

Back to Examples