Multi-Implementation Patch Selection
When developing patches for code implementation tasks, language models can sometimes produce hallucinations or suboptimal solutions. This example demonstrates how to generate multiple patch implementations for a single task and use another LLM with chain-of-thought reasoning to select the best implementation.
Implementation
Below are the specific changes needed to implement multi-implementation patch selection:
Step 1: Add required imports
Location: libs/core/deep_next/core/steps/implement/develop_patch.py
# Add these imports to the top of the file
import random
from langchain_core.output_parsers import PydanticOutputParser
from pydantic import BaseModel, Field
Step 2: Create the evaluation model
Location: libs/core/deep_next/core/steps/implement/develop_patch.py
class PatchEvaluation(BaseModel):
"""Model for the LLM to evaluate and select the best patch implementation."""
reasoning: str = Field(description="Detailed reasoning for the selection")
selected_patch_index: int = Field(description="Index of the selected patch (0-based)")
Step 3: Implement the evaluator agent
Location: libs/core/deep_next/core/steps/implement/develop_patch.py
def _create_evaluator_agent():
"""Creates LLM agent for evaluating and selecting the best patch implementation."""
evaluation_prompt_template = ChatPromptTemplate.from_messages(
[
(
"system",
"""You are an expert software engineer tasked with evaluating multiple code
implementation patches generated by LLMs and selecting the best one.
Evaluate each implementation based on:
1. Correctness - Does it correctly address the requirements?
2. Completeness - Does it cover all aspects of the task?
3. Code quality - Is the code well-structured, readable, and maintainable?
4. Best practices - Does it follow Python conventions and best practices?
5. Integration - Would it integrate well with the existing codebase?
Provide thorough reasoning for your selection, comparing the strengths and
weaknesses of each implementation.""",
),
(
"human",
"""I need you to evaluate {num_patches} different patch implementations for the same task
and select the best one.
Task information:
- File path: {file_path}
- High-level description: {high_level_description}
- Detailed description: {description}
- Issue statement: {issue_statement}
Here are the patch implementations to evaluate:
{patches}
Compare these implementations and select the best one. Provide detailed reasoning
for your selection, analyzing the strengths and weaknesses of each approach.
Return the index (0-based) of the best implementation along with your reasoning.
EXAMPLE OUTPUT:
--------------------
{example_patch_evaluation}
--------------------
"""
),
]
)
parser = PydanticOutputParser(pydantic_object=PatchEvaluation)
return evaluation_prompt_template | _create_llm() | parser
Step 4: Implement the multi-implementation function
Location: libs/core/deep_next/core/steps/implement/develop_patch.py
def develop_multiple_file_patches(
step: Step,
issue_statement: str,
num_implementations: int = 3
) -> str:
"""Generate multiple patch implementations and select the best one.
Args:
step: The step containing file and description information
issue_statement: The issue statement for context
num_implementations: Number of implementations to generate (default: 3)
Returns:
The raw patch text of the best implementation
"""
if not step.target_file.exists():
logger.warning(f"Creating new file: '{step.target_file}'")
with open(step.target_file, "w") as f:
f.write("# Comment added at creation time to indicate empty file.\n")
logger.info(f"Generating {num_implementations} patch implementations for {step.target_file}...")
# Generate multiple implementations with different seeds
raw_patches = []
for i in range(num_implementations):
seed = random.randint(1, 10000) # Use different seeds for diversity
logger.info(f"Generating implementation {i+1}/{num_implementations} with seed {seed}...")
llm_agent = _create_llm_agent()
raw_patch = llm_agent.invoke(
{
"path": step.target_file,
"code_context": read_txt(step.target_file),
"high_level_description": step.title,
"description": step.description,
"issue_statement": issue_statement,
}
)
raw_patches.append(raw_patch)
if num_implementations == 1:
return raw_patches[0]
# Format the patches for evaluation
formatted_patches = []
for i, patch in enumerate(raw_patches):
formatted_patches.append(f"### Implementation {i}\n```\n{patch}\n```\n")
# Evaluate and select the best implementation
logger.info("Evaluating patch implementations to select the best one...")
evaluator = _create_evaluator_agent()
evaluation = evaluator.invoke(
{
"num_patches": num_implementations,
"file_path": step.target_file,
"high_level_description": step.title,
"description": step.description,
"issue_statement": issue_statement,
"patches": "\n\n".join(formatted_patches),
"example_patch_evaluation": example_patch_evaluation.model_dump_json(),
}
)
selected_index = evaluation.selected_patch_index
logger.info(
f"Selected implementation {selected_index} as the best. Reasoning: {evaluation.reasoning}" # noqa: E501
)
return raw_patches[selected_index]
example_patch_evaluation = PatchEvaluation(
reasoning=("The first implementation is the best because it correctly "
"implements the required functions and follows best "
"practices."),
selected_patch_index=0
)
Step 5: Update the graph node
Location: libs/core/deep_next/core/steps/implement/graph.py
# Add the imports
from deep_next.core.steps.implement.develop_patch import develop_multiple_file_patches
# Update the code_development method
@staticmethod
@tenacity.retry(
stop=tenacity.stop_after_attempt(5),
retry=tenacity.retry_if_exception_type((ApplyPatchError, ParsePatchesError)),
reraise=True,
)
def code_development(
state: _State,
) -> _State:
# Use develop_multiple_file_patches instead of develop_single_file_patches
# to generate multiple implementations and select the best one
raw_patches = develop_multiple_file_patches(
step=state.selected_step,
issue_statement=state.issue_statement,
num_implementations=3 # Generate 3 implementations
)
patches: list[CodePatch] = parse_patches(raw_patches)
patches = [patch for patch in patches if patch.before != patch.after]
for patch in patches:
apply_patch(patch)
return state
Benefits
-
Improved Reliability: By generating multiple implementations and selecting the best one, we reduce the chance of using a patch with hallucinations or other flaws.
-
Quality Control: The evaluation step serves as a quality gate that analyzes each implementation based on multiple criteria.
-
Reduced Failure Rate: The code is less likely to have issues, reducing the number of retry attempts needed.
-
Cross-Validation: Having multiple implementations allows for cross-validation of approaches to identify the most robust solution.
Usage Example
The function is designed to be a drop-in replacement for the original develop_single_file_patches
function:
from deep_next.core.steps.implement.develop_patch import develop_multiple_file_patches
from deep_next.core.steps.action_plan.data_model import Step
from pathlib import Path
step = Step(
target_file=Path("./libs/core/tests/_resources/example_project/src/hello_world.py"),
title="Add type hints",
description="Add type hints to the add_integers function"
)
raw_patch = develop_multiple_file_patches(
step=step,
issue_statement="Add type hints to the add_integers function",
num_implementations=3 # Generate 3 different implementations
)
print(raw_patch)