LLM-based Project Map Filtering
When working with large codebases, the project map can become very lengthy, causing context window constraints for subsequent LLM processing. This example demonstrates how to add an LLM filter to the project map generation process that identifies and retains only the most relevant files and directories for a specific task.
Implementation
Below are the specific changes needed to add LLM-based filtering to the project_map.py
module:
Step 1: Add required imports
Location: libs/core/deep_next/core/steps/gather_project_knowledge/project_map.py
# Add these imports to the top of the file
import textwrap
from typing import List
from loguru import logger
from pydantic import BaseModel
from langchain.output_parsers import PydanticOutputParser
from langchain.prompts import ChatPromptTemplate
# Import LLM utilities
from deep_next.core.steps.gather_project_knowledge.common import _create_llm
Step 2: Create a structured output model
Location: libs/core/deep_next/core/steps/gather_project_knowledge/project_map.py
# Add this class after the existing _is_valid_file function
class ProjectMapFilter(BaseModel):
"""Model for the LLM to populate with important directories and files"""
reasoning: str
important_repo_map: str
class Prompt:
task_description = textwrap.dedent("""
You are an expert software engineer tasked with identifying the most important
directories and files for solving a specific task in a codebase.
Task description:
{task_description}
Below is the full project map:
{project_map}
Please analyze the project structure and identify which directories and files
are most likely to be relevant for this task. Many projects have large directories
with hundreds of files that aren't relevant to most tasks.
Your goal is to retain only the important parts of the project structure to reduce
context length while preserving critical information.
Respond with a list of the most important directories and files to keep, along with
brief reasoning for your selections.
EXAMPLE OUTPUT:
--------------------
{example_output_project_map}
--------------------
"""
)
Step 3: Implement the filtering function
Location: libs/core/deep_next/core/steps/gather_project_knowledge/project_map.py
def _create_llm_agent():
"""Creates LLM agent for filtering project map"""
parser = PydanticOutputParser(pydantic_object=ProjectMapFilter)
project_map_prompt_template = ChatPromptTemplate.from_messages(
[
("human", Prompt.task_description),
]
)
return project_map_prompt_template | _create_llm() | parser
# Add this function before the tree function
def filter_project_map(project_map: str, task_description: str) -> str:
"""
Use an LLM to filter the project map, keeping only relevant components for the task.
Args:
project_map: The full project map as a string
task_description: Description of the task being performed
Returns:
A filtered version of the project map
"""
# Get LLM response with structured output
response = _create_llm_agent().invoke(
{
"task_description": task_description,
"project_map": project_map,
"example_output_project_map": example_output_project_map.model_dump_json(),
}
)
return response.important_repo_map
example_output_project_map = ProjectMapFilter(
reasoning="The src directory contains the main codebase, while tests are not relevant.",
important_repo_map=textwrap.dedent(
"""\
π src
βββ π module1
β βββ π __init__.py
β βββ π file1.py
β βββ π file2.py
βββ π module2
β βββ π file3.py
β π main.py
"""
),
)
Step 4: Update the tree function signature and implementation
Location: libs/core/deep_next/core/steps/gather_project_knowledge/project_map.py
# Modify the tree function signature to add the task_description parameter
def tree(path: Path | str, root_name: str | None = None, task_description: str = None) -> str:
"""
Generates a visual directory tree structure for a given path.
Generates a visual directory tree structure for a given path. If a tree node
consists of more than {N_BIG_BRANCH_ITEMS} items, it is cut off and marked with:
"π <found X items>".
Args:
path (Path | str): The root directory to generate the tree for.
root_name (str | None): The name of the root node of the tree.
task_description (str | None): Optional task description to filter the tree.
Returns:
str: A string representation of the directory tree, formatted using
the `rich` library for visual clarity.
"""
Step 4.1: Modify the end of the tree
function to apply filtering
Location: libs/core/deep_next/core/steps/gather_project_knowledge/project_map.py
# ... existing code ...
# Replace the end of the tree function with this code
# Apply LLM filtering if task description is provided
if task_description:
logger.info(f"Filtering project map for task: {task_description[:100]}...")
return filter_project_map(flat_tree, task_description)
return flat_tree
Benefits
- Reduced context length: Filters out directories and files that arenβt relevant to the current task
- Improved focus: Helps subsequent LLMs focus on the most important parts of the codebase
- Task-specific customization: The filtering adapts based on the specific task description
- Preserves structure: Maintains the hierarchical structure of the project while reducing size
- Better LLM performance: With less irrelevant information, downstream LLMs can make more accurate decisions
Usage Example
from pathlib import Path
from deep_next.core.steps.gather_project_knowledge.project_map import tree
# Generate complete project map
full_map = tree(Path("/path/to/project"))
# Generate filtered project map for a specific task
task = "Add type hints to the database connection module"
filtered_map = tree(
Path("/path/to/project"),
task_description=task
)