LlamaIndex Best Practices and Coding Standards
6/30/2025
このドキュメントは、LlamaIndexを使用した高品質なアプリケーション開発に関する包括的なガイドを提供します。コードの組織化、パフォーマンスの最適化、セキュリティ対策、テスト戦略など、開発の様々な側面をカバーしています。ディレクトリ構造、ファイル命名規則、最適化技術、入力検証や認証のベストプラクティスなどが詳述されています。
# LlamaIndex Best Practices and Coding Standards
This document provides comprehensive guidance on developing high-quality applications using LlamaIndex. It covers various aspects of development, including code organization, performance optimization, security considerations, and testing strategies.
## 1. Code Organization and Structure
* **Directory Structure Best Practices:**
* `data/`: Store data sources (e.g., documents, PDFs) used by LlamaIndex.
* `indices/`: Contains index definitions and configurations.
* `queries/`: Defines query engines and query logic.
* `models/`: Place custom LLM or embedding model configurations.
* `utils/`: Utility functions and helper classes.
* `tests/`: Unit, integration, and end-to-end tests.
* `scripts/`: Scripts for data ingestion, index building, or other automation tasks.
* **File Naming Conventions:**
* Data loaders: `*_loader.py` (e.g., `pdf_loader.py`)
* Index definitions: `*_index.py` (e.g., `vector_index.py`)
* Query engines: `*_query_engine.py` (e.g., `knowledge_graph_query_engine.py`)
* Models: `*_model.py` (e.g., `custom_llm_model.py`)
* Utilities: `*_utils.py` (e.g., `text_processing_utils.py`)
* Tests: `test_*.py` (e.g., `test_vector_index.py`)
* **Module Organization:**
* Group related functionalities into modules (e.g., `data_ingestion`, `indexing`, `querying`).
* Use clear and descriptive module names.
* Minimize dependencies between modules to improve maintainability.
* **Component Architecture:**
* **Data Connectors:** Abstract data loading logic into reusable connectors.
* **Index Structures:** Use appropriate index structures (e.g., `VectorStoreIndex`, `KnowledgeGraphIndex`) based on data characteristics and query requirements.
* **Query Engines:** Decouple query logic from index structures.
* **LLM Abstraction:** Abstract LLM calls using interfaces for flexibility and testability.
* **Code Splitting:**
* Break down large functions into smaller, well-defined functions.
* Use classes to encapsulate related data and behavior.
* Extract reusable code into separate modules or packages.
## 2. Common Patterns and Anti-patterns
* **Design Patterns:**
* **Factory Pattern:** For creating different types of indexes or query engines.
* **Strategy Pattern:** For choosing different retrieval or ranking algorithms.
* **Decorator Pattern:** For adding pre-processing or post-processing steps to queries.
* **Recommended Approaches:**
* **Data Ingestion:** Use `SimpleDirectoryReader` or custom data connectors to load data.
* **Indexing:** Choose the appropriate index type based on your data and query needs. Consider `VectorStoreIndex` for semantic search, `KnowledgeGraphIndex` for knowledge graph-based queries, and `ComposableGraph` for combining multiple indexes.
* **Querying:** Use `as_query_engine()` to create a query engine from an index. Customize the query engine with different retrieval and response synthesis modules.
* **Evaluation:** Use LlamaIndex's evaluation modules to measure the performance of your LLM application (e.g., retrieval and LLM response quality).
* **Anti-patterns and Code Smells:**
* **Tight Coupling:** Avoid tight coupling between components. Use interfaces and dependency injection to promote loose coupling.
* **God Classes:** Avoid creating large classes that do too much. Break them down into smaller, more focused classes.
* **Code Duplication:** Avoid duplicating code. Extract common code into reusable functions or classes.
* **Ignoring Errors:** Don't ignore errors. Handle them gracefully or raise exceptions.
* **State Management:**
* Use LlamaIndex's `StorageContext` to persist indexes to disk.
* Consider using a database to store application state.
* **Error Handling:**
* Use `try-except` blocks to handle exceptions.
* Log errors for debugging purposes.
* Provide informative error messages to the user.
## 3. Performance Considerations
* **Optimization Techniques:**
* **Indexing:** Optimize index construction by using appropriate chunk sizes and overlap.
* **Querying:** Optimize query performance by using appropriate retrieval and ranking algorithms.
* **Caching:** Cache query results to improve performance.
* **Parallelization:** Parallelize data loading and indexing tasks.
* **Memory Management:**
* Use generators to process large datasets in chunks.
* Release memory when it is no longer needed.
* **Bundle Size Optimization:** (Not directly applicable to LlamaIndex as it is a backend library, but relevant if building a web UI on top)
* Remove unused code.
* Use code splitting to load only the code that is needed.
* **Lazy Loading:**
* Load data and models only when they are needed.
* Use lazy initialization to defer the creation of objects until they are first used.
## 4. Security Best Practices
* **Common Vulnerabilities:**
* **Prompt Injection:** Prevent prompt injection attacks by carefully sanitizing user input and using appropriate prompt engineering techniques.
* **Data Leaks:** Protect sensitive data by using appropriate access control and encryption.
* **API Key Exposure:** Avoid exposing API keys in your code. Use environment variables or a secure configuration management system to store API keys.
* **Input Validation:**
* Validate all user input to prevent injection attacks.
* Sanitize input to remove potentially harmful characters.
* **Authentication and Authorization:**
* Implement authentication and authorization to control access to your application.
* Use strong passwords and multi-factor authentication.
* **Data Protection:**
* Encrypt sensitive data at rest and in transit.
* Use appropriate access control to protect data.
* **Secure API Communication:**
* Use HTTPS to encrypt communication between your application and the LlamaIndex API.
* Validate the server certificate to prevent man-in-the-middle attacks.
## 5. Testing Approaches
* **Unit Testing:**
* Write unit tests for all core components, including data connectors, index structures, and query engines.
* Use mocking and stubbing to isolate components during testing.
* **Integration Testing:**
* Write integration tests to verify that different components work together correctly.
* Test the integration between LlamaIndex and other libraries or frameworks.
* **End-to-end Testing:**
* Write end-to-end tests to verify that the entire application works as expected.
* Test the application with real data and user scenarios.
* **Test Organization:**
* Organize tests into separate directories for unit, integration, and end-to-end tests.
* Use clear and descriptive test names.
* **Mocking and Stubbing:**
* Use mocking and stubbing to isolate components during testing.
* Use a mocking framework such as `unittest.mock` or `pytest-mock`.
## 6. Common Pitfalls and Gotchas
* **Frequent Mistakes:**
* Using the wrong index type for the data.
* Not optimizing query performance.
* Not handling errors gracefully.
* Exposing API keys.
* Not validating user input.
* **Edge Cases:**
* Handling large documents.
* Handling noisy or incomplete data.
* Handling complex queries.
* **Version-Specific Issues:**
* Be aware of breaking changes in LlamaIndex releases.
* Refer to the LlamaIndex documentation for version-specific information.
* **Compatibility Concerns:**
* Ensure that LlamaIndex is compatible with the other libraries and frameworks that you are using.
* Test your application thoroughly to identify any compatibility issues.
* **Debugging Strategies:**
* Use logging to track the execution of your application.
* Use a debugger to step through your code and inspect variables.
* Use LlamaIndex's debugging tools to diagnose issues.
## 7. Tooling and Environment
* **Recommended Development Tools:**
* **IDE:** VS Code, PyCharm
* **Package Manager:** Poetry, pip
* **Testing Framework:** pytest
* **Linting and Formatting:** flake8, black
* **Build Configuration:**
* Use a build system such as `poetry` to manage dependencies.
* Create a `requirements.txt` file to list dependencies.
* **Linting and Formatting:**
* Use a linter such as `flake8` to enforce code style.
* Use a formatter such as `black` to automatically format code.
* **Deployment Best Practices:**
* Use a containerization technology such as Docker to package your application.
* Use a cloud platform such as AWS, Azure, or GCP to deploy your application.
* **CI/CD Integration:**
* Use a CI/CD system such as GitHub Actions or Jenkins to automate the build, test, and deployment process.