Files
FLAML/.github/copilot-instructions.md
Copilot fc4efe3510 Fix sklearn 1.7+ compatibility: BaseEstimator type detection for ensemble (#1512)
* Initial plan

* Fix ExtraTreesEstimator regression ensemble error with sklearn 1.7+

Co-authored-by: thinkall <3197038+thinkall@users.noreply.github.com>

* Address code review feedback: improve __sklearn_tags__ implementation

Co-authored-by: thinkall <3197038+thinkall@users.noreply.github.com>

* Fix format error

* Emphasize pre-commit

---------

Co-authored-by: copilot-swe-agent[bot] <198982749+Copilot@users.noreply.github.com>
Co-authored-by: thinkall <3197038+thinkall@users.noreply.github.com>
Co-authored-by: Li Jiang <lijiang1@microsoft.com>
2026-01-23 10:20:59 +08:00

244 lines
7.3 KiB
Markdown

# GitHub Copilot Instructions for FLAML
## Project Overview
FLAML (Fast Library for Automated Machine Learning & Tuning) is a lightweight Python library for efficient automation of machine learning and AI operations. It automates workflow based on large language models, machine learning models, etc. and optimizes their performance.
**Key Components:**
- `flaml/automl/`: AutoML functionality for classification and regression
- `flaml/tune/`: Generic hyperparameter tuning
- `flaml/default/`: Zero-shot AutoML with default configurations
- `flaml/autogen/`: Legacy autogen code (note: AutoGen has moved to a separate repository)
- `flaml/fabric/`: Microsoft Fabric integration
- `test/`: Comprehensive test suite
## Build and Test Commands
### Installation
```bash
# Basic installation
pip install -e .
# Install with test dependencies
pip install -e .[test]
# Install with automl dependencies
pip install -e .[automl]
# Install with forecast dependencies (Linux only)
pip install -e .[forecast]
```
### Running Tests
```bash
# Run all tests (excluding autogen)
pytest test/ --ignore=test/autogen --reruns 2 --reruns-delay 10
# Run tests with coverage
coverage run -a -m pytest test --ignore=test/autogen --reruns 2 --reruns-delay 10
coverage xml
# Check dependencies
python test/check_dependency.py
```
### Linting and Formatting
```bash
# Run pre-commit hooks
pre-commit run --all-files
# Format with black (line length: 120)
black . --line-length 120
# Run ruff for linting and auto-fix
ruff check . --fix
```
## Code Style and Formatting
### Python Style
- **Line length:** 120 characters (configured in both Black and Ruff)
- **Formatter:** Black (v23.3.0+)
- **Linter:** Ruff with Pyflakes and pycodestyle rules
- **Import sorting:** Use isort (via Ruff)
- **Python version:** Supports Python >= 3.10 (full support for 3.10, 3.11, 3.12 and 3.13)
### Code Quality Rules
- Follow Black formatting conventions
- Keep imports sorted and organized
- Avoid unused imports (F401) - these are flagged but not auto-fixed
- Avoid wildcard imports (F403) where possible
- Complexity: Max McCabe complexity of 10
- Use type hints where appropriate
- Write clear docstrings for public APIs
### Pre-commit Hooks
The repository uses pre-commit hooks for:
- Checking for large files, AST syntax, YAML/TOML/JSON validity
- Detecting merge conflicts and private keys
- Trailing whitespace and end-of-file fixes
- pyupgrade for Python 3.8+ syntax
- Black formatting
- Markdown formatting (mdformat with GFM and frontmatter support)
- Ruff linting with auto-fix
## Testing Strategy
### Test Organization
- Tests are in the `test/` directory, organized by module
- `test/automl/`: AutoML feature tests
- `test/tune/`: Hyperparameter tuning tests
- `test/default/`: Zero-shot AutoML tests
- `test/nlp/`: NLP-related tests
- `test/spark/`: Spark integration tests
### Test Requirements
- Write tests for new functionality
- Ensure tests pass on multiple Python versions (3.10, 3.11, 3.12 and 3.13)
- Tests should work on both Ubuntu and Windows
- Use pytest markers for platform-specific tests (e.g., `@pytest.mark.spark`)
- Tests should be idempotent and not depend on external state
- Use `--reruns 2 --reruns-delay 10` for flaky tests
### Coverage
- Aim for good test coverage on new code
- Coverage reports are generated for Python 3.11 builds
- Coverage reports are uploaded to Codecov
## Git Workflow and Best Practices
### Branching
- Main branch: `main`
- Create feature branches from `main`
- PR reviews are required before merging
### Commit Messages
- Use clear, descriptive commit messages
- Reference issue numbers when applicable
- ALWAYS run `pre-commit run --all-files` before each commit to avoid formatting issues
### Pull Requests
- Ensure all tests pass before requesting review
- Update documentation if adding new features
- Follow the PR template in `.github/PULL_REQUEST_TEMPLATE.md`
- ALWAYS run `pre-commit run --all-files` before each commit to avoid formatting issues
## Project Structure
```
flaml/
├── automl/ # AutoML functionality
├── tune/ # Hyperparameter tuning
├── default/ # Zero-shot AutoML
├── autogen/ # Legacy autogen (deprecated, moved to separate repo)
├── fabric/ # Microsoft Fabric integration
├── onlineml/ # Online learning
└── version.py # Version information
test/ # Test suite
├── automl/
├── tune/
├── default/
├── nlp/
└── spark/
notebook/ # Example notebooks
website/ # Documentation website
```
## Dependencies and Package Management
### Core Dependencies
- NumPy >= 1.17
- Python >= 3.10 (officially supported: 3.10, 3.11, 3.12 and 3.13)
### Optional Dependencies
- `[automl]`: lightgbm, xgboost, scipy, pandas, scikit-learn
- `[test]`: Full test suite dependencies
- `[spark]`: PySpark and joblib dependencies
- `[forecast]`: holidays, prophet, statsmodels, hcrystalball, pytorch-forecasting, pytorch-lightning, tensorboardX
- `[hf]`: Hugging Face transformers and datasets
- See `setup.py` for complete list
### Version Constraints
- Be mindful of Python version-specific dependencies (check setup.py)
- XGBoost versions differ based on Python version
- NumPy 2.0+ only for Python >= 3.13
- Some features (like vowpalwabbit) only work with older Python versions
## Boundaries and Restrictions
### Do NOT Modify
- `.git/` directory and Git configuration
- `LICENSE` file
- Version information in `flaml/version.py` (unless explicitly updating version)
- GitHub Actions workflows without careful consideration
- Existing test files unless fixing bugs or adding coverage
### Be Cautious With
- `setup.py`: Changes to dependencies should be carefully reviewed
- `pyproject.toml`: Linting and testing configuration
- `.pre-commit-config.yaml`: Pre-commit hook configuration
- Backward compatibility: FLAML is a library with external users
### Security Considerations
- Never commit secrets or API keys
- Be careful with external data sources in tests
- Validate user inputs in public APIs
- Follow secure coding practices for ML operations
## Special Notes
### AutoGen Migration
- AutoGen has moved to a separate repository: https://github.com/microsoft/autogen
- The `flaml/autogen/` directory contains legacy code
- Tests in `test/autogen/` are ignored in the main test suite
- Direct users to the new AutoGen repository for AutoGen-related issues
### Platform-Specific Considerations
- Some tests only run on Linux (e.g., forecast tests with prophet)
- Windows and Ubuntu are the primary supported platforms
- macOS support exists but requires special libomp setup for lgbm/xgboost
### Performance
- FLAML focuses on efficient automation and tuning
- Consider computational cost when adding new features
- Optimize for low resource usage where possible
## Documentation
- Main documentation: https://microsoft.github.io/FLAML/
- Update documentation when adding new features
- Provide clear examples in docstrings
- Add notebook examples for significant new features
## Contributing
- Follow the contributing guide: https://microsoft.github.io/FLAML/docs/Contribute
- Sign the Microsoft CLA when making your first contribution
- Be respectful and follow the Microsoft Open Source Code of Conduct
- Join the Discord community for discussions: https://discord.gg/Cppx2vSPVP