mirror of
https://github.com/microsoft/FLAML.git
synced 2026-02-09 02:09:16 +08:00
* Initial plan * Fix ExtraTreesEstimator regression ensemble error with sklearn 1.7+ Co-authored-by: thinkall <3197038+thinkall@users.noreply.github.com> * Address code review feedback: improve __sklearn_tags__ implementation Co-authored-by: thinkall <3197038+thinkall@users.noreply.github.com> * Fix format error * Emphasize pre-commit --------- Co-authored-by: copilot-swe-agent[bot] <198982749+Copilot@users.noreply.github.com> Co-authored-by: thinkall <3197038+thinkall@users.noreply.github.com> Co-authored-by: Li Jiang <lijiang1@microsoft.com>
244 lines
7.3 KiB
Markdown
244 lines
7.3 KiB
Markdown
# GitHub Copilot Instructions for FLAML
|
|
|
|
## Project Overview
|
|
|
|
FLAML (Fast Library for Automated Machine Learning & Tuning) is a lightweight Python library for efficient automation of machine learning and AI operations. It automates workflow based on large language models, machine learning models, etc. and optimizes their performance.
|
|
|
|
**Key Components:**
|
|
|
|
- `flaml/automl/`: AutoML functionality for classification and regression
|
|
- `flaml/tune/`: Generic hyperparameter tuning
|
|
- `flaml/default/`: Zero-shot AutoML with default configurations
|
|
- `flaml/autogen/`: Legacy autogen code (note: AutoGen has moved to a separate repository)
|
|
- `flaml/fabric/`: Microsoft Fabric integration
|
|
- `test/`: Comprehensive test suite
|
|
|
|
## Build and Test Commands
|
|
|
|
### Installation
|
|
|
|
```bash
|
|
# Basic installation
|
|
pip install -e .
|
|
|
|
# Install with test dependencies
|
|
pip install -e .[test]
|
|
|
|
# Install with automl dependencies
|
|
pip install -e .[automl]
|
|
|
|
# Install with forecast dependencies (Linux only)
|
|
pip install -e .[forecast]
|
|
```
|
|
|
|
### Running Tests
|
|
|
|
```bash
|
|
# Run all tests (excluding autogen)
|
|
pytest test/ --ignore=test/autogen --reruns 2 --reruns-delay 10
|
|
|
|
# Run tests with coverage
|
|
coverage run -a -m pytest test --ignore=test/autogen --reruns 2 --reruns-delay 10
|
|
coverage xml
|
|
|
|
# Check dependencies
|
|
python test/check_dependency.py
|
|
```
|
|
|
|
### Linting and Formatting
|
|
|
|
```bash
|
|
# Run pre-commit hooks
|
|
pre-commit run --all-files
|
|
|
|
# Format with black (line length: 120)
|
|
black . --line-length 120
|
|
|
|
# Run ruff for linting and auto-fix
|
|
ruff check . --fix
|
|
```
|
|
|
|
## Code Style and Formatting
|
|
|
|
### Python Style
|
|
|
|
- **Line length:** 120 characters (configured in both Black and Ruff)
|
|
- **Formatter:** Black (v23.3.0+)
|
|
- **Linter:** Ruff with Pyflakes and pycodestyle rules
|
|
- **Import sorting:** Use isort (via Ruff)
|
|
- **Python version:** Supports Python >= 3.10 (full support for 3.10, 3.11, 3.12 and 3.13)
|
|
|
|
### Code Quality Rules
|
|
|
|
- Follow Black formatting conventions
|
|
- Keep imports sorted and organized
|
|
- Avoid unused imports (F401) - these are flagged but not auto-fixed
|
|
- Avoid wildcard imports (F403) where possible
|
|
- Complexity: Max McCabe complexity of 10
|
|
- Use type hints where appropriate
|
|
- Write clear docstrings for public APIs
|
|
|
|
### Pre-commit Hooks
|
|
|
|
The repository uses pre-commit hooks for:
|
|
|
|
- Checking for large files, AST syntax, YAML/TOML/JSON validity
|
|
- Detecting merge conflicts and private keys
|
|
- Trailing whitespace and end-of-file fixes
|
|
- pyupgrade for Python 3.8+ syntax
|
|
- Black formatting
|
|
- Markdown formatting (mdformat with GFM and frontmatter support)
|
|
- Ruff linting with auto-fix
|
|
|
|
## Testing Strategy
|
|
|
|
### Test Organization
|
|
|
|
- Tests are in the `test/` directory, organized by module
|
|
- `test/automl/`: AutoML feature tests
|
|
- `test/tune/`: Hyperparameter tuning tests
|
|
- `test/default/`: Zero-shot AutoML tests
|
|
- `test/nlp/`: NLP-related tests
|
|
- `test/spark/`: Spark integration tests
|
|
|
|
### Test Requirements
|
|
|
|
- Write tests for new functionality
|
|
- Ensure tests pass on multiple Python versions (3.10, 3.11, 3.12 and 3.13)
|
|
- Tests should work on both Ubuntu and Windows
|
|
- Use pytest markers for platform-specific tests (e.g., `@pytest.mark.spark`)
|
|
- Tests should be idempotent and not depend on external state
|
|
- Use `--reruns 2 --reruns-delay 10` for flaky tests
|
|
|
|
### Coverage
|
|
|
|
- Aim for good test coverage on new code
|
|
- Coverage reports are generated for Python 3.11 builds
|
|
- Coverage reports are uploaded to Codecov
|
|
|
|
## Git Workflow and Best Practices
|
|
|
|
### Branching
|
|
|
|
- Main branch: `main`
|
|
- Create feature branches from `main`
|
|
- PR reviews are required before merging
|
|
|
|
### Commit Messages
|
|
|
|
- Use clear, descriptive commit messages
|
|
- Reference issue numbers when applicable
|
|
- ALWAYS run `pre-commit run --all-files` before each commit to avoid formatting issues
|
|
|
|
### Pull Requests
|
|
|
|
- Ensure all tests pass before requesting review
|
|
- Update documentation if adding new features
|
|
- Follow the PR template in `.github/PULL_REQUEST_TEMPLATE.md`
|
|
- ALWAYS run `pre-commit run --all-files` before each commit to avoid formatting issues
|
|
|
|
## Project Structure
|
|
|
|
```
|
|
flaml/
|
|
├── automl/ # AutoML functionality
|
|
├── tune/ # Hyperparameter tuning
|
|
├── default/ # Zero-shot AutoML
|
|
├── autogen/ # Legacy autogen (deprecated, moved to separate repo)
|
|
├── fabric/ # Microsoft Fabric integration
|
|
├── onlineml/ # Online learning
|
|
└── version.py # Version information
|
|
|
|
test/ # Test suite
|
|
├── automl/
|
|
├── tune/
|
|
├── default/
|
|
├── nlp/
|
|
└── spark/
|
|
|
|
notebook/ # Example notebooks
|
|
website/ # Documentation website
|
|
```
|
|
|
|
## Dependencies and Package Management
|
|
|
|
### Core Dependencies
|
|
|
|
- NumPy >= 1.17
|
|
- Python >= 3.10 (officially supported: 3.10, 3.11, 3.12 and 3.13)
|
|
|
|
### Optional Dependencies
|
|
|
|
- `[automl]`: lightgbm, xgboost, scipy, pandas, scikit-learn
|
|
- `[test]`: Full test suite dependencies
|
|
- `[spark]`: PySpark and joblib dependencies
|
|
- `[forecast]`: holidays, prophet, statsmodels, hcrystalball, pytorch-forecasting, pytorch-lightning, tensorboardX
|
|
- `[hf]`: Hugging Face transformers and datasets
|
|
- See `setup.py` for complete list
|
|
|
|
### Version Constraints
|
|
|
|
- Be mindful of Python version-specific dependencies (check setup.py)
|
|
- XGBoost versions differ based on Python version
|
|
- NumPy 2.0+ only for Python >= 3.13
|
|
- Some features (like vowpalwabbit) only work with older Python versions
|
|
|
|
## Boundaries and Restrictions
|
|
|
|
### Do NOT Modify
|
|
|
|
- `.git/` directory and Git configuration
|
|
- `LICENSE` file
|
|
- Version information in `flaml/version.py` (unless explicitly updating version)
|
|
- GitHub Actions workflows without careful consideration
|
|
- Existing test files unless fixing bugs or adding coverage
|
|
|
|
### Be Cautious With
|
|
|
|
- `setup.py`: Changes to dependencies should be carefully reviewed
|
|
- `pyproject.toml`: Linting and testing configuration
|
|
- `.pre-commit-config.yaml`: Pre-commit hook configuration
|
|
- Backward compatibility: FLAML is a library with external users
|
|
|
|
### Security Considerations
|
|
|
|
- Never commit secrets or API keys
|
|
- Be careful with external data sources in tests
|
|
- Validate user inputs in public APIs
|
|
- Follow secure coding practices for ML operations
|
|
|
|
## Special Notes
|
|
|
|
### AutoGen Migration
|
|
|
|
- AutoGen has moved to a separate repository: https://github.com/microsoft/autogen
|
|
- The `flaml/autogen/` directory contains legacy code
|
|
- Tests in `test/autogen/` are ignored in the main test suite
|
|
- Direct users to the new AutoGen repository for AutoGen-related issues
|
|
|
|
### Platform-Specific Considerations
|
|
|
|
- Some tests only run on Linux (e.g., forecast tests with prophet)
|
|
- Windows and Ubuntu are the primary supported platforms
|
|
- macOS support exists but requires special libomp setup for lgbm/xgboost
|
|
|
|
### Performance
|
|
|
|
- FLAML focuses on efficient automation and tuning
|
|
- Consider computational cost when adding new features
|
|
- Optimize for low resource usage where possible
|
|
|
|
## Documentation
|
|
|
|
- Main documentation: https://microsoft.github.io/FLAML/
|
|
- Update documentation when adding new features
|
|
- Provide clear examples in docstrings
|
|
- Add notebook examples for significant new features
|
|
|
|
## Contributing
|
|
|
|
- Follow the contributing guide: https://microsoft.github.io/FLAML/docs/Contribute
|
|
- Sign the Microsoft CLA when making your first contribution
|
|
- Be respectful and follow the Microsoft Open Source Code of Conduct
|
|
- Join the Discord community for discussions: https://discord.gg/Cppx2vSPVP
|