mirror of
https://github.com/microsoft/FLAML.git
synced 2026-02-08 17:59:17 +08:00
* Initial plan * Fix ExtraTreesEstimator regression ensemble error with sklearn 1.7+ Co-authored-by: thinkall <3197038+thinkall@users.noreply.github.com> * Address code review feedback: improve __sklearn_tags__ implementation Co-authored-by: thinkall <3197038+thinkall@users.noreply.github.com> * Fix format error * Emphasize pre-commit --------- Co-authored-by: copilot-swe-agent[bot] <198982749+Copilot@users.noreply.github.com> Co-authored-by: thinkall <3197038+thinkall@users.noreply.github.com> Co-authored-by: Li Jiang <lijiang1@microsoft.com>
7.3 KiB
7.3 KiB
GitHub Copilot Instructions for FLAML
Project Overview
FLAML (Fast Library for Automated Machine Learning & Tuning) is a lightweight Python library for efficient automation of machine learning and AI operations. It automates workflow based on large language models, machine learning models, etc. and optimizes their performance.
Key Components:
flaml/automl/: AutoML functionality for classification and regressionflaml/tune/: Generic hyperparameter tuningflaml/default/: Zero-shot AutoML with default configurationsflaml/autogen/: Legacy autogen code (note: AutoGen has moved to a separate repository)flaml/fabric/: Microsoft Fabric integrationtest/: Comprehensive test suite
Build and Test Commands
Installation
# Basic installation
pip install -e .
# Install with test dependencies
pip install -e .[test]
# Install with automl dependencies
pip install -e .[automl]
# Install with forecast dependencies (Linux only)
pip install -e .[forecast]
Running Tests
# Run all tests (excluding autogen)
pytest test/ --ignore=test/autogen --reruns 2 --reruns-delay 10
# Run tests with coverage
coverage run -a -m pytest test --ignore=test/autogen --reruns 2 --reruns-delay 10
coverage xml
# Check dependencies
python test/check_dependency.py
Linting and Formatting
# Run pre-commit hooks
pre-commit run --all-files
# Format with black (line length: 120)
black . --line-length 120
# Run ruff for linting and auto-fix
ruff check . --fix
Code Style and Formatting
Python Style
- Line length: 120 characters (configured in both Black and Ruff)
- Formatter: Black (v23.3.0+)
- Linter: Ruff with Pyflakes and pycodestyle rules
- Import sorting: Use isort (via Ruff)
- Python version: Supports Python >= 3.10 (full support for 3.10, 3.11, 3.12 and 3.13)
Code Quality Rules
- Follow Black formatting conventions
- Keep imports sorted and organized
- Avoid unused imports (F401) - these are flagged but not auto-fixed
- Avoid wildcard imports (F403) where possible
- Complexity: Max McCabe complexity of 10
- Use type hints where appropriate
- Write clear docstrings for public APIs
Pre-commit Hooks
The repository uses pre-commit hooks for:
- Checking for large files, AST syntax, YAML/TOML/JSON validity
- Detecting merge conflicts and private keys
- Trailing whitespace and end-of-file fixes
- pyupgrade for Python 3.8+ syntax
- Black formatting
- Markdown formatting (mdformat with GFM and frontmatter support)
- Ruff linting with auto-fix
Testing Strategy
Test Organization
- Tests are in the
test/directory, organized by module test/automl/: AutoML feature teststest/tune/: Hyperparameter tuning teststest/default/: Zero-shot AutoML teststest/nlp/: NLP-related teststest/spark/: Spark integration tests
Test Requirements
- Write tests for new functionality
- Ensure tests pass on multiple Python versions (3.10, 3.11, 3.12 and 3.13)
- Tests should work on both Ubuntu and Windows
- Use pytest markers for platform-specific tests (e.g.,
@pytest.mark.spark) - Tests should be idempotent and not depend on external state
- Use
--reruns 2 --reruns-delay 10for flaky tests
Coverage
- Aim for good test coverage on new code
- Coverage reports are generated for Python 3.11 builds
- Coverage reports are uploaded to Codecov
Git Workflow and Best Practices
Branching
- Main branch:
main - Create feature branches from
main - PR reviews are required before merging
Commit Messages
- Use clear, descriptive commit messages
- Reference issue numbers when applicable
- ALWAYS run
pre-commit run --all-filesbefore each commit to avoid formatting issues
Pull Requests
- Ensure all tests pass before requesting review
- Update documentation if adding new features
- Follow the PR template in
.github/PULL_REQUEST_TEMPLATE.md - ALWAYS run
pre-commit run --all-filesbefore each commit to avoid formatting issues
Project Structure
flaml/
├── automl/ # AutoML functionality
├── tune/ # Hyperparameter tuning
├── default/ # Zero-shot AutoML
├── autogen/ # Legacy autogen (deprecated, moved to separate repo)
├── fabric/ # Microsoft Fabric integration
├── onlineml/ # Online learning
└── version.py # Version information
test/ # Test suite
├── automl/
├── tune/
├── default/
├── nlp/
└── spark/
notebook/ # Example notebooks
website/ # Documentation website
Dependencies and Package Management
Core Dependencies
- NumPy >= 1.17
- Python >= 3.10 (officially supported: 3.10, 3.11, 3.12 and 3.13)
Optional Dependencies
[automl]: lightgbm, xgboost, scipy, pandas, scikit-learn[test]: Full test suite dependencies[spark]: PySpark and joblib dependencies[forecast]: holidays, prophet, statsmodels, hcrystalball, pytorch-forecasting, pytorch-lightning, tensorboardX[hf]: Hugging Face transformers and datasets- See
setup.pyfor complete list
Version Constraints
- Be mindful of Python version-specific dependencies (check setup.py)
- XGBoost versions differ based on Python version
- NumPy 2.0+ only for Python >= 3.13
- Some features (like vowpalwabbit) only work with older Python versions
Boundaries and Restrictions
Do NOT Modify
.git/directory and Git configurationLICENSEfile- Version information in
flaml/version.py(unless explicitly updating version) - GitHub Actions workflows without careful consideration
- Existing test files unless fixing bugs or adding coverage
Be Cautious With
setup.py: Changes to dependencies should be carefully reviewedpyproject.toml: Linting and testing configuration.pre-commit-config.yaml: Pre-commit hook configuration- Backward compatibility: FLAML is a library with external users
Security Considerations
- Never commit secrets or API keys
- Be careful with external data sources in tests
- Validate user inputs in public APIs
- Follow secure coding practices for ML operations
Special Notes
AutoGen Migration
- AutoGen has moved to a separate repository: https://github.com/microsoft/autogen
- The
flaml/autogen/directory contains legacy code - Tests in
test/autogen/are ignored in the main test suite - Direct users to the new AutoGen repository for AutoGen-related issues
Platform-Specific Considerations
- Some tests only run on Linux (e.g., forecast tests with prophet)
- Windows and Ubuntu are the primary supported platforms
- macOS support exists but requires special libomp setup for lgbm/xgboost
Performance
- FLAML focuses on efficient automation and tuning
- Consider computational cost when adding new features
- Optimize for low resource usage where possible
Documentation
- Main documentation: https://microsoft.github.io/FLAML/
- Update documentation when adding new features
- Provide clear examples in docstrings
- Add notebook examples for significant new features
Contributing
- Follow the contributing guide: https://microsoft.github.io/FLAML/docs/Contribute
- Sign the Microsoft CLA when making your first contribution
- Be respectful and follow the Microsoft Open Source Code of Conduct
- Join the Discord community for discussions: https://discord.gg/Cppx2vSPVP