mirror of https://github.com/microsoft/FLAML.git synced 2026-02-08 17:59:17 +08:00

Files

Copilot fc4efe3510 Fix sklearn 1.7+ compatibility: BaseEstimator type detection for ensemble (#1512 )

* Initial plan

* Fix ExtraTreesEstimator regression ensemble error with sklearn 1.7+

Co-authored-by: thinkall <3197038+thinkall@users.noreply.github.com>

* Address code review feedback: improve __sklearn_tags__ implementation

Co-authored-by: thinkall <3197038+thinkall@users.noreply.github.com>

* Fix format error

* Emphasize pre-commit

---------

Co-authored-by: copilot-swe-agent[bot] <198982749+Copilot@users.noreply.github.com>
Co-authored-by: thinkall <3197038+thinkall@users.noreply.github.com>
Co-authored-by: Li Jiang <lijiang1@microsoft.com>

2026-01-23 10:20:59 +08:00

7.3 KiB

Raw Permalink Blame History

GitHub Copilot Instructions for FLAML

Project Overview

FLAML (Fast Library for Automated Machine Learning & Tuning) is a lightweight Python library for efficient automation of machine learning and AI operations. It automates workflow based on large language models, machine learning models, etc. and optimizes their performance.

Key Components:

flaml/automl/: AutoML functionality for classification and regression
flaml/tune/: Generic hyperparameter tuning
flaml/default/: Zero-shot AutoML with default configurations
flaml/autogen/: Legacy autogen code (note: AutoGen has moved to a separate repository)
flaml/fabric/: Microsoft Fabric integration
test/: Comprehensive test suite

Build and Test Commands

Installation

# Basic installation
pip install -e .

# Install with test dependencies
pip install -e .[test]

# Install with automl dependencies
pip install -e .[automl]

# Install with forecast dependencies (Linux only)
pip install -e .[forecast]

Running Tests

# Run all tests (excluding autogen)
pytest test/ --ignore=test/autogen --reruns 2 --reruns-delay 10

# Run tests with coverage
coverage run -a -m pytest test --ignore=test/autogen --reruns 2 --reruns-delay 10
coverage xml

# Check dependencies
python test/check_dependency.py

Linting and Formatting

# Run pre-commit hooks
pre-commit run --all-files

# Format with black (line length: 120)
black . --line-length 120

# Run ruff for linting and auto-fix
ruff check . --fix

Code Style and Formatting

Python Style

Line length: 120 characters (configured in both Black and Ruff)
Formatter: Black (v23.3.0+)
Linter: Ruff with Pyflakes and pycodestyle rules
Import sorting: Use isort (via Ruff)
Python version: Supports Python >= 3.10 (full support for 3.10, 3.11, 3.12 and 3.13)

Code Quality Rules

Follow Black formatting conventions
Keep imports sorted and organized
Avoid unused imports (F401) - these are flagged but not auto-fixed
Avoid wildcard imports (F403) where possible
Complexity: Max McCabe complexity of 10
Use type hints where appropriate
Write clear docstrings for public APIs

Pre-commit Hooks

The repository uses pre-commit hooks for:

Checking for large files, AST syntax, YAML/TOML/JSON validity
Detecting merge conflicts and private keys
Trailing whitespace and end-of-file fixes
pyupgrade for Python 3.8+ syntax
Black formatting
Markdown formatting (mdformat with GFM and frontmatter support)
Ruff linting with auto-fix

Testing Strategy

Test Organization

Tests are in the test/ directory, organized by module
test/automl/: AutoML feature tests
test/tune/: Hyperparameter tuning tests
test/default/: Zero-shot AutoML tests
test/nlp/: NLP-related tests
test/spark/: Spark integration tests

Test Requirements

Write tests for new functionality
Ensure tests pass on multiple Python versions (3.10, 3.11, 3.12 and 3.13)
Tests should work on both Ubuntu and Windows
Use pytest markers for platform-specific tests (e.g., @pytest.mark.spark)
Tests should be idempotent and not depend on external state
Use --reruns 2 --reruns-delay 10 for flaky tests

Coverage

Aim for good test coverage on new code
Coverage reports are generated for Python 3.11 builds
Coverage reports are uploaded to Codecov

Git Workflow and Best Practices

Branching

Main branch: main
Create feature branches from main
PR reviews are required before merging

Commit Messages

Use clear, descriptive commit messages
Reference issue numbers when applicable
ALWAYS run pre-commit run --all-files before each commit to avoid formatting issues

Pull Requests

Ensure all tests pass before requesting review
Update documentation if adding new features
Follow the PR template in .github/PULL_REQUEST_TEMPLATE.md
ALWAYS run pre-commit run --all-files before each commit to avoid formatting issues

Project Structure

flaml/
├── automl/         # AutoML functionality
├── tune/           # Hyperparameter tuning
├── default/        # Zero-shot AutoML
├── autogen/        # Legacy autogen (deprecated, moved to separate repo)
├── fabric/         # Microsoft Fabric integration
├── onlineml/       # Online learning
└── version.py      # Version information

test/               # Test suite
├── automl/
├── tune/
├── default/
├── nlp/
└── spark/

notebook/           # Example notebooks
website/            # Documentation website

Dependencies and Package Management

Core Dependencies

NumPy >= 1.17
Python >= 3.10 (officially supported: 3.10, 3.11, 3.12 and 3.13)

Optional Dependencies

[automl]: lightgbm, xgboost, scipy, pandas, scikit-learn
[test]: Full test suite dependencies
[spark]: PySpark and joblib dependencies
[forecast]: holidays, prophet, statsmodels, hcrystalball, pytorch-forecasting, pytorch-lightning, tensorboardX
[hf]: Hugging Face transformers and datasets
See setup.py for complete list

Version Constraints

Be mindful of Python version-specific dependencies (check setup.py)
XGBoost versions differ based on Python version
NumPy 2.0+ only for Python >= 3.13
Some features (like vowpalwabbit) only work with older Python versions

Boundaries and Restrictions

Do NOT Modify

.git/ directory and Git configuration
LICENSE file
Version information in flaml/version.py (unless explicitly updating version)
GitHub Actions workflows without careful consideration
Existing test files unless fixing bugs or adding coverage

Be Cautious With

setup.py: Changes to dependencies should be carefully reviewed
pyproject.toml: Linting and testing configuration
.pre-commit-config.yaml: Pre-commit hook configuration
Backward compatibility: FLAML is a library with external users

Security Considerations

Never commit secrets or API keys
Be careful with external data sources in tests
Validate user inputs in public APIs
Follow secure coding practices for ML operations

Special Notes

AutoGen Migration

AutoGen has moved to a separate repository: https://github.com/microsoft/autogen
The flaml/autogen/ directory contains legacy code
Tests in test/autogen/ are ignored in the main test suite
Direct users to the new AutoGen repository for AutoGen-related issues

Platform-Specific Considerations

Some tests only run on Linux (e.g., forecast tests with prophet)
Windows and Ubuntu are the primary supported platforms
macOS support exists but requires special libomp setup for lgbm/xgboost

Performance

FLAML focuses on efficient automation and tuning
Consider computational cost when adding new features
Optimize for low resource usage where possible

Documentation

Main documentation: https://microsoft.github.io/FLAML/
Update documentation when adding new features
Provide clear examples in docstrings
Add notebook examples for significant new features

Contributing

Follow the contributing guide: https://microsoft.github.io/FLAML/docs/Contribute
Sign the Microsoft CLA when making your first contribution
Be respectful and follow the Microsoft Open Source Code of Conduct
Join the Discord community for discussions: https://discord.gg/Cppx2vSPVP

7.3 KiB Raw Permalink Blame History