Compare commits

...

32 Commits
v2.4.0 ... main

Author SHA1 Message Date
dependabot[bot]
bc1e4dc5ea Bump webpack from 5.94.0 to 5.105.0 in /website (#1515) 2026-02-08 16:29:18 +08:00
Copilot
158ff7d99e Fix transformers API compatibility: support v4.26+ and v5.0+ with version-aware parameter selection (#1514)
* Initial plan

* Fix transformers API compatibility issues

Co-authored-by: thinkall <3197038+thinkall@users.noreply.github.com>

* Add backward compatibility for transformers v4.26+ by version check

Support both tokenizer (v4.26-4.43) and processing_class (v4.44+) parameters based on installed transformers version. Fallback to tokenizer if version check fails.

Co-authored-by: thinkall <3197038+thinkall@users.noreply.github.com>

* Improve exception handling specificity

Use specific exception types (ImportError, AttributeError, ValueError) instead of broad Exception catch for better error handling.

Co-authored-by: thinkall <3197038+thinkall@users.noreply.github.com>

* Run pre-commit formatting on all files

Applied black formatting to fix code style across the repository.

Co-authored-by: thinkall <3197038+thinkall@users.noreply.github.com>

---------

Co-authored-by: copilot-swe-agent[bot] <198982749+Copilot@users.noreply.github.com>
Co-authored-by: thinkall <3197038+thinkall@users.noreply.github.com>
2026-01-28 09:00:21 +08:00
Li Jiang
a5021152d2 ci: skip pre-commit workflow on main (#1513)
* ci: skip pre-commit workflow on main

* ci: run pre-commit only on pull requests
2026-01-25 21:10:05 +08:00
Copilot
fc4efe3510 Fix sklearn 1.7+ compatibility: BaseEstimator type detection for ensemble (#1512)
* Initial plan

* Fix ExtraTreesEstimator regression ensemble error with sklearn 1.7+

Co-authored-by: thinkall <3197038+thinkall@users.noreply.github.com>

* Address code review feedback: improve __sklearn_tags__ implementation

Co-authored-by: thinkall <3197038+thinkall@users.noreply.github.com>

* Fix format error

* Emphasize pre-commit

---------

Co-authored-by: copilot-swe-agent[bot] <198982749+Copilot@users.noreply.github.com>
Co-authored-by: thinkall <3197038+thinkall@users.noreply.github.com>
Co-authored-by: Li Jiang <lijiang1@microsoft.com>
2026-01-23 10:20:59 +08:00
Li Jiang
cd0e9fb0d2 Only run save dependencies on main branch (#1510) 2026-01-22 11:07:40 +08:00
dependabot[bot]
a9c0a9e30a Bump lodash from 4.17.21 to 4.17.23 in /website (#1509)
Bumps [lodash](https://github.com/lodash/lodash) from 4.17.21 to 4.17.23.
- [Release notes](https://github.com/lodash/lodash/releases)
- [Commits](https://github.com/lodash/lodash/compare/4.17.21...4.17.23)

---
updated-dependencies:
- dependency-name: lodash
  dependency-version: 4.17.23
  dependency-type: indirect
...

Signed-off-by: dependabot[bot] <support@github.com>
Co-authored-by: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com>
2026-01-22 08:47:33 +08:00
Li Jiang
a05b669de3 Update Python version support and pre-commit in documentation (#1505) 2026-01-21 16:39:54 +08:00
Copilot
6e59103e86 Add hierarchical search space documentation (#1496)
* Initial plan

* Add hierarchical search space documentation to Tune-User-Defined-Function.md

Co-authored-by: thinkall <3197038+thinkall@users.noreply.github.com>

* Add clarifying comments to hierarchical search space examples

Co-authored-by: thinkall <3197038+thinkall@users.noreply.github.com>

* Fix formatting issues with pre-commit

Co-authored-by: thinkall <3197038+thinkall@users.noreply.github.com>

---------

Co-authored-by: copilot-swe-agent[bot] <198982749+Copilot@users.noreply.github.com>
Co-authored-by: thinkall <3197038+thinkall@users.noreply.github.com>
Co-authored-by: Li Jiang <bnujli@gmail.com>
2026-01-21 14:40:56 +08:00
Copilot
d9e74031e0 Expose task-level and estimator-level preprocessors as public API (#1497)
* Initial plan

* Add public preprocess() API methods for AutoML and estimators

Co-authored-by: thinkall <3197038+thinkall@users.noreply.github.com>

* Add documentation for preprocess() API methods

Co-authored-by: thinkall <3197038+thinkall@users.noreply.github.com>

* Add example script demonstrating preprocess() API usage

Co-authored-by: thinkall <3197038+thinkall@users.noreply.github.com>

* Address code review feedback - fix type hints and simplify test logic

Co-authored-by: thinkall <3197038+thinkall@users.noreply.github.com>

* Fix formatting issues with pre-commit hooks

Co-authored-by: thinkall <3197038+thinkall@users.noreply.github.com>

* Remove example.py, make tests faster

---------

Co-authored-by: copilot-swe-agent[bot] <198982749+Copilot@users.noreply.github.com>
Co-authored-by: thinkall <3197038+thinkall@users.noreply.github.com>
Co-authored-by: Li Jiang <bnujli@gmail.com>
Co-authored-by: Li Jiang <lijiang1@microsoft.com>
2026-01-21 14:38:25 +08:00
Copilot
7ec1414e9b Clarify period parameter and automatic label lagging in time series forecasting (#1495)
* Initial plan

* Add comprehensive documentation for period parameter and automatic label lagging

Co-authored-by: thinkall <3197038+thinkall@users.noreply.github.com>

* Address code review feedback on docstring clarity

Co-authored-by: thinkall <3197038+thinkall@users.noreply.github.com>

* Clarify period vs prediction output length per @thinkall's feedback

Co-authored-by: thinkall <3197038+thinkall@users.noreply.github.com>

* Refine terminology per code review feedback

Co-authored-by: thinkall <3197038+thinkall@users.noreply.github.com>

* Run pre-commit formatting fixes

Co-authored-by: thinkall <3197038+thinkall@users.noreply.github.com>

---------

Co-authored-by: copilot-swe-agent[bot] <198982749+Copilot@users.noreply.github.com>
Co-authored-by: thinkall <3197038+thinkall@users.noreply.github.com>
Co-authored-by: Li Jiang <bnujli@gmail.com>
2026-01-21 14:19:23 +08:00
Copilot
9233a52736 Add configurable label overlap handling for classification holdout strategy (#1491)
* Initial plan

* Fix training/test set overlap in holdout classification by only adding missing labels when needed

Co-authored-by: thinkall <3197038+thinkall@users.noreply.github.com>

* Address code review feedback: add bounds checking and fix edge cases

Co-authored-by: thinkall <3197038+thinkall@users.noreply.github.com>

* Fix bounds checking: use correct comparison operator for array indexing

Co-authored-by: thinkall <3197038+thinkall@users.noreply.github.com>

* Fix potential ValueError with max() on empty lists and simplify test assertions

Co-authored-by: thinkall <3197038+thinkall@users.noreply.github.com>

* Add extra bounds checking for label_matches indices

Co-authored-by: thinkall <3197038+thinkall@users.noreply.github.com>

* Fix pandas_on_spark compatibility by using iloc_pandas_on_spark util method

Co-authored-by: thinkall <3197038+thinkall@users.noreply.github.com>

* Run pre-commit to fix formatting issues

Co-authored-by: thinkall <3197038+thinkall@users.noreply.github.com>

* Improve missing label handling to avoid overlap when possible

For classes with multiple instances that end up in one set, properly
re-split them between train and val instead of duplicating. Only add
to both sets when the class has exactly 1 instance (unavoidable overlap).

Co-authored-by: thinkall <3197038+thinkall@users.noreply.github.com>

* Fix: ensure at least 1 instance remains in original set when re-splitting

Addresses comments on lines 580 and 724 - prevents moving ALL instances
from validation to training or vice versa by using min(num_instances - 1, ...).

Co-authored-by: thinkall <3197038+thinkall@users.noreply.github.com>

* Refactor: Extract missing label handling into two methods (fast/slow)

- Created _handle_missing_labels_fast(): Fast version that adds first instance to both sets (may overlap)
- Created _handle_missing_labels_no_overlap(): Precise version that avoids overlap when possible
- Added allow_label_overlap parameter to AutoML.fit() (default=True for fast version)
- Updated documentation with new parameter
- Both versions maintain label completeness while offering speed/precision tradeoff

Co-authored-by: thinkall <3197038+thinkall@users.noreply.github.com>

* Add usage example for allow_label_overlap to Best-Practices.md

- Added comprehensive documentation in Classification section
- Included code examples showing both fast and precise versions
- Explained tradeoffs between speed and precision
- Noted that parameter only affects holdout evaluation

Co-authored-by: thinkall <3197038+thinkall@users.noreply.github.com>

* Address code review feedback: update documentation and tests

- Updated docstrings to clarify fast version only adds instances to missing sets
- Fixed documentation to reflect actual behavior (not "both sets" but "set with missing label")
- Completely rewrote test_no_overlap.py to test both allow_label_overlap modes
- Added tests with sample_weights for better code coverage
- Added test for single-instance class handling
- All 5 tests passing

Co-authored-by: thinkall <3197038+thinkall@users.noreply.github.com>

* Fix AttributeError: initialize _allow_label_overlap in settings and retrain_from_log

- Added allow_label_overlap to settings initialization with default=True
- Added parameter defaulting in fit() method to use settings value if not provided
- Added _allow_label_overlap initialization in retrain_from_log method
- Fixes test failures in test_multiclass, test_regression, and spark tests

Co-authored-by: thinkall <3197038+thinkall@users.noreply.github.com>

* Add docstring to fit()

---------

Co-authored-by: copilot-swe-agent[bot] <198982749+Copilot@users.noreply.github.com>
Co-authored-by: thinkall <3197038+thinkall@users.noreply.github.com>
Co-authored-by: Li Jiang <bnujli@gmail.com>
Co-authored-by: Li Jiang <lijiang1@microsoft.com>
2026-01-21 14:03:48 +08:00
Copilot
7ac076d544 Use scientific notation for best error in logger output (#1498)
* Initial plan

* Change best error format from .4f to .4e for scientific notation

Co-authored-by: thinkall <3197038+thinkall@users.noreply.github.com>

---------

Co-authored-by: copilot-swe-agent[bot] <198982749+Copilot@users.noreply.github.com>
Co-authored-by: thinkall <3197038+thinkall@users.noreply.github.com>
Co-authored-by: Li Jiang <bnujli@gmail.com>
2026-01-21 09:06:19 +08:00
Copilot
3d489f1aaa Add validation and clear error messages for custom_metric parameter (#1500)
* Initial plan

* Add validation and documentation for custom_metric parameter

Co-authored-by: thinkall <3197038+thinkall@users.noreply.github.com>

* Refactor validation into reusable method and improve error handling

Co-authored-by: thinkall <3197038+thinkall@users.noreply.github.com>

* Apply pre-commit formatting fixes

Co-authored-by: thinkall <3197038+thinkall@users.noreply.github.com>

---------

Co-authored-by: copilot-swe-agent[bot] <198982749+Copilot@users.noreply.github.com>
Co-authored-by: thinkall <3197038+thinkall@users.noreply.github.com>
Co-authored-by: Li Jiang <bnujli@gmail.com>
2026-01-21 08:58:11 +08:00
Copilot
c64eeb5e8d Document that final_estimator parameters in ensemble are not auto-tuned (#1499)
* Initial plan

* Document final_estimator parameter behavior in ensemble configuration

Co-authored-by: thinkall <3197038+thinkall@users.noreply.github.com>

* Address code review feedback: fix syntax in examples and use float comparison

Co-authored-by: thinkall <3197038+thinkall@users.noreply.github.com>

* Run pre-commit to fix formatting issues

Co-authored-by: thinkall <3197038+thinkall@users.noreply.github.com>

---------

Co-authored-by: copilot-swe-agent[bot] <198982749+Copilot@users.noreply.github.com>
Co-authored-by: thinkall <3197038+thinkall@users.noreply.github.com>
Co-authored-by: Li Jiang <bnujli@gmail.com>
2026-01-20 21:59:31 +08:00
Copilot
bf35f98a24 Document missing value handling behavior for AutoML estimators (#1473)
* Initial plan

* Add comprehensive documentation on missing value handling in FAQ

Co-authored-by: thinkall <3197038+thinkall@users.noreply.github.com>

* Apply mdformat to FAQ.md

Co-authored-by: thinkall <3197038+thinkall@users.noreply.github.com>

* Correct FAQ: FLAML does preprocess missing values with SimpleImputer

Co-authored-by: thinkall <3197038+thinkall@users.noreply.github.com>

---------

Co-authored-by: copilot-swe-agent[bot] <198982749+Copilot@users.noreply.github.com>
Co-authored-by: thinkall <3197038+thinkall@users.noreply.github.com>
Co-authored-by: Li Jiang <bnujli@gmail.com>
2026-01-20 21:53:10 +08:00
Copilot
1687ca9a94 Fix eval_set preprocessing for XGBoost estimators with categorical features (#1470)
* Initial plan

* Initial analysis - reproduced eval_set preprocessing bug

Co-authored-by: thinkall <3197038+thinkall@users.noreply.github.com>

* Fix eval_set preprocessing for XGBoost estimators with categorical features

Co-authored-by: thinkall <3197038+thinkall@users.noreply.github.com>

* Add eval_set tests to test_xgboost function

Co-authored-by: thinkall <3197038+thinkall@users.noreply.github.com>

* Fix linting issues with ruff and black

Co-authored-by: thinkall <3197038+thinkall@users.noreply.github.com>

---------

Co-authored-by: copilot-swe-agent[bot] <198982749+Copilot@users.noreply.github.com>
Co-authored-by: thinkall <3197038+thinkall@users.noreply.github.com>
Co-authored-by: Li Jiang <bnujli@gmail.com>
2026-01-20 20:41:21 +08:00
Copilot
7a597adcc9 Add GitHub Copilot instructions for FLAML repository (#1502)
* Initial plan

* Add comprehensive Copilot instructions for FLAML repository

Co-authored-by: thinkall <3197038+thinkall@users.noreply.github.com>

* Update forecast dependencies list to be complete

Co-authored-by: thinkall <3197038+thinkall@users.noreply.github.com>

* Clarify Python version support details

Co-authored-by: thinkall <3197038+thinkall@users.noreply.github.com>

---------

Co-authored-by: copilot-swe-agent[bot] <198982749+Copilot@users.noreply.github.com>
Co-authored-by: thinkall <3197038+thinkall@users.noreply.github.com>
2026-01-20 18:06:47 +08:00
Copilot
4ea9650f99 Fix nested dictionary merge in SearchThread losing sampled hyperparameters (#1494)
* Initial plan

* Add recursive dict update to fix nested config merge

Co-authored-by: thinkall <3197038+thinkall@users.noreply.github.com>

---------

Co-authored-by: copilot-swe-agent[bot] <198982749+Copilot@users.noreply.github.com>
Co-authored-by: thinkall <3197038+thinkall@users.noreply.github.com>
Co-authored-by: Li Jiang <bnujli@gmail.com>
2026-01-20 15:50:18 +08:00
Li Jiang
fa1a32afb6 Fix indents (#1493) 2026-01-20 11:18:58 +08:00
Copilot
5eb7d623b0 Expand docs to include all flamlized estimators (#1472)
* Initial plan

* Add documentation for all flamlized estimators (RandomForest, ExtraTrees, LGBMClassifier, XGBRegressor)

Co-authored-by: thinkall <3197038+thinkall@users.noreply.github.com>

* Fix markdown formatting per pre-commit

Co-authored-by: thinkall <3197038+thinkall@users.noreply.github.com>

---------

Co-authored-by: copilot-swe-agent[bot] <198982749+Copilot@users.noreply.github.com>
Co-authored-by: thinkall <3197038+thinkall@users.noreply.github.com>
Co-authored-by: Li Jiang <bnujli@gmail.com>
2026-01-20 10:59:48 +08:00
Copilot
22dcfcd3c0 Add comprehensive metric documentation and URL reference to AutoML docstrings (#1471)
* Initial plan

* Update AutoML metric documentation with full list and documentation link

Co-authored-by: thinkall <3197038+thinkall@users.noreply.github.com>

* Apply black and mdformat formatting to code and documentation

Co-authored-by: thinkall <3197038+thinkall@users.noreply.github.com>

* Apply pre-commit formatting fixes

Co-authored-by: thinkall <3197038+thinkall@users.noreply.github.com>

---------

Co-authored-by: copilot-swe-agent[bot] <198982749+Copilot@users.noreply.github.com>
Co-authored-by: thinkall <3197038+thinkall@users.noreply.github.com>
Co-authored-by: Li Jiang <bnujli@gmail.com>
2026-01-20 10:34:54 +08:00
Li Jiang
d7208b32d0 Bump version to 2.5.0 (#1492) 2026-01-20 10:30:39 +08:00
Copilot
5f1aa2dda8 Fix: Preserve FLAML_sample_size in best_config_per_estimator (#1475)
* Initial plan

* Fix: Preserve FLAML_sample_size in best_config_per_estimator

Modified best_config_per_estimator property to keep FLAML_sample_size when returning best configurations. Previously, AutoMLState.sanitize() was removing this key, which caused the sample size information to be lost when using starting_points from a previous run.

Co-authored-by: thinkall <3197038+thinkall@users.noreply.github.com>

* Add a test to verify the improvement of starting_points

* Update documentation to reflect FLAML_sample_size preservation

Updated Task-Oriented-AutoML.md to document that best_config_per_estimator now preserves FLAML_sample_size:
- Added note in "Warm start" section explaining that FLAML_sample_size is preserved for effective warm-starting
- Added note in "Get best configuration" section with example showing FLAML_sample_size in output
- Explains importance of sample size preservation for continuing optimization with correct sample sizes

Co-authored-by: thinkall <3197038+thinkall@users.noreply.github.com>

* Fix unintended code change

* Improve docstrings and docs

---------

Co-authored-by: copilot-swe-agent[bot] <198982749+Copilot@users.noreply.github.com>
Co-authored-by: thinkall <3197038+thinkall@users.noreply.github.com>
Co-authored-by: Li Jiang <bnujli@gmail.com>
Co-authored-by: Li Jiang <lijiang1@microsoft.com>
2026-01-20 07:42:31 +08:00
Copilot
67bdcde4d5 Fix BlendSearch OptunaSearch warning for non-hierarchical spaces with Ray Tune domains (#1477)
* Initial plan

* Fix BlendSearch OptunaSearch warning for non-hierarchical spaces

Co-authored-by: thinkall <3197038+thinkall@users.noreply.github.com>

* Clean up test file

Co-authored-by: thinkall <3197038+thinkall@users.noreply.github.com>

* Add regression test for BlendSearch UDF mode warning fix

Co-authored-by: thinkall <3197038+thinkall@users.noreply.github.com>

* Improve the fix and tests

* Fix Define-by-run function passed in  argument is not yet supported when using

---------

Co-authored-by: copilot-swe-agent[bot] <198982749+Copilot@users.noreply.github.com>
Co-authored-by: thinkall <3197038+thinkall@users.noreply.github.com>
Co-authored-by: Li Jiang <bnujli@gmail.com>
Co-authored-by: Li Jiang <lijiang1@microsoft.com>
2026-01-20 00:01:41 +08:00
Copilot
46a406edd4 Add objective parameter to LGBMEstimator search space (#1474)
* Initial plan

* Add objective parameter to LGBMEstimator search_space

Co-authored-by: thinkall <3197038+thinkall@users.noreply.github.com>

* Add test for LGBMEstimator objective parameter

Co-authored-by: thinkall <3197038+thinkall@users.noreply.github.com>

* Fix format error

* Remove changes, just add a test to verify the current supported usage

---------

Co-authored-by: copilot-swe-agent[bot] <198982749+Copilot@users.noreply.github.com>
Co-authored-by: thinkall <3197038+thinkall@users.noreply.github.com>
Co-authored-by: Li Jiang <bnujli@gmail.com>
Co-authored-by: Li Jiang <lijiang1@microsoft.com>
2026-01-19 21:10:21 +08:00
Li Jiang
f1817ea7b1 Add support to python 3.13 (#1486) 2026-01-19 18:31:43 +08:00
Li Jiang
f6a5163e6a Fix isinstance usage issues (#1488)
* Fix isinstance usage issues

* Pin python version to 3.12 for pre-commit

* Update mdformat to 0.7.22
2026-01-19 15:19:05 +08:00
Li Jiang
e64b486528 Fix Best Practices not shown (#1483)
* Simplify automl.fit calls in Best Practices

Removed 'retrain_full' and 'eval_method' parameters from automl.fit calls.

* Fix best practices not shown
2026-01-13 14:25:28 +08:00
Li Jiang
a74354f7a9 Update documents, Bump version to 2.4.1, Sync Fabric till 088cfb98 (#1482)
* Add best practices

* Update docs to reflect on the recent changes

* Improve model persisting best practices

* Bump version to 2.4.1

* List all estimators

* Remove autogen

* Update dependencies
2026-01-13 12:49:36 +08:00
Li Jiang
ced1d6f331 Support pickling the whole AutoML instance, Sync Fabric till 0d4ab16f (#1481) 2026-01-12 23:04:38 +08:00
Li Jiang
bb213e7ebd Add timeout for tests and remove macos test envs (#1479) 2026-01-10 22:48:54 +08:00
Li Jiang
d241e8de90 Update readme, enable all python versions for macos tests (#1478)
* Fix macOS hang with running coverage

* Run coverage only in ubuntu

* Fix syntax error

* Fix run tests logic

* Update readme

* Don't test python 3.10 on macos as it's stuck

* Enable all python versions for macos
2026-01-10 20:03:24 +08:00
60 changed files with 3956 additions and 490 deletions

243
.github/copilot-instructions.md vendored Normal file
View File

@@ -0,0 +1,243 @@
# GitHub Copilot Instructions for FLAML
## Project Overview
FLAML (Fast Library for Automated Machine Learning & Tuning) is a lightweight Python library for efficient automation of machine learning and AI operations. It automates workflow based on large language models, machine learning models, etc. and optimizes their performance.
**Key Components:**
- `flaml/automl/`: AutoML functionality for classification and regression
- `flaml/tune/`: Generic hyperparameter tuning
- `flaml/default/`: Zero-shot AutoML with default configurations
- `flaml/autogen/`: Legacy autogen code (note: AutoGen has moved to a separate repository)
- `flaml/fabric/`: Microsoft Fabric integration
- `test/`: Comprehensive test suite
## Build and Test Commands
### Installation
```bash
# Basic installation
pip install -e .
# Install with test dependencies
pip install -e .[test]
# Install with automl dependencies
pip install -e .[automl]
# Install with forecast dependencies (Linux only)
pip install -e .[forecast]
```
### Running Tests
```bash
# Run all tests (excluding autogen)
pytest test/ --ignore=test/autogen --reruns 2 --reruns-delay 10
# Run tests with coverage
coverage run -a -m pytest test --ignore=test/autogen --reruns 2 --reruns-delay 10
coverage xml
# Check dependencies
python test/check_dependency.py
```
### Linting and Formatting
```bash
# Run pre-commit hooks
pre-commit run --all-files
# Format with black (line length: 120)
black . --line-length 120
# Run ruff for linting and auto-fix
ruff check . --fix
```
## Code Style and Formatting
### Python Style
- **Line length:** 120 characters (configured in both Black and Ruff)
- **Formatter:** Black (v23.3.0+)
- **Linter:** Ruff with Pyflakes and pycodestyle rules
- **Import sorting:** Use isort (via Ruff)
- **Python version:** Supports Python >= 3.10 (full support for 3.10, 3.11, 3.12 and 3.13)
### Code Quality Rules
- Follow Black formatting conventions
- Keep imports sorted and organized
- Avoid unused imports (F401) - these are flagged but not auto-fixed
- Avoid wildcard imports (F403) where possible
- Complexity: Max McCabe complexity of 10
- Use type hints where appropriate
- Write clear docstrings for public APIs
### Pre-commit Hooks
The repository uses pre-commit hooks for:
- Checking for large files, AST syntax, YAML/TOML/JSON validity
- Detecting merge conflicts and private keys
- Trailing whitespace and end-of-file fixes
- pyupgrade for Python 3.8+ syntax
- Black formatting
- Markdown formatting (mdformat with GFM and frontmatter support)
- Ruff linting with auto-fix
## Testing Strategy
### Test Organization
- Tests are in the `test/` directory, organized by module
- `test/automl/`: AutoML feature tests
- `test/tune/`: Hyperparameter tuning tests
- `test/default/`: Zero-shot AutoML tests
- `test/nlp/`: NLP-related tests
- `test/spark/`: Spark integration tests
### Test Requirements
- Write tests for new functionality
- Ensure tests pass on multiple Python versions (3.10, 3.11, 3.12 and 3.13)
- Tests should work on both Ubuntu and Windows
- Use pytest markers for platform-specific tests (e.g., `@pytest.mark.spark`)
- Tests should be idempotent and not depend on external state
- Use `--reruns 2 --reruns-delay 10` for flaky tests
### Coverage
- Aim for good test coverage on new code
- Coverage reports are generated for Python 3.11 builds
- Coverage reports are uploaded to Codecov
## Git Workflow and Best Practices
### Branching
- Main branch: `main`
- Create feature branches from `main`
- PR reviews are required before merging
### Commit Messages
- Use clear, descriptive commit messages
- Reference issue numbers when applicable
- ALWAYS run `pre-commit run --all-files` before each commit to avoid formatting issues
### Pull Requests
- Ensure all tests pass before requesting review
- Update documentation if adding new features
- Follow the PR template in `.github/PULL_REQUEST_TEMPLATE.md`
- ALWAYS run `pre-commit run --all-files` before each commit to avoid formatting issues
## Project Structure
```
flaml/
├── automl/ # AutoML functionality
├── tune/ # Hyperparameter tuning
├── default/ # Zero-shot AutoML
├── autogen/ # Legacy autogen (deprecated, moved to separate repo)
├── fabric/ # Microsoft Fabric integration
├── onlineml/ # Online learning
└── version.py # Version information
test/ # Test suite
├── automl/
├── tune/
├── default/
├── nlp/
└── spark/
notebook/ # Example notebooks
website/ # Documentation website
```
## Dependencies and Package Management
### Core Dependencies
- NumPy >= 1.17
- Python >= 3.10 (officially supported: 3.10, 3.11, 3.12 and 3.13)
### Optional Dependencies
- `[automl]`: lightgbm, xgboost, scipy, pandas, scikit-learn
- `[test]`: Full test suite dependencies
- `[spark]`: PySpark and joblib dependencies
- `[forecast]`: holidays, prophet, statsmodels, hcrystalball, pytorch-forecasting, pytorch-lightning, tensorboardX
- `[hf]`: Hugging Face transformers and datasets
- See `setup.py` for complete list
### Version Constraints
- Be mindful of Python version-specific dependencies (check setup.py)
- XGBoost versions differ based on Python version
- NumPy 2.0+ only for Python >= 3.13
- Some features (like vowpalwabbit) only work with older Python versions
## Boundaries and Restrictions
### Do NOT Modify
- `.git/` directory and Git configuration
- `LICENSE` file
- Version information in `flaml/version.py` (unless explicitly updating version)
- GitHub Actions workflows without careful consideration
- Existing test files unless fixing bugs or adding coverage
### Be Cautious With
- `setup.py`: Changes to dependencies should be carefully reviewed
- `pyproject.toml`: Linting and testing configuration
- `.pre-commit-config.yaml`: Pre-commit hook configuration
- Backward compatibility: FLAML is a library with external users
### Security Considerations
- Never commit secrets or API keys
- Be careful with external data sources in tests
- Validate user inputs in public APIs
- Follow secure coding practices for ML operations
## Special Notes
### AutoGen Migration
- AutoGen has moved to a separate repository: https://github.com/microsoft/autogen
- The `flaml/autogen/` directory contains legacy code
- Tests in `test/autogen/` are ignored in the main test suite
- Direct users to the new AutoGen repository for AutoGen-related issues
### Platform-Specific Considerations
- Some tests only run on Linux (e.g., forecast tests with prophet)
- Windows and Ubuntu are the primary supported platforms
- macOS support exists but requires special libomp setup for lgbm/xgboost
### Performance
- FLAML focuses on efficient automation and tuning
- Consider computational cost when adding new features
- Optimize for low resource usage where possible
## Documentation
- Main documentation: https://microsoft.github.io/FLAML/
- Update documentation when adding new features
- Provide clear examples in docstrings
- Add notebook examples for significant new features
## Contributing
- Follow the contributing guide: https://microsoft.github.io/FLAML/docs/Contribute
- Sign the Microsoft CLA when making your first contribution
- Be respectful and follow the Microsoft Open Source Code of Conduct
- Join the Discord community for discussions: https://discord.gg/Cppx2vSPVP

View File

@@ -1,9 +1,7 @@
name: Code formatting
# see: https://help.github.com/en/actions/reference/events-that-trigger-workflows
on: # Trigger the workflow on push or pull request, but only for the main branch
push:
branches: [main]
on:
pull_request: {}
defaults:

View File

@@ -39,13 +39,8 @@ jobs:
strategy:
fail-fast: false
matrix:
os: [ubuntu-latest, macos-latest, windows-latest]
python-version: ["3.10", "3.11", "3.12"]
exclude:
- os: macos-latest
python-version: "3.10" # macOS runners will hang on python 3.10 for unknown reasons
- os: macos-latest
python-version: "3.12" # macOS runners will hang on python 3.12 for unknown reasons
os: [ubuntu-latest, windows-latest]
python-version: ["3.10", "3.11", "3.12", "3.13"]
steps:
- uses: actions/checkout@v4
- name: Set up Python ${{ matrix.python-version }}
@@ -79,6 +74,11 @@ jobs:
run: |
pip install pyspark==4.0.1
pip list | grep "pyspark"
- name: On Ubuntu python 3.13, install pyspark 4.1.0
if: matrix.python-version == '3.13' && matrix.os == 'ubuntu-latest'
run: |
pip install pyspark==4.1.0
pip list | grep "pyspark"
# # TODO: support ray
# - name: If linux and python<3.11, install ray 2
# if: matrix.os == 'ubuntu-latest' && matrix.python-version < '3.11'
@@ -103,10 +103,12 @@ jobs:
run: |
pip cache purge
- name: Test with pytest
timeout-minutes: 120
if: matrix.python-version != '3.11'
run: |
pytest test/ --ignore=test/autogen --reruns 2 --reruns-delay 10
- name: Coverage
timeout-minutes: 120
if: matrix.python-version == '3.11'
run: |
pip install coverage
@@ -119,6 +121,7 @@ jobs:
file: ./coverage.xml
flags: unittests
- name: Save dependencies
if: github.ref == 'refs/heads/main'
shell: bash
run: |
git config --global user.name 'github-actions[bot]'

View File

@@ -36,7 +36,7 @@ repos:
- id: black
- repo: https://github.com/executablebooks/mdformat
rev: 0.7.17
rev: 0.7.22
hooks:
- id: mdformat
additional_dependencies:

View File

@@ -4,8 +4,8 @@ This repository incorporates material as listed below or described in the code.
## Component. Ray.
Code in tune/\[analysis.py, sample.py, trial.py, result.py\],
searcher/\[suggestion.py, variant_generator.py\], and scheduler/trial_scheduler.py is adapted from
Code in tune/[analysis.py, sample.py, trial.py, result.py],
searcher/[suggestion.py, variant_generator.py], and scheduler/trial_scheduler.py is adapted from
https://github.com/ray-project/ray/blob/master/python/ray/tune/
## Open Source License/Copyright Notice.

View File

@@ -34,7 +34,7 @@ FLAML has a .NET implementation in [ML.NET](http://dot.net/ml), an open-source,
## Installation
FLAML requires **Python version >= 3.9**. It can be installed from pip:
The latest version of FLAML requires **Python >= 3.10 and < 3.14**. While other Python versions may work for core components, full model support is not guaranteed. FLAML can be installed via `pip`:
```bash
pip install flaml

View File

@@ -12,7 +12,7 @@ If you believe you have found a security vulnerability in any Microsoft-owned re
Instead, please report them to the Microsoft Security Response Center (MSRC) at [https://msrc.microsoft.com/create-report](https://msrc.microsoft.com/create-report).
If you prefer to submit without logging in, send email to [secure@microsoft.com](mailto:secure@microsoft.com). If possible, encrypt your message with our PGP key; please download it from the [Microsoft Security Response Center PGP Key page](https://www.microsoft.com/en-us/msrc/pgp-key-msrc).
If you prefer to submit without logging in, send email to [secure@microsoft.com](mailto:secure@microsoft.com). If possible, encrypt your message with our PGP key; please download it from the [Microsoft Security Response Center PGP Key page](https://www.microsoft.com/en-us/msrc/pgp-key-msrc).
You should receive a response within 24 hours. If for some reason you do not, please follow up via email to ensure we received your original message. Additional information can be found at [microsoft.com/msrc](https://www.microsoft.com/msrc).

View File

@@ -118,6 +118,8 @@ class AutoML(BaseEstimator):
e.g., 'accuracy', 'roc_auc', 'roc_auc_ovr', 'roc_auc_ovo', 'roc_auc_weighted',
'roc_auc_ovo_weighted', 'roc_auc_ovr_weighted', 'f1', 'micro_f1', 'macro_f1',
'log_loss', 'mae', 'mse', 'r2', 'mape'. Default is 'auto'.
For a full list of supported built-in metrics, please refer to
https://microsoft.github.io/FLAML/docs/Use-Cases/Task-Oriented-AutoML#optimization-metric
If passing a customized metric function, the function needs to
have the following input arguments:
@@ -154,6 +156,10 @@ class AutoML(BaseEstimator):
"pred_time": pred_time,
}
```
**Note:** When passing a custom metric function, pass the function itself
(e.g., `metric=custom_metric`), not the result of calling it
(e.g., `metric=custom_metric(...)`). FLAML will call your function
internally during the training process.
task: A string of the task type, e.g.,
'classification', 'regression', 'ts_forecast', 'rank',
'seq-classification', 'seq-regression', 'summarization',
@@ -174,6 +180,11 @@ class AutoML(BaseEstimator):
and 'final_estimator' to specify the passthrough and
final_estimator in the stacker. The dict can also contain
'n_jobs' as the key to specify the number of jobs for the stacker.
Note: The hyperparameters of a custom 'final_estimator' are NOT
automatically tuned. If you provide an estimator instance (e.g.,
CatBoostClassifier()), it will use the parameters you specified
or their defaults. If 'final_estimator' is not provided, the best
model found during the search will be used as the final estimator.
eval_method: A string of resampling strategy, one of
['auto', 'cv', 'holdout'].
split_ratio: A float of the valiation data percentage for holdout.
@@ -332,6 +343,12 @@ class AutoML(BaseEstimator):
}
```
skip_transform: boolean, default=False | Whether to pre-process data prior to modeling.
allow_label_overlap: boolean, default=True | For classification tasks with holdout evaluation,
whether to allow label overlap between train and validation sets. When True (default),
uses a fast strategy that adds the first instance of missing labels to the set that is
missing them, which may create some overlap. When False, uses a precise but slower
strategy that intelligently re-splits instances to avoid overlap when possible.
Only affects classification tasks with holdout evaluation method.
fit_kwargs_by_estimator: dict, default=None | The user specified keywords arguments, grouped by estimator name.
e.g.,
@@ -362,7 +379,10 @@ class AutoML(BaseEstimator):
settings["split_ratio"] = settings.get("split_ratio", SPLIT_RATIO)
settings["n_splits"] = settings.get("n_splits", N_SPLITS)
settings["auto_augment"] = settings.get("auto_augment", True)
settings["allow_label_overlap"] = settings.get("allow_label_overlap", True)
settings["metric"] = settings.get("metric", "auto")
# Validate that custom metric is callable if not a string
self._validate_metric_parameter(settings["metric"], allow_auto=True)
settings["estimator_list"] = settings.get("estimator_list", "auto")
settings["log_file_name"] = settings.get("log_file_name", "")
settings["max_iter"] = settings.get("max_iter") # no budget by default
@@ -413,13 +433,69 @@ class AutoML(BaseEstimator):
"""
state = self.__dict__.copy()
state.pop("mlflow_integration", None)
# Keep mlflow_integration for post-load visualization (e.g., infos), but
# strip non-picklable runtime-only members (thread futures, clients).
mlflow_integration = state.get("mlflow_integration", None)
if mlflow_integration is not None:
import copy
mi = copy.copy(mlflow_integration)
# These are runtime-only and often contain locks / threads.
if hasattr(mi, "futures"):
mi.futures = {}
if hasattr(mi, "futures_log_model"):
mi.futures_log_model = {}
if hasattr(mi, "train_func"):
mi.train_func = None
if hasattr(mi, "mlflow_client"):
mi.mlflow_client = None
state["mlflow_integration"] = mi
# MLflow signature objects may hold references to Spark/pandas-on-Spark
# inputs and can indirectly capture SparkContext, which is not picklable.
state.pop("estimator_signature", None)
state.pop("pipeline_signature", None)
return state
def __setstate__(self, state):
self.__dict__.update(state)
# Ensure attribute exists post-unpickle.
self.mlflow_integration = None
# Ensure mlflow_integration runtime members exist post-unpickle.
mi = getattr(self, "mlflow_integration", None)
if mi is not None:
if not hasattr(mi, "futures") or mi.futures is None:
mi.futures = {}
if not hasattr(mi, "futures_log_model") or mi.futures_log_model is None:
mi.futures_log_model = {}
if not hasattr(mi, "train_func"):
mi.train_func = None
if not hasattr(mi, "mlflow_client") or mi.mlflow_client is None:
try:
import mlflow as _mlflow
mi.mlflow_client = _mlflow.tracking.MlflowClient()
except Exception:
mi.mlflow_client = None
@staticmethod
def _validate_metric_parameter(metric, allow_auto=True):
"""Validate that the metric parameter is either a string or a callable function.
Args:
metric: The metric parameter to validate.
allow_auto: Whether to allow "auto" as a valid string value.
Raises:
ValueError: If metric is not a string or callable function.
"""
if allow_auto and metric == "auto":
return
if not isinstance(metric, str) and not callable(metric):
raise ValueError(
f"The 'metric' parameter must be either a string or a callable function, "
f"but got {type(metric).__name__}. "
f"If you defined a custom_metric function, make sure to pass the function itself "
f"(e.g., metric=custom_metric) and not the result of calling it "
f"(e.g., metric=custom_metric(...))."
)
def get_params(self, deep: bool = False) -> dict:
return self._settings.copy()
@@ -469,18 +545,135 @@ class AutoML(BaseEstimator):
@property
def best_config(self):
"""A dictionary of the best configuration."""
"""A dictionary of the best configuration.
The returned config dictionary can be used to:
1. Pass as `starting_points` to a new AutoML run.
2. Initialize the corresponding FLAML estimator directly.
3. Initialize the original model (e.g., LightGBM, XGBoost) after converting
FLAML-specific parameters.
Note:
The config contains FLAML's search space parameters, which may differ from
the original model's parameters. For example, FLAML uses `log_max_bin` for
LightGBM instead of `max_bin`. Use the FLAML estimator's `config2params()`
method to convert to the original model's parameters.
Example:
```python
from flaml import AutoML
from flaml.automl.model import LGBMEstimator
from lightgbm import LGBMClassifier
from sklearn.datasets import load_iris
X, y = load_iris(return_X_y=True)
# Train with AutoML
automl = AutoML()
automl.fit(X, y, task="classification", time_budget=10)
# Get the best config
best_config = automl.best_config
print("Best config:", best_config)
# Example output: {'n_estimators': 4, 'num_leaves': 4, 'min_child_samples': 20,
# 'learning_rate': 0.1, 'log_max_bin': 8, ...}
# Option 1: Use FLAML estimator directly (handles parameter conversion internally)
flaml_estimator = LGBMEstimator(task="classification", **best_config)
flaml_estimator.fit(X, y)
# Option 2: Convert to original model parameters using config2params()
# This converts FLAML-specific params (e.g., log_max_bin -> max_bin)
original_params = flaml_estimator.params # or use flaml_estimator.config2params(best_config)
print("Original model params:", original_params)
# Example output: {'n_estimators': 4, 'num_leaves': 4, 'min_child_samples': 20,
# 'learning_rate': 0.1, 'max_bin': 255, ...} # log_max_bin converted to max_bin
# Now use with original LightGBM
lgbm_model = LGBMClassifier(**original_params)
lgbm_model.fit(X, y)
```
"""
state = self._search_states.get(self._best_estimator)
config = state and getattr(state, "best_config", None)
return config and AutoMLState.sanitize(config)
@property
def best_config_per_estimator(self):
"""A dictionary of all estimators' best configuration."""
return {
e: e_search_state.best_config and AutoMLState.sanitize(e_search_state.best_config)
for e, e_search_state in self._search_states.items()
}
"""A dictionary of all estimators' best configuration.
Returns a dictionary where keys are estimator names (e.g., 'lgbm', 'xgboost')
and values are the best hyperparameter configurations found for each estimator.
The config may include `FLAML_sample_size` which indicates the sample size used
during training.
This is useful for:
1. Passing as `starting_points` to a new AutoML run for warm-starting.
2. Comparing the best configurations across different estimators.
3. Initializing the original models after converting FLAML-specific parameters.
Note:
The configs contain FLAML's search space parameters, which may differ from
the original models' parameters. Use each estimator's `config2params()` method
to convert to the original model's parameters.
Example:
```python
from flaml import AutoML
from flaml.automl.model import LGBMEstimator, XGBoostEstimator
from lightgbm import LGBMClassifier
from xgboost import XGBClassifier
from sklearn.datasets import load_iris
X, y = load_iris(return_X_y=True)
# Train with AutoML
automl = AutoML()
automl.fit(X, y, task="classification", time_budget=30,
estimator_list=['lgbm', 'xgboost'])
# Get best configs for all estimators
configs = automl.best_config_per_estimator
print(configs)
# Example output: {'lgbm': {'n_estimators': 4, 'num_leaves': 4, 'log_max_bin': 8, ...},
# 'xgboost': {'n_estimators': 4, 'max_leaves': 4, ...}}
# Use as starting points for a new AutoML run (warm start)
new_automl = AutoML()
new_automl.fit(X, y, task="classification", time_budget=30,
starting_points=configs)
# Or convert to original model parameters for direct use
if configs.get('lgbm'):
lgbm_config = configs['lgbm'].copy()
lgbm_config.pop('FLAML_sample_size', None) # Remove FLAML internal param
flaml_lgbm = LGBMEstimator(task="classification", **lgbm_config)
original_lgbm_params = flaml_lgbm.params # Converted params (log_max_bin -> max_bin), or use flaml_lgbm.config2params(lgbm_config)
lgbm_model = LGBMClassifier(**original_lgbm_params)
lgbm_model.fit(X, y)
if configs.get('xgboost'):
xgb_config = configs['xgboost'].copy()
xgb_config.pop('FLAML_sample_size', None) # Remove FLAML internal param
flaml_xgb = XGBoostEstimator(task="classification", **xgb_config)
original_xgb_params = flaml_xgb.params # Converted params
xgb_model = XGBClassifier(**original_xgb_params)
xgb_model.fit(X, y)
```
"""
result = {}
for e, e_search_state in self._search_states.items():
if e_search_state.best_config:
config = e_search_state.best_config.get("ml", e_search_state.best_config).copy()
# Remove internal keys that are not needed for starting_points, but keep FLAML_sample_size
config.pop("learner", None)
config.pop("_choice_", None)
result[e] = config
else:
result[e] = None
return result
@property
def best_loss_per_estimator(self):
@@ -596,7 +789,7 @@ class AutoML(BaseEstimator):
def predict(
self,
X: np.array | DataFrame | list[str] | list[list[str]] | psDataFrame,
X: np.ndarray | DataFrame | list[str] | list[list[str]] | psDataFrame,
**pred_kwargs,
):
"""Predict label from features.
@@ -662,6 +855,50 @@ class AutoML(BaseEstimator):
proba = self._trained_estimator.predict_proba(X, **pred_kwargs)
return proba
def preprocess(
self,
X: np.ndarray | DataFrame | list[str] | list[list[str]] | psDataFrame,
):
"""Preprocess data using task-level preprocessing.
This method applies task-level preprocessing transformations to the input data,
including handling of data types, sparse matrices, and feature transformations
that were learned during the fit phase. This should be called before any
estimator-level preprocessing.
Args:
X: A numpy array or pandas dataframe or pyspark.pandas dataframe
of featurized instances, shape n * m,
or for time series forecast tasks:
a pandas dataframe with the first column containing
timestamp values (datetime type) or an integer n for
the predict steps (only valid when the estimator is
arima or sarimax). Other columns in the dataframe
are assumed to be exogenous variables (categorical
or numeric).
Returns:
Preprocessed data in the same format as input (numpy array, DataFrame, etc.).
Raises:
AttributeError: If the model has not been fitted yet.
Example:
```python
automl = AutoML()
automl.fit(X_train, y_train, task="classification")
# Apply task-level preprocessing to new data
X_test_preprocessed = automl.preprocess(X_test)
```
"""
if not hasattr(self, "_state") or self._state is None:
raise AttributeError("AutoML instance has not been fitted yet. Please call fit() first.")
if not hasattr(self, "_transformer"):
raise AttributeError("Transformer not initialized. Please call fit() first.")
return self._state.task.preprocess(X, self._transformer)
def add_learner(self, learner_name, learner_class):
"""Add a customized learner.
@@ -820,6 +1057,14 @@ class AutoML(BaseEstimator):
the searched learners, such as sample_weight. Below are a few examples of
estimator-specific parameters:
period: int | forecast horizon for all time series forecast tasks.
This is the number of time steps ahead to forecast (e.g., period=12 means
forecasting 12 steps into the future). This represents the forecast horizon
used during model training. Note: during prediction, the output length
equals the length of X_test. FLAML automatically handles feature
engineering for you - sklearn-based models (lgbm, rf, xgboost, etc.) will have
lagged features created automatically, while time series native models (prophet,
arima, sarimax) use their built-in forecasting capabilities. You do NOT need
to manually create lagged features of the target variable.
gpu_per_trial: float, default = 0 | A float of the number of gpus per trial,
only used by TransformersEstimator, XGBoostSklearnEstimator, and
TemporalFusionTransformerEstimator.
@@ -927,6 +1172,7 @@ class AutoML(BaseEstimator):
eval_method = self._decide_eval_method(eval_method, time_budget)
self.modelcount = 0
self._auto_augment = auto_augment
self._allow_label_overlap = self._settings.get("allow_label_overlap", True)
self._prepare_data(eval_method, split_ratio, n_splits)
self._state.time_budget = -1
self._state.free_mem_ratio = 0
@@ -1114,17 +1360,344 @@ class AutoML(BaseEstimator):
return self._state.data_size[0] if self._sample else None
def pickle(self, output_file_name):
"""Serialize the AutoML instance to a pickle file.
Notes:
When the trained estimator(s) are Spark-based, they may hold references
to SparkContext/SparkSession via Spark ML objects. Such objects are not
safely picklable and can cause pickling/broadcast errors.
This method externalizes Spark ML models into an adjacent artifact
directory and stores only lightweight metadata in the pickle.
"""
import os
import pickle
import re
def _safe_name(name: str) -> str:
return re.sub(r"[^A-Za-z0-9_.-]+", "_", name)
def _iter_trained_estimators():
trained = getattr(self, "_trained_estimator", None)
if trained is not None:
yield "_trained_estimator", trained
for est_name in getattr(self, "estimator_list", []) or []:
ss = getattr(self, "_search_states", {}).get(est_name)
te = ss and getattr(ss, "trained_estimator", None)
if te is not None:
yield f"_search_states.{est_name}.trained_estimator", te
def _scrub_pyspark_refs(root_obj):
"""Best-effort removal of pyspark objects prior to pickling.
SparkContext/SparkSession and Spark DataFrame objects are not picklable.
This function finds such objects within common containers and instance
attributes and replaces them with None, returning a restore mapping.
"""
try:
import pyspark
from pyspark.broadcast import Broadcast
from pyspark.sql import DataFrame as SparkDataFrame
from pyspark.sql import SparkSession
try:
import pyspark.pandas as ps
psDataFrameType = getattr(ps, "DataFrame", None)
psSeriesType = getattr(ps, "Series", None)
except Exception:
psDataFrameType = None
psSeriesType = None
bad_types = [
pyspark.SparkContext,
SparkSession,
SparkDataFrame,
Broadcast,
]
if psDataFrameType is not None:
bad_types.append(psDataFrameType)
if psSeriesType is not None:
bad_types.append(psSeriesType)
bad_types = tuple(t for t in bad_types if t is not None)
except Exception:
return {}
restore = {}
visited = set()
def _mark(parent, key, value, path):
restore[(id(parent), key)] = (parent, key, value)
try:
if isinstance(parent, dict):
parent[key] = None
elif isinstance(parent, list):
parent[key] = None
elif isinstance(parent, tuple):
# tuples are immutable; we can't modify in-place
pass
else:
setattr(parent, key, None)
except Exception:
# Best-effort.
pass
def _walk(obj, depth, parent=None, key=None, path="self"):
if obj is None:
return
oid = id(obj)
if oid in visited:
return
visited.add(oid)
if isinstance(obj, bad_types):
if parent is not None:
_mark(parent, key, obj, path)
return
if depth <= 0:
return
if isinstance(obj, dict):
for k, v in list(obj.items()):
_walk(v, depth - 1, parent=obj, key=k, path=f"{path}[{k!r}]")
return
if isinstance(obj, list):
for i, v in enumerate(list(obj)):
_walk(v, depth - 1, parent=obj, key=i, path=f"{path}[{i}]")
return
if isinstance(obj, tuple):
# Can't scrub inside tuples safely; but still inspect for diagnostics.
for i, v in enumerate(obj):
_walk(v, depth - 1, parent=None, key=None, path=f"{path}[{i}]")
return
if isinstance(obj, set):
for v in list(obj):
_walk(v, depth - 1, parent=None, key=None, path=f"{path}{{...}}")
return
d = getattr(obj, "__dict__", None)
if isinstance(d, dict):
for attr, v in list(d.items()):
_walk(v, depth - 1, parent=obj, key=attr, path=f"{path}.{attr}")
_walk(root_obj, depth=6)
return restore
# Temporarily remove non-picklable pieces (e.g., SparkContext-backed objects)
# and externalize spark models.
estimator_to_training_function = {}
spark_restore = []
artifact_dir = None
state_restore = {}
automl_restore = {}
scrub_restore = {}
try:
# Signatures are only used for MLflow logging; they are not required
# for inference and can capture SparkContext via pyspark objects.
for attr in ("estimator_signature", "pipeline_signature"):
if hasattr(self, attr):
automl_restore[attr] = getattr(self, attr)
setattr(self, attr, None)
for estimator in self.estimator_list:
search_state = self._search_states[estimator]
if hasattr(search_state, "training_function"):
estimator_to_training_function[estimator] = search_state.training_function
del search_state.training_function
# AutoMLState may keep Spark / pandas-on-Spark dataframes which are not picklable.
# They are not required for inference, so strip them for serialization.
state = getattr(self, "_state", None)
if state is not None:
for attr in (
"X_train",
"y_train",
"X_train_all",
"y_train_all",
"X_val",
"y_val",
"weight_val",
"groups_val",
"sample_weight_all",
"groups",
"groups_all",
"kf",
):
if hasattr(state, attr):
state_restore[attr] = getattr(state, attr)
setattr(state, attr, None)
for key, est in _iter_trained_estimators():
if getattr(est, "estimator_baseclass", None) != "spark":
continue
# Drop training data reference (Spark DataFrame / pandas-on-Spark).
old_df_train = getattr(est, "df_train", None)
old_model = getattr(est, "_model", None)
model_meta = None
if old_model is not None:
if artifact_dir is None:
artifact_dir = output_file_name + ".flaml_artifacts"
os.makedirs(artifact_dir, exist_ok=True)
# store relative dirname so the pickle+folder can be moved together
self._flaml_pickle_artifacts_dirname = os.path.basename(artifact_dir)
model_dir = os.path.join(artifact_dir, _safe_name(key))
# Spark ML models are saved as directories.
try:
writer = old_model.write()
writer.overwrite().save(model_dir)
except Exception as e:
raise RuntimeError(
"Failed to externalize Spark model for pickling. "
"Please ensure the Spark ML model supports write().overwrite().save(path)."
) from e
model_meta = {
"path": os.path.relpath(model_dir, os.path.dirname(output_file_name) or "."),
"class": old_model.__class__.__module__ + "." + old_model.__class__.__name__,
}
# Replace in-memory Spark model with metadata only.
est._model = None
est._flaml_spark_model_meta = model_meta
est.df_train = None
spark_restore.append((est, old_model, old_df_train, model_meta))
with open(output_file_name, "wb") as f:
try:
pickle.dump(self, f, pickle.HIGHEST_PROTOCOL)
except Exception:
# Some pyspark objects can still be captured indirectly.
scrub_restore = _scrub_pyspark_refs(self)
if scrub_restore:
f.seek(0)
f.truncate()
pickle.dump(self, f, pickle.HIGHEST_PROTOCOL)
else:
raise
finally:
# Restore training_function and Spark models so current object remains usable.
for estimator, tf in estimator_to_training_function.items():
self._search_states[estimator].training_function = tf
for attr, val in automl_restore.items():
setattr(self, attr, val)
state = getattr(self, "_state", None)
if state is not None and state_restore:
for attr, val in state_restore.items():
setattr(state, attr, val)
for est, old_model, old_df_train, model_meta in spark_restore:
est._model = old_model
est.df_train = old_df_train
if model_meta is not None and hasattr(est, "_flaml_spark_model_meta"):
delattr(est, "_flaml_spark_model_meta")
if scrub_restore:
for _, (parent, key, value) in scrub_restore.items():
try:
if isinstance(parent, dict):
parent[key] = value
elif isinstance(parent, list):
parent[key] = value
else:
setattr(parent, key, value)
except Exception:
pass
@classmethod
def load_pickle(cls, input_file_name: str, load_spark_models: bool = True):
"""Load an AutoML instance saved by :meth:`pickle`.
Args:
input_file_name: Path to the pickle file created by :meth:`pickle`.
load_spark_models: Whether to load externalized Spark ML models back
into the estimator objects. If False, Spark estimators will remain
without their underlying Spark model and cannot be used for predict.
Returns:
The deserialized AutoML instance.
"""
import importlib
import os
import pickle
estimator_to_training_function = {}
for estimator in self.estimator_list:
search_state = self._search_states[estimator]
if hasattr(search_state, "training_function"):
estimator_to_training_function[estimator] = search_state.training_function
del search_state.training_function
with open(input_file_name, "rb") as f:
automl = pickle.load(f)
with open(output_file_name, "wb") as f:
pickle.dump(self, f, pickle.HIGHEST_PROTOCOL)
# Recreate per-estimator training_function if it was removed for pickling.
try:
for est_name, ss in getattr(automl, "_search_states", {}).items():
if not hasattr(ss, "training_function"):
ss.training_function = partial(
AutoMLState._compute_with_config_base,
state=automl._state,
estimator=est_name,
)
except Exception:
# Best-effort; training_function is only needed for re-searching.
pass
if not load_spark_models:
return automl
base_dir = os.path.dirname(input_file_name) or "."
def _iter_trained_estimators_loaded():
trained = getattr(automl, "_trained_estimator", None)
if trained is not None:
yield trained
for ss in getattr(automl, "_search_states", {}).values():
te = ss and getattr(ss, "trained_estimator", None)
if te is not None:
yield te
for est in _iter_trained_estimators_loaded():
meta = getattr(est, "_flaml_spark_model_meta", None)
if not meta:
continue
model_path = meta.get("path")
model_class = meta.get("class")
if not model_path or not model_class:
continue
abs_model_path = os.path.join(base_dir, model_path)
module_name, _, class_name = model_class.rpartition(".")
try:
module = importlib.import_module(module_name)
model_cls = getattr(module, class_name)
except Exception as e:
raise RuntimeError(f"Failed to import Spark model class '{model_class}'") from e
# Most Spark ML models support either Class.load(path) or Class.read().load(path).
if hasattr(model_cls, "load"):
est._model = model_cls.load(abs_model_path)
elif hasattr(model_cls, "read"):
est._model = model_cls.read().load(abs_model_path)
else:
try:
from pyspark.ml.pipeline import PipelineModel
loaded_model = PipelineModel.load(abs_model_path)
if not isinstance(loaded_model, model_cls):
raise RuntimeError(
f"Loaded model type '{type(loaded_model).__name__}' does not match expected type '{model_class}'."
)
est._model = loaded_model
except Exception as e:
raise RuntimeError(
f"Spark model class '{model_class}' does not support load/read(). "
"Unable to restore Spark model from artifacts."
) from e
return automl
@property
def trainable(self) -> Callable[[dict], float | None]:
@@ -1203,6 +1776,7 @@ class AutoML(BaseEstimator):
n_splits,
self._df,
self._sample_weight_full,
self._allow_label_overlap,
)
self.data_size_full = self._state.data_size_full
@@ -1259,6 +1833,7 @@ class AutoML(BaseEstimator):
time_col=None,
cv_score_agg_func=None,
skip_transform=None,
allow_label_overlap=True,
mlflow_logging=None,
fit_kwargs_by_estimator=None,
mlflow_exp_name=None,
@@ -1287,6 +1862,8 @@ class AutoML(BaseEstimator):
e.g., 'accuracy', 'roc_auc', 'roc_auc_ovr', 'roc_auc_ovo', 'roc_auc_weighted',
'roc_auc_ovo_weighted', 'roc_auc_ovr_weighted', 'f1', 'micro_f1', 'macro_f1',
'log_loss', 'mae', 'mse', 'r2', 'mape'. Default is 'auto'.
For a full list of supported built-in metrics, please refer to
https://microsoft.github.io/FLAML/docs/Use-Cases/Task-Oriented-AutoML#optimization-metric
If passing a customized metric function, the function needs to
have the following input arguments:
@@ -1323,6 +1900,10 @@ class AutoML(BaseEstimator):
"pred_time": pred_time,
}
```
**Note:** When passing a custom metric function, pass the function itself
(e.g., `metric=custom_metric`), not the result of calling it
(e.g., `metric=custom_metric(...)`). FLAML will call your function
internally during the training process.
task: A string of the task type, e.g.,
'classification', 'regression', 'ts_forecast_regression',
'ts_forecast_classification', 'rank', 'seq-classification',
@@ -1345,6 +1926,11 @@ class AutoML(BaseEstimator):
and 'final_estimator' to specify the passthrough and
final_estimator in the stacker. The dict can also contain
'n_jobs' as the key to specify the number of jobs for the stacker.
Note: The hyperparameters of a custom 'final_estimator' are NOT
automatically tuned. If you provide an estimator instance (e.g.,
CatBoostClassifier()), it will use the parameters you specified
or their defaults. If 'final_estimator' is not provided, the best
model found during the search will be used as the final estimator.
eval_method: A string of resampling strategy, one of
['auto', 'cv', 'holdout'].
split_ratio: A float of the valiation data percentage for holdout.
@@ -1534,6 +2120,12 @@ class AutoML(BaseEstimator):
```
skip_transform: boolean, default=False | Whether to pre-process data prior to modeling.
allow_label_overlap: boolean, default=True | For classification tasks with holdout evaluation,
whether to allow label overlap between train and validation sets. When True (default),
uses a fast strategy that adds the first instance of missing labels to the set that is
missing them, which may create some overlap. When False, uses a precise but slower
strategy that intelligently re-splits instances to avoid overlap when possible.
Only affects classification tasks with holdout evaluation method.
mlflow_logging: boolean, default=None | Whether to log the training results to mlflow.
Default value is None, which means the logging decision is made based on
AutoML.__init__'s mlflow_logging argument. Not valid if mlflow is not installed.
@@ -1567,6 +2159,14 @@ class AutoML(BaseEstimator):
the searched learners, such as sample_weight. Below are a few examples of
estimator-specific parameters:
period: int | forecast horizon for all time series forecast tasks.
This is the number of time steps ahead to forecast (e.g., period=12 means
forecasting 12 steps into the future). This represents the forecast horizon
used during model training. Note: during prediction, the output length
equals the length of X_test. FLAML automatically handles feature
engineering for you - sklearn-based models (lgbm, rf, xgboost, etc.) will have
lagged features created automatically, while time series native models (prophet,
arima, sarimax) use their built-in forecasting capabilities. You do NOT need
to manually create lagged features of the target variable.
gpu_per_trial: float, default = 0 | A float of the number of gpus per trial,
only used by TransformersEstimator, XGBoostSklearnEstimator, and
TemporalFusionTransformerEstimator.
@@ -1603,7 +2203,10 @@ class AutoML(BaseEstimator):
split_ratio = split_ratio or self._settings.get("split_ratio")
n_splits = n_splits or self._settings.get("n_splits")
auto_augment = self._settings.get("auto_augment") if auto_augment is None else auto_augment
metric = metric or self._settings.get("metric")
allow_label_overlap = (
self._settings.get("allow_label_overlap") if allow_label_overlap is None else allow_label_overlap
)
metric = self._settings.get("metric") if metric is None else metric
estimator_list = estimator_list or self._settings.get("estimator_list")
log_file_name = self._settings.get("log_file_name") if log_file_name is None else log_file_name
max_iter = self._settings.get("max_iter") if max_iter is None else max_iter
@@ -1785,6 +2388,7 @@ class AutoML(BaseEstimator):
self._retrain_in_budget = retrain_full == "budget" and (eval_method == "holdout" and self._state.X_val is None)
self._auto_augment = auto_augment
self._allow_label_overlap = allow_label_overlap
_sample_size_from_starting_points = {}
if isinstance(starting_points, dict):
@@ -1842,6 +2446,9 @@ class AutoML(BaseEstimator):
and (self._min_sample_size * SAMPLE_MULTIPLY_FACTOR < self._state.data_size[0])
)
# Validate metric parameter before processing
self._validate_metric_parameter(metric, allow_auto=True)
metric = task.default_metric(metric)
self._state.metric = metric
@@ -2488,7 +3095,7 @@ class AutoML(BaseEstimator):
)
logger.info(
" at {:.1f}s,\testimator {}'s best error={:.4f},\tbest estimator {}'s best error={:.4f}".format(
" at {:.1f}s,\testimator {}'s best error={:.4e},\tbest estimator {}'s best error={:.4e}".format(
self._state.time_from_start,
estimator,
search_state.best_loss,
@@ -2665,6 +3272,10 @@ class AutoML(BaseEstimator):
# the total degree of parallelization = parallelization degree per estimator * parallelization degree of ensemble
)
if isinstance(self._ensemble, dict):
# Note: If a custom final_estimator is provided, it is used as-is without
# hyperparameter tuning. The user is responsible for setting appropriate
# parameters or using defaults. If not provided, the best model found
# during the search (self._trained_estimator) is used.
final_estimator = self._ensemble.get("final_estimator", self._trained_estimator)
passthrough = self._ensemble.get("passthrough", True)
ensemble_n_jobs = self._ensemble.get("n_jobs", ensemble_n_jobs)

View File

@@ -311,14 +311,14 @@ def get_y_pred(estimator, X, eval_metric, task: Task):
else:
y_pred = estimator.predict(X)
if isinstance(y_pred, Series) or isinstance(y_pred, DataFrame):
if isinstance(y_pred, (Series, DataFrame)):
y_pred = y_pred.values
return y_pred
def to_numpy(x):
if isinstance(x, Series or isinstance(x, DataFrame)):
if isinstance(x, (Series, DataFrame)):
x = x.values
else:
x = np.ndarray(x)
@@ -586,7 +586,7 @@ def _eval_estimator(
# TODO: why are integer labels being cast to str in the first place?
if isinstance(val_pred_y, Series) or isinstance(val_pred_y, DataFrame) or isinstance(val_pred_y, np.ndarray):
if isinstance(val_pred_y, (Series, DataFrame, np.ndarray)):
test = val_pred_y if isinstance(val_pred_y, np.ndarray) else val_pred_y.values
if not np.issubdtype(test.dtype, np.number):
# some NLP models return a list

View File

@@ -26,6 +26,13 @@ from sklearn.preprocessing import Normalizer
from sklearn.svm import LinearSVC
from xgboost import __version__ as xgboost_version
try:
from sklearn.utils._tags import ClassifierTags, RegressorTags
SKLEARN_TAGS_AVAILABLE = True
except ImportError:
SKLEARN_TAGS_AVAILABLE = False
from flaml import tune
from flaml.automl.data import group_counts
from flaml.automl.spark import ERROR as SPARK_ERROR
@@ -135,6 +142,7 @@ class BaseEstimator(sklearn.base.ClassifierMixin, sklearn.base.BaseEstimator):
self._task = task if isinstance(task, Task) else task_factory(task, None, None)
self.params = self.config2params(config)
self.estimator_class = self._model = None
self.estimator_baseclass = "sklearn"
if "_estimator_type" in self.params:
self._estimator_type = self.params.pop("_estimator_type")
else:
@@ -147,6 +155,25 @@ class BaseEstimator(sklearn.base.ClassifierMixin, sklearn.base.BaseEstimator):
params["_estimator_type"] = self._estimator_type
return params
def __sklearn_tags__(self):
"""Override sklearn tags to respect the _estimator_type attribute.
This is needed for sklearn 1.7+ which uses get_tags() instead of
checking _estimator_type directly. Since BaseEstimator inherits from
ClassifierMixin, it would otherwise always be tagged as a classifier.
"""
tags = super().__sklearn_tags__()
if hasattr(self, "_estimator_type") and SKLEARN_TAGS_AVAILABLE:
if self._estimator_type == "regressor":
tags.estimator_type = "regressor"
tags.regressor_tags = RegressorTags()
tags.classifier_tags = None
elif self._estimator_type == "classifier":
tags.estimator_type = "classifier"
tags.classifier_tags = ClassifierTags()
tags.regressor_tags = None
return tags
@property
def classes_(self):
return self._model.classes_
@@ -294,6 +321,35 @@ class BaseEstimator(sklearn.base.ClassifierMixin, sklearn.base.BaseEstimator):
train_time = self._fit(X_train, y_train, **kwargs)
return train_time
def preprocess(self, X):
"""Preprocess data using estimator-level preprocessing.
This method applies estimator-specific preprocessing transformations to the input data.
This is the second level of preprocessing that should be applied after task-level
preprocessing (automl.preprocess()). Different estimator types may apply different
preprocessing steps (e.g., sparse matrix conversion, dataframe handling).
Args:
X: A numpy array or a dataframe of featurized instances, shape n*m.
Returns:
Preprocessed data ready for the estimator's predict/fit methods.
Example:
```python
automl = AutoML()
automl.fit(X_train, y_train, task="classification")
# First apply task-level preprocessing
X_test_task = automl.preprocess(X_test)
# Then apply estimator-level preprocessing
estimator = automl.model
X_test_estimator = estimator.preprocess(X_test_task)
```
"""
return self._preprocess(X)
def predict(self, X, **kwargs):
"""Predict label from features.
@@ -439,6 +495,7 @@ class SparkEstimator(BaseEstimator):
raise SPARK_ERROR
super().__init__(task, **config)
self.df_train = None
self.estimator_baseclass = "spark"
def _preprocess(
self,
@@ -974,7 +1031,7 @@ class TransformersEstimator(BaseEstimator):
from .nlp.huggingface.utils import tokenize_text
from .nlp.utils import is_a_list_of_str
is_str = str(X.dtypes[0]) in ("string", "str")
is_str = str(X.dtypes.iloc[0]) in ("string", "str")
is_list_of_str = is_a_list_of_str(X[list(X.keys())[0]].to_list()[0])
if is_str or is_list_of_str:
@@ -1139,16 +1196,31 @@ class TransformersEstimator(BaseEstimator):
control.should_save = True
control.should_evaluate = True
self._trainer = TrainerForAuto(
args=self._training_args,
model_init=self._model_init,
train_dataset=train_dataset,
eval_dataset=eval_dataset,
tokenizer=self.tokenizer,
data_collator=self.data_collator,
compute_metrics=self._compute_metrics_by_dataset_name,
callbacks=[EarlyStoppingCallbackForAuto],
)
# Use processing_class for transformers >= 4.44.0, tokenizer for older versions
trainer_kwargs = {
"args": self._training_args,
"model_init": self._model_init,
"train_dataset": train_dataset,
"eval_dataset": eval_dataset,
"data_collator": self.data_collator,
"compute_metrics": self._compute_metrics_by_dataset_name,
"callbacks": [EarlyStoppingCallbackForAuto],
}
# Check if processing_class parameter is supported (transformers >= 4.44.0)
try:
import transformers
from packaging import version
if version.parse(transformers.__version__) >= version.parse("4.44.0"):
trainer_kwargs["processing_class"] = self.tokenizer
else:
trainer_kwargs["tokenizer"] = self.tokenizer
except (ImportError, AttributeError, ValueError):
# Fallback to tokenizer if version check fails
trainer_kwargs["tokenizer"] = self.tokenizer
self._trainer = TrainerForAuto(**trainer_kwargs)
if self._task in NLG_TASKS:
setattr(self._trainer, "_is_seq2seq", True)

View File

@@ -5,7 +5,7 @@ from typing import List, Optional
from flaml.automl.task.task import NLG_TASKS
try:
from transformers import TrainingArguments
from transformers import Seq2SeqTrainingArguments as TrainingArguments
except ImportError:
TrainingArguments = object

View File

@@ -211,29 +211,28 @@ def tokenize_onedataframe(
hf_args=None,
prefix_str=None,
):
with tokenizer.as_target_tokenizer():
_, tokenized_column_names = tokenize_row(
dict(X.iloc[0]),
_, tokenized_column_names = tokenize_row(
dict(X.iloc[0]),
tokenizer,
prefix=(prefix_str,) if task is SUMMARIZATION else None,
task=task,
hf_args=hf_args,
return_column_name=True,
)
d = X.apply(
lambda x: tokenize_row(
x,
tokenizer,
prefix=(prefix_str,) if task is SUMMARIZATION else None,
task=task,
hf_args=hf_args,
return_column_name=True,
)
d = X.apply(
lambda x: tokenize_row(
x,
tokenizer,
prefix=(prefix_str,) if task is SUMMARIZATION else None,
task=task,
hf_args=hf_args,
),
axis=1,
result_type="expand",
)
X_tokenized = pd.DataFrame(columns=tokenized_column_names)
X_tokenized[tokenized_column_names] = d
return X_tokenized
),
axis=1,
result_type="expand",
)
X_tokenized = pd.DataFrame(columns=tokenized_column_names)
X_tokenized[tokenized_column_names] = d
return X_tokenized
def tokenize_row(
@@ -396,7 +395,7 @@ def load_model(checkpoint_path, task, num_labels=None):
if task in (SEQCLASSIFICATION, SEQREGRESSION):
return AutoModelForSequenceClassification.from_pretrained(
checkpoint_path, config=model_config, ignore_mismatched_sizes=True
checkpoint_path, config=model_config, ignore_mismatched_sizes=True, trust_remote_code=True
)
elif task == TOKENCLASSIFICATION:
return AutoModelForTokenClassification.from_pretrained(checkpoint_path, config=model_config)

View File

@@ -25,9 +25,7 @@ def load_default_huggingface_metric_for_task(task):
def is_a_list_of_str(this_obj):
return (isinstance(this_obj, list) or isinstance(this_obj, np.ndarray)) and all(
isinstance(x, str) for x in this_obj
)
return isinstance(this_obj, (list, np.ndarray)) and all(isinstance(x, str) for x in this_obj)
def _clean_value(value: Any) -> str:

View File

@@ -37,10 +37,9 @@ class SearchState:
if isinstance(domain_one_dim, sample.Domain):
renamed_type = list(inspect.signature(domain_one_dim.is_valid).parameters.values())[0].annotation
type_match = (
renamed_type == Any
renamed_type is Any
or isinstance(value_one_dim, renamed_type)
or isinstance(value_one_dim, int)
and renamed_type is float
or (renamed_type is float and isinstance(value_one_dim, int))
)
if not (type_match and domain_one_dim.is_valid(value_one_dim)):
return False

View File

@@ -365,6 +365,465 @@ class GenericTask(Task):
X_train, X_val, y_train, y_val = GenericTask._split_pyspark(state, X, y, split_ratio, stratify)
return X_train, X_val, y_train, y_val
def _handle_missing_labels_fast(
self,
state,
X_train,
X_val,
y_train,
y_val,
X_train_all,
y_train_all,
is_spark_dataframe,
data_is_df,
):
"""Handle missing labels by adding first instance to the set with missing label.
This is the faster version that may create some overlap but ensures all labels
are present in both sets. If a label is missing from train, it adds the first
instance to train. If a label is missing from val, it adds the first instance to val.
If no labels are missing, no instances are duplicated.
Args:
state: The state object containing fit parameters
X_train, X_val: Training and validation features
y_train, y_val: Training and validation labels
X_train_all, y_train_all: Complete dataset
is_spark_dataframe: Whether data is pandas_on_spark
data_is_df: Whether data is DataFrame/Series
Returns:
Tuple of (X_train, X_val, y_train, y_val) with missing labels added
"""
# Check which labels are present in train and val sets
if is_spark_dataframe:
label_set_train, _ = unique_pandas_on_spark(y_train)
label_set_val, _ = unique_pandas_on_spark(y_val)
label_set_all, first = unique_value_first_index(y_train_all)
else:
label_set_all, first = unique_value_first_index(y_train_all)
label_set_train = np.unique(y_train)
label_set_val = np.unique(y_val)
# Find missing labels
missing_in_train = np.setdiff1d(label_set_all, label_set_train)
missing_in_val = np.setdiff1d(label_set_all, label_set_val)
# Add first instance of missing labels to train set
if len(missing_in_train) > 0:
missing_train_indices = []
for label in missing_in_train:
label_matches = np.where(label_set_all == label)[0]
if len(label_matches) > 0 and label_matches[0] < len(first):
missing_train_indices.append(first[label_matches[0]])
if len(missing_train_indices) > 0:
X_missing_train = (
iloc_pandas_on_spark(X_train_all, missing_train_indices)
if is_spark_dataframe
else X_train_all.iloc[missing_train_indices]
if data_is_df
else X_train_all[missing_train_indices]
)
y_missing_train = (
iloc_pandas_on_spark(y_train_all, missing_train_indices)
if is_spark_dataframe
else y_train_all.iloc[missing_train_indices]
if isinstance(y_train_all, (pd.Series, psSeries))
else y_train_all[missing_train_indices]
)
X_train = concat(X_missing_train, X_train)
y_train = concat(y_missing_train, y_train) if data_is_df else np.concatenate([y_missing_train, y_train])
# Handle sample_weight if present
if "sample_weight" in state.fit_kwargs:
sample_weight_source = (
state.sample_weight_all
if hasattr(state, "sample_weight_all")
else state.fit_kwargs.get("sample_weight")
)
if sample_weight_source is not None and max(missing_train_indices) < len(sample_weight_source):
missing_weights = (
sample_weight_source[missing_train_indices]
if isinstance(sample_weight_source, np.ndarray)
else sample_weight_source.iloc[missing_train_indices]
)
state.fit_kwargs["sample_weight"] = concat(missing_weights, state.fit_kwargs["sample_weight"])
# Add first instance of missing labels to val set
if len(missing_in_val) > 0:
missing_val_indices = []
for label in missing_in_val:
label_matches = np.where(label_set_all == label)[0]
if len(label_matches) > 0 and label_matches[0] < len(first):
missing_val_indices.append(first[label_matches[0]])
if len(missing_val_indices) > 0:
X_missing_val = (
iloc_pandas_on_spark(X_train_all, missing_val_indices)
if is_spark_dataframe
else X_train_all.iloc[missing_val_indices]
if data_is_df
else X_train_all[missing_val_indices]
)
y_missing_val = (
iloc_pandas_on_spark(y_train_all, missing_val_indices)
if is_spark_dataframe
else y_train_all.iloc[missing_val_indices]
if isinstance(y_train_all, (pd.Series, psSeries))
else y_train_all[missing_val_indices]
)
X_val = concat(X_missing_val, X_val)
y_val = concat(y_missing_val, y_val) if data_is_df else np.concatenate([y_missing_val, y_val])
# Handle sample_weight if present
if (
"sample_weight" in state.fit_kwargs
and hasattr(state, "weight_val")
and state.weight_val is not None
):
sample_weight_source = (
state.sample_weight_all
if hasattr(state, "sample_weight_all")
else state.fit_kwargs.get("sample_weight")
)
if sample_weight_source is not None and max(missing_val_indices) < len(sample_weight_source):
missing_weights = (
sample_weight_source[missing_val_indices]
if isinstance(sample_weight_source, np.ndarray)
else sample_weight_source.iloc[missing_val_indices]
)
state.weight_val = concat(missing_weights, state.weight_val)
return X_train, X_val, y_train, y_val
def _handle_missing_labels_no_overlap(
self,
state,
X_train,
X_val,
y_train,
y_val,
X_train_all,
y_train_all,
is_spark_dataframe,
data_is_df,
split_ratio,
):
"""Handle missing labels intelligently to avoid overlap when possible.
This is the slower but more precise version that:
- For single-instance classes: Adds to both sets (unavoidable overlap)
- For multi-instance classes: Re-splits them properly to avoid overlap
Args:
state: The state object containing fit parameters
X_train, X_val: Training and validation features
y_train, y_val: Training and validation labels
X_train_all, y_train_all: Complete dataset
is_spark_dataframe: Whether data is pandas_on_spark
data_is_df: Whether data is DataFrame/Series
split_ratio: The ratio for splitting
Returns:
Tuple of (X_train, X_val, y_train, y_val) with missing labels handled
"""
# Check which labels are present in train and val sets
if is_spark_dataframe:
label_set_train, _ = unique_pandas_on_spark(y_train)
label_set_val, _ = unique_pandas_on_spark(y_val)
label_set_all, first = unique_value_first_index(y_train_all)
else:
label_set_all, first = unique_value_first_index(y_train_all)
label_set_train = np.unique(y_train)
label_set_val = np.unique(y_val)
# Find missing labels
missing_in_train = np.setdiff1d(label_set_all, label_set_train)
missing_in_val = np.setdiff1d(label_set_all, label_set_val)
# Handle missing labels intelligently
# For classes with only 1 instance: add to both sets (unavoidable overlap)
# For classes with multiple instances: move/split them properly to avoid overlap
if len(missing_in_train) > 0:
# Process missing labels in training set
for label in missing_in_train:
# Find all indices for this label in the original data
if is_spark_dataframe:
label_indices = np.where(y_train_all.to_numpy() == label)[0].tolist()
else:
label_indices = np.where(np.asarray(y_train_all) == label)[0].tolist()
num_instances = len(label_indices)
if num_instances == 1:
# Single instance: must add to both train and val (unavoidable overlap)
X_single = (
iloc_pandas_on_spark(X_train_all, label_indices)
if is_spark_dataframe
else X_train_all.iloc[label_indices]
if data_is_df
else X_train_all[label_indices]
)
y_single = (
iloc_pandas_on_spark(y_train_all, label_indices)
if is_spark_dataframe
else y_train_all.iloc[label_indices]
if isinstance(y_train_all, (pd.Series, psSeries))
else y_train_all[label_indices]
)
X_train = concat(X_single, X_train)
y_train = concat(y_single, y_train) if data_is_df else np.concatenate([y_single, y_train])
# Handle sample_weight
if "sample_weight" in state.fit_kwargs:
sample_weight_source = (
state.sample_weight_all
if hasattr(state, "sample_weight_all")
else state.fit_kwargs.get("sample_weight")
)
if sample_weight_source is not None and label_indices[0] < len(sample_weight_source):
single_weight = (
sample_weight_source[label_indices]
if isinstance(sample_weight_source, np.ndarray)
else sample_weight_source.iloc[label_indices]
)
state.fit_kwargs["sample_weight"] = concat(single_weight, state.fit_kwargs["sample_weight"])
else:
# Multiple instances: move some from val to train (no overlap needed)
# Calculate how many to move to train (leave at least 1 in val)
num_to_train = max(1, min(num_instances - 1, int(num_instances * (1 - split_ratio))))
indices_to_move = label_indices[:num_to_train]
X_to_move = (
iloc_pandas_on_spark(X_train_all, indices_to_move)
if is_spark_dataframe
else X_train_all.iloc[indices_to_move]
if data_is_df
else X_train_all[indices_to_move]
)
y_to_move = (
iloc_pandas_on_spark(y_train_all, indices_to_move)
if is_spark_dataframe
else y_train_all.iloc[indices_to_move]
if isinstance(y_train_all, (pd.Series, psSeries))
else y_train_all[indices_to_move]
)
# Add to train
X_train = concat(X_to_move, X_train)
y_train = concat(y_to_move, y_train) if data_is_df else np.concatenate([y_to_move, y_train])
# Remove from val (they are currently all in val)
if is_spark_dataframe:
val_mask = ~y_val.isin([label])
X_val = X_val[val_mask]
y_val = y_val[val_mask]
else:
val_mask = np.asarray(y_val) != label
if data_is_df:
X_val = X_val[val_mask]
y_val = y_val[val_mask]
else:
X_val = X_val[val_mask]
y_val = y_val[val_mask]
# Add remaining instances back to val
remaining_indices = label_indices[num_to_train:]
if len(remaining_indices) > 0:
X_remaining = (
iloc_pandas_on_spark(X_train_all, remaining_indices)
if is_spark_dataframe
else X_train_all.iloc[remaining_indices]
if data_is_df
else X_train_all[remaining_indices]
)
y_remaining = (
iloc_pandas_on_spark(y_train_all, remaining_indices)
if is_spark_dataframe
else y_train_all.iloc[remaining_indices]
if isinstance(y_train_all, (pd.Series, psSeries))
else y_train_all[remaining_indices]
)
X_val = concat(X_remaining, X_val)
y_val = concat(y_remaining, y_val) if data_is_df else np.concatenate([y_remaining, y_val])
# Handle sample_weight
if "sample_weight" in state.fit_kwargs:
sample_weight_source = (
state.sample_weight_all
if hasattr(state, "sample_weight_all")
else state.fit_kwargs.get("sample_weight")
)
if sample_weight_source is not None and max(indices_to_move) < len(sample_weight_source):
weights_to_move = (
sample_weight_source[indices_to_move]
if isinstance(sample_weight_source, np.ndarray)
else sample_weight_source.iloc[indices_to_move]
)
state.fit_kwargs["sample_weight"] = concat(
weights_to_move, state.fit_kwargs["sample_weight"]
)
if (
len(remaining_indices) > 0
and hasattr(state, "weight_val")
and state.weight_val is not None
):
# Remove and re-add weights for val
if isinstance(state.weight_val, np.ndarray):
state.weight_val = state.weight_val[val_mask]
else:
state.weight_val = state.weight_val[val_mask]
if max(remaining_indices) < len(sample_weight_source):
remaining_weights = (
sample_weight_source[remaining_indices]
if isinstance(sample_weight_source, np.ndarray)
else sample_weight_source.iloc[remaining_indices]
)
state.weight_val = concat(remaining_weights, state.weight_val)
if len(missing_in_val) > 0:
# Process missing labels in validation set
for label in missing_in_val:
# Find all indices for this label in the original data
if is_spark_dataframe:
label_indices = np.where(y_train_all.to_numpy() == label)[0].tolist()
else:
label_indices = np.where(np.asarray(y_train_all) == label)[0].tolist()
num_instances = len(label_indices)
if num_instances == 1:
# Single instance: must add to both train and val (unavoidable overlap)
X_single = (
iloc_pandas_on_spark(X_train_all, label_indices)
if is_spark_dataframe
else X_train_all.iloc[label_indices]
if data_is_df
else X_train_all[label_indices]
)
y_single = (
iloc_pandas_on_spark(y_train_all, label_indices)
if is_spark_dataframe
else y_train_all.iloc[label_indices]
if isinstance(y_train_all, (pd.Series, psSeries))
else y_train_all[label_indices]
)
X_val = concat(X_single, X_val)
y_val = concat(y_single, y_val) if data_is_df else np.concatenate([y_single, y_val])
# Handle sample_weight
if "sample_weight" in state.fit_kwargs and hasattr(state, "weight_val"):
sample_weight_source = (
state.sample_weight_all
if hasattr(state, "sample_weight_all")
else state.fit_kwargs.get("sample_weight")
)
if sample_weight_source is not None and label_indices[0] < len(sample_weight_source):
single_weight = (
sample_weight_source[label_indices]
if isinstance(sample_weight_source, np.ndarray)
else sample_weight_source.iloc[label_indices]
)
if state.weight_val is not None:
state.weight_val = concat(single_weight, state.weight_val)
else:
# Multiple instances: move some from train to val (no overlap needed)
# Calculate how many to move to val (leave at least 1 in train)
num_to_val = max(1, min(num_instances - 1, int(num_instances * split_ratio)))
indices_to_move = label_indices[:num_to_val]
X_to_move = (
iloc_pandas_on_spark(X_train_all, indices_to_move)
if is_spark_dataframe
else X_train_all.iloc[indices_to_move]
if data_is_df
else X_train_all[indices_to_move]
)
y_to_move = (
iloc_pandas_on_spark(y_train_all, indices_to_move)
if is_spark_dataframe
else y_train_all.iloc[indices_to_move]
if isinstance(y_train_all, (pd.Series, psSeries))
else y_train_all[indices_to_move]
)
# Add to val
X_val = concat(X_to_move, X_val)
y_val = concat(y_to_move, y_val) if data_is_df else np.concatenate([y_to_move, y_val])
# Remove from train (they are currently all in train)
if is_spark_dataframe:
train_mask = ~y_train.isin([label])
X_train = X_train[train_mask]
y_train = y_train[train_mask]
else:
train_mask = np.asarray(y_train) != label
if data_is_df:
X_train = X_train[train_mask]
y_train = y_train[train_mask]
else:
X_train = X_train[train_mask]
y_train = y_train[train_mask]
# Add remaining instances back to train
remaining_indices = label_indices[num_to_val:]
if len(remaining_indices) > 0:
X_remaining = (
iloc_pandas_on_spark(X_train_all, remaining_indices)
if is_spark_dataframe
else X_train_all.iloc[remaining_indices]
if data_is_df
else X_train_all[remaining_indices]
)
y_remaining = (
iloc_pandas_on_spark(y_train_all, remaining_indices)
if is_spark_dataframe
else y_train_all.iloc[remaining_indices]
if isinstance(y_train_all, (pd.Series, psSeries))
else y_train_all[remaining_indices]
)
X_train = concat(X_remaining, X_train)
y_train = concat(y_remaining, y_train) if data_is_df else np.concatenate([y_remaining, y_train])
# Handle sample_weight
if "sample_weight" in state.fit_kwargs:
sample_weight_source = (
state.sample_weight_all
if hasattr(state, "sample_weight_all")
else state.fit_kwargs.get("sample_weight")
)
if sample_weight_source is not None and max(indices_to_move) < len(sample_weight_source):
weights_to_move = (
sample_weight_source[indices_to_move]
if isinstance(sample_weight_source, np.ndarray)
else sample_weight_source.iloc[indices_to_move]
)
if hasattr(state, "weight_val") and state.weight_val is not None:
state.weight_val = concat(weights_to_move, state.weight_val)
if len(remaining_indices) > 0:
# Remove and re-add weights for train
if isinstance(state.fit_kwargs["sample_weight"], np.ndarray):
state.fit_kwargs["sample_weight"] = state.fit_kwargs["sample_weight"][train_mask]
else:
state.fit_kwargs["sample_weight"] = state.fit_kwargs["sample_weight"][train_mask]
if max(remaining_indices) < len(sample_weight_source):
remaining_weights = (
sample_weight_source[remaining_indices]
if isinstance(sample_weight_source, np.ndarray)
else sample_weight_source.iloc[remaining_indices]
)
state.fit_kwargs["sample_weight"] = concat(
remaining_weights, state.fit_kwargs["sample_weight"]
)
return X_train, X_val, y_train, y_val
def prepare_data(
self,
state,
@@ -377,6 +836,7 @@ class GenericTask(Task):
n_splits,
data_is_df,
sample_weight_full,
allow_label_overlap=True,
) -> int:
X_val, y_val = state.X_val, state.y_val
if issparse(X_val):
@@ -505,59 +965,46 @@ class GenericTask(Task):
elif self.is_classification():
# for classification, make sure the labels are complete in both
# training and validation data
label_set, first = unique_value_first_index(y_train_all)
rest = []
last = 0
first.sort()
for i in range(len(first)):
rest.extend(range(last, first[i]))
last = first[i] + 1
rest.extend(range(last, len(y_train_all)))
X_first = X_train_all.iloc[first] if data_is_df else X_train_all[first]
if len(first) < len(y_train_all) / 2:
# Get X_rest and y_rest with drop, sparse matrix can't apply np.delete
X_rest = (
np.delete(X_train_all, first, axis=0)
if isinstance(X_train_all, np.ndarray)
else X_train_all.drop(first.tolist())
if data_is_df
else X_train_all[rest]
)
y_rest = (
np.delete(y_train_all, first, axis=0)
if isinstance(y_train_all, np.ndarray)
else y_train_all.drop(first.tolist())
if data_is_df
else y_train_all[rest]
stratify = y_train_all if split_type == "stratified" else None
X_train, X_val, y_train, y_val = self._train_test_split(
state, X_train_all, y_train_all, split_ratio=split_ratio, stratify=stratify
)
# Handle missing labels using the appropriate strategy
if allow_label_overlap:
# Fast version: adds first instance to set with missing label (may create overlap)
X_train, X_val, y_train, y_val = self._handle_missing_labels_fast(
state,
X_train,
X_val,
y_train,
y_val,
X_train_all,
y_train_all,
is_spark_dataframe,
data_is_df,
)
else:
X_rest = (
iloc_pandas_on_spark(X_train_all, rest)
if is_spark_dataframe
else X_train_all.iloc[rest]
if data_is_df
else X_train_all[rest]
# Precise version: avoids overlap when possible (slower)
X_train, X_val, y_train, y_val = self._handle_missing_labels_no_overlap(
state,
X_train,
X_val,
y_train,
y_val,
X_train_all,
y_train_all,
is_spark_dataframe,
data_is_df,
split_ratio,
)
y_rest = (
iloc_pandas_on_spark(y_train_all, rest)
if is_spark_dataframe
else y_train_all.iloc[rest]
if data_is_df
else y_train_all[rest]
)
stratify = y_rest if split_type == "stratified" else None
X_train, X_val, y_train, y_val = self._train_test_split(
state, X_rest, y_rest, first, rest, split_ratio, stratify
)
X_train = concat(X_first, X_train)
y_train = concat(label_set, y_train) if data_is_df else np.concatenate([label_set, y_train])
X_val = concat(X_first, X_val)
y_val = concat(label_set, y_val) if data_is_df else np.concatenate([label_set, y_val])
if isinstance(y_train, (psDataFrame, pd.DataFrame)) and y_train.shape[1] == 1:
y_train = y_train[y_train.columns[0]]
y_val = y_val[y_val.columns[0]]
y_train.name = y_val.name = y_rest.name
# Only set name if y_train_all is a Series (not a DataFrame)
if isinstance(y_train_all, (pd.Series, psSeries)):
y_train.name = y_val.name = y_train_all.name
elif self.is_regression():
X_train, X_val, y_train, y_val = self._train_test_split(

View File

@@ -151,7 +151,7 @@ class TimeSeriesTask(Task):
raise ValueError("Must supply either X_train_all and y_train_all, or dataframe and label")
try:
dataframe[self.time_col] = pd.to_datetime(dataframe[self.time_col])
dataframe.loc[:, self.time_col] = pd.to_datetime(dataframe[self.time_col])
except Exception:
raise ValueError(
f"For '{TS_FORECAST}' task, time column {self.time_col} must contain timestamp values."
@@ -386,9 +386,8 @@ class TimeSeriesTask(Task):
return X
def preprocess(self, X, transformer=None):
if isinstance(X, pd.DataFrame) or isinstance(X, np.ndarray) or isinstance(X, pd.Series):
X = X.copy()
X = normalize_ts_data(X, self.target_names, self.time_col)
if isinstance(X, (pd.DataFrame, np.ndarray, pd.Series)):
X = normalize_ts_data(X.copy(), self.target_names, self.time_col)
return self._preprocess(X, transformer)
elif isinstance(X, int):
return X

View File

@@ -17,24 +17,30 @@ from sklearn.preprocessing import StandardScaler
def make_lag_features(X: pd.DataFrame, y: pd.Series, lags: int):
"""Transform input data X, y into autoregressive form - shift
them appropriately based on horizon and create `lags` columns.
"""Transform input data X, y into autoregressive form by creating `lags` columns.
This function is called automatically by FLAML during the training process
to convert time series data into a format suitable for sklearn-based regression
models (e.g., lgbm, rf, xgboost). Users do NOT need to manually call this function
or create lagged features themselves.
Parameters
----------
X : pandas.DataFrame
Input features.
Input feature DataFrame, which may contain temporal features and/or exogenous variables.
y : array_like, (1d)
Target vector.
Target vector (time series values to forecast).
horizon : int
length of X for `predict` method
lags : int
Number of lagged time steps to use as features.
Returns
-------
pandas.DataFrame
shifted dataframe with `lags` columns
Shifted dataframe with `lags` columns for each original feature.
The target variable y is also lagged to prevent data leakage
(i.e., we use y(t-1), y(t-2), ..., y(t-lags) to predict y(t)).
"""
lag_features = []
@@ -55,6 +61,17 @@ def make_lag_features(X: pd.DataFrame, y: pd.Series, lags: int):
class SklearnWrapper:
"""Wrapper class for using sklearn-based models for time series forecasting.
This wrapper automatically handles the transformation of time series data into
a supervised learning format by creating lagged features. It trains separate
models for each step in the forecast horizon.
Users typically don't interact with this class directly - it's used internally
by FLAML when sklearn-based estimators (lgbm, rf, xgboost, etc.) are selected
for time series forecasting tasks.
"""
def __init__(
self,
model_class: type,
@@ -76,6 +93,8 @@ class SklearnWrapper:
self.pca = None
def fit(self, X: pd.DataFrame, y: pd.Series, **kwargs):
if "is_retrain" in kwargs:
kwargs.pop("is_retrain")
self._X = X
self._y = y
@@ -92,7 +111,14 @@ class SklearnWrapper:
for i, model in enumerate(self.models):
offset = i + self.lags
model.fit(X_trans[: len(X) - offset], y[offset:], **fit_params)
if len(X) - offset > 2:
# series with length 2 will meet All features are either constant or ignored.
# TODO: see why the non-constant features are ignored. Selector?
model.fit(X_trans[: len(X) - offset], y[offset:], **fit_params)
elif len(X) > offset and "catboost" not in str(model).lower():
model.fit(X_trans[: len(X) - offset], y[offset:], **fit_params)
else:
print("[INFO]: Length of data should longer than period + lags.")
return self
def predict(self, X, X_train=None, y_train=None):

View File

@@ -121,7 +121,12 @@ class TimeSeriesDataset:
@property
def X_all(self) -> pd.DataFrame:
return pd.concat([self.X_train, self.X_val], axis=0)
# Remove empty or all-NA columns before concatenation
X_train_filtered = self.X_train.dropna(axis=1, how="all")
X_val_filtered = self.X_val.dropna(axis=1, how="all")
# Concatenate the filtered DataFrames
return pd.concat([X_train_filtered, X_val_filtered], axis=0)
@property
def y_train(self) -> pd.DataFrame:
@@ -472,7 +477,7 @@ class DataTransformerTS:
if "__NAN__" not in X[col].cat.categories:
X[col] = X[col].cat.add_categories("__NAN__").fillna("__NAN__")
else:
X[col] = X[col].fillna("__NAN__")
X[col] = X[col].fillna("__NAN__").infer_objects(copy=False)
X[col] = X[col].astype("category")
for column in self.num_columns:
@@ -541,14 +546,12 @@ def normalize_ts_data(X_train_all, target_names, time_col, y_train_all=None):
def validate_data_basic(X_train_all, y_train_all):
assert isinstance(X_train_all, np.ndarray) or issparse(X_train_all) or isinstance(X_train_all, pd.DataFrame), (
"X_train_all must be a numpy array, a pandas dataframe, " "or Scipy sparse matrix."
)
assert isinstance(X_train_all, (np.ndarray, DataFrame)) or issparse(
X_train_all
), "X_train_all must be a numpy array, a pandas dataframe, or Scipy sparse matrix."
assert (
isinstance(y_train_all, np.ndarray)
or isinstance(y_train_all, pd.Series)
or isinstance(y_train_all, pd.DataFrame)
assert isinstance(
y_train_all, (np.ndarray, pd.Series, pd.DataFrame)
), "y_train_all must be a numpy array or a pandas series or DataFrame."
assert X_train_all.size != 0 and y_train_all.size != 0, "Input data must not be empty, use None if no data"

View File

@@ -95,6 +95,27 @@ def flamlize_estimator(super_class, name: str, task: str, alternatives=None):
def fit(self, X, y, *args, **params):
hyperparams, estimator_name, X, y_transformed = self.suggest_hyperparams(X, y)
self.set_params(**hyperparams)
# Transform eval_set if present
if "eval_set" in params and params["eval_set"] is not None:
transformed_eval_set = []
for eval_X, eval_y in params["eval_set"]:
# Transform features
eval_X_transformed = self._feature_transformer.transform(eval_X)
# Transform labels if applicable
if self._label_transformer and estimator_name in [
"rf",
"extra_tree",
"xgboost",
"xgb_limitdepth",
"choose_xgb",
]:
eval_y_transformed = self._label_transformer.transform(eval_y)
transformed_eval_set.append((eval_X_transformed, eval_y_transformed))
else:
transformed_eval_set.append((eval_X_transformed, eval_y))
params["eval_set"] = transformed_eval_set
if self._label_transformer and estimator_name in [
"rf",
"extra_tree",

View File

@@ -1,6 +1,6 @@
# ChaCha for Online AutoML
FLAML includes *ChaCha* which is an automatic hyperparameter tuning solution for online machine learning. Online machine learning has the following properties: (1) data comes in sequential order; and (2) the performance of the machine learning model is evaluated online, i.e., at every iteration. *ChaCha* performs online AutoML respecting the aforementioned properties of online learning, and at the same time respecting the following constraints: (1) only a small constant number of 'live' models are allowed to perform online learning at the same time; and (2) no model persistence or offline training is allowed, which means that once we decide to replace a 'live' model with a new one, the replaced model can no longer be retrieved.
FLAML includes *ChaCha* which is an automatic hyperparameter tuning solution for online machine learning. Online machine learning has the following properties: (1) data comes in sequential order; and (2) the performance of the machine learning model is evaluated online, i.e., at every iteration. *ChaCha* performs online AutoML respecting the aforementioned properties of online learning, and at the same time respecting the following constraints: (1) only a small constant number of 'live' models are allowed to perform online learning at the same time; and (2) no model persistence or offline training is allowed, which means that once we decide to replace a 'live' model with a new one, the replaced model can no longer be retrieved.
For more technical details about *ChaCha*, please check our paper.

View File

@@ -217,7 +217,24 @@ class BlendSearch(Searcher):
if global_search_alg is not None:
self._gs = global_search_alg
elif getattr(self, "__name__", None) != "CFO":
if space and self._ls.hierarchical:
# Use define-by-run for OptunaSearch when needed:
# - Hierarchical/conditional spaces are best supported via define-by-run.
# - Ray Tune domain/grid specs can trigger an "unresolved search space" warning
# unless we switch to define-by-run.
use_define_by_run = bool(getattr(self._ls, "hierarchical", False))
if (not use_define_by_run) and isinstance(space, dict) and space:
try:
from .variant_generator import parse_spec_vars
_, domain_vars, grid_vars = parse_spec_vars(space)
use_define_by_run = bool(domain_vars or grid_vars)
except Exception:
# Be conservative: if we can't determine whether the space is
# unresolved, fall back to the original behavior.
use_define_by_run = False
self._use_define_by_run = use_define_by_run
if use_define_by_run:
from functools import partial
gs_space = partial(define_by_run_func, space=space)
@@ -487,7 +504,7 @@ class BlendSearch(Searcher):
self._ls_bound_max,
self._subspace.get(trial_id, self._ls.space),
)
if self._gs is not None and self._experimental and (not self._ls.hierarchical):
if self._gs is not None and self._experimental and (not getattr(self, "_use_define_by_run", False)):
self._gs.add_evaluated_point(flatten_dict(config), objective)
# TODO: recover when supported
# converted = convert_key(config, self._gs.space)

View File

@@ -641,8 +641,10 @@ class FLOW2(Searcher):
else:
# key must be in space
domain = space[key]
if self.hierarchical and not (
domain is None or type(domain) in (str, int, float) or isinstance(domain, sample.Domain)
if (
self.hierarchical
and domain is not None
and not isinstance(domain, (str, int, float, sample.Domain))
):
# not domain or hashable
# get rid of list type for hierarchical search space.

View File

@@ -207,7 +207,7 @@ class ChampionFrontierSearcher(BaseSearcher):
hyperparameter_config_groups.append(partial_new_configs)
# does not have searcher_trial_ids
searcher_trial_ids_groups.append([])
elif isinstance(config_domain, Float) or isinstance(config_domain, Categorical):
elif isinstance(config_domain, (Float, Categorical)):
# otherwise we need to deal with them in group
nonpoly_config[k] = v
if k not in self._space_of_nonpoly_hp:

View File

@@ -25,6 +25,31 @@ from .flow2 import FLOW2
logger = logging.getLogger(__name__)
def _recursive_dict_update(target: Dict, source: Dict) -> None:
"""Recursively update target dictionary with source dictionary.
Unlike dict.update(), this function merges nested dictionaries instead of
replacing them entirely. This is crucial for configurations with nested
structures (e.g., XGBoost params).
Args:
target: The dictionary to be updated (modified in place).
source: The dictionary containing values to merge into target.
Example:
>>> target = {'params': {'eta': 0.1, 'max_depth': 3}}
>>> source = {'params': {'verbosity': 0}}
>>> _recursive_dict_update(target, source)
>>> target
{'params': {'eta': 0.1, 'max_depth': 3, 'verbosity': 0}}
"""
for key, value in source.items():
if isinstance(value, dict) and key in target and isinstance(target[key], dict):
_recursive_dict_update(target[key], value)
else:
target[key] = value
class SearchThread:
"""Class of global or local search thread."""
@@ -65,7 +90,7 @@ class SearchThread:
try:
config = self._search_alg.suggest(trial_id)
if isinstance(self._search_alg._space, dict):
config.update(self._const)
_recursive_dict_update(config, self._const)
else:
# define by run
config, self.space = unflatten_hierarchical(config, self._space)

View File

@@ -1 +1 @@
__version__ = "2.4.0"
__version__ = "2.5.0"

View File

@@ -52,8 +52,8 @@ setuptools.setup(
],
"test": [
"numpy>=1.17,<2.0.0; python_version<'3.13'",
"numpy>2.0.0; python_version>='3.13'",
"jupyter; python_version<'3.13'",
"numpy>=1.17; python_version>='3.13'",
"jupyter",
"lightgbm>=2.3.1",
"xgboost>=0.90,<2.0.0; python_version<'3.11'",
"xgboost>=2.0.0; python_version>='3.11'",
@@ -68,10 +68,10 @@ setuptools.setup(
"pre-commit",
"torch",
"torchvision",
"catboost>=0.26; python_version<'3.13'",
"catboost>=0.26",
"rgf-python",
"optuna>=2.8.0,<=3.6.1",
"openml; python_version<'3.13'",
"openml",
"statsmodels>=0.12.2",
"psutil",
"dataclasses",
@@ -82,7 +82,7 @@ setuptools.setup(
"rouge_score",
"hcrystalball",
"seqeval",
"pytorch-forecasting; python_version<'3.13'",
"pytorch-forecasting",
"mlflow-skinny<=2.22.1", # Refer to https://mvnrepository.com/artifact/org.mlflow/mlflow-spark
"joblibspark>=0.5.0",
"joblib<=1.3.2",
@@ -116,14 +116,14 @@ setuptools.setup(
"scikit-learn",
],
"hf": [
"transformers[torch]==4.26",
"transformers[torch]>=4.26",
"datasets",
"nltk<=3.8.1",
"rouge_score",
"seqeval",
],
"nlp": [ # for backward compatibility; hf is the new option name
"transformers[torch]==4.26",
"transformers[torch]>=4.26",
"datasets",
"nltk<=3.8.1",
"rouge_score",
@@ -140,7 +140,7 @@ setuptools.setup(
"prophet>=1.1.5",
"statsmodels>=0.12.2",
"hcrystalball>=0.1.10",
"pytorch-forecasting>=0.10.4; python_version<'3.13'",
"pytorch-forecasting>=0.10.4",
"pytorch-lightning>=1.9.0",
"tensorboardX>=2.6",
],

View File

@@ -72,5 +72,39 @@ def test_custom_hp():
print(automl.best_config_per_estimator)
def test_lgbm_objective():
"""Test that objective parameter can be set via custom_hp for LGBMEstimator"""
import numpy as np
# Create a simple regression dataset
np.random.seed(42)
X_train = np.random.rand(100, 5)
y_train = np.random.rand(100) * 100 # Scale to avoid division issues with MAPE
automl = AutoML()
settings = {
"time_budget": 3,
"metric": "mape",
"task": "regression",
"estimator_list": ["lgbm"],
"verbose": 0,
"custom_hp": {"lgbm": {"objective": {"domain": "mape"}}}, # Fixed value, not tuned
}
automl.fit(X_train, y_train, **settings)
# Verify that objective was set correctly
assert "objective" in automl.best_config, "objective should be in best_config"
assert automl.best_config["objective"] == "mape", "objective should be 'mape'"
# Verify the model has the correct objective
if hasattr(automl.model, "estimator") and hasattr(automl.model.estimator, "get_params"):
model_params = automl.model.estimator.get_params()
assert model_params.get("objective") == "mape", "Model should use 'mape' objective"
print("Test passed: objective parameter works correctly with LGBMEstimator")
if __name__ == "__main__":
test_custom_hp()
test_lgbm_objective()

View File

@@ -188,7 +188,11 @@ def _test_sparse_matrix_classification(estimator):
"n_jobs": 1,
"model_history": True,
}
X_train = scipy.sparse.random(1554, 21, dtype=int)
# NOTE: Avoid `dtype=int` here. On some NumPy/SciPy combinations (notably
# Windows + Python 3.13), `scipy.sparse.random(..., dtype=int)` may trigger
# integer sampling paths which raise "low is out of bounds for int32".
# A float sparse matrix is sufficient to validate sparse-input support.
X_train = scipy.sparse.random(1554, 21, dtype=np.float32)
y_train = np.random.randint(3, size=1554)
automl_experiment.fit(X_train=X_train, y_train=y_train, **automl_settings)

View File

@@ -181,6 +181,49 @@ class TestMultiClass(unittest.TestCase):
}
automl.fit(X_train=X_train, y_train=y_train, **settings)
def test_ensemble_final_estimator_params_not_tuned(self):
"""Test that final_estimator parameters in ensemble are not automatically tuned.
This test verifies that when a custom final_estimator is provided with specific
parameters, those parameters are used as-is without any hyperparameter tuning.
"""
from sklearn.linear_model import LogisticRegression
automl = AutoML()
X_train, y_train = load_wine(return_X_y=True)
# Create a LogisticRegression with specific non-default parameters
custom_params = {
"C": 0.5, # Non-default value
"max_iter": 50, # Non-default value
"random_state": 42,
}
final_est = LogisticRegression(**custom_params)
settings = {
"time_budget": 5,
"estimator_list": ["rf", "lgbm"],
"task": "classification",
"ensemble": {
"final_estimator": final_est,
"passthrough": False,
},
"n_jobs": 1,
}
automl.fit(X_train=X_train, y_train=y_train, **settings)
# Verify that the final estimator in the stacker uses the exact parameters we specified
if hasattr(automl.model, "final_estimator_"):
# The model is a StackingClassifier
fitted_final_estimator = automl.model.final_estimator_
assert (
abs(fitted_final_estimator.C - custom_params["C"]) < 1e-9
), f"Expected C={custom_params['C']}, but got {fitted_final_estimator.C}"
assert (
fitted_final_estimator.max_iter == custom_params["max_iter"]
), f"Expected max_iter={custom_params['max_iter']}, but got {fitted_final_estimator.max_iter}"
print("✓ Final estimator parameters were preserved (not tuned)")
def test_dataframe(self):
self.test_classification(True)
@@ -235,6 +278,34 @@ class TestMultiClass(unittest.TestCase):
except ImportError:
pass
def test_invalid_custom_metric(self):
"""Test that proper error is raised when custom_metric is called instead of passed."""
from sklearn.datasets import load_iris
X_train, y_train = load_iris(return_X_y=True)
# Test with non-callable metric in __init__
with self.assertRaises(ValueError) as context:
automl = AutoML(metric=123) # passing an int instead of function
self.assertIn("must be either a string or a callable function", str(context.exception))
self.assertIn("but got int", str(context.exception))
# Test with non-callable metric in fit
automl = AutoML()
with self.assertRaises(ValueError) as context:
automl.fit(X_train=X_train, y_train=y_train, metric=[], task="classification", time_budget=1)
self.assertIn("must be either a string or a callable function", str(context.exception))
self.assertIn("but got list", str(context.exception))
# Test with tuple (simulating result of calling a function that returns tuple)
with self.assertRaises(ValueError) as context:
automl = AutoML()
automl.fit(
X_train=X_train, y_train=y_train, metric=(0.5, {"loss": 0.5}), task="classification", time_budget=1
)
self.assertIn("must be either a string or a callable function", str(context.exception))
self.assertIn("but got tuple", str(context.exception))
def test_classification(self, as_frame=False):
automl_experiment = AutoML()
automl_settings = {
@@ -368,7 +439,11 @@ class TestMultiClass(unittest.TestCase):
"n_jobs": 1,
"model_history": True,
}
X_train = scipy.sparse.random(1554, 21, dtype=int)
# NOTE: Avoid `dtype=int` here. On some NumPy/SciPy combinations (notably
# Windows + Python 3.13), `scipy.sparse.random(..., dtype=int)` may trigger
# integer sampling paths which raise "low is out of bounds for int32".
# A float sparse matrix is sufficient to validate sparse-input support.
X_train = scipy.sparse.random(1554, 21, dtype=np.float32)
y_train = np.random.randint(3, size=1554)
automl_experiment.fit(X_train=X_train, y_train=y_train, **automl_settings)
print(automl_experiment.classes_)
@@ -531,6 +606,32 @@ class TestMultiClass(unittest.TestCase):
print(f"Best accuracy on validation data: {new_automl_val_accuracy:.4g}")
# print('Training duration of best run: {0:.4g} s'.format(new_automl_experiment.best_config_train_time))
def test_starting_points_should_improve_performance(self):
N = 10000 # a large N is needed to see the improvement
X_train, y_train = load_iris(return_X_y=True)
X_train = np.concatenate([X_train + 0.1 * i for i in range(N)], axis=0)
y_train = np.concatenate([y_train] * N, axis=0)
am1 = AutoML()
am1.fit(X_train, y_train, estimator_list=["lgbm"], time_budget=3, seed=11)
am2 = AutoML()
am2.fit(
X_train,
y_train,
estimator_list=["lgbm"],
time_budget=2,
seed=11,
starting_points=am1.best_config_per_estimator,
)
print(f"am1.best_loss: {am1.best_loss:.4f}")
print(f"am2.best_loss: {am2.best_loss:.4f}")
assert np.round(am2.best_loss, 4) <= np.round(
am1.best_loss, 4
), "Starting points should help improve the performance!"
if __name__ == "__main__":
unittest.main()

View File

@@ -0,0 +1,272 @@
"""Test to ensure correct label overlap handling for classification tasks"""
import numpy as np
import pandas as pd
from sklearn.datasets import load_iris, make_classification
from flaml import AutoML
def test_allow_label_overlap_true():
"""Test with allow_label_overlap=True (fast mode, default)"""
# Load iris dataset
dic_data = load_iris(as_frame=True)
iris_data = dic_data["frame"]
# Prepare data
x_train = iris_data[["sepal length (cm)", "sepal width (cm)", "petal length (cm)", "petal width (cm)"]].to_numpy()
y_train = iris_data["target"]
# Train with fast mode (default)
automl = AutoML()
automl_settings = {
"max_iter": 5,
"metric": "accuracy",
"task": "classification",
"estimator_list": ["lgbm"],
"eval_method": "holdout",
"split_type": "stratified",
"keep_search_state": True,
"retrain_full": False,
"auto_augment": False,
"verbose": 0,
"allow_label_overlap": True, # Fast mode
}
automl.fit(x_train, y_train, **automl_settings)
# Check results
input_size = len(x_train)
train_size = len(automl._state.X_train)
val_size = len(automl._state.X_val)
# With stratified split on balanced data, fast mode may have no overlap
assert (
train_size + val_size >= input_size
), f"Inconsistent sizes. Input: {input_size}, Train: {train_size}, Val: {val_size}"
# Verify all classes are represented in both sets
train_labels = set(np.unique(automl._state.y_train))
val_labels = set(np.unique(automl._state.y_val))
all_labels = set(np.unique(y_train))
assert train_labels == all_labels, f"Not all labels in train. All: {all_labels}, Train: {train_labels}"
assert val_labels == all_labels, f"Not all labels in val. All: {all_labels}, Val: {val_labels}"
print(
f"✓ Test passed (fast mode): Input: {input_size}, Train: {train_size}, Val: {val_size}, "
f"Overlap: {train_size + val_size - input_size}"
)
def test_allow_label_overlap_false():
"""Test with allow_label_overlap=False (precise mode)"""
# Load iris dataset
dic_data = load_iris(as_frame=True)
iris_data = dic_data["frame"]
# Prepare data
x_train = iris_data[["sepal length (cm)", "sepal width (cm)", "petal length (cm)", "petal width (cm)"]].to_numpy()
y_train = iris_data["target"]
# Train with precise mode
automl = AutoML()
automl_settings = {
"max_iter": 5,
"metric": "accuracy",
"task": "classification",
"estimator_list": ["lgbm"],
"eval_method": "holdout",
"split_type": "stratified",
"keep_search_state": True,
"retrain_full": False,
"auto_augment": False,
"verbose": 0,
"allow_label_overlap": False, # Precise mode
}
automl.fit(x_train, y_train, **automl_settings)
# Check that there's no overlap (or minimal overlap for single-instance classes)
input_size = len(x_train)
train_size = len(automl._state.X_train)
val_size = len(automl._state.X_val)
# Verify all classes are represented
all_labels = set(np.unique(y_train))
# Should have no overlap or minimal overlap
overlap = train_size + val_size - input_size
assert overlap <= len(all_labels), f"Excessive overlap: {overlap}"
# Verify all classes are represented
train_labels = set(np.unique(automl._state.y_train))
val_labels = set(np.unique(automl._state.y_val))
combined_labels = train_labels.union(val_labels)
assert combined_labels == all_labels, f"Not all labels present. All: {all_labels}, Combined: {combined_labels}"
print(
f"✓ Test passed (precise mode): Input: {input_size}, Train: {train_size}, Val: {val_size}, "
f"Overlap: {overlap}"
)
def test_uniform_split_with_overlap_control():
"""Test with uniform split and both overlap modes"""
# Load iris dataset
dic_data = load_iris(as_frame=True)
iris_data = dic_data["frame"]
# Prepare data
x_train = iris_data[["sepal length (cm)", "sepal width (cm)", "petal length (cm)", "petal width (cm)"]].to_numpy()
y_train = iris_data["target"]
# Test precise mode with uniform split
automl = AutoML()
automl_settings = {
"max_iter": 5,
"metric": "accuracy",
"task": "classification",
"estimator_list": ["lgbm"],
"eval_method": "holdout",
"split_type": "uniform",
"keep_search_state": True,
"retrain_full": False,
"auto_augment": False,
"verbose": 0,
"allow_label_overlap": False, # Precise mode
}
automl.fit(x_train, y_train, **automl_settings)
input_size = len(x_train)
train_size = len(automl._state.X_train)
val_size = len(automl._state.X_val)
# Verify all classes are represented
train_labels = set(np.unique(automl._state.y_train))
val_labels = set(np.unique(automl._state.y_val))
all_labels = set(np.unique(y_train))
combined_labels = train_labels.union(val_labels)
assert combined_labels == all_labels, "Not all labels present with uniform split"
print(f"✓ Test passed (uniform split): Input: {input_size}, Train: {train_size}, Val: {val_size}")
def test_with_sample_weights():
"""Test label overlap handling with sample weights"""
# Create a simple dataset
X, y = make_classification(
n_samples=200,
n_features=10,
n_informative=5,
n_redundant=2,
n_classes=3,
n_clusters_per_class=1,
random_state=42,
)
# Create sample weights (giving more weight to some samples)
sample_weight = np.random.uniform(0.5, 2.0, size=len(y))
# Test fast mode with sample weights
automl_fast = AutoML()
automl_fast.fit(
X,
y,
task="classification",
metric="accuracy",
estimator_list=["lgbm"],
eval_method="holdout",
split_type="stratified",
max_iter=3,
keep_search_state=True,
retrain_full=False,
auto_augment=False,
verbose=0,
allow_label_overlap=True, # Fast mode
sample_weight=sample_weight,
)
# Verify all labels present
train_labels_fast = set(np.unique(automl_fast._state.y_train))
val_labels_fast = set(np.unique(automl_fast._state.y_val))
all_labels = set(np.unique(y))
assert train_labels_fast == all_labels, "Not all labels in train (fast mode with weights)"
assert val_labels_fast == all_labels, "Not all labels in val (fast mode with weights)"
# Test precise mode with sample weights
automl_precise = AutoML()
automl_precise.fit(
X,
y,
task="classification",
metric="accuracy",
estimator_list=["lgbm"],
eval_method="holdout",
split_type="stratified",
max_iter=3,
keep_search_state=True,
retrain_full=False,
auto_augment=False,
verbose=0,
allow_label_overlap=False, # Precise mode
sample_weight=sample_weight,
)
# Verify all labels present
train_labels_precise = set(np.unique(automl_precise._state.y_train))
val_labels_precise = set(np.unique(automl_precise._state.y_val))
combined_labels = train_labels_precise.union(val_labels_precise)
assert combined_labels == all_labels, "Not all labels present (precise mode with weights)"
print("✓ Test passed with sample weights (fast and precise modes)")
def test_single_instance_class():
"""Test handling of single-instance classes"""
# Create imbalanced dataset where one class has only 1 instance
X = np.random.randn(50, 4)
y = np.array([0] * 40 + [1] * 9 + [2] * 1) # Class 2 has only 1 instance
# Test precise mode - should add single instance to both sets
automl = AutoML()
automl.fit(
X,
y,
task="classification",
metric="accuracy",
estimator_list=["lgbm"],
eval_method="holdout",
split_type="uniform",
max_iter=3,
keep_search_state=True,
retrain_full=False,
auto_augment=False,
verbose=0,
allow_label_overlap=False, # Precise mode
)
# Verify all labels present
train_labels = set(np.unique(automl._state.y_train))
val_labels = set(np.unique(automl._state.y_val))
all_labels = set(np.unique(y))
# Single-instance class should be in both sets
combined_labels = train_labels.union(val_labels)
assert combined_labels == all_labels, "Not all labels present with single-instance class"
# Check that single-instance class (label 2) is in both sets
assert 2 in train_labels, "Single-instance class not in train"
assert 2 in val_labels, "Single-instance class not in val"
print("✓ Test passed with single-instance class")
if __name__ == "__main__":
test_allow_label_overlap_true()
test_allow_label_overlap_false()
test_uniform_split_with_overlap_control()
test_with_sample_weights()
test_single_instance_class()
print("\n✓ All tests passed!")

View File

@@ -0,0 +1,236 @@
"""Tests for the public preprocessor APIs."""
import unittest
import numpy as np
import pandas as pd
from sklearn.datasets import load_breast_cancer, load_diabetes
from flaml import AutoML
class TestPreprocessAPI(unittest.TestCase):
"""Test cases for the public preprocess() API methods."""
def test_automl_preprocess_before_fit(self):
"""Test that calling preprocess before fit raises an error."""
automl = AutoML()
X_test = np.array([[1, 2, 3], [4, 5, 6]])
with self.assertRaises(AttributeError) as context:
automl.preprocess(X_test)
# Check that an error is raised about not being fitted
self.assertIn("fit()", str(context.exception))
def test_automl_preprocess_classification(self):
"""Test task-level preprocessing for classification."""
# Load dataset
X, y = load_breast_cancer(return_X_y=True)
X_train, y_train = X[:400], y[:400]
X_test = X[400:450]
# Train AutoML
automl = AutoML()
automl_settings = {
"max_iter": 5,
"task": "classification",
"metric": "accuracy",
"estimator_list": ["lgbm"],
"verbose": 0,
}
automl.fit(X_train, y_train, **automl_settings)
# Test task-level preprocessing
X_preprocessed = automl.preprocess(X_test)
# Verify the output is not None and has the right shape
self.assertIsNotNone(X_preprocessed)
self.assertEqual(X_preprocessed.shape[0], X_test.shape[0])
def test_automl_preprocess_regression(self):
"""Test task-level preprocessing for regression."""
# Load dataset
X, y = load_diabetes(return_X_y=True)
X_train, y_train = X[:300], y[:300]
X_test = X[300:350]
# Train AutoML
automl = AutoML()
automl_settings = {
"max_iter": 5,
"task": "regression",
"metric": "r2",
"estimator_list": ["lgbm"],
"verbose": 0,
}
automl.fit(X_train, y_train, **automl_settings)
# Test task-level preprocessing
X_preprocessed = automl.preprocess(X_test)
# Verify the output
self.assertIsNotNone(X_preprocessed)
self.assertEqual(X_preprocessed.shape[0], X_test.shape[0])
def test_automl_preprocess_with_dataframe(self):
"""Test task-level preprocessing with pandas DataFrame."""
# Create a simple dataset
X_train = pd.DataFrame(
{
"feature1": [1, 2, 3, 4, 5] * 20,
"feature2": [5, 4, 3, 2, 1] * 20,
"category": ["a", "b", "a", "b", "a"] * 20,
}
)
y_train = pd.Series([0, 1, 0, 1, 0] * 20)
X_test = pd.DataFrame(
{
"feature1": [6, 7, 8],
"feature2": [1, 2, 3],
"category": ["a", "b", "a"],
}
)
# Train AutoML
automl = AutoML()
automl_settings = {
"max_iter": 5,
"task": "classification",
"metric": "accuracy",
"estimator_list": ["lgbm"],
"verbose": 0,
}
automl.fit(X_train, y_train, **automl_settings)
# Test preprocessing
X_preprocessed = automl.preprocess(X_test)
# Verify the output - check the number of rows matches
self.assertIsNotNone(X_preprocessed)
preprocessed_len = len(X_preprocessed) if hasattr(X_preprocessed, "__len__") else X_preprocessed.shape[0]
self.assertEqual(preprocessed_len, len(X_test))
def test_estimator_preprocess(self):
"""Test estimator-level preprocessing."""
# Load dataset
X, y = load_breast_cancer(return_X_y=True)
X_train, y_train = X[:400], y[:400]
X_test = X[400:450]
# Train AutoML
automl = AutoML()
automl_settings = {
"max_iter": 5,
"task": "classification",
"metric": "accuracy",
"estimator_list": ["lgbm"],
"verbose": 0,
}
automl.fit(X_train, y_train, **automl_settings)
# Get the trained estimator
estimator = automl.model
self.assertIsNotNone(estimator)
# First apply task-level preprocessing
X_task_preprocessed = automl.preprocess(X_test)
# Then apply estimator-level preprocessing
X_estimator_preprocessed = estimator.preprocess(X_task_preprocessed)
# Verify the output
self.assertIsNotNone(X_estimator_preprocessed)
self.assertEqual(X_estimator_preprocessed.shape[0], X_test.shape[0])
def test_preprocess_pipeline(self):
"""Test the complete preprocessing pipeline (task-level then estimator-level)."""
# Load dataset
X, y = load_breast_cancer(return_X_y=True)
X_train, y_train = X[:400], y[:400]
X_test = X[400:450]
# Train AutoML
automl = AutoML()
automl_settings = {
"max_iter": 5,
"task": "classification",
"metric": "accuracy",
"estimator_list": ["lgbm"],
"verbose": 0,
}
automl.fit(X_train, y_train, **automl_settings)
# Apply the complete preprocessing pipeline
X_task_preprocessed = automl.preprocess(X_test)
X_final = automl.model.preprocess(X_task_preprocessed)
# Verify predictions work with preprocessed data
# The internal predict already does this preprocessing,
# but we verify our manual preprocessing gives consistent results
y_pred_manual = automl.model._model.predict(X_final)
y_pred_auto = automl.predict(X_test)
# Both should give the same predictions
np.testing.assert_array_equal(y_pred_manual, y_pred_auto)
def test_preprocess_with_mixed_types(self):
"""Test preprocessing with mixed data types."""
# Create dataset with mixed types
X_train = pd.DataFrame(
{
"numeric1": np.random.rand(100),
"numeric2": np.random.randint(0, 100, 100),
"categorical": np.random.choice(["cat", "dog", "bird"], 100),
"boolean": np.random.choice([True, False], 100),
}
)
y_train = pd.Series(np.random.randint(0, 2, 100))
X_test = pd.DataFrame(
{
"numeric1": np.random.rand(10),
"numeric2": np.random.randint(0, 100, 10),
"categorical": np.random.choice(["cat", "dog", "bird"], 10),
"boolean": np.random.choice([True, False], 10),
}
)
# Train AutoML
automl = AutoML()
automl_settings = {
"max_iter": 5,
"task": "classification",
"metric": "accuracy",
"estimator_list": ["lgbm"],
"verbose": 0,
}
automl.fit(X_train, y_train, **automl_settings)
# Test preprocessing
X_preprocessed = automl.preprocess(X_test)
# Verify the output
self.assertIsNotNone(X_preprocessed)
def test_estimator_preprocess_without_automl(self):
"""Test that estimator.preprocess() can be used independently."""
from flaml.automl.model import LGBMEstimator
# Create a simple estimator
X_train = np.random.rand(100, 5)
y_train = np.random.randint(0, 2, 100)
estimator = LGBMEstimator(task="classification")
estimator.fit(X_train, y_train)
# Test preprocessing
X_test = np.random.rand(10, 5)
X_preprocessed = estimator.preprocess(X_test)
# Verify the output
self.assertIsNotNone(X_preprocessed)
self.assertEqual(X_preprocessed.shape, X_test.shape)
if __name__ == "__main__":
unittest.main()

View File

@@ -130,7 +130,7 @@ class TestRegression(unittest.TestCase):
)
automl.fit(X_train=X_train, y_train=y_train, X_val=X_val, y_val=y_val, **settings)
def test_parallel(self, hpo_method=None):
def test_parallel_and_pickle(self, hpo_method=None):
automl_experiment = AutoML()
automl_settings = {
"time_budget": 10,
@@ -153,6 +153,18 @@ class TestRegression(unittest.TestCase):
except ImportError:
return
# test pickle and load_pickle, should work for prediction
automl_experiment.pickle("automl_xgboost_spark.pkl")
automl_loaded = AutoML().load_pickle("automl_xgboost_spark.pkl")
assert automl_loaded.best_estimator == automl_experiment.best_estimator
assert automl_loaded.best_loss == automl_experiment.best_loss
automl_loaded.predict(X_train)
import shutil
shutil.rmtree("automl_xgboost_spark.pkl", ignore_errors=True)
shutil.rmtree("automl_xgboost_spark.pkl.flaml_artifacts", ignore_errors=True)
def test_sparse_matrix_regression_holdout(self):
X_train = scipy.sparse.random(8, 100)
y_train = np.random.uniform(size=8)

View File

@@ -0,0 +1,89 @@
"""Test sklearn 1.7+ compatibility for estimator type detection.
This test ensures that FLAML estimators are properly recognized as
regressors or classifiers by sklearn's is_regressor() and is_classifier()
functions, which is required for sklearn 1.7+ ensemble methods.
"""
import pytest
from sklearn.base import is_classifier, is_regressor
from flaml.automl.model import (
ExtraTreesEstimator,
LGBMEstimator,
RandomForestEstimator,
XGBoostSklearnEstimator,
)
def test_extra_trees_regressor_type():
"""Test that ExtraTreesEstimator with regression task is recognized as regressor."""
est = ExtraTreesEstimator(task="regression")
assert is_regressor(est), "ExtraTreesEstimator(task='regression') should be recognized as a regressor"
assert not is_classifier(est), "ExtraTreesEstimator(task='regression') should not be recognized as a classifier"
def test_extra_trees_classifier_type():
"""Test that ExtraTreesEstimator with classification task is recognized as classifier."""
est = ExtraTreesEstimator(task="binary")
assert is_classifier(est), "ExtraTreesEstimator(task='binary') should be recognized as a classifier"
assert not is_regressor(est), "ExtraTreesEstimator(task='binary') should not be recognized as a regressor"
est = ExtraTreesEstimator(task="multiclass")
assert is_classifier(est), "ExtraTreesEstimator(task='multiclass') should be recognized as a classifier"
assert not is_regressor(est), "ExtraTreesEstimator(task='multiclass') should not be recognized as a regressor"
def test_random_forest_regressor_type():
"""Test that RandomForestEstimator with regression task is recognized as regressor."""
est = RandomForestEstimator(task="regression")
assert is_regressor(est), "RandomForestEstimator(task='regression') should be recognized as a regressor"
assert not is_classifier(est), "RandomForestEstimator(task='regression') should not be recognized as a classifier"
def test_random_forest_classifier_type():
"""Test that RandomForestEstimator with classification task is recognized as classifier."""
est = RandomForestEstimator(task="binary")
assert is_classifier(est), "RandomForestEstimator(task='binary') should be recognized as a classifier"
assert not is_regressor(est), "RandomForestEstimator(task='binary') should not be recognized as a regressor"
def test_lgbm_regressor_type():
"""Test that LGBMEstimator with regression task is recognized as regressor."""
est = LGBMEstimator(task="regression")
assert is_regressor(est), "LGBMEstimator(task='regression') should be recognized as a regressor"
assert not is_classifier(est), "LGBMEstimator(task='regression') should not be recognized as a classifier"
def test_lgbm_classifier_type():
"""Test that LGBMEstimator with classification task is recognized as classifier."""
est = LGBMEstimator(task="binary")
assert is_classifier(est), "LGBMEstimator(task='binary') should be recognized as a classifier"
assert not is_regressor(est), "LGBMEstimator(task='binary') should not be recognized as a regressor"
def test_xgboost_regressor_type():
"""Test that XGBoostSklearnEstimator with regression task is recognized as regressor."""
est = XGBoostSklearnEstimator(task="regression")
assert is_regressor(est), "XGBoostSklearnEstimator(task='regression') should be recognized as a regressor"
assert not is_classifier(est), "XGBoostSklearnEstimator(task='regression') should not be recognized as a classifier"
def test_xgboost_classifier_type():
"""Test that XGBoostSklearnEstimator with classification task is recognized as classifier."""
est = XGBoostSklearnEstimator(task="binary")
assert is_classifier(est), "XGBoostSklearnEstimator(task='binary') should be recognized as a classifier"
assert not is_regressor(est), "XGBoostSklearnEstimator(task='binary') should not be recognized as a regressor"
if __name__ == "__main__":
# Run all tests
test_extra_trees_regressor_type()
test_extra_trees_classifier_type()
test_random_forest_regressor_type()
test_random_forest_classifier_type()
test_lgbm_regressor_type()
test_lgbm_classifier_type()
test_xgboost_regressor_type()
test_xgboost_classifier_type()
print("All sklearn 1.7+ compatibility tests passed!")

View File

@@ -183,6 +183,8 @@ def test_lgbm():
def test_xgboost():
import numpy as np
from flaml.default import XGBClassifier, XGBRegressor
X_train, y_train = load_breast_cancer(return_X_y=True, as_frame=True)
@@ -200,6 +202,65 @@ def test_xgboost():
regressor.predict(X_train)
print(regressor)
# Test eval_set with categorical features (Issue: eval_set not preprocessed)
np.random.seed(42)
n = 500
df = pd.DataFrame(
{
"num1": np.random.randn(n),
"num2": np.random.rand(n) * 10,
"cat1": np.random.choice(["A", "B", "C"], size=n),
"cat2": np.random.choice(["X", "Y"], size=n),
"target": np.random.choice([0, 1], size=n),
}
)
X = df.drop(columns="target")
y = df["target"]
X_train_cat, X_valid_cat, y_train_cat, y_valid_cat = train_test_split(X, y, test_size=0.2, random_state=0)
# Convert categorical columns to pandas 'category' dtype
for col in X_train_cat.select_dtypes(include="object").columns:
X_train_cat[col] = X_train_cat[col].astype("category")
X_valid_cat[col] = X_valid_cat[col].astype("category")
# Test XGBClassifier with eval_set
classifier_eval = XGBClassifier(
tree_method="hist",
enable_categorical=True,
eval_metric="logloss",
use_label_encoder=False,
early_stopping_rounds=10,
random_state=0,
n_estimators=10,
)
classifier_eval.fit(X_train_cat, y_train_cat, eval_set=[(X_valid_cat, y_valid_cat)], verbose=False)
y_pred = classifier_eval.predict(X_valid_cat)
assert len(y_pred) == len(y_valid_cat)
# Test XGBRegressor with eval_set
y_reg = df["num1"] # Use num1 as target for regression
X_reg = df.drop(columns=["num1", "target"])
X_train_reg, X_valid_reg, y_train_reg, y_valid_reg = train_test_split(X_reg, y_reg, test_size=0.2, random_state=0)
for col in X_train_reg.select_dtypes(include="object").columns:
X_train_reg[col] = X_train_reg[col].astype("category")
X_valid_reg[col] = X_valid_reg[col].astype("category")
regressor_eval = XGBRegressor(
tree_method="hist",
enable_categorical=True,
eval_metric="rmse",
early_stopping_rounds=10,
random_state=0,
n_estimators=10,
)
regressor_eval.fit(X_train_reg, y_train_reg, eval_set=[(X_valid_reg, y_valid_reg)], verbose=False)
y_pred = regressor_eval.predict(X_valid_reg)
assert len(y_pred) == len(y_valid_reg)
def test_nobudget():
X_train, y_train = load_breast_cancer(return_X_y=True, as_frame=True)

View File

@@ -165,7 +165,7 @@ def test_spark_synapseml_rank():
_test_spark_synapseml_lightgbm(spark, "rank")
def test_spark_input_df():
def test_spark_input_df_and_pickle():
import pandas as pd
file_url = "https://mmlspark.blob.core.windows.net/publicwasb/company_bankruptcy_prediction_data.csv"
@@ -201,6 +201,19 @@ def test_spark_input_df():
**settings,
)
# test pickle and load_pickle, should work for prediction
automl.pickle("automl_spark.pkl")
automl_loaded = AutoML().load_pickle("automl_spark.pkl")
assert automl_loaded.best_estimator == automl.best_estimator
assert automl_loaded.best_loss == automl.best_loss
automl_loaded.predict(df)
automl_loaded.model.estimator.transform(test_data)
import shutil
shutil.rmtree("automl_spark.pkl", ignore_errors=True)
shutil.rmtree("automl_spark.pkl.flaml_artifacts", ignore_errors=True)
if estimator_list == ["rf_spark"]:
return
@@ -393,13 +406,13 @@ def test_auto_convert_dtypes_spark():
if __name__ == "__main__":
test_spark_synapseml_classification()
test_spark_synapseml_regression()
test_spark_synapseml_rank()
test_spark_input_df()
test_get_random_dataframe()
test_auto_convert_dtypes_pandas()
test_auto_convert_dtypes_spark()
# test_spark_synapseml_classification()
# test_spark_synapseml_regression()
# test_spark_synapseml_rank()
test_spark_input_df_and_pickle()
# test_get_random_dataframe()
# test_auto_convert_dtypes_pandas()
# test_auto_convert_dtypes_spark()
# import cProfile
# import pstats

View File

@@ -28,10 +28,10 @@ skip_spark = not spark_available
pytestmark = [pytest.mark.skipif(skip_spark, reason="Spark is not installed. Skip all spark tests."), pytest.mark.spark]
def test_parallel_xgboost(hpo_method=None, data_size=1000):
def test_parallel_xgboost_and_pickle(hpo_method=None, data_size=1000):
automl_experiment = AutoML()
automl_settings = {
"time_budget": 10,
"time_budget": 30,
"metric": "ap",
"task": "classification",
"log_file_name": "test/sparse_classification.log",
@@ -53,15 +53,27 @@ def test_parallel_xgboost(hpo_method=None, data_size=1000):
print(automl_experiment.best_iteration)
print(automl_experiment.best_estimator)
# test pickle and load_pickle, should work for prediction
automl_experiment.pickle("automl_xgboost_spark.pkl")
automl_loaded = AutoML().load_pickle("automl_xgboost_spark.pkl")
assert automl_loaded.best_estimator == automl_experiment.best_estimator
assert automl_loaded.best_loss == automl_experiment.best_loss
automl_loaded.predict(X_train)
import shutil
shutil.rmtree("automl_xgboost_spark.pkl", ignore_errors=True)
shutil.rmtree("automl_xgboost_spark.pkl.flaml_artifacts", ignore_errors=True)
def test_parallel_xgboost_others():
# use random search as the hpo_method
test_parallel_xgboost(hpo_method="random")
test_parallel_xgboost_and_pickle(hpo_method="random")
@pytest.mark.skip(reason="currently not supporting too large data, will support spark dataframe in the future")
def test_large_dataset():
test_parallel_xgboost(data_size=90000000)
test_parallel_xgboost_and_pickle(data_size=90000000)
@pytest.mark.skipif(
@@ -95,10 +107,10 @@ def test_custom_learner(data_size=1000):
if __name__ == "__main__":
test_parallel_xgboost()
test_parallel_xgboost_others()
# test_large_dataset()
if skip_my_learner:
print("please run pytest in the root directory of FLAML, i.e., the directory that contains the setup.py file")
else:
test_custom_learner()
test_parallel_xgboost_and_pickle()
# test_parallel_xgboost_others()
# # test_large_dataset()
# if skip_my_learner:
# print("please run pytest in the root directory of FLAML, i.e., the directory that contains the setup.py file")
# else:
# test_custom_learner()

View File

@@ -262,7 +262,11 @@ class TestMultiClass(unittest.TestCase):
"n_concurrent_trials": 2,
"use_spark": True,
}
X_train = scipy.sparse.random(1554, 21, dtype=int)
# NOTE: Avoid `dtype=int` here. On some NumPy/SciPy combinations (notably
# Windows + Python 3.13), `scipy.sparse.random(..., dtype=int)` may trigger
# integer sampling paths which raise "low is out of bounds for int32".
# A float sparse matrix is sufficient to validate sparse-input support.
X_train = scipy.sparse.random(1554, 21, dtype=np.float32)
y_train = np.random.randint(3, size=1554)
automl_experiment.fit(X_train=X_train, y_train=y_train, **automl_settings)
print(automl_experiment.classes_)

View File

@@ -0,0 +1,99 @@
"""Tests for SearchThread nested dictionary update fix."""
import pytest
from flaml.tune.searcher.search_thread import _recursive_dict_update
def test_recursive_dict_update_simple():
"""Test simple non-nested dictionary update."""
target = {"a": 1, "b": 2}
source = {"c": 3}
_recursive_dict_update(target, source)
assert target == {"a": 1, "b": 2, "c": 3}
def test_recursive_dict_update_override():
"""Test that source values override target values for non-dict values."""
target = {"a": 1, "b": 2}
source = {"b": 3}
_recursive_dict_update(target, source)
assert target == {"a": 1, "b": 3}
def test_recursive_dict_update_nested():
"""Test nested dictionary merge (the main use case for XGBoost params)."""
target = {
"num_boost_round": 10,
"params": {
"max_depth": 12,
"eta": 0.020168455186106736,
"min_child_weight": 1.4504723523894132,
"scale_pos_weight": 3.794258636185337,
"gamma": 0.4985070123025904,
},
}
source = {
"params": {
"verbosity": 3,
"booster": "gbtree",
"eval_metric": "auc",
"tree_method": "hist",
"objective": "binary:logistic",
}
}
_recursive_dict_update(target, source)
# Check that sampled params are preserved
assert target["params"]["max_depth"] == 12
assert target["params"]["eta"] == 0.020168455186106736
assert target["params"]["min_child_weight"] == 1.4504723523894132
assert target["params"]["scale_pos_weight"] == 3.794258636185337
assert target["params"]["gamma"] == 0.4985070123025904
# Check that const params are added
assert target["params"]["verbosity"] == 3
assert target["params"]["booster"] == "gbtree"
assert target["params"]["eval_metric"] == "auc"
assert target["params"]["tree_method"] == "hist"
assert target["params"]["objective"] == "binary:logistic"
# Check top-level param is preserved
assert target["num_boost_round"] == 10
def test_recursive_dict_update_deeply_nested():
"""Test deeply nested dictionary merge."""
target = {"a": {"b": {"c": 1, "d": 2}}}
source = {"a": {"b": {"e": 3}}}
_recursive_dict_update(target, source)
assert target == {"a": {"b": {"c": 1, "d": 2, "e": 3}}}
def test_recursive_dict_update_mixed_types():
"""Test that non-dict values in source replace dict values in target."""
target = {"a": {"b": 1}}
source = {"a": 2}
_recursive_dict_update(target, source)
assert target == {"a": 2}
def test_recursive_dict_update_empty_dicts():
"""Test with empty dictionaries."""
target = {}
source = {"a": 1}
_recursive_dict_update(target, source)
assert target == {"a": 1}
target = {"a": 1}
source = {}
_recursive_dict_update(target, source)
assert target == {"a": 1}
def test_recursive_dict_update_none_values():
"""Test that None values are properly handled."""
target = {"a": 1, "b": None}
source = {"b": 2, "c": None}
_recursive_dict_update(target, source)
assert target == {"a": 1, "b": 2, "c": None}

View File

@@ -324,3 +324,26 @@ def test_no_optuna():
import flaml.tune.searcher.suggestion
subprocess.check_call([sys.executable, "-m", "pip", "install", "optuna==2.8.0"])
def test_unresolved_search_space(caplog):
import logging
from flaml import tune
from flaml.tune.searcher.blendsearch import BlendSearch
if caplog is not None:
caplog.set_level(logging.INFO)
BlendSearch(metric="loss", mode="min", space={"lr": tune.uniform(0.001, 0.1), "depth": tune.randint(1, 10)})
try:
text = caplog.text
except AttributeError:
text = ""
assert (
"unresolved search space" not in text and text
), "BlendSearch should not produce warning about unresolved search space"
if __name__ == "__main__":
test_unresolved_search_space(None)

View File

@@ -4,7 +4,7 @@
**Date and Time**: 09.09.2024, 15:30-17:00
Location: Sorbonne University, 4 place Jussieu, 75005 Paris
Location: Sorbonne University, 4 place Jussieu, 75005 Paris
Duration: 1.5 hours

View File

@@ -4,7 +4,7 @@
**Date and Time**: 04-26, 09:0010:30 PT.
Location: Microsoft Conference Center, Seattle, WA.
Location: Microsoft Conference Center, Seattle, WA.
Duration: 1.5 hours

View File

@@ -0,0 +1,159 @@
# Best Practices
This page collects practical guidance for using FLAML effectively across common tasks.
## General tips
- Start simple: set `task`, `time_budget`, and keep `metric="auto"` unless you have a strong reason to override.
- Prefer correct splits: ensure your evaluation strategy matches your data (time series vs i.i.d., grouped data, etc.).
- Keep estimator lists explicit when debugging: start with a small `estimator_list` and expand.
- Use built-in discovery helpers to avoid stale hardcoded lists:
```python
from flaml import AutoML
from flaml.automl.task.factory import task_factory
automl = AutoML()
print("Built-in sklearn metrics:", sorted(automl.supported_metrics[0]))
print(
"classification estimators:",
sorted(task_factory("classification").estimators.keys()),
)
```
## Classification
- **Metric**: for binary classification, `metric="roc_auc"` is common; for multiclass, `metric="log_loss"` is often robust.
- **Imbalanced data**:
- pass `sample_weight` to `AutoML.fit()`;
- consider setting class weights via `custom_hp` / `fit_kwargs_by_estimator` for specific estimators (see [FAQ](FAQ)).
- **Probability vs label metrics**: use `roc_auc` / `log_loss` when you care about calibrated probabilities.
- **Label overlap control** (holdout evaluation only):
- By default, FLAML uses a fast strategy (`allow_label_overlap=True`) that ensures all labels are present in both training and validation sets by adding missing labels' first instances to both sets. This is efficient but may create minor overlap.
- For strict no-overlap validation, use `allow_label_overlap=False`. This slower but more precise strategy intelligently re-splits multi-instance classes to avoid overlap while maintaining label completeness.
```python
from flaml import AutoML
# Fast version (default): allows overlap for efficiency
automl_fast = AutoML()
automl_fast.fit(
X_train,
y_train,
task="classification",
eval_method="holdout",
allow_label_overlap=True,
) # default
# Precise version: avoids overlap when possible
automl_precise = AutoML()
automl_precise.fit(
X_train,
y_train,
task="classification",
eval_method="holdout",
allow_label_overlap=False,
) # slower but more precise
```
Note: This only affects holdout evaluation. CV and custom validation sets are unaffected.
## Regression
- **Default metric**: `metric="r2"` (minimizes `1 - r2`).
- If your target scale matters (e.g., dollar error), consider `mae`/`rmse`.
## Learning to rank
- Use `task="rank"` with group information (`groups` / `groups_val`) so metrics like `ndcg` and `ndcg@k` are meaningful.
- If you pass `metric="ndcg@10"`, also pass `groups` so FLAML can compute group-aware NDCG.
## Time series forecasting
- Use time-aware splitting. For holdout validation, set `eval_method="holdout"` and use a time-ordered dataset.
- Prefer supplying a DataFrame with a clear time column when possible.
- Optional time-series estimators depend on optional dependencies. To list what is available in your environment:
```python
from flaml.automl.task.factory import task_factory
print("forecast:", sorted(task_factory("forecast").estimators.keys()))
```
## NLP (Transformers)
- Install the optional dependency: `pip install "flaml[hf]"`.
- When you provide a custom metric, ensure it returns `(metric_to_minimize, metrics_to_log)` with stable keys.
## Speed, stability, and tricky settings
- **Time budget vs convergence**: if you see warnings about not all estimators converging, increase `time_budget` or reduce `estimator_list`.
- **Memory pressure / OOM**:
- set `free_mem_ratio` (e.g., `0.2`) to keep free memory above a threshold;
- set `model_history=False` to reduce stored artifacts;
- **Reproducibility**: set `seed` and keep `n_jobs` fixed; expect some runtime variance.
## Persisting models
FLAML supports **both** MLflow logging and pickle-based persistence. For production deployment, MLflow logging is typically the most important option because it plugs into the MLflow ecosystem (tracking, model registry, serving, governance). For quick local reuse, persisting the whole `AutoML` object via pickle is often the most convenient.
### Option 1: MLflow logging (recommended for production)
When you run `AutoML.fit()` inside an MLflow run, FLAML can log metrics/params automatically (disable via `mlflow_logging=False` if needed). To persist the trained `AutoML` object as a model artifact and reuse MLflow tooling end-to-end:
```python
import mlflow
import numpy as np
from sklearn.datasets import load_iris
from sklearn.model_selection import train_test_split
from flaml import AutoML
X, y = load_iris(return_X_y=True, as_frame=True)
X_train, X_test, y_train, y_test = train_test_split(
X, y, test_size=0.2, random_state=42
)
automl = AutoML()
mlflow.set_experiment("flaml")
with mlflow.start_run(run_name="flaml_run") as run:
automl.fit(X_train, y_train, task="classification", time_budget=3)
run_id = run.info.run_id
# Later (or in a different process)
automl2 = mlflow.sklearn.load_model(f"runs:/{run_id}/model")
assert np.array_equal(automl2.predict(X_test), automl.predict(X_test))
```
### Option 2: Pickle the full `AutoML` instance (convenient)
Pickling stores the *entire* `AutoML` instance (not just the best estimator). This is useful when you prefer not to rely on MLflow or when you want to reuse additional attributes of the AutoML object without retraining.
In Microsoft Fabric scenarios, additional attributes is particularly important for re-plotting visualization figures without requiring model retraining.
```python
import mlflow
import numpy as np
from sklearn.datasets import load_iris
from sklearn.model_selection import train_test_split
from flaml import AutoML
X, y = load_iris(return_X_y=True, as_frame=True)
X_train, X_test, y_train, y_test = train_test_split(
X, y, test_size=0.2, random_state=42
)
automl = AutoML()
mlflow.set_experiment("flaml")
with mlflow.start_run(run_name="flaml_run") as run:
automl.fit(X_train, y_train, task="classification", time_budget=3)
automl.pickle("automl.pkl")
automl2 = AutoML.load_pickle("automl.pkl")
assert np.array_equal(automl2.predict(X_test), automl.predict(X_test))
assert automl.best_config == automl2.best_config
assert automl.best_loss == automl2.best_loss
assert automl.mlflow_integration.infos == automl2.mlflow_integration.infos
```
See also: [Task-Oriented AutoML](Use-Cases/Task-Oriented-AutoML) and [FAQ](FAQ).

View File

@@ -49,7 +49,7 @@ print(flaml.__version__)
```
- Please ensure all **code snippets and error messages are formatted in
appropriate code blocks**. See [Creating and highlighting code blocks](https://help.github.com/articles/creating-and-highlighting-code-blocks)
appropriate code blocks**. See [Creating and highlighting code blocks](https://help.github.com/articles/creating-and-highlighting-code-blocks)
for more details.
## Becoming a Reviewer
@@ -62,10 +62,10 @@ There is currently no formal reviewer solicitation process. Current reviewers id
```bash
git clone https://github.com/microsoft/FLAML.git
pip install -e FLAML[notebook,autogen]
pip install -e ".[notebook]"
```
In case the `pip install` command fails, try escaping the brackets such as `pip install -e FLAML\[notebook,autogen\]`.
In case the `pip install` command fails, try escaping the brackets such as `pip install -e .\[notebook\]`.
### Docker
@@ -88,7 +88,7 @@ Run `pre-commit install` to install pre-commit into your git hooks. Before you c
### Coverage
Any code you commit should not decrease coverage. To run all unit tests, install the \[test\] option under FLAML/:
Any code you commit should not decrease coverage. To run all unit tests, install the [test] option under FLAML/:
```bash
pip install -e."[test]"

View File

@@ -2,7 +2,7 @@
### Prerequisites
Install the \[automl\] option.
Install the [automl] option.
```bash
pip install "flaml[automl]"

View File

@@ -2,7 +2,7 @@
### Requirements
This example requires GPU. Install the \[automl,hf\] option:
This example requires GPU. Install the [automl,hf] option:
```python
pip install "flaml[automl,hf]"

View File

@@ -2,7 +2,7 @@
### Prerequisites
Install the \[automl\] option.
Install the [automl] option.
```bash
pip install "flaml[automl]"

View File

@@ -2,7 +2,7 @@
### Prerequisites
Install the \[automl\] option.
Install the [automl] option.
```bash
pip install "flaml[automl]"

View File

@@ -2,12 +2,31 @@
### Prerequisites
Install the \[automl,ts_forecast\] option.
Install the [automl,ts_forecast] option.
```bash
pip install "flaml[automl,ts_forecast]"
```
### Understanding the `period` Parameter
The `period` parameter (also called **horizon** in the code) specifies the **forecast horizon** - the number of future time steps the model is trained to predict. For example:
- `period=12` means you want to forecast 12 time steps ahead (e.g., 12 months, 12 days)
- `period=7` means you want to forecast 7 time steps ahead
**Important Note on Prediction**: During the prediction stage, the output length equals the length of `X_test`. This means you can generate predictions for any number of time steps by providing the corresponding timestamps in `X_test`, regardless of the `period` value used during training.
#### Automatic Feature Engineering
**Important**: You do NOT need to manually lag the target variable before training. FLAML handles this automatically:
- **For sklearn-based models** (lgbm, rf, xgboost, extra_tree, catboost): FLAML automatically creates lagged features of both the target variable and any exogenous variables. This transforms the time series forecasting problem into a supervised learning regression problem.
- **For time series native models** (prophet, arima, sarimax, holt-winters): These models have built-in time series forecasting capabilities and handle temporal dependencies natively.
The automatic lagging is implemented internally when you call `automl.fit()` with `task="ts_forecast"` or `task="ts_forecast_classification"`, so you can focus on providing clean input data without worrying about feature engineering.
### Simple NumPy Example
```python

View File

@@ -2,7 +2,7 @@
### Prerequisites for this example
Install the \[automl\] option.
Install the [automl] option.
```bash
pip install "flaml[automl] matplotlib openml"

View File

@@ -2,7 +2,7 @@
### Prerequisites for this example
Install the \[automl\] option.
Install the [automl] option.
```bash
pip install "flaml[automl] matplotlib openml"

View File

@@ -6,7 +6,7 @@ Flamlized estimators automatically use data-dependent default hyperparameter con
### Prerequisites
This example requires the \[autozero\] option.
This example requires the [autozero] option.
```bash
pip install flaml[autozero] lightgbm openml
@@ -67,6 +67,82 @@ X_test.shape: (5160, 8), y_test.shape: (5160,)
[Link to notebook](https://github.com/microsoft/FLAML/blob/main/notebook/zeroshot_lightgbm.ipynb) | [Open in colab](https://colab.research.google.com/github/microsoft/FLAML/blob/main/notebook/zeroshot_lightgbm.ipynb)
## Flamlized LGBMClassifier
### Prerequisites
This example requires the [autozero] option.
```bash
pip install flaml[autozero] lightgbm openml
```
### Zero-shot AutoML
```python
from flaml.automl.data import load_openml_dataset
from flaml.default import LGBMClassifier
from flaml.automl.ml import sklearn_metric_loss_score
X_train, X_test, y_train, y_test = load_openml_dataset(dataset_id=1169, data_dir="./")
lgbm = LGBMClassifier()
lgbm.fit(X_train, y_train)
y_pred = lgbm.predict(X_test)
print(
"flamlized lgbm accuracy",
"=",
1 - sklearn_metric_loss_score("accuracy", y_pred, y_test),
)
print(lgbm)
```
#### Sample output
```
load dataset from ./openml_ds1169.pkl
Dataset name: airlines
X_train.shape: (404537, 7), y_train.shape: (404537,);
X_test.shape: (134846, 7), y_test.shape: (134846,)
flamlized lgbm accuracy = 0.6745
LGBMClassifier(colsample_bytree=0.85, learning_rate=0.05, max_bin=255,
min_child_samples=20, n_estimators=500, num_leaves=31,
reg_alpha=0.01, reg_lambda=0.1, verbose=-1)
```
## Flamlized XGBRegressor
### Prerequisites
This example requires xgboost, sklearn, openml==0.10.2.
### Zero-shot AutoML
```python
from flaml.automl.data import load_openml_dataset
from flaml.default import XGBRegressor
from flaml.automl.ml import sklearn_metric_loss_score
X_train, X_test, y_train, y_test = load_openml_dataset(dataset_id=537, data_dir="./")
xgb = XGBRegressor()
xgb.fit(X_train, y_train)
y_pred = xgb.predict(X_test)
print("flamlized xgb r2", "=", 1 - sklearn_metric_loss_score("r2", y_pred, y_test))
print(xgb)
```
#### Sample output
```
load dataset from ./openml_ds537.pkl
Dataset name: houses
X_train.shape: (15480, 8), y_train.shape: (15480,);
X_test.shape: (5160, 8), y_test.shape: (5160,)
flamlized xgb r2 = 0.8542
XGBRegressor(colsample_bylevel=1, colsample_bytree=0.85, learning_rate=0.05,
max_depth=6, n_estimators=500, reg_alpha=0.01, reg_lambda=1.0,
subsample=0.9)
```
## Flamlized XGBClassifier
### Prerequisites
@@ -112,3 +188,159 @@ XGBClassifier(base_score=0.5, booster='gbtree',
scale_pos_weight=1, subsample=1.0, tree_method='hist',
use_label_encoder=False, validate_parameters=1, verbosity=0)
```
## Flamlized RandomForestRegressor
### Prerequisites
This example requires the [autozero] option.
```bash
pip install flaml[autozero] scikit-learn openml
```
### Zero-shot AutoML
```python
from flaml.automl.data import load_openml_dataset
from flaml.default import RandomForestRegressor
from flaml.automl.ml import sklearn_metric_loss_score
X_train, X_test, y_train, y_test = load_openml_dataset(dataset_id=537, data_dir="./")
rf = RandomForestRegressor()
rf.fit(X_train, y_train)
y_pred = rf.predict(X_test)
print("flamlized rf r2", "=", 1 - sklearn_metric_loss_score("r2", y_pred, y_test))
print(rf)
```
#### Sample output
```
load dataset from ./openml_ds537.pkl
Dataset name: houses
X_train.shape: (15480, 8), y_train.shape: (15480,);
X_test.shape: (5160, 8), y_test.shape: (5160,)
flamlized rf r2 = 0.8521
RandomForestRegressor(max_features=0.8, min_samples_leaf=2, min_samples_split=5,
n_estimators=500)
```
## Flamlized RandomForestClassifier
### Prerequisites
This example requires the [autozero] option.
```bash
pip install flaml[autozero] scikit-learn openml
```
### Zero-shot AutoML
```python
from flaml.automl.data import load_openml_dataset
from flaml.default import RandomForestClassifier
from flaml.automl.ml import sklearn_metric_loss_score
X_train, X_test, y_train, y_test = load_openml_dataset(dataset_id=1169, data_dir="./")
rf = RandomForestClassifier()
rf.fit(X_train, y_train)
y_pred = rf.predict(X_test)
print(
"flamlized rf accuracy",
"=",
1 - sklearn_metric_loss_score("accuracy", y_pred, y_test),
)
print(rf)
```
#### Sample output
```
load dataset from ./openml_ds1169.pkl
Dataset name: airlines
X_train.shape: (404537, 7), y_train.shape: (404537,);
X_test.shape: (134846, 7), y_test.shape: (134846,)
flamlized rf accuracy = 0.6701
RandomForestClassifier(max_features=0.7, min_samples_leaf=3, min_samples_split=5,
n_estimators=500)
```
## Flamlized ExtraTreesRegressor
### Prerequisites
This example requires the [autozero] option.
```bash
pip install flaml[autozero] scikit-learn openml
```
### Zero-shot AutoML
```python
from flaml.automl.data import load_openml_dataset
from flaml.default import ExtraTreesRegressor
from flaml.automl.ml import sklearn_metric_loss_score
X_train, X_test, y_train, y_test = load_openml_dataset(dataset_id=537, data_dir="./")
et = ExtraTreesRegressor()
et.fit(X_train, y_train)
y_pred = et.predict(X_test)
print("flamlized et r2", "=", 1 - sklearn_metric_loss_score("r2", y_pred, y_test))
print(et)
```
#### Sample output
```
load dataset from ./openml_ds537.pkl
Dataset name: houses
X_train.shape: (15480, 8), y_train.shape: (15480,);
X_test.shape: (5160, 8), y_test.shape: (5160,)
flamlized et r2 = 0.8534
ExtraTreesRegressor(max_features=0.75, min_samples_leaf=2, min_samples_split=5,
n_estimators=500)
```
## Flamlized ExtraTreesClassifier
### Prerequisites
This example requires the [autozero] option.
```bash
pip install flaml[autozero] scikit-learn openml
```
### Zero-shot AutoML
```python
from flaml.automl.data import load_openml_dataset
from flaml.default import ExtraTreesClassifier
from flaml.automl.ml import sklearn_metric_loss_score
X_train, X_test, y_train, y_test = load_openml_dataset(dataset_id=1169, data_dir="./")
et = ExtraTreesClassifier()
et.fit(X_train, y_train)
y_pred = et.predict(X_test)
print(
"flamlized et accuracy",
"=",
1 - sklearn_metric_loss_score("accuracy", y_pred, y_test),
)
print(et)
```
#### Sample output
```
load dataset from ./openml_ds1169.pkl
Dataset name: airlines
X_train.shape: (404537, 7), y_train.shape: (404537,);
X_test.shape: (134846, 7), y_test.shape: (134846,)
flamlized et accuracy = 0.6698
ExtraTreesClassifier(max_features=0.7, min_samples_leaf=3, min_samples_split=5,
n_estimators=500)
```

View File

@@ -2,7 +2,7 @@ FLAML can be used together with AzureML. On top of that, using mlflow and ray is
### Prerequisites
Install the \[automl,azureml\] option.
Install the [automl,azureml] option.
```bash
pip install "flaml[automl,azureml]"

View File

@@ -2,7 +2,7 @@ As FLAML's AutoML module can be used a transformer in the Sklearn's pipeline we
### Prerequisites
Install the \[automl\] option.
Install the [automl] option.
```bash
pip install "flaml[automl] openml"

View File

@@ -8,13 +8,114 @@
### About `low_cost_partial_config` in `tune`.
- Definition and purpose: The `low_cost_partial_config` is a dictionary of subset of the hyperparameter coordinates whose value corresponds to a configuration with known low-cost (i.e., low computation cost for training the corresponding model). The concept of low/high-cost is meaningful in the case where a subset of the hyperparameters to tune directly affects the computation cost for training the model. For example, `n_estimators` and `max_leaves` are known to affect the training cost of tree-based learners. We call this subset of hyperparameters, *cost-related hyperparameters*. In such scenarios, if you are aware of low-cost configurations for the cost-related hyperparameters, you are recommended to set them as the `low_cost_partial_config`. Using the tree-based method example again, since we know that small `n_estimators` and `max_leaves` generally correspond to simpler models and thus lower cost, we set `{'n_estimators': 4, 'max_leaves': 4}` as the `low_cost_partial_config` by default (note that `4` is the lower bound of search space for these two hyperparameters), e.g., in [LGBM](https://github.com/microsoft/FLAML/blob/main/flaml/model.py#L215). Configuring `low_cost_partial_config` helps the search algorithms make more cost-efficient choices.
- Definition and purpose: The `low_cost_partial_config` is a dictionary of subset of the hyperparameter coordinates whose value corresponds to a configuration with known low-cost (i.e., low computation cost for training the corresponding model). The concept of low/high-cost is meaningful in the case where a subset of the hyperparameters to tune directly affects the computation cost for training the model. For example, `n_estimators` and `max_leaves` are known to affect the training cost of tree-based learners. We call this subset of hyperparameters, *cost-related hyperparameters*. In such scenarios, if you are aware of low-cost configurations for the cost-related hyperparameters, you are recommended to set them as the `low_cost_partial_config`. Using the tree-based method example again, since we know that small `n_estimators` and `max_leaves` generally correspond to simpler models and thus lower cost, we set `{'n_estimators': 4, 'max_leaves': 4}` as the `low_cost_partial_config` by default (note that `4` is the lower bound of search space for these two hyperparameters), e.g., in [LGBM](https://github.com/microsoft/FLAML/blob/main/flaml/model.py#L215). Configuring `low_cost_partial_config` helps the search algorithms make more cost-efficient choices.
In AutoML, the `low_cost_init_value` in `search_space()` function for each estimator serves the same role.
- Usage in practice: It is recommended to configure it if there are cost-related hyperparameters in your tuning task and you happen to know the low-cost values for them, but it is not required (It is fine to leave it the default value, i.e., `None`).
- How does it work: `low_cost_partial_config` if configured, will be used as an initial point of the search. It also affects the search trajectory. For more details about how does it play a role in the search algorithms, please refer to the papers about the search algorithms used: Section 2 of [Frugal Optimization for Cost-related Hyperparameters (CFO)](https://arxiv.org/pdf/2005.01571.pdf) and Section 3 of [Economical Hyperparameter Optimization with Blended Search Strategy (BlendSearch)](https://openreview.net/pdf?id=VbLH04pRA3).
### How does FLAML handle missing values?
FLAML automatically preprocesses missing values in the input data through its `DataTransformer` class (for classification/regression tasks) and `DataTransformerTS` class (for time series tasks). The preprocessing behavior differs based on the column type:
**Automatic Missing Value Preprocessing:**
FLAML performs the following preprocessing automatically when you call `AutoML.fit()`:
1. **Numerical/Continuous Columns**: Missing values (NaN) in numerical columns are imputed using `sklearn.impute.SimpleImputer` with the **median strategy**. This preprocessing is applied in the `DataTransformer.fit_transform()` method (see `flaml/automl/data.py` lines 357-369 and `flaml/automl/time_series/ts_data.py` lines 429-440).
1. **Categorical Columns**: Missing values in categorical columns (object, category, or string dtypes) are filled with a special placeholder value `"__NAN__"`, which is treated as a distinct category.
**Example of automatic preprocessing:**
```python
from flaml import AutoML
import pandas as pd
import numpy as np
# Data with missing values
X_train = pd.DataFrame(
{
"num_feature": [1.0, 2.0, np.nan, 4.0, 5.0],
"cat_feature": ["A", "B", None, "A", "B"],
}
)
y_train = [0, 1, 0, 1, 0]
# FLAML automatically handles missing values
automl = AutoML()
automl.fit(X_train, y_train, task="classification", time_budget=60)
# Numerical NaNs are imputed with median, categorical None becomes "__NAN__"
```
**Estimator-Specific Native Handling:**
After FLAML's preprocessing, some estimators have additional native missing value handling capabilities:
- **`lgbm`** (LightGBM): After preprocessing, can still handle any remaining NaN values natively by learning optimal split directions.
- **`xgboost`** (XGBoost): After preprocessing, can handle remaining NaN values by learning the best direction during training.
- **`xgb_limitdepth`** (XGBoost with depth limit): Same as `xgboost`.
- **`catboost`** (CatBoost): After preprocessing, has additional sophisticated missing value handling strategies. See [CatBoost documentation](https://catboost.ai/en/docs/concepts/algorithm-missing-values-processing).
- **`histgb`** (HistGradientBoosting): After preprocessing, can still handle NaN values natively.
**Estimators that rely on preprocessing:**
These estimators rely on FLAML's automatic preprocessing since they cannot handle missing values directly:
- **`rf`** (RandomForest): Requires preprocessing (automatically done by FLAML).
- **`extra_tree`** (ExtraTrees): Requires preprocessing (automatically done by FLAML).
- **`lrl1`**, **`lrl2`** (LogisticRegression): Require preprocessing (automatically done by FLAML).
- **`kneighbor`** (KNeighbors): Requires preprocessing (automatically done by FLAML).
- **`sgd`** (SGDClassifier/Regressor): Require preprocessing (automatically done by FLAML).
**Advanced: Customizing Missing Value Handling**
In most cases, FLAML's automatic preprocessing (median imputation for numerical, "__NAN__" for categorical) works well. However, if you need custom preprocessing:
1. **Skip automatic preprocessing** using the `skip_transform` parameter:
```python
from flaml import AutoML
from sklearn.impute import SimpleImputer
import numpy as np
# Custom preprocessing with different strategy
imputer = SimpleImputer(strategy="mean") # Use mean instead of median
X_train_preprocessed = imputer.fit_transform(X_train)
X_test_preprocessed = imputer.transform(X_test)
# Skip FLAML's automatic preprocessing
automl = AutoML()
automl.fit(
X_train_preprocessed,
y_train,
task="classification",
time_budget=60,
skip_transform=True, # Skip automatic preprocessing
)
```
2. **Use sklearn Pipeline** for integrated custom preprocessing:
```python
from flaml import AutoML
from sklearn.pipeline import Pipeline
from sklearn.impute import SimpleImputer, KNNImputer
# Custom pipeline with KNN imputation
pipeline = Pipeline(
[
("imputer", KNNImputer(n_neighbors=5)), # Custom imputation strategy
("automl", AutoML()),
]
)
pipeline.fit(X_train, y_train)
```
**Note on time series forecasting**: For time series tasks (`ts_forecast`, `ts_forecast_panel`), the `DataTransformerTS` class applies the same preprocessing approach (median imputation for numerical columns, "__NAN__" for categorical). Missing values handling in the time dimension may require additional consideration depending on your specific forecasting model.
### How does FLAML handle imbalanced data (unequal distribution of target classes in classification task)?
Currently FLAML does several things for imbalanced data.
@@ -73,7 +174,9 @@ Optimization history can be checked from the [log](Use-Cases/Task-Oriented-AutoM
### How to get the best config of an estimator and use it to train the original model outside FLAML?
When you finished training an AutoML estimator, you may want to use it in other code w/o depending on FLAML. You can get the `automl.best_config` and convert it to the parameters of the original model with below code:
When you finished training an AutoML estimator, you may want to use it in other code w/o depending on FLAML. The `automl.best_config` contains FLAML's search space parameters, which may differ from the original model's parameters (e.g., FLAML uses `log_max_bin` for LightGBM instead of `max_bin`). You need to convert them using the `config2params()` method.
**Method 1: Using the trained model instance**
```python
from flaml import AutoML
@@ -86,10 +189,43 @@ automl.fit(X, y)
print(f"{automl.best_estimator=}")
print(f"{automl.best_config=}")
print(f"params for best estimator: {automl.model.config2params(automl.best_config)}")
# Example: {'n_estimators': 4, 'num_leaves': 4, 'min_child_samples': 20,
# 'learning_rate': 0.1, 'log_max_bin': 8, ...}
# Convert to original model parameters
best_params = automl.model.config2params(automl.best_config)
print(f"params for best estimator: {best_params}")
# Example: {'n_estimators': 4, 'num_leaves': 4, 'min_child_samples': 20,
# 'learning_rate': 0.1, 'max_bin': 255, ...} # log_max_bin -> max_bin
```
If the automl instance is not accessible and you've the `best_config`. You can also convert it with below code:
**Method 2: Using FLAML estimator classes directly**
If the automl instance is not accessible and you only have the `best_config`, you can convert it with below code:
```python
from flaml.automl.model import LGBMEstimator
best_config = {
"n_estimators": 4,
"num_leaves": 4,
"min_child_samples": 20,
"learning_rate": 0.1,
"log_max_bin": 8, # FLAML-specific parameter
"colsample_bytree": 1.0,
"reg_alpha": 0.0009765625,
"reg_lambda": 1.0,
}
# Create FLAML estimator - this automatically converts parameters
flaml_estimator = LGBMEstimator(task="classification", **best_config)
best_params = flaml_estimator.params # Converted params ready for original model
print(f"Converted params: {best_params}")
# Example: {'n_estimators': 4, 'num_leaves': 4, 'min_child_samples': 20,
# 'learning_rate': 0.1, 'max_bin': 255, 'verbose': -1, ...}
```
**Method 3: Using task_factory (for any estimator type)**
```python
from flaml.automl.task.factory import task_factory
@@ -107,11 +243,101 @@ model_class = task_factory(task).estimator_class_from_str(best_estimator)(task=t
best_params = model_class.config2params(best_config)
```
Then you can use it to train the sklearn estimators directly:
Then you can use it to train the sklearn/lightgbm/xgboost estimators directly:
```python
from sklearn.ensemble import RandomForestClassifier
from lightgbm import LGBMClassifier
model = RandomForestClassifier(**best_params)
# Using LightGBM directly with converted parameters
model = LGBMClassifier(**best_params)
model.fit(X, y)
```
**Using best_config_per_estimator for multiple estimators**
```python
from flaml import AutoML
from flaml.automl.model import LGBMEstimator, XGBoostEstimator
from lightgbm import LGBMClassifier
from xgboost import XGBClassifier
automl = AutoML()
automl.fit(
X, y, task="classification", time_budget=30, estimator_list=["lgbm", "xgboost"]
)
# Get configs for all estimators
configs = automl.best_config_per_estimator
# Example: {'lgbm': {'n_estimators': 4, 'log_max_bin': 8, ...},
# 'xgboost': {'n_estimators': 4, 'max_leaves': 4, ...}}
# Convert and use LightGBM config
if configs.get("lgbm"):
lgbm_config = configs["lgbm"].copy()
lgbm_config.pop("FLAML_sample_size", None) # Remove FLAML internal param if present
flaml_lgbm = LGBMEstimator(task="classification", **lgbm_config)
lgbm_model = LGBMClassifier(**flaml_lgbm.params)
lgbm_model.fit(X, y)
# Convert and use XGBoost config
if configs.get("xgboost"):
xgb_config = configs["xgboost"].copy()
xgb_config.pop("FLAML_sample_size", None) # Remove FLAML internal param if present
flaml_xgb = XGBoostEstimator(task="classification", **xgb_config)
xgb_model = XGBClassifier(**flaml_xgb.params)
xgb_model.fit(X, y)
```
### How to save and load an AutoML object? (`pickle` / `load_pickle`)
FLAML provides `AutoML.pickle()` / `AutoML.load_pickle()` as a convenient and robust way to persist an AutoML run.
```python
from flaml import AutoML
automl = AutoML()
automl.fit(X_train, y_train, task="classification", time_budget=60)
# Save
automl.pickle("automl.pkl")
# Load
automl_loaded = AutoML.load_pickle("automl.pkl")
pred = automl_loaded.predict(X_test)
```
Notes:
- If you used Spark estimators, `AutoML.pickle()` externalizes Spark ML models into an adjacent artifact folder and keeps
the pickle itself lightweight.
- If you want to skip re-loading externalized Spark models (e.g., in an environment without Spark), use:
```python
automl_loaded = AutoML.load_pickle("automl.pkl", load_spark_models=False)
```
### How to list all available estimators for a task?
The available estimator set is task-dependent and can vary with optional dependencies. You can list the estimator keys
that FLAML currently has registered in your environment:
```python
from flaml.automl.task.factory import task_factory
print(sorted(task_factory("classification").estimators.keys()))
print(sorted(task_factory("regression").estimators.keys()))
print(sorted(task_factory("forecast").estimators.keys()))
print(sorted(task_factory("rank").estimators.keys()))
```
### How to list supported built-in metrics?
```python
from flaml import AutoML
automl = AutoML()
sklearn_metrics, hf_metrics, spark_metrics = automl.supported_metrics
print(sorted(sklearn_metrics))
print(sorted(hf_metrics))
print(spark_metrics)
```

View File

@@ -8,7 +8,6 @@ and optimizes their performance.
### Main Features
- FLAML enables building next-gen GPT-X applications based on multi-agent conversations with minimal effort. It simplifies the orchestration, automation and optimization of a complex GPT-X workflow. It maximizes the performance of GPT-X models and augments their weakness.
- For common machine learning tasks like classification and regression, it quickly finds quality models for user-provided data with low computational resources. It is easy to customize or extend.
- It supports fast and economical automatic tuning, capable of handling large search space with heterogeneous evaluation cost and complex constraints/guidance/early stopping.
@@ -16,45 +15,10 @@ FLAML is powered by a series of [research studies](/docs/Research) from Microsof
### Quickstart
Install FLAML from pip: `pip install flaml`. Find more options in [Installation](/docs/Installation).
Install FLAML from pip: `pip install flaml` (**requires Python >= 3.10**). Find more options in [Installation](/docs/Installation).
There are several ways of using flaml:
#### (New) [AutoGen](https://microsoft.github.io/autogen/)
Autogen enables the next-gen GPT-X applications with a generic multi-agent conversation framework.
It offers customizable and conversable agents which integrate LLMs, tools and human.
By automating chat among multiple capable agents, one can easily make them collectively perform tasks autonomously or with human feedback, including tasks that require using tools via code. For example,
```python
from flaml import autogen
assistant = autogen.AssistantAgent("assistant")
user_proxy = autogen.UserProxyAgent("user_proxy")
user_proxy.initiate_chat(
assistant,
message="Show me the YTD gain of 10 largest technology companies as of today.",
)
# This initiates an automated chat between the two agents to solve the task
```
Autogen also helps maximize the utility out of the expensive LLMs such as ChatGPT and GPT-4. It offers a drop-in replacement of `openai.Completion` or `openai.ChatCompletion` with powerful functionalites like tuning, caching, error handling, templating. For example, you can optimize generations by LLM with your own tuning data, success metrics and budgets.
```python
# perform tuning
config, analysis = autogen.Completion.tune(
data=tune_data,
metric="success",
mode="max",
eval_func=eval_func,
inference_budget=0.05,
optimization_budget=3,
num_samples=-1,
)
# perform inference for a test instance
response = autogen.Completion.create(context=test_instance, **config)
```
#### [Task-oriented AutoML](/docs/Use-Cases/task-oriented-automl)
With three lines of code, you can start using this economical and fast AutoML engine as a scikit-learn style estimator.
@@ -140,9 +104,10 @@ Then, you can use it just like you use the original `LGMBClassifier`. Your other
### Where to Go Next?
- Understand the use cases for [AutoGen](https://microsoft.github.io/autogen/), [Task-oriented AutoML](/docs/Use-Cases/Task-Oriented-Automl), [Tune user-defined function](/docs/Use-Cases/Tune-User-Defined-Function) and [Zero-shot AutoML](/docs/Use-Cases/Zero-Shot-AutoML).
- Find code examples under "Examples": from [AutoGen - AgentChat](/docs/Examples/AutoGen-AgentChat) to [Tune - PyTorch](/docs/Examples/Tune-PyTorch).
- Understand the use cases for [Task-oriented AutoML](/docs/Use-Cases/Task-Oriented-Automl), [Tune user-defined function](/docs/Use-Cases/Tune-User-Defined-Function) and [Zero-shot AutoML](/docs/Use-Cases/Zero-Shot-AutoML).
- Find code examples under "Examples": from [AutoML - Classification](/docs/Examples/AutoML-Classification) to [Tune - PyTorch](/docs/Examples/Tune-PyTorch).
- Learn about [research](/docs/Research) around FLAML and check [blogposts](/blog).
- Apply practical guidance in [Best Practices](/docs/Best-Practices).
- Chat on [Discord](https://discord.gg/Cppx2vSPVP).
If you like our project, please give it a [star](https://github.com/microsoft/FLAML/stargazers) on GitHub. If you are interested in contributing, please read [Contributor's Guide](/docs/Contribute).

View File

@@ -2,7 +2,7 @@
## Python
FLAML requires **Python version >= 3.7**. It can be installed from pip:
FLAML requires **Python version >= 3.10**. It can be installed from pip:
```bash
pip install flaml
@@ -16,12 +16,6 @@ conda install flaml -c conda-forge
### Optional Dependencies
#### [Autogen](Use-Cases/Autogen)
```bash
pip install "flaml[autogen]"
```
#### [Task-oriented AutoML](Use-Cases/Task-Oriented-AutoML)
```bash
@@ -63,7 +57,7 @@ pip install "flaml[hf]"
#### Notebook
To run the [notebook examples](https://github.com/microsoft/FLAML/tree/main/notebook),
install flaml with the \[notebook\] option:
install flaml with the [notebook] option:
```bash
pip install "flaml[notebook]"

View File

@@ -32,15 +32,16 @@ from flaml import AutoML
automl = AutoML()
automl.fit(X_train, y_train, task="regression", time_budget=60, **other_settings)
# Save the model
with open("automl.pkl", "wb") as f:
pickle.dump(automl, f, pickle.HIGHEST_PROTOCOL)
automl.pickle("automl.pkl")
# At prediction time
with open("automl.pkl", "rb") as f:
automl = pickle.load(f)
automl = AutoML.load_pickle("automl.pkl")
pred = automl.predict(X_test)
```
FLAML also supports plain `pickle.dump()` / `pickle.load()`, but `automl.pickle()` / `AutoML.load_pickle()` is recommended,
especially when Spark estimators are involved.
If users provide the minimal inputs only, `AutoML` uses the default settings for optimization metric, estimator list etc.
## Customize AutoML.fit()
@@ -50,6 +51,7 @@ If users provide the minimal inputs only, `AutoML` uses the default settings for
The optimization metric is specified via the `metric` argument. It can be either a string which refers to a built-in metric, or a user-defined function.
- Built-in metric.
- 'accuracy': 1 - accuracy as the corresponding metric to minimize.
- 'log_loss': default metric for multiclass classification.
- 'r2': 1 - r2_score as the corresponding metric to minimize. Default metric for regression.
@@ -69,6 +71,40 @@ The optimization metric is specified via the `metric` argument. It can be either
- 'ap': minimize 1 - average_precision_score.
- 'ndcg': minimize 1 - ndcg_score.
- 'ndcg@k': minimize 1 - ndcg_score@k. k is an integer.
- 'pr_auc': minimize 1 - precision-recall AUC score. (Spark-specific)
- 'var': minimize variance. (Spark-specific)
- Built-in HuggingFace metrics (for NLP tasks).
- 'accuracy': minimize 1 - accuracy.
- 'bertscore': minimize 1 - BERTScore.
- 'bleu': minimize 1 - BLEU score.
- 'bleurt': minimize 1 - BLEURT score.
- 'cer': minimize character error rate.
- 'chrf': minimize ChrF score.
- 'code_eval': minimize 1 - code evaluation score.
- 'comet': minimize 1 - COMET score.
- 'competition_math': minimize 1 - competition math score.
- 'coval': minimize 1 - CoVal score.
- 'cuad': minimize 1 - CUAD score.
- 'f1': minimize 1 - F1 score.
- 'gleu': minimize 1 - GLEU score.
- 'google_bleu': minimize 1 - Google BLEU score.
- 'matthews_correlation': minimize 1 - Matthews correlation coefficient.
- 'meteor': minimize 1 - METEOR score.
- 'pearsonr': minimize 1 - Pearson correlation coefficient.
- 'precision': minimize 1 - precision.
- 'recall': minimize 1 - recall.
- 'rouge': minimize 1 - ROUGE score.
- 'rouge1': minimize 1 - ROUGE-1 score.
- 'rouge2': minimize 1 - ROUGE-2 score.
- 'sacrebleu': minimize 1 - SacreBLEU score.
- 'sari': minimize 1 - SARI score.
- 'seqeval': minimize 1 - SeqEval score.
- 'spearmanr': minimize 1 - Spearman correlation coefficient.
- 'ter': minimize translation error rate.
- 'wer': minimize word error rate.
- User-defined function.
A customized metric function that requires the following (input) signature, and returns the input configs value in terms of the metric you want to minimize, and a dictionary of auxiliary information at your choice:
@@ -122,6 +158,18 @@ def custom_metric(
It returns the validation loss penalized by the gap between validation and training loss as the metric to minimize, and three metrics to log: val_loss, train_loss and pred_time. The arguments `config`, `groups_val` and `groups_train` are not used in the function.
You can also inspect what FLAML recognizes as built-in metrics at runtime:
```python
from flaml import AutoML
automl = AutoML()
sklearn_metrics, hf_metrics, spark_metrics = automl.supported_metrics
print(sorted(sklearn_metrics))
print(sorted(hf_metrics))
print(spark_metrics)
```
### Estimator and search space
The estimator list can contain one or more estimator names, each corresponding to a built-in estimator or a custom estimator. Each estimator has a search space for hyperparameter configurations. FLAML supports both classical machine learning models and deep neural networks.
@@ -131,7 +179,7 @@ The estimator list can contain one or more estimator names, each corresponding t
- Built-in estimator.
- 'lgbm': LGBMEstimator for task "classification", "regression", "rank", "ts_forecast" and "ts_forecast_classification". Hyperparameters: n_estimators, num_leaves, min_child_samples, learning_rate, log_max_bin (logarithm of (max_bin + 1) with base 2), colsample_bytree, reg_alpha, reg_lambda.
- 'xgboost': XGBoostSkLearnEstimator for task "classification", "regression", "rank", "ts_forecast" and "ts_forecast_classification". Hyperparameters: n_estimators, max_leaves, min_child_weight, learning_rate, subsample, colsample_bylevel, colsample_bytree, reg_alpha, reg_lambda.
- 'xgb_limitdepth': XGBoostLimitDepthEstimator for task "classification", "regression", "rank", "ts_forecast" and "ts_forecast_classification". Hyperparameters: n_estimators, max_depth, min_child_weight, learning_rate, subsample, colsample_bylevel, colsample_bytree, reg_alpha, reg_lambda.
- 'xgb_limitdepth': XGBoostLimitDepthEstimator for task "classification", "regression", "rank", "ts_forecast" and "ts_forecast_classification". Hyperparameters: n_estimators, max_depth, min_child_weight, learning_rate, subsample, colsample_bylevel, colsample_bytree, reg_alpha, reg_lambda.
- 'rf': RandomForestEstimator for task "classification", "regression", "ts_forecast" and "ts_forecast_classification". Hyperparameters: n_estimators, max_features, max_leaves, criterion (for classification only). Starting from v1.1.0,
it uses a fixed random_state by default.
- 'extra_tree': ExtraTreesEstimator for task "classification", "regression", "ts_forecast" and "ts_forecast_classification". Hyperparameters: n_estimators, max_features, max_leaves, criterion (for classification only). Starting from v1.1.0,
@@ -146,11 +194,45 @@ The estimator list can contain one or more estimator names, each corresponding t
- 'sarimax': SARIMAX for task "ts_forecast". Hyperparameters: p, d, q, P, D, Q, s.
- 'holt-winters': Holt-Winters (triple exponential smoothing) model for task "ts_forecast". Hyperparameters: seasonal_perdiods, seasonal, use_boxcox, trend, damped_trend.
- 'transformer': Huggingface transformer models for task "seq-classification", "seq-regression", "multichoice-classification", "token-classification" and "summarization". Hyperparameters: learning_rate, num_train_epochs, per_device_train_batch_size, warmup_ratio, weight_decay, adam_epsilon, seed.
- 'temporal_fusion_transformer': TemporalFusionTransformerEstimator for task "ts_forecast_panel". Hyperparameters: gradient_clip_val, hidden_size, hidden_continuous_size, attention_head_size, dropout, learning_rate. There is a [known issue](https://github.com/jdb78/pytorch-forecasting/issues/1145) with pytorch-forecast logging.
- 'tft': TemporalFusionTransformerEstimator for task "ts_forecast_panel". Hyperparameters: gradient_clip_val, hidden_size, hidden_continuous_size, attention_head_size, dropout, learning_rate.
- 'tcn': Temporal Convolutional Network (TCN) estimator for task "ts_forecast" (requires optional deep learning dependencies, e.g., `torch` and `pytorch_lightning`).
- Spark estimators (for Spark / pandas-on-Spark DataFrames; the exact set depends on your Spark runtime and installed packages):
- 'lgbm_spark': Spark LightGBM models via [SynapseML](https://microsoft.github.io/SynapseML/docs/features/lightgbm/about/).
- 'rf_spark': Spark MLlib RandomForestClassifier/Regressor.
- 'gbt_spark': Spark MLlib GBTClassifier/GBTRegressor.
- 'lr_spark': Spark MLlib LinearRegression.
- 'glr_spark': Spark MLlib GeneralizedLinearRegression.
- 'svc_spark': Spark MLlib LinearSVC (binary classification only).
- 'nb_spark': Spark MLlib NaiveBayes (classification only).
- 'aft_spark': Spark MLlib AFTSurvivalRegression.
- Custom estimator. Use custom estimator for:
- tuning an estimator that is not built-in;
- customizing search space for a built-in estimator.
#### List all available estimators (recommended)
The exact set of available estimators depends on the `task` and optional dependencies (e.g., Prophet/Orbit/PyTorch).
To list the estimator keys available in your environment:
```python
from flaml.automl.task.factory import task_factory
print("classification:", sorted(task_factory("classification").estimators.keys()))
print("regression:", sorted(task_factory("regression").estimators.keys()))
print("forecast:", sorted(task_factory("forecast").estimators.keys()))
print("rank:", sorted(task_factory("rank").estimators.keys()))
```
For reference, the built-in estimator keys included in the codebase are:
- Tabular / ranking / NLP tasks (GenericTask):
`['aft_spark', 'catboost', 'enet', 'extra_tree', 'gbt_spark', 'glr_spark', 'histgb', 'kneighbor', 'lassolars', 'lgbm', 'lgbm_spark', 'lr_spark', 'lrl1', 'lrl2', 'nb_spark', 'rf', 'rf_spark', 'sgd', 'svc', 'svc_spark', 'transformer', 'transformer_ms', 'xgb_limitdepth', 'xgboost']`
- Time series tasks (TimeSeriesTask):
`['arima', 'avg', 'catboost', 'extra_tree', 'holt-winters', 'lassolars', 'lgbm', 'naive', 'prophet', 'rf', 'sarimax', 'savg', 'snaive', 'tcn', 'tft', 'xgb_limitdepth', 'xgboost', 'orbit']`
Some of the time series estimators (e.g., `prophet`, `orbit`, `tcn`, `tft`) are only available when the corresponding
optional dependencies are installed.
#### Guidelines on tuning a custom estimator
To tune a custom estimator that is not built-in, you need to:
@@ -160,6 +242,7 @@ To tune a custom estimator that is not built-in, you need to:
```python
from flaml.automl.model import SKLearnEstimator
# SKLearnEstimator is derived from BaseEstimator
import rgf
@@ -168,31 +251,44 @@ class MyRegularizedGreedyForest(SKLearnEstimator):
def __init__(self, task="binary", **config):
super().__init__(task, **config)
if task in CLASSIFICATION:
from rgf.sklearn import RGFClassifier
if isinstance(task, str):
from flaml.automl.task.factory import task_factory
self.estimator_class = RGFClassifier
task = task_factory(task)
if task.is_classification():
from rgf.sklearn import RGFClassifier
self.estimator_class = RGFClassifier
else:
from rgf.sklearn import RGFRegressor
from rgf.sklearn import RGFRegressor
self.estimator_class = RGFRegressor
self.estimator_class = RGFRegressor
@classmethod
def search_space(cls, data_size, task):
space = {
"max_leaf": {
"domain": tune.lograndint(lower=4, upper=data_size),
"low_cost_init_value": 4,
},
"n_iter": {
"domain": tune.lograndint(lower=1, upper=data_size),
"low_cost_init_value": 1,
},
"learning_rate": {"domain": tune.loguniform(lower=0.01, upper=20.0)},
"min_samples_leaf": {
"domain": tune.lograndint(lower=1, upper=20),
"init_value": 20,
},
"max_leaf": {
"domain": tune.lograndint(lower=4, upper=data_size[0]),
"init_value": 4,
},
"n_iter": {
"domain": tune.lograndint(lower=1, upper=data_size[0]),
"init_value": 1,
},
"n_tree_search": {
"domain": tune.lograndint(lower=1, upper=32768),
"init_value": 1,
},
"opt_interval": {
"domain": tune.lograndint(lower=1, upper=10000),
"init_value": 100,
},
"learning_rate": {"domain": tune.loguniform(lower=0.01, upper=20.0)},
"min_samples_leaf": {
"domain": tune.lograndint(lower=1, upper=20),
"init_value": 20,
},
}
return space
```
@@ -373,18 +469,40 @@ To use stacked ensemble after the model search, set `ensemble=True` or a dict. W
- "final_estimator": an instance of the final estimator in the stacker.
- "passthrough": True (default) or False, whether to pass the original features to the stacker.
**Important Note:** The hyperparameters of a custom `final_estimator` are **NOT automatically tuned**. If you provide an estimator instance (e.g., `CatBoostClassifier()`), it will use the parameters you specified or their defaults. To use specific hyperparameters, you must set them when creating the estimator instance. If `final_estimator` is not provided, the best model found during the search will be used as the final estimator (recommended for best performance).
For example,
```python
automl.fit(
X_train, y_train, task="classification",
"ensemble": {
"final_estimator": LogisticRegression(),
X_train,
y_train,
task="classification",
ensemble={
"final_estimator": LogisticRegression(), # Uses default LogisticRegression parameters
"passthrough": False,
},
)
```
Or with custom parameters:
```python
from catboost import CatBoostClassifier
automl.fit(
X_train,
y_train,
task="classification",
ensemble={
"final_estimator": CatBoostClassifier(
iterations=100, depth=6, learning_rate=0.1
),
"passthrough": True,
},
)
```
### Resampling strategy
By default, flaml decides the resampling automatically according to the data size and the time budget. If you would like to enforce a certain resampling strategy, you can set `eval_method` to be "holdout" or "cv" for holdout or cross-validation.
@@ -415,7 +533,7 @@ For both classification and regression tasks more advanced split configurations
More in general, `split_type` can also be set as a custom splitter object, when `eval_method="cv"`. It needs to be an instance of a derived class of scikit-learn
[KFold](https://scikit-learn.org/stable/modules/generated/sklearn.model_selection.KFold.html#sklearn.model_selection.KFold)
and have `split` and `get_n_splits` methods with the same signatures. To disable shuffling, the splitter instance must contain the attribute `shuffle=False`.
and have `split` and `get_n_splits` methods with the same signatures. To disable shuffling, the splitter instance must contain the attribute `shuffle=False`.
### Parallel tuning
@@ -505,6 +623,8 @@ automl2.fit(
`starting_points` is a dictionary or a str to specify the starting hyperparameter config. (1) When it is a dictionary, the keys are the estimator names. If you do not need to specify starting points for an estimator, exclude its name from the dictionary. The value for each key can be either a dictionary of a list of dictionaries, corresponding to one hyperparameter configuration, or multiple hyperparameter configurations, respectively. (2) When it is a str: if "data", use data-dependent defaults; if "data:path", use data-dependent defaults which are stored at path; if "static", use data-independent defaults. Please find more details about data-dependent defaults in [zero shot AutoML](Zero-Shot-AutoML#combine-zero-shot-automl-and-hyperparameter-tuning).
**Note on sample size preservation**: When using `best_config_per_estimator` as starting points, the configurations now preserve `FLAML_sample_size` (if subsampling was used during the search). This ensures that the warm-started run continues optimization with the same sample sizes that produced the best results in the previous run, leading to more effective warm-starting.
### Log the trials
The trials are logged in a file if a `log_file_name` is passed.
@@ -606,6 +726,64 @@ plt.barh(
![png](images/feature_importance.png)
### Preprocess data
FLAML provides two levels of preprocessing that can be accessed as public APIs:
1. **Task-level preprocessing** (`automl.preprocess()`): This applies transformations that are specific to the task type, such as handling data types, sparse matrices, and feature transformations learned during training.
1. **Estimator-level preprocessing** (`estimator.preprocess()`): This applies transformations specific to the estimator type (e.g., LightGBM, XGBoost).
The task-level preprocessing should be applied before the estimator-level preprocessing.
#### Task-level preprocessing
```python
from flaml import AutoML
import numpy as np
# Train the model
automl = AutoML()
automl.fit(X_train, y_train, task="classification", time_budget=60)
# Apply task-level preprocessing to new data
X_test_preprocessed = automl.preprocess(X_test)
# Now you can use this with the estimator
predictions = automl.model.predict(X_test_preprocessed)
```
#### Estimator-level preprocessing
```python
# Get the trained estimator
estimator = automl.model
# Apply task-level preprocessing first
X_test_task = automl.preprocess(X_test)
# Then apply estimator-level preprocessing
X_test_estimator = estimator.preprocess(X_test_task)
# Use the fully preprocessed data with the underlying model
predictions = estimator._model.predict(X_test_estimator)
```
#### Complete preprocessing pipeline
For most use cases, the `predict()` method already handles both levels of preprocessing internally. However, if you need to apply preprocessing separately (e.g., for custom inference pipelines or debugging), you can use:
```python
# Complete preprocessing pipeline
X_task_preprocessed = automl.preprocess(X_test)
X_final = automl.model.preprocess(X_task_preprocessed)
# This is equivalent to what happens internally in:
predictions = automl.predict(X_test)
```
**Note**: The `preprocess()` methods can only be called after `fit()` has been executed, as they rely on the transformations learned during training.
### Get best configuration
We can find the best estimator's name and best configuration by:
@@ -617,6 +795,25 @@ print(automl.best_config)
# {'n_estimators': 148, 'num_leaves': 18, 'min_child_samples': 3, 'learning_rate': 0.17402065726724145, 'log_max_bin': 8, 'colsample_bytree': 0.6649148062238498, 'reg_alpha': 0.0009765625, 'reg_lambda': 0.0067613624509965}
```
**Note**: The config contains FLAML's search space parameters, which may differ from the original model's parameters. For example, FLAML uses `log_max_bin` for LightGBM instead of `max_bin`. To convert to the original model's parameters, use the `config2params()` method:
```python
from flaml.automl.model import LGBMEstimator
# Convert FLAML config to original model parameters
flaml_estimator = LGBMEstimator(task="classification", **automl.best_config)
original_params = flaml_estimator.params
print(original_params)
# {'n_estimators': 148, 'num_leaves': 18, 'min_child_samples': 3, 'learning_rate': 0.17402065726724145, 'max_bin': 255, ...}
# Note: 'log_max_bin': 8 is converted to 'max_bin': 255 (2^8 - 1)
# Now you can use original LightGBM directly
from lightgbm import LGBMClassifier
lgbm_model = LGBMClassifier(**original_params)
lgbm_model.fit(X_train, y_train)
```
We can also find the best configuration per estimator.
```python
@@ -626,6 +823,40 @@ print(automl.best_config_per_estimator)
The `None` value corresponds to the estimators which have not been tried.
**Converting configs for all estimators to original model parameters:**
```python
from flaml.automl.model import LGBMEstimator, XGBoostEstimator
from lightgbm import LGBMClassifier
from xgboost import XGBClassifier
configs = automl.best_config_per_estimator
# Convert and use LightGBM config
if configs.get("lgbm"):
lgbm_config = configs["lgbm"].copy()
lgbm_config.pop("FLAML_sample_size", None) # Remove FLAML internal param if present
flaml_lgbm = LGBMEstimator(task="classification", **lgbm_config)
lgbm_model = LGBMClassifier(**flaml_lgbm.params)
lgbm_model.fit(X_train, y_train)
# Convert and use XGBoost config
if configs.get("xgboost"):
xgb_config = configs["xgboost"].copy()
xgb_config.pop("FLAML_sample_size", None) # Remove FLAML internal param if present
flaml_xgb = XGBoostEstimator(task="classification", **xgb_config)
xgb_model = XGBClassifier(**flaml_xgb.params)
xgb_model.fit(X_train, y_train)
```
**Note**: When subsampling is used during the search (e.g., with large datasets), the configurations may also include `FLAML_sample_size` to indicate the sample size used. For example:
```python
# {'lgbm': {'n_estimators': 729, 'num_leaves': 21, ..., 'FLAML_sample_size': 45000}, ...}
```
This information is preserved in `best_config_per_estimator` and is important for warm-starting subsequent runs with the correct sample sizes.
Other useful information:
```python
@@ -693,7 +924,7 @@ If you want to get a sense of how much time is needed to find the best model, yo
> INFO - Estimated sufficient time budget=145194s. Estimated necessary time budget=2118s.
> INFO - at 2.6s, estimator lgbm's best error=0.4459, best estimator lgbm's best error=0.4459
> INFO - at 2.6s, estimator lgbm's best error=0.4459, best estimator lgbm's best error=0.4459
You will see that the time to finish the first and cheapest trial is 2.6 seconds. The estimated necessary time budget is 2118 seconds, and the estimated sufficient time budget is 145194 seconds. Note that this is only an estimated range to help you decide your budget.

View File

@@ -23,13 +23,13 @@ Related arguments:
- `evaluation_function`: A user-defined evaluation function.
- `metric`: A string of the metric name to optimize for.
- `mode`: A string in \['min', 'max'\] to specify the objective as minimization or maximization.
- `mode`: A string in ['min', 'max'] to specify the objective as minimization or maximization.
The first step is to specify your tuning objective.
To do it, you should first specify your evaluation procedure (e.g., perform a machine learning model training and validation) with respect to the hyperparameters in a user-defined function `evaluation_function`.
The function requires a hyperparameter configuration as input, and can simply return a metric value in a scalar or return a dictionary of metric name and metric value pairs.
In the following code, we define an evaluation function with respect to two hyperparameters named `x` and `y` according to $obj := (x-85000)^2 - x/y$. Note that we use this toy example here for more accessible demonstration purposes. In real use cases, the evaluation function usually cannot be written in this closed form, but instead involves a black-box and expensive evaluation procedure. Please check out [Tune HuggingFace](/docs/Examples/Tune-HuggingFace), [Tune PyTorch](/docs/Examples/Tune-PyTorch) and [Tune LightGBM](/docs/Getting-Started#tune-user-defined-function) for real examples of tuning tasks.
In the following code, we define an evaluation function with respect to two hyperparameters named `x` and `y` according to $obj := (x-85000)^2 - x/y$. Note that we use this toy example here for more accessible demonstration purposes. In real use cases, the evaluation function usually cannot be written in this closed form, but instead involves a black-box and expensive evaluation procedure. Please check out [Tune HuggingFace](/docs/Examples/Tune-HuggingFace), [Tune PyTorch](/docs/Examples/Tune-PyTorch) and [Tune LightGBM](/docs/Getting-Started#tune-user-defined-function) for real examples of tuning tasks.
```python
import time
@@ -72,7 +72,7 @@ Related arguments:
The second step is to specify a search space of the hyperparameters through the argument `config`. In the search space, you need to specify valid values for your hyperparameters and can specify how these values are sampled (e.g., from a uniform distribution or a log-uniform distribution).
In the following code example, we include a search space for the two hyperparameters `x` and `y` as introduced above. The valid values for both are integers in the range of \[1, 100000\]. The values for `x` are sampled uniformly in the specified range (using `tune.randint(lower=1, upper=100000)`), and the values for `y` are sampled uniformly in logarithmic space of the specified range (using `tune.lograndit(lower=1, upper=100000)`).
In the following code example, we include a search space for the two hyperparameters `x` and `y` as introduced above. The valid values for both are integers in the range of [1, 100000]. The values for `x` are sampled uniformly in the specified range (using `tune.randint(lower=1, upper=100000)`), and the values for `y` are sampled uniformly in logarithmic space of the specified range (using `tune.lograndit(lower=1, upper=100000)`).
```python
from flaml import tune
@@ -181,15 +181,171 @@ config = {
<!-- Please refer to [ray.tune](https://docs.ray.io/en/latest/tune/api_docs/search_space.html#overview) for a more comprehensive introduction about possible choices of the domain. -->
#### Hierarchical search space
A hierarchical (or conditional) search space allows you to define hyperparameters that depend on the value of other hyperparameters. This is useful when different choices for a categorical hyperparameter require different sets of hyperparameters.
For example, if you're tuning a machine learning pipeline where different models require different hyperparameters, or when the choice of an optimizer determines which optimizer-specific hyperparameters are relevant.
**Syntax**: To create a hierarchical search space, use `tune.choice()` with a list where some elements are dictionaries containing nested hyperparameter definitions.
**Example 1: Model selection with model-specific hyperparameters**
In this example, we have two model types (linear and tree-based), each with their own specific hyperparameters:
```python
from flaml import tune
search_space = {
"model": tune.choice(
[
{
"model_type": "linear",
"learning_rate": tune.loguniform(1e-4, 1e-1),
"regularization": tune.uniform(0, 1),
},
{
"model_type": "tree",
"n_estimators": tune.randint(10, 100),
"max_depth": tune.randint(3, 10),
},
]
),
# Common hyperparameters for all models
"batch_size": tune.choice([32, 64, 128]),
}
def evaluate_config(config):
model_config = config["model"]
if model_config["model_type"] == "linear":
# Use learning_rate and regularization
# train_linear_model() is a placeholder for your actual training code
score = train_linear_model(
lr=model_config["learning_rate"],
reg=model_config["regularization"],
batch_size=config["batch_size"],
)
else: # tree
# Use n_estimators and max_depth
# train_tree_model() is a placeholder for your actual training code
score = train_tree_model(
n_est=model_config["n_estimators"],
depth=model_config["max_depth"],
batch_size=config["batch_size"],
)
return {"score": score}
# Run tuning
analysis = tune.run(
evaluate_config,
config=search_space,
metric="score",
mode="min",
num_samples=20,
)
```
**Example 2: Mixed choices with constants and nested spaces**
You can also mix constant values with nested hyperparameter spaces in `tune.choice()`:
```python
search_space = {
"optimizer": tune.choice(
[
"sgd", # constant value
{
"optimizer_type": "adam",
"beta1": tune.uniform(0.8, 0.99),
"beta2": tune.uniform(0.9, 0.999),
},
{
"optimizer_type": "rmsprop",
"decay": tune.loguniform(1e-3, 1e-1),
"momentum": tune.uniform(0, 0.99),
},
]
),
"learning_rate": tune.loguniform(1e-5, 1e-1),
}
def evaluate_config(config):
optimizer_config = config["optimizer"]
if optimizer_config == "sgd":
optimizer = create_sgd_optimizer(lr=config["learning_rate"])
elif optimizer_config["optimizer_type"] == "adam":
optimizer = create_adam_optimizer(
lr=config["learning_rate"],
beta1=optimizer_config["beta1"],
beta2=optimizer_config["beta2"],
)
else: # rmsprop
optimizer = create_rmsprop_optimizer(
lr=config["learning_rate"],
decay=optimizer_config["decay"],
momentum=optimizer_config["momentum"],
)
# train_model() is a placeholder for your actual training code
return train_model(optimizer)
```
**Example 3: Nested hierarchical spaces**
You can also nest dictionaries within the search space for organizing related hyperparameters:
```python
search_space = {
"preprocessing": {
"normalize": tune.choice([True, False]),
"feature_selection": tune.choice(["none", "pca", "lda"]),
},
"model": tune.choice(
[
{
"type": "neural_net",
"layers": tune.randint(1, 5),
"units_per_layer": tune.randint(32, 256),
},
{
"type": "ensemble",
"n_models": tune.randint(3, 10),
},
]
),
}
def evaluate_config(config):
# Access nested hyperparameters
normalize = config["preprocessing"]["normalize"]
feature_selection = config["preprocessing"]["feature_selection"]
model_config = config["model"]
# Use the hyperparameters accordingly
# train_with_config() is a placeholder for your actual training code
score = train_with_config(normalize, feature_selection, model_config)
return {"score": score}
```
**Notes:**
- When a configuration is sampled, only the selected branch of the hierarchical space will be active.
- The evaluation function should check which choice was selected and use the corresponding nested hyperparameters.
- Hierarchical search spaces work with all FLAML search algorithms (CFO, BlendSearch).
- You can specify `low_cost_partial_config` for hierarchical spaces as well by providing the path to the nested parameters.
#### Cost-related hyperparameters
Cost-related hyperparameters are a subset of the hyperparameters which directly affect the computation cost incurred in the evaluation of any hyperparameter configuration. For example, the number of estimators (`n_estimators`) and the maximum number of leaves (`max_leaves`) are known to affect the training cost of tree-based learners. So they are cost-related hyperparameters for tree-based learners.
When cost-related hyperparameters exist, the evaluation cost in the search space is heterogeneous.
In this case, designing a search space with proper ranges of the hyperparameter values is highly non-trivial. Classical tuning algorithms such as Bayesian optimization and random search are typically sensitive to such ranges. It may take them a very high cost to find a good choice if the ranges are too large. And if the ranges are too small, the optimal choice(s) may not be included and thus not possible to be found. With our method, you can use a search space with larger ranges in the case of heterogeneous cost.
In this case, designing a search space with proper ranges of the hyperparameter values is highly non-trivial. Classical tuning algorithms such as Bayesian optimization and random search are typically sensitive to such ranges. It may take them a very high cost to find a good choice if the ranges are too large. And if the ranges are too small, the optimal choice(s) may not be included and thus not possible to be found. With our method, you can use a search space with larger ranges in the case of heterogeneous cost.
Our search algorithms are designed to finish the tuning process at a low total cost when the evaluation cost in the search space is heterogeneous.
So in such scenarios, if you are aware of low-cost configurations for the cost-related hyperparameters, you are encouraged to set them as the `low_cost_partial_config`, which is a dictionary of a subset of the hyperparameter coordinates whose value corresponds to a configuration with known low cost. Using the example of the tree-based methods again, since we know that small `n_estimators` and `max_leaves` generally correspond to simpler models and thus lower cost, we set `{'n_estimators': 4, 'max_leaves': 4}` as the `low_cost_partial_config` by default (note that 4 is the lower bound of search space for these two hyperparameters), e.g., in LGBM. Please find more details on how the algorithm works [here](#cfo-frugal-optimization-for-cost-related-hyperparameters).
So in such scenarios, if you are aware of low-cost configurations for the cost-related hyperparameters, you are encouraged to set them as the `low_cost_partial_config`, which is a dictionary of a subset of the hyperparameter coordinates whose value corresponds to a configuration with known low cost. Using the example of the tree-based methods again, since we know that small `n_estimators` and `max_leaves` generally correspond to simpler models and thus lower cost, we set `{'n_estimators': 4, 'max_leaves': 4}` as the `low_cost_partial_config` by default (note that 4 is the lower bound of search space for these two hyperparameters), e.g., in LGBM. Please find more details on how the algorithm works [here](#cfo-frugal-optimization-for-cost-related-hyperparameters).
In addition, if you are aware of the cost relationship between different categorical hyperparameter choices, you are encouraged to provide this information through `cat_hp_cost`. It also helps the search algorithm to reduce the total cost.
@@ -202,7 +358,7 @@ Related arguments:
- `config_constraints` (optional): A list of config constraints to be satisfied.
- `metric_constraints` (optional): A list of metric constraints to be satisfied. e.g., `['precision', '>=', 0.9]`.
The third step is to specify constraints of the tuning task. One notable property of `flaml.tune` is that it is able to finish the tuning process (obtaining good results) within a required resource constraint. A user can either provide the resource constraint in terms of wall-clock time (in seconds) through the argument `time_budget_s`, or in terms of the number of trials through the argument `num_samples`. The following example shows three use cases:
The third step is to specify constraints of the tuning task. One notable property of `flaml.tune` is that it is able to finish the tuning process (obtaining good results) within a required resource constraint. A user can either provide the resource constraint in terms of wall-clock time (in seconds) through the argument `time_budget_s`, or in terms of the number of trials through the argument `num_samples`. The following example shows three use cases:
```python
# Set a resource constraint of 60 seconds wall-clock time for the tuning.
@@ -295,8 +451,8 @@ Related arguments:
Details about parallel tuning with Spark could be found [here](/docs/Examples/Integrate%20-%20Spark#parallel-spark-jobs).
You can perform parallel tuning by specifying `use_ray=True` (requiring flaml\[ray\] option installed) or `use_spark=True`
(requiring flaml\[spark\] option installed). You can also limit the amount of resources allocated per trial by specifying `resources_per_trial`,
You can perform parallel tuning by specifying `use_ray=True` (requiring flaml[ray] option installed) or `use_spark=True`
(requiring flaml[spark] option installed). You can also limit the amount of resources allocated per trial by specifying `resources_per_trial`,
e.g., `resources_per_trial={'cpu': 2}` when `use_ray=True`.
```python
@@ -409,11 +565,11 @@ analysis = tune.run(
You can find more details about this scheduler in [this paper](https://arxiv.org/pdf/1911.04706.pdf).
#### 2. A scheduler of the [`TrialScheduler`](https://docs.ray.io/en/latest/tune/api_docs/schedulers.html#tune-schedulers) class from `ray.tune`.
#### 2. A scheduler of the [`TrialScheduler`](https://docs.ray.io/en/latest/tune/api_docs/schedulers.html#tune-schedulers) class from `ray.tune`.
There is a handful of schedulers of this type implemented in `ray.tune`, for example, [ASHA](https://docs.ray.io/en/latest/tune/api_docs/schedulers.html#asha-tune-schedulers-ashascheduler), [HyperBand](https://docs.ray.io/en/latest/tune/api_docs/schedulers.html#tune-original-hyperband), [BOHB](https://docs.ray.io/en/latest/tune/api_docs/schedulers.html#tune-scheduler-bohb), etc.
To use this type of scheduler you can either (1) set `scheduler='asha'`, which will automatically create an [ASHAScheduler](https://docs.ray.io/en/latest/tune/api_docs/schedulers.html#asha-tune-schedulers-ashascheduler) instance using the provided inputs (`resource_attr`, `min_resource`, `max_resource`, and `reduction_factor`); or (2) create an instance by yourself and provided it via `scheduler`, as shown in the following code example,
To use this type of scheduler you can either (1) set `scheduler='asha'`, which will automatically create an [ASHAScheduler](https://docs.ray.io/en/latest/tune/api_docs/schedulers.html#asha-tune-schedulers-ashascheduler) instance using the provided inputs (`resource_attr`, `min_resource`, `max_resource`, and `reduction_factor`); or (2) create an instance by yourself and provided it via `scheduler`, as shown in the following code example,
```python
# require: pip install flaml[ray]
@@ -589,7 +745,7 @@ NOTE:
## Hyperparameter Optimization Algorithm
To tune the hyperparameters toward your objective, you will want to use a hyperparameter optimization algorithm which can help suggest hyperparameters with better performance (regarding your objective). `flaml` offers two HPO methods: CFO and BlendSearch. `flaml.tune` uses BlendSearch by default when the option \[blendsearch\] is installed.
To tune the hyperparameters toward your objective, you will want to use a hyperparameter optimization algorithm which can help suggest hyperparameters with better performance (regarding your objective). `flaml` offers two HPO methods: CFO and BlendSearch. `flaml.tune` uses BlendSearch by default when the option [blendsearch] is installed.
<!-- ![png](images/CFO.png) | ![png](images/BlendSearch.png)
:---:|:---: -->

View File

@@ -15,6 +15,7 @@
'Installation',
{'Use Cases': [{type: 'autogenerated', dirName: 'Use-Cases'}]},
{'Examples': [{type: 'autogenerated', dirName: 'Examples'}]},
'Best-Practices',
'Contribute',
'Research',
],

View File

@@ -1843,7 +1843,7 @@
"@jridgewell/resolve-uri" "^3.1.0"
"@jridgewell/sourcemap-codec" "^1.4.14"
"@jridgewell/trace-mapping@^0.3.20", "@jridgewell/trace-mapping@^0.3.24", "@jridgewell/trace-mapping@^0.3.25":
"@jridgewell/trace-mapping@^0.3.24", "@jridgewell/trace-mapping@^0.3.25":
version "0.3.25"
resolved "https://registry.yarnpkg.com/@jridgewell/trace-mapping/-/trace-mapping-0.3.25.tgz#15f190e98895f3fc23276ee14bc76b675c2e50f0"
integrity sha512-vNk6aEwybGtawWmy/PzwnGDOjCkLWSD2wqvjGGAgOAwCGWySYXfYoxt00IJkTF+8Lb57DwOb3Aa0o9CApepiYQ==
@@ -2218,10 +2218,26 @@
dependencies:
"@types/node" "*"
"@types/estree@^1.0.5":
version "1.0.5"
resolved "https://registry.yarnpkg.com/@types/estree/-/estree-1.0.5.tgz#a6ce3e556e00fd9895dd872dd172ad0d4bd687f4"
integrity sha512-/kYRxGDLWzHOB7q+wtSUQlFrtcdUccpfy+X+9iMBpHK8QLLhx2wIPYuS5DYtR9Wa/YlZAbIovy7qVdB1Aq6Lyw==
"@types/eslint-scope@^3.7.7":
version "3.7.7"
resolved "https://registry.yarnpkg.com/@types/eslint-scope/-/eslint-scope-3.7.7.tgz#3108bd5f18b0cdb277c867b3dd449c9ed7079ac5"
integrity sha512-MzMFlSLBqNF2gcHWO0G1vP/YQyfvrxZ0bF+u7mzUdZ1/xK4A4sru+nraZz5i3iEIk1l1uyicaDVTB4QbbEkAYg==
dependencies:
"@types/eslint" "*"
"@types/estree" "*"
"@types/eslint@*":
version "9.6.1"
resolved "https://registry.yarnpkg.com/@types/eslint/-/eslint-9.6.1.tgz#d5795ad732ce81715f27f75da913004a56751584"
integrity sha512-FXx2pKgId/WyYo2jXw63kk7/+TY7u7AziEJxJAnSFzHlqTAS3Ync6SvgYAN/k4/PQpnnVuzoMuVnByKK2qp0ag==
dependencies:
"@types/estree" "*"
"@types/json-schema" "*"
"@types/estree@*", "@types/estree@^1.0.8":
version "1.0.8"
resolved "https://registry.yarnpkg.com/@types/estree/-/estree-1.0.8.tgz#958b91c991b1867ced318bedea0e215ee050726e"
integrity sha512-dWHzHa2WqEXI/O1E9OjrocMTKJl2mSrEolh1Iomrv6U+JuNwaHXsXx9bLu5gG7BUWFIN0skIQJQ/L1rIex4X6w==
"@types/express-serve-static-core@*", "@types/express-serve-static-core@^4.17.18":
version "4.17.31"
@@ -2271,6 +2287,11 @@
dependencies:
"@types/node" "*"
"@types/json-schema@*", "@types/json-schema@^7.0.15":
version "7.0.15"
resolved "https://registry.yarnpkg.com/@types/json-schema/-/json-schema-7.0.15.tgz#596a1747233694d50f6ad8a7869fcb6f56cf5841"
integrity sha512-5+fP8P8MFNC+AyZCDxrB2pkZFPGzqQWUzpSeuuVLvm8VMcorNYavBqoFcxK8bQz4Qsbn4oUEEem4wDLfcysGHA==
"@types/json-schema@^7.0.4", "@types/json-schema@^7.0.5", "@types/json-schema@^7.0.8", "@types/json-schema@^7.0.9":
version "7.0.11"
resolved "https://registry.npmmirror.com/@types/json-schema/-/json-schema-7.0.11.tgz#d421b6c527a3037f7c84433fd2c4229e016863d3"
@@ -2407,125 +2428,125 @@
dependencies:
"@types/node" "*"
"@webassemblyjs/ast@1.12.1", "@webassemblyjs/ast@^1.12.1":
version "1.12.1"
resolved "https://registry.yarnpkg.com/@webassemblyjs/ast/-/ast-1.12.1.tgz#bb16a0e8b1914f979f45864c23819cc3e3f0d4bb"
integrity sha512-EKfMUOPRRUTy5UII4qJDGPpqfwjOmZ5jeGFwid9mnoqIFK+e0vqoi1qH56JpmZSzEL53jKnNzScdmftJyG5xWg==
"@webassemblyjs/ast@1.14.1", "@webassemblyjs/ast@^1.14.1":
version "1.14.1"
resolved "https://registry.yarnpkg.com/@webassemblyjs/ast/-/ast-1.14.1.tgz#a9f6a07f2b03c95c8d38c4536a1fdfb521ff55b6"
integrity sha512-nuBEDgQfm1ccRp/8bCQrx1frohyufl4JlbMMZ4P1wpeOfDhF6FQkxZJ1b/e+PLwr6X1Nhw6OLme5usuBWYBvuQ==
dependencies:
"@webassemblyjs/helper-numbers" "1.11.6"
"@webassemblyjs/helper-wasm-bytecode" "1.11.6"
"@webassemblyjs/helper-numbers" "1.13.2"
"@webassemblyjs/helper-wasm-bytecode" "1.13.2"
"@webassemblyjs/floating-point-hex-parser@1.11.6":
version "1.11.6"
resolved "https://registry.yarnpkg.com/@webassemblyjs/floating-point-hex-parser/-/floating-point-hex-parser-1.11.6.tgz#dacbcb95aff135c8260f77fa3b4c5fea600a6431"
integrity sha512-ejAj9hfRJ2XMsNHk/v6Fu2dGS+i4UaXBXGemOfQ/JfQ6mdQg/WXtwleQRLLS4OvfDhv8rYnVwH27YJLMyYsxhw==
"@webassemblyjs/floating-point-hex-parser@1.13.2":
version "1.13.2"
resolved "https://registry.yarnpkg.com/@webassemblyjs/floating-point-hex-parser/-/floating-point-hex-parser-1.13.2.tgz#fcca1eeddb1cc4e7b6eed4fc7956d6813b21b9fb"
integrity sha512-6oXyTOzbKxGH4steLbLNOu71Oj+C8Lg34n6CqRvqfS2O71BxY6ByfMDRhBytzknj9yGUPVJ1qIKhRlAwO1AovA==
"@webassemblyjs/helper-api-error@1.11.6":
version "1.11.6"
resolved "https://registry.yarnpkg.com/@webassemblyjs/helper-api-error/-/helper-api-error-1.11.6.tgz#6132f68c4acd59dcd141c44b18cbebbd9f2fa768"
integrity sha512-o0YkoP4pVu4rN8aTJgAyj9hC2Sv5UlkzCHhxqWj8butaLvnpdc2jOwh4ewE6CX0txSfLn/UYaV/pheS2Txg//Q==
"@webassemblyjs/helper-api-error@1.13.2":
version "1.13.2"
resolved "https://registry.yarnpkg.com/@webassemblyjs/helper-api-error/-/helper-api-error-1.13.2.tgz#e0a16152248bc38daee76dd7e21f15c5ef3ab1e7"
integrity sha512-U56GMYxy4ZQCbDZd6JuvvNV/WFildOjsaWD3Tzzvmw/mas3cXzRJPMjP83JqEsgSbyrmaGjBfDtV7KDXV9UzFQ==
"@webassemblyjs/helper-buffer@1.12.1":
version "1.12.1"
resolved "https://registry.yarnpkg.com/@webassemblyjs/helper-buffer/-/helper-buffer-1.12.1.tgz#6df20d272ea5439bf20ab3492b7fb70e9bfcb3f6"
integrity sha512-nzJwQw99DNDKr9BVCOZcLuJJUlqkJh+kVzVl6Fmq/tI5ZtEyWT1KZMyOXltXLZJmDtvLCDgwsyrkohEtopTXCw==
"@webassemblyjs/helper-buffer@1.14.1":
version "1.14.1"
resolved "https://registry.yarnpkg.com/@webassemblyjs/helper-buffer/-/helper-buffer-1.14.1.tgz#822a9bc603166531f7d5df84e67b5bf99b72b96b"
integrity sha512-jyH7wtcHiKssDtFPRB+iQdxlDf96m0E39yb0k5uJVhFGleZFoNw1c4aeIcVUPPbXUVJ94wwnMOAqUHyzoEPVMA==
"@webassemblyjs/helper-numbers@1.11.6":
version "1.11.6"
resolved "https://registry.yarnpkg.com/@webassemblyjs/helper-numbers/-/helper-numbers-1.11.6.tgz#cbce5e7e0c1bd32cf4905ae444ef64cea919f1b5"
integrity sha512-vUIhZ8LZoIWHBohiEObxVm6hwP034jwmc9kuq5GdHZH0wiLVLIPcMCdpJzG4C11cHoQ25TFIQj9kaVADVX7N3g==
"@webassemblyjs/helper-numbers@1.13.2":
version "1.13.2"
resolved "https://registry.yarnpkg.com/@webassemblyjs/helper-numbers/-/helper-numbers-1.13.2.tgz#dbd932548e7119f4b8a7877fd5a8d20e63490b2d"
integrity sha512-FE8aCmS5Q6eQYcV3gI35O4J789wlQA+7JrqTTpJqn5emA4U2hvwJmvFRC0HODS+3Ye6WioDklgd6scJ3+PLnEA==
dependencies:
"@webassemblyjs/floating-point-hex-parser" "1.11.6"
"@webassemblyjs/helper-api-error" "1.11.6"
"@webassemblyjs/floating-point-hex-parser" "1.13.2"
"@webassemblyjs/helper-api-error" "1.13.2"
"@xtuc/long" "4.2.2"
"@webassemblyjs/helper-wasm-bytecode@1.11.6":
version "1.11.6"
resolved "https://registry.yarnpkg.com/@webassemblyjs/helper-wasm-bytecode/-/helper-wasm-bytecode-1.11.6.tgz#bb2ebdb3b83aa26d9baad4c46d4315283acd51e9"
integrity sha512-sFFHKwcmBprO9e7Icf0+gddyWYDViL8bpPjJJl0WHxCdETktXdmtWLGVzoHbqUcY4Be1LkNfwTmXOJUFZYSJdA==
"@webassemblyjs/helper-wasm-bytecode@1.13.2":
version "1.13.2"
resolved "https://registry.yarnpkg.com/@webassemblyjs/helper-wasm-bytecode/-/helper-wasm-bytecode-1.13.2.tgz#e556108758f448aae84c850e593ce18a0eb31e0b"
integrity sha512-3QbLKy93F0EAIXLh0ogEVR6rOubA9AoZ+WRYhNbFyuB70j3dRdwH9g+qXhLAO0kiYGlg3TxDV+I4rQTr/YNXkA==
"@webassemblyjs/helper-wasm-section@1.12.1":
version "1.12.1"
resolved "https://registry.yarnpkg.com/@webassemblyjs/helper-wasm-section/-/helper-wasm-section-1.12.1.tgz#3da623233ae1a60409b509a52ade9bc22a37f7bf"
integrity sha512-Jif4vfB6FJlUlSbgEMHUyk1j234GTNG9dBJ4XJdOySoj518Xj0oGsNi59cUQF4RRMS9ouBUxDDdyBVfPTypa5g==
"@webassemblyjs/helper-wasm-section@1.14.1":
version "1.14.1"
resolved "https://registry.yarnpkg.com/@webassemblyjs/helper-wasm-section/-/helper-wasm-section-1.14.1.tgz#9629dda9c4430eab54b591053d6dc6f3ba050348"
integrity sha512-ds5mXEqTJ6oxRoqjhWDU83OgzAYjwsCV8Lo/N+oRsNDmx/ZDpqalmrtgOMkHwxsG0iI//3BwWAErYRHtgn0dZw==
dependencies:
"@webassemblyjs/ast" "1.12.1"
"@webassemblyjs/helper-buffer" "1.12.1"
"@webassemblyjs/helper-wasm-bytecode" "1.11.6"
"@webassemblyjs/wasm-gen" "1.12.1"
"@webassemblyjs/ast" "1.14.1"
"@webassemblyjs/helper-buffer" "1.14.1"
"@webassemblyjs/helper-wasm-bytecode" "1.13.2"
"@webassemblyjs/wasm-gen" "1.14.1"
"@webassemblyjs/ieee754@1.11.6":
version "1.11.6"
resolved "https://registry.yarnpkg.com/@webassemblyjs/ieee754/-/ieee754-1.11.6.tgz#bb665c91d0b14fffceb0e38298c329af043c6e3a"
integrity sha512-LM4p2csPNvbij6U1f19v6WR56QZ8JcHg3QIJTlSwzFcmx6WSORicYj6I63f9yU1kEUtrpG+kjkiIAkevHpDXrg==
"@webassemblyjs/ieee754@1.13.2":
version "1.13.2"
resolved "https://registry.yarnpkg.com/@webassemblyjs/ieee754/-/ieee754-1.13.2.tgz#1c5eaace1d606ada2c7fd7045ea9356c59ee0dba"
integrity sha512-4LtOzh58S/5lX4ITKxnAK2USuNEvpdVV9AlgGQb8rJDHaLeHciwG4zlGr0j/SNWlr7x3vO1lDEsuePvtcDNCkw==
dependencies:
"@xtuc/ieee754" "^1.2.0"
"@webassemblyjs/leb128@1.11.6":
version "1.11.6"
resolved "https://registry.yarnpkg.com/@webassemblyjs/leb128/-/leb128-1.11.6.tgz#70e60e5e82f9ac81118bc25381a0b283893240d7"
integrity sha512-m7a0FhE67DQXgouf1tbN5XQcdWoNgaAuoULHIfGFIEVKA6tu/edls6XnIlkmS6FrXAquJRPni3ZZKjw6FSPjPQ==
"@webassemblyjs/leb128@1.13.2":
version "1.13.2"
resolved "https://registry.yarnpkg.com/@webassemblyjs/leb128/-/leb128-1.13.2.tgz#57c5c3deb0105d02ce25fa3fd74f4ebc9fd0bbb0"
integrity sha512-Lde1oNoIdzVzdkNEAWZ1dZ5orIbff80YPdHx20mrHwHrVNNTjNr8E3xz9BdpcGqRQbAEa+fkrCb+fRFTl/6sQw==
dependencies:
"@xtuc/long" "4.2.2"
"@webassemblyjs/utf8@1.11.6":
version "1.11.6"
resolved "https://registry.yarnpkg.com/@webassemblyjs/utf8/-/utf8-1.11.6.tgz#90f8bc34c561595fe156603be7253cdbcd0fab5a"
integrity sha512-vtXf2wTQ3+up9Zsg8sa2yWiQpzSsMyXj0qViVP6xKGCUT8p8YJ6HqI7l5eCnWx1T/FYdsv07HQs2wTFbbof/RA==
"@webassemblyjs/utf8@1.13.2":
version "1.13.2"
resolved "https://registry.yarnpkg.com/@webassemblyjs/utf8/-/utf8-1.13.2.tgz#917a20e93f71ad5602966c2d685ae0c6c21f60f1"
integrity sha512-3NQWGjKTASY1xV5m7Hr0iPeXD9+RDobLll3T9d2AO+g3my8xy5peVyjSag4I50mR1bBSN/Ct12lo+R9tJk0NZQ==
"@webassemblyjs/wasm-edit@^1.12.1":
version "1.12.1"
resolved "https://registry.yarnpkg.com/@webassemblyjs/wasm-edit/-/wasm-edit-1.12.1.tgz#9f9f3ff52a14c980939be0ef9d5df9ebc678ae3b"
integrity sha512-1DuwbVvADvS5mGnXbE+c9NfA8QRcZ6iKquqjjmR10k6o+zzsRVesil54DKexiowcFCPdr/Q0qaMgB01+SQ1u6g==
"@webassemblyjs/wasm-edit@^1.14.1":
version "1.14.1"
resolved "https://registry.yarnpkg.com/@webassemblyjs/wasm-edit/-/wasm-edit-1.14.1.tgz#ac6689f502219b59198ddec42dcd496b1004d597"
integrity sha512-RNJUIQH/J8iA/1NzlE4N7KtyZNHi3w7at7hDjvRNm5rcUXa00z1vRz3glZoULfJ5mpvYhLybmVcwcjGrC1pRrQ==
dependencies:
"@webassemblyjs/ast" "1.12.1"
"@webassemblyjs/helper-buffer" "1.12.1"
"@webassemblyjs/helper-wasm-bytecode" "1.11.6"
"@webassemblyjs/helper-wasm-section" "1.12.1"
"@webassemblyjs/wasm-gen" "1.12.1"
"@webassemblyjs/wasm-opt" "1.12.1"
"@webassemblyjs/wasm-parser" "1.12.1"
"@webassemblyjs/wast-printer" "1.12.1"
"@webassemblyjs/ast" "1.14.1"
"@webassemblyjs/helper-buffer" "1.14.1"
"@webassemblyjs/helper-wasm-bytecode" "1.13.2"
"@webassemblyjs/helper-wasm-section" "1.14.1"
"@webassemblyjs/wasm-gen" "1.14.1"
"@webassemblyjs/wasm-opt" "1.14.1"
"@webassemblyjs/wasm-parser" "1.14.1"
"@webassemblyjs/wast-printer" "1.14.1"
"@webassemblyjs/wasm-gen@1.12.1":
version "1.12.1"
resolved "https://registry.yarnpkg.com/@webassemblyjs/wasm-gen/-/wasm-gen-1.12.1.tgz#a6520601da1b5700448273666a71ad0a45d78547"
integrity sha512-TDq4Ojh9fcohAw6OIMXqiIcTq5KUXTGRkVxbSo1hQnSy6lAM5GSdfwWeSxpAo0YzgsgF182E/U0mDNhuA0tW7w==
"@webassemblyjs/wasm-gen@1.14.1":
version "1.14.1"
resolved "https://registry.yarnpkg.com/@webassemblyjs/wasm-gen/-/wasm-gen-1.14.1.tgz#991e7f0c090cb0bb62bbac882076e3d219da9570"
integrity sha512-AmomSIjP8ZbfGQhumkNvgC33AY7qtMCXnN6bL2u2Js4gVCg8fp735aEiMSBbDR7UQIj90n4wKAFUSEd0QN2Ukg==
dependencies:
"@webassemblyjs/ast" "1.12.1"
"@webassemblyjs/helper-wasm-bytecode" "1.11.6"
"@webassemblyjs/ieee754" "1.11.6"
"@webassemblyjs/leb128" "1.11.6"
"@webassemblyjs/utf8" "1.11.6"
"@webassemblyjs/ast" "1.14.1"
"@webassemblyjs/helper-wasm-bytecode" "1.13.2"
"@webassemblyjs/ieee754" "1.13.2"
"@webassemblyjs/leb128" "1.13.2"
"@webassemblyjs/utf8" "1.13.2"
"@webassemblyjs/wasm-opt@1.12.1":
version "1.12.1"
resolved "https://registry.yarnpkg.com/@webassemblyjs/wasm-opt/-/wasm-opt-1.12.1.tgz#9e6e81475dfcfb62dab574ac2dda38226c232bc5"
integrity sha512-Jg99j/2gG2iaz3hijw857AVYekZe2SAskcqlWIZXjji5WStnOpVoat3gQfT/Q5tb2djnCjBtMocY/Su1GfxPBg==
"@webassemblyjs/wasm-opt@1.14.1":
version "1.14.1"
resolved "https://registry.yarnpkg.com/@webassemblyjs/wasm-opt/-/wasm-opt-1.14.1.tgz#e6f71ed7ccae46781c206017d3c14c50efa8106b"
integrity sha512-PTcKLUNvBqnY2U6E5bdOQcSM+oVP/PmrDY9NzowJjislEjwP/C4an2303MCVS2Mg9d3AJpIGdUFIQQWbPds0Sw==
dependencies:
"@webassemblyjs/ast" "1.12.1"
"@webassemblyjs/helper-buffer" "1.12.1"
"@webassemblyjs/wasm-gen" "1.12.1"
"@webassemblyjs/wasm-parser" "1.12.1"
"@webassemblyjs/ast" "1.14.1"
"@webassemblyjs/helper-buffer" "1.14.1"
"@webassemblyjs/wasm-gen" "1.14.1"
"@webassemblyjs/wasm-parser" "1.14.1"
"@webassemblyjs/wasm-parser@1.12.1", "@webassemblyjs/wasm-parser@^1.12.1":
version "1.12.1"
resolved "https://registry.yarnpkg.com/@webassemblyjs/wasm-parser/-/wasm-parser-1.12.1.tgz#c47acb90e6f083391e3fa61d113650eea1e95937"
integrity sha512-xikIi7c2FHXysxXe3COrVUPSheuBtpcfhbpFj4gmu7KRLYOzANztwUU0IbsqvMqzuNK2+glRGWCEqZo1WCLyAQ==
"@webassemblyjs/wasm-parser@1.14.1", "@webassemblyjs/wasm-parser@^1.14.1":
version "1.14.1"
resolved "https://registry.yarnpkg.com/@webassemblyjs/wasm-parser/-/wasm-parser-1.14.1.tgz#b3e13f1893605ca78b52c68e54cf6a865f90b9fb"
integrity sha512-JLBl+KZ0R5qB7mCnud/yyX08jWFw5MsoalJ1pQ4EdFlgj9VdXKGuENGsiCIjegI1W7p91rUlcB/LB5yRJKNTcQ==
dependencies:
"@webassemblyjs/ast" "1.12.1"
"@webassemblyjs/helper-api-error" "1.11.6"
"@webassemblyjs/helper-wasm-bytecode" "1.11.6"
"@webassemblyjs/ieee754" "1.11.6"
"@webassemblyjs/leb128" "1.11.6"
"@webassemblyjs/utf8" "1.11.6"
"@webassemblyjs/ast" "1.14.1"
"@webassemblyjs/helper-api-error" "1.13.2"
"@webassemblyjs/helper-wasm-bytecode" "1.13.2"
"@webassemblyjs/ieee754" "1.13.2"
"@webassemblyjs/leb128" "1.13.2"
"@webassemblyjs/utf8" "1.13.2"
"@webassemblyjs/wast-printer@1.12.1":
version "1.12.1"
resolved "https://registry.yarnpkg.com/@webassemblyjs/wast-printer/-/wast-printer-1.12.1.tgz#bcecf661d7d1abdaf989d8341a4833e33e2b31ac"
integrity sha512-+X4WAlOisVWQMikjbcvY2e0rwPsKQ9F688lksZhBcPycBBuii3O7m8FACbDMWDojpAqvjIncrG8J0XHKyQfVeA==
"@webassemblyjs/wast-printer@1.14.1":
version "1.14.1"
resolved "https://registry.yarnpkg.com/@webassemblyjs/wast-printer/-/wast-printer-1.14.1.tgz#3bb3e9638a8ae5fdaf9610e7a06b4d9f9aa6fe07"
integrity sha512-kPSSXE6De1XOR820C90RIo2ogvZG+c3KiHzqUoO/F34Y2shGzesfqv7o57xrxovZJH/MetF5UjroJ/R/3isoiw==
dependencies:
"@webassemblyjs/ast" "1.12.1"
"@webassemblyjs/ast" "1.14.1"
"@xtuc/long" "4.2.2"
"@xtuc/ieee754@^1.2.0":
@@ -2551,10 +2572,10 @@ acorn-dynamic-import@^4.0.0:
resolved "https://registry.npmmirror.com/acorn-dynamic-import/-/acorn-dynamic-import-4.0.0.tgz#482210140582a36b83c3e342e1cfebcaa9240948"
integrity sha512-d3OEjQV4ROpoflsnUA8HozoIR504TFxNivYEUi6uwz0IYhBkTDXGuWlNdMtybRt3nqVx/L6XqMt0FxkXuWKZhw==
acorn-import-attributes@^1.9.5:
version "1.9.5"
resolved "https://registry.yarnpkg.com/acorn-import-attributes/-/acorn-import-attributes-1.9.5.tgz#7eb1557b1ba05ef18b5ed0ec67591bfab04688ef"
integrity sha512-n02Vykv5uA3eHGM/Z2dQrcD56kL8TyDb2p1+0P83PClMnC/nc+anbQRhIOWnSq4Ke/KvDPrY3C9hDtC/A3eHnQ==
acorn-import-phases@^1.0.3:
version "1.0.4"
resolved "https://registry.yarnpkg.com/acorn-import-phases/-/acorn-import-phases-1.0.4.tgz#16eb850ba99a056cb7cbfe872ffb8972e18c8bd7"
integrity sha512-wKmbr/DDiIXzEOiWrTTUcDm24kQ2vGfZQvM2fwg2vXqR5uW6aapr7ObPtj1th32b9u90/Pf4AItvdTh42fBmVQ==
acorn-jsx@^5.0.1:
version "5.3.2"
@@ -2571,15 +2592,15 @@ acorn@^6.1.1:
resolved "https://registry.npmmirror.com/acorn/-/acorn-6.4.2.tgz#35866fd710528e92de10cf06016498e47e39e1e6"
integrity sha512-XtGIhXwF8YM8bJhGxG5kXgjkEuNGLTkoYqVE+KMR+aspr4KGYmKYg7yUe3KghyQ9yheNwLnjmzh/7+gfDBmHCQ==
acorn@^8.0.4, acorn@^8.5.0, acorn@^8.7.1:
acorn@^8.0.4, acorn@^8.5.0:
version "8.8.1"
resolved "https://registry.npmmirror.com/acorn/-/acorn-8.8.1.tgz#0a3f9cbecc4ec3bea6f0a80b66ae8dd2da250b73"
integrity sha512-7zFpHzhnqYKrkYdUjF1HI1bzd0VygEGX8lFk4k5zVMqHEoES+P+7TKI+EvLO9WVMJ8eekdO0aDEK044xTXwPPA==
acorn@^8.8.2:
version "8.12.1"
resolved "https://registry.yarnpkg.com/acorn/-/acorn-8.12.1.tgz#71616bdccbe25e27a54439e0046e89ca76df2248"
integrity sha512-tcpGyI9zbizT9JbV6oYE477V6mTlXvvi0T0G3SNIYE2apm/G5huBa1+K89VGeovbg+jycCrfhl3ADxErOuO6Jg==
acorn@^8.15.0:
version "8.15.0"
resolved "https://registry.yarnpkg.com/acorn/-/acorn-8.15.0.tgz#a360898bc415edaac46c8241f6383975b930b816"
integrity sha512-NZyJarBfL7nWwIq+FDL6Zp/yHEhePMNnnJ0y3qfieCrmNvYct8uvtiV41UvlSe6apAfk0fY1FbWx+NwfmpvtTg==
address@^1.0.1, address@^1.1.2:
version "1.2.1"
@@ -2606,7 +2627,7 @@ ajv-keywords@^3.4.1, ajv-keywords@^3.5.2:
resolved "https://registry.npmmirror.com/ajv-keywords/-/ajv-keywords-3.5.2.tgz#31f29da5ab6e00d1c2d329acf7b5929614d5014d"
integrity sha512-5p6WTN0DdTGVQk6VjcEju19IgaHudalcfabD7yhDGeA6bcQnmL+CpveLJq/3hvfwd1aof6L386Ougkx6RfyMIQ==
ajv-keywords@^5.0.0:
ajv-keywords@^5.0.0, ajv-keywords@^5.1.0:
version "5.1.0"
resolved "https://registry.npmmirror.com/ajv-keywords/-/ajv-keywords-5.1.0.tgz#69d4d385a4733cdbeab44964a1170a88f87f0e16"
integrity sha512-YCS/JNFAUyr5vAuhk1DWm1CBxRHW9LbJ2ozWeemrIqpbsqKjHVxYPyi5GC0rjZIT5JxJ3virVTS8wk4i/Z+krw==
@@ -2633,6 +2654,16 @@ ajv@^8.0.0, ajv@^8.8.0:
require-from-string "^2.0.2"
uri-js "^4.2.2"
ajv@^8.9.0:
version "8.17.1"
resolved "https://registry.yarnpkg.com/ajv/-/ajv-8.17.1.tgz#37d9a5c776af6bc92d7f4f9510eba4c0a60d11a6"
integrity sha512-B/gBuNg5SiMTrPkC+A2+cW0RszwxYmn6VYxB/inlBStS5nx6xHIt/ehKRhIMhqusl7a8LjQoZnjCs5vhwxOQ1g==
dependencies:
fast-deep-equal "^3.1.3"
fast-uri "^3.0.1"
json-schema-traverse "^1.0.0"
require-from-string "^2.0.2"
algoliasearch-helper@^3.5.5:
version "3.26.0"
resolved "https://registry.yarnpkg.com/algoliasearch-helper/-/algoliasearch-helper-3.26.0.tgz#d6e283396a9fc5bf944f365dc3b712570314363f"
@@ -2842,6 +2873,11 @@ base16@^1.0.0:
resolved "https://registry.npmmirror.com/base16/-/base16-1.0.0.tgz#e297f60d7ec1014a7a971a39ebc8a98c0b681e70"
integrity sha512-pNdYkNPiJUnEhnfXV56+sQy8+AaPcG3POZAUnwr4EeqCUZFz4u2PePbo3e5Gj4ziYPCWGUZT9RHisvJKnwFuBQ==
baseline-browser-mapping@^2.9.0:
version "2.9.19"
resolved "https://registry.yarnpkg.com/baseline-browser-mapping/-/baseline-browser-mapping-2.9.19.tgz#3e508c43c46d961eb4d7d2e5b8d1dd0f9ee4f488"
integrity sha512-ipDqC8FrAl/76p2SSWKSI+H9tFwm7vYqXQrItCuiVPt26Km0jS+NzSsBWAaBusvSbQcfJG+JitdMm+wZAgTYqg==
batch@0.6.1:
version "0.6.1"
resolved "https://registry.npmmirror.com/batch/-/batch-0.6.1.tgz#dc34314f4e679318093fc760272525f94bf25c16"
@@ -2929,15 +2965,16 @@ browserslist@^4.0.0, browserslist@^4.16.5, browserslist@^4.16.6, browserslist@^4
node-releases "^2.0.6"
update-browserslist-db "^1.0.9"
browserslist@^4.21.10:
version "4.23.3"
resolved "https://registry.yarnpkg.com/browserslist/-/browserslist-4.23.3.tgz#debb029d3c93ebc97ffbc8d9cbb03403e227c800"
integrity sha512-btwCFJVjI4YWDNfau8RhZ+B1Q/VLoUITrm3RlP6y1tYGWIOa+InuYiRGXUBXo8nA1qKmHMyLB/iVQg5TT4eFoA==
browserslist@^4.28.1:
version "4.28.1"
resolved "https://registry.yarnpkg.com/browserslist/-/browserslist-4.28.1.tgz#7f534594628c53c63101079e27e40de490456a95"
integrity sha512-ZC5Bd0LgJXgwGqUknZY/vkUQ04r8NXnJZ3yYi4vDmSiZmC/pdSN0NbNRPxZpbtO4uAfDUAFffO8IZoM3Gj8IkA==
dependencies:
caniuse-lite "^1.0.30001646"
electron-to-chromium "^1.5.4"
node-releases "^2.0.18"
update-browserslist-db "^1.1.0"
baseline-browser-mapping "^2.9.0"
caniuse-lite "^1.0.30001759"
electron-to-chromium "^1.5.263"
node-releases "^2.0.27"
update-browserslist-db "^1.2.0"
buble-jsx-only@^0.19.8:
version "0.19.8"
@@ -3037,11 +3074,16 @@ caniuse-api@^3.0.0:
lodash.memoize "^4.1.2"
lodash.uniq "^4.5.0"
caniuse-lite@^1.0.0, caniuse-lite@^1.0.30001400, caniuse-lite@^1.0.30001426, caniuse-lite@^1.0.30001646:
caniuse-lite@^1.0.0, caniuse-lite@^1.0.30001400, caniuse-lite@^1.0.30001426:
version "1.0.30001718"
resolved "https://registry.npmjs.org/caniuse-lite/-/caniuse-lite-1.0.30001718.tgz"
integrity sha512-AflseV1ahcSunK53NfEs9gFWgOEmzr0f+kaMFA4xiLZlr9Hzt7HxcSpIFcnNCUkz6R6dWKa54rUz3HUmI3nVcw==
caniuse-lite@^1.0.30001759:
version "1.0.30001769"
resolved "https://registry.yarnpkg.com/caniuse-lite/-/caniuse-lite-1.0.30001769.tgz#1ad91594fad7dc233777c2781879ab5409f7d9c2"
integrity sha512-BCfFL1sHijQlBGWBMuJyhZUhzo7wer5sVj9hqekB/7xn0Ypy+pER/edCYQm4exbXj4WiySGp40P8UuTh6w1srg==
ccount@^1.0.0, ccount@^1.0.3:
version "1.1.0"
resolved "https://registry.npmmirror.com/ccount/-/ccount-1.1.0.tgz#246687debb6014735131be8abab2d93898f8d043"
@@ -3849,10 +3891,10 @@ electron-to-chromium@^1.4.251:
resolved "https://registry.npmmirror.com/electron-to-chromium/-/electron-to-chromium-1.4.284.tgz#61046d1e4cab3a25238f6bf7413795270f125592"
integrity sha512-M8WEXFuKXMYMVr45fo8mq0wUrrJHheiKZf6BArTKk9ZBYCKJEOU5H8cdWgDT+qCVZf7Na4lVUaZsA+h6uA9+PA==
electron-to-chromium@^1.5.4:
version "1.5.14"
resolved "https://registry.yarnpkg.com/electron-to-chromium/-/electron-to-chromium-1.5.14.tgz#8de5fd941f4deede999f90503c4b5923fbe1962b"
integrity sha512-bEfPECb3fJ15eaDnu9LEJ2vPGD6W1vt7vZleSVyFhYuMIKm3vz/g9lt7IvEzgdwj58RjbPKUF2rXTCN/UW47tQ==
electron-to-chromium@^1.5.263:
version "1.5.286"
resolved "https://registry.yarnpkg.com/electron-to-chromium/-/electron-to-chromium-1.5.286.tgz#142be1ab5e1cd5044954db0e5898f60a4960384e"
integrity sha512-9tfDXhJ4RKFNerfjdCcZfufu49vg620741MNs26a9+bhLThdB+plgMeou98CAaHu/WATj2iHOOHTp1hWtABj2A==
emoji-regex@^8.0.0:
version "8.0.0"
@@ -3886,13 +3928,13 @@ end-of-stream@^1.1.0:
dependencies:
once "^1.4.0"
enhanced-resolve@^5.17.1:
version "5.17.1"
resolved "https://registry.yarnpkg.com/enhanced-resolve/-/enhanced-resolve-5.17.1.tgz#67bfbbcc2f81d511be77d686a90267ef7f898a15"
integrity sha512-LMHl3dXhTcfv8gM4kEzIUeTQ+7fpdA0l2tUf34BddXPkz2A5xJ5L/Pchd5BL6rdccM9QGvu0sWZzK1Z1t4wwyg==
enhanced-resolve@^5.19.0:
version "5.19.0"
resolved "https://registry.yarnpkg.com/enhanced-resolve/-/enhanced-resolve-5.19.0.tgz#6687446a15e969eaa63c2fa2694510e17ae6d97c"
integrity sha512-phv3E1Xl4tQOShqSte26C7Fl84EwUdZsyOuSSk9qtAGyyQs2s3jJzComh+Abf4g187lUUAvH+H26omrqia2aGg==
dependencies:
graceful-fs "^4.2.4"
tapable "^2.2.0"
tapable "^2.3.0"
entities@^2.0.0:
version "2.2.0"
@@ -3958,10 +4000,10 @@ es-errors@^1.3.0:
resolved "https://registry.yarnpkg.com/es-errors/-/es-errors-1.3.0.tgz#05f75a25dab98e4fb1dcd5e1472c0546d5057c8f"
integrity sha512-Zf5H2Kxt2xjTvbJvP2ZWLEICxA6j+hAmMzIlypy4xcBg1vKVnx89Wy0GbS+kf5cwCVFFzdCFh2XSCFNULS6csw==
es-module-lexer@^1.2.1:
version "1.5.4"
resolved "https://registry.yarnpkg.com/es-module-lexer/-/es-module-lexer-1.5.4.tgz#a8efec3a3da991e60efa6b633a7cad6ab8d26b78"
integrity sha512-MVNK56NiMrOwitFB7cqDwq0CQutbw+0BvLshJSse0MUNU+y1FC3bUS/AQg7oUng+/wKrrki7JfmwtVHkVfPLlw==
es-module-lexer@^2.0.0:
version "2.0.0"
resolved "https://registry.yarnpkg.com/es-module-lexer/-/es-module-lexer-2.0.0.tgz#f657cd7a9448dcdda9c070a3cb75e5dc1e85f5b1"
integrity sha512-5POEcUuZybH7IdmGsD8wlf0AI55wMecM9rVBTI/qEAy2c1kTOm3DjFYjrBdI2K3BaJjJYfYFeRtM0t9ssnRuxw==
es-to-primitive@^1.2.1:
version "1.2.1"
@@ -3977,7 +4019,7 @@ escalade@^3.1.1:
resolved "https://registry.npmmirror.com/escalade/-/escalade-3.1.1.tgz#d8cfdc7000965c5a0174b4a82eaa5c0552742e40"
integrity sha512-k0er2gUkLf8O0zKJiAhmkTnJlTvINGv7ygDNPbeIsX/TJjGJZHuh9B2UxbsaEkmlEo9MfhrSzmhIlhRlI2GXnw==
escalade@^3.1.2:
escalade@^3.2.0:
version "3.2.0"
resolved "https://registry.yarnpkg.com/escalade/-/escalade-3.2.0.tgz#011a3f69856ba189dffa7dc8fcce99d2a87903e5"
integrity sha512-WUj2qlxaQtO4g6Pq5c29GTcWGDyd8itL8zTlipgECz3JesAiiOKotd8JU6otB3PACgG6xkJUyVhboMS+bje/jA==
@@ -4155,6 +4197,11 @@ fast-json-stable-stringify@^2.0.0:
resolved "https://registry.npmmirror.com/fast-json-stable-stringify/-/fast-json-stable-stringify-2.1.0.tgz#874bf69c6f404c2b5d99c481341399fd55892633"
integrity sha512-lhd/wF+Lk98HZoTCtlVraHtfh5XYijIjalXck7saUtuanSDyLMxnHhSXEDJqHxD7msR8D0uCmqlkwjCV8xvwHw==
fast-uri@^3.0.1:
version "3.1.0"
resolved "https://registry.yarnpkg.com/fast-uri/-/fast-uri-3.1.0.tgz#66eecff6c764c0df9b762e62ca7edcfb53b4edfa"
integrity sha512-iPeeDKJSWf4IEOasVVrknXpaBV0IApz/gp7S2bb7Z4Lljbl2MGJRqInZiUrQwV16cpzw/D3S5j5Julj/gT52AA==
fast-url-parser@1.1.3:
version "1.1.3"
resolved "https://registry.npmmirror.com/fast-url-parser/-/fast-url-parser-1.1.3.tgz#f4af3ea9f34d8a271cf58ad2b3759f431f0b318d"
@@ -5426,10 +5473,10 @@ lines-and-columns@^1.1.6:
resolved "https://registry.npmmirror.com/lines-and-columns/-/lines-and-columns-1.2.4.tgz#eca284f75d2965079309dc0ad9255abb2ebc1632"
integrity sha512-7ylylesZQ/PV29jhEDl3Ufjo6ZX7gCqJr5F7PKrqc93v7fzSymt1BpwEU8nAUXs8qzzvqhbjhK5QZg6Mt/HkBg==
loader-runner@^4.2.0:
version "4.3.0"
resolved "https://registry.npmmirror.com/loader-runner/-/loader-runner-4.3.0.tgz#c1b4a163b99f614830353b16755e7149ac2314e1"
integrity sha512-3R/1M+yS3j5ou80Me59j7F9IMs4PXs3VqRrm0TU3AbKPxlmpoY1TNscJV/oGJXo8qCatFGTfDbY6W6ipGOYXfg==
loader-runner@^4.3.1:
version "4.3.1"
resolved "https://registry.yarnpkg.com/loader-runner/-/loader-runner-4.3.1.tgz#6c76ed29b0ccce9af379208299f07f876de737e3"
integrity sha512-IWqP2SCPhyVFTBtRcgMHdzlf9ul25NwaFx4wCEH/KjAXuuHY4yNjvPXsBokp8jCB936PyWRaPKUNh8NvylLp2Q==
loader-utils@2.0.4, loader-utils@^2.0.0:
version "2.0.4"
@@ -5488,9 +5535,9 @@ lodash.uniq@4.5.0, lodash.uniq@^4.5.0:
integrity sha512-xfBaXQd9ryd9dlSDvnvI0lvxfLJlYAZzXomUYzLKtUeOQvOP5piqAWuGtrhWeqaXK9hhoM/iyJc5AV+XfsX3HQ==
lodash@^4.17.19, lodash@^4.17.20, lodash@^4.17.21:
version "4.17.21"
resolved "https://registry.npmmirror.com/lodash/-/lodash-4.17.21.tgz#679591c564c3bffaae8454cf0b3df370c3d6911c"
integrity sha512-v2kDEe57lecTulaDIuNTPy3Ry4gLGJ6Z1O3vE1krgXZNrsQ+LFTGHVxVjcXPs17LhbZVGedAJv8XZ1tvj5FvSg==
version "4.17.23"
resolved "https://registry.yarnpkg.com/lodash/-/lodash-4.17.23.tgz#f113b0378386103be4f6893388c73d0bde7f2c5a"
integrity sha512-LgVTMpQtIopCi79SJeDiP0TfWi5CNEc/L/aRdTh3yIvmZXTnheWpKjSZhnvMl8iXbC1tFg9gdHHDMLoV7CnG+w==
loose-envify@^1.0.0, loose-envify@^1.1.0, loose-envify@^1.2.0, loose-envify@^1.3.1, loose-envify@^1.4.0:
version "1.4.0"
@@ -5787,10 +5834,10 @@ node-forge@1.3.0, node-forge@^1:
resolved "https://registry.npmmirror.com/node-forge/-/node-forge-1.3.0.tgz#37a874ea723855f37db091e6c186e5b67a01d4b2"
integrity sha512-08ARB91bUi6zNKzVmaj3QO7cr397uiDT2nJ63cHjyNtCTWIgvS47j3eT0WfzUwS9+6Z5YshRaoasFkXCKrIYbA==
node-releases@^2.0.18:
version "2.0.18"
resolved "https://registry.yarnpkg.com/node-releases/-/node-releases-2.0.18.tgz#f010e8d35e2fe8d6b2944f03f70213ecedc4ca3f"
integrity sha512-d9VeXT4SJ7ZeOqGX6R5EM022wpL+eWPooLI+5UpWn2jCT1aosUQEhQP214x33Wkwx3JQMvIm+tIoVOdodFS40g==
node-releases@^2.0.27:
version "2.0.27"
resolved "https://registry.yarnpkg.com/node-releases/-/node-releases-2.0.27.tgz#eedca519205cf20f650f61d56b070db111231e4e"
integrity sha512-nmh3lCkYZ3grZvqcCH+fjmQ7X+H0OeZgP40OierEaAptX4XofMh5kwNbWh7lBduUzCcV/8kZ+NDLCwm2iorIlA==
node-releases@^2.0.6:
version "2.0.6"
@@ -6140,10 +6187,10 @@ picocolors@^1.0.0:
resolved "https://registry.npmmirror.com/picocolors/-/picocolors-1.0.0.tgz#cb5bdc74ff3f51892236eaf79d68bc44564ab81c"
integrity sha512-1fygroTLlHu66zi26VoTDv8yRgm0Fccecssto+MhsZ0D/DGW2sm8E8AjW7NU5VVTRt5GxbeZ5qBuJr+HyLYkjQ==
picocolors@^1.0.1:
version "1.1.0"
resolved "https://registry.yarnpkg.com/picocolors/-/picocolors-1.1.0.tgz#5358b76a78cde483ba5cef6a9dc9671440b27d59"
integrity sha512-TQ92mBOW0l3LeMeyLV6mzy/kWr8lkd/hp3mTg7wYK7zJhuBStmGMBG0BdeDZS/dZx1IukaX6Bk11zcln25o1Aw==
picocolors@^1.1.1:
version "1.1.1"
resolved "https://registry.yarnpkg.com/picocolors/-/picocolors-1.1.1.tgz#3d321af3eab939b083c8f929a1d12cda81c26b6b"
integrity sha512-xceH2snhtb5M9liqDsmEw56le376mTZkEX/jEb/RxNFyegNul7eNslCXP9FDj/Lcu0X8KEyMceP2ntpaHrDEVA==
picomatch@^2.0.4, picomatch@^2.2.1, picomatch@^2.3.1:
version "2.3.1"
@@ -7227,15 +7274,6 @@ schema-utils@^3.0.0, schema-utils@^3.1.1:
ajv "^6.12.5"
ajv-keywords "^3.5.2"
schema-utils@^3.2.0:
version "3.3.0"
resolved "https://registry.yarnpkg.com/schema-utils/-/schema-utils-3.3.0.tgz#f50a88877c3c01652a15b622ae9e9795df7a60fe"
integrity sha512-pN/yOAvcC+5rQ5nERGuwrjLlYvLTbCibnZ1I7B1LaiAz9BRBlE9GMgE/eqV30P7aJQUf7Ddimy/RsbYO/GrVGg==
dependencies:
"@types/json-schema" "^7.0.8"
ajv "^6.12.5"
ajv-keywords "^3.5.2"
schema-utils@^4.0.0:
version "4.0.0"
resolved "https://registry.npmmirror.com/schema-utils/-/schema-utils-4.0.0.tgz#60331e9e3ae78ec5d16353c467c34b3a0a1d3df7"
@@ -7246,6 +7284,16 @@ schema-utils@^4.0.0:
ajv-formats "^2.1.1"
ajv-keywords "^5.0.0"
schema-utils@^4.3.0, schema-utils@^4.3.3:
version "4.3.3"
resolved "https://registry.yarnpkg.com/schema-utils/-/schema-utils-4.3.3.tgz#5b1850912fa31df90716963d45d9121fdfc09f46"
integrity sha512-eflK8wEtyOE6+hsaRVPxvUKYCpRgzLqDTb8krvAsRIwOGlHoSgYLgBXoubGgLd2fT41/OUYdb48v4k4WWHQurA==
dependencies:
"@types/json-schema" "^7.0.9"
ajv "^8.9.0"
ajv-formats "^2.1.1"
ajv-keywords "^5.1.0"
section-matter@^1.0.0:
version "1.0.0"
resolved "https://registry.npmmirror.com/section-matter/-/section-matter-1.0.0.tgz#e9041953506780ec01d59f292a19c7b850b84167"
@@ -7309,7 +7357,7 @@ send@0.19.0:
range-parser "~1.2.1"
statuses "2.0.1"
serialize-javascript@^6.0.0, serialize-javascript@^6.0.1:
serialize-javascript@^6.0.0, serialize-javascript@^6.0.2:
version "6.0.2"
resolved "https://registry.yarnpkg.com/serialize-javascript/-/serialize-javascript-6.0.2.tgz#defa1e055c83bf6d59ea805d8da862254eb6a6c2"
integrity sha512-Saa1xPByTTq2gdeFZYLLo+RFE35NHZkAbqZeWNd3BpzppeVisAqpDjcp8dyf6uIvEqJRd46jemmyA4iFIeVk8g==
@@ -7738,11 +7786,16 @@ tapable@^1.0.0:
resolved "https://registry.npmmirror.com/tapable/-/tapable-1.1.3.tgz#a1fccc06b58db61fd7a45da2da44f5f3a3e67ba2"
integrity sha512-4WK/bYZmj8xLr+HUCODHGF1ZFzsYffasLUgEiMBY4fgtltdO6B4WJtlSbPaDTLpYTcGVwM2qLnFTICEcNxs3kA==
tapable@^2.0.0, tapable@^2.1.1, tapable@^2.2.0:
tapable@^2.0.0:
version "2.2.1"
resolved "https://registry.npmmirror.com/tapable/-/tapable-2.2.1.tgz#1967a73ef4060a82f12ab96af86d52fdb76eeca0"
integrity sha512-GNzQvQTOIP6RyTfE2Qxb8ZVlNmw0n88vp1szwWRimP02mnTsx3Wtn5qRdqY9w2XduFNUgvOwhNnQsjwCp+kqaQ==
tapable@^2.3.0:
version "2.3.0"
resolved "https://registry.yarnpkg.com/tapable/-/tapable-2.3.0.tgz#7e3ea6d5ca31ba8e078b560f0d83ce9a14aa8be6"
integrity sha512-g9ljZiwki/LfxmQADO3dEY1CbpmXT5Hm2fJ+QaGKwSXUylMybePR7/67YW7jOrrvjEgL1Fmz5kzyAjWVWLlucg==
terser-webpack-plugin@^5.2.4:
version "5.3.6"
resolved "https://registry.npmmirror.com/terser-webpack-plugin/-/terser-webpack-plugin-5.3.6.tgz#5590aec31aa3c6f771ce1b1acca60639eab3195c"
@@ -7754,16 +7807,16 @@ terser-webpack-plugin@^5.2.4:
serialize-javascript "^6.0.0"
terser "^5.14.1"
terser-webpack-plugin@^5.3.10:
version "5.3.10"
resolved "https://registry.yarnpkg.com/terser-webpack-plugin/-/terser-webpack-plugin-5.3.10.tgz#904f4c9193c6fd2a03f693a2150c62a92f40d199"
integrity sha512-BKFPWlPDndPs+NGGCr1U59t0XScL5317Y0UReNrHaw9/FwhPENlq6bfgs+4yPfyP51vqC1bQ4rp1EfXW5ZSH9w==
terser-webpack-plugin@^5.3.16:
version "5.3.16"
resolved "https://registry.yarnpkg.com/terser-webpack-plugin/-/terser-webpack-plugin-5.3.16.tgz#741e448cc3f93d8026ebe4f7ef9e4afacfd56330"
integrity sha512-h9oBFCWrq78NyWWVcSwZarJkZ01c2AyGrzs1crmHZO3QUg9D61Wu4NPjBy69n7JqylFF5y+CsUZYmYEIZ3mR+Q==
dependencies:
"@jridgewell/trace-mapping" "^0.3.20"
"@jridgewell/trace-mapping" "^0.3.25"
jest-worker "^27.4.5"
schema-utils "^3.1.1"
serialize-javascript "^6.0.1"
terser "^5.26.0"
schema-utils "^4.3.0"
serialize-javascript "^6.0.2"
terser "^5.31.1"
terser@^5.10.0, terser@^5.14.1:
version "5.15.1"
@@ -7775,13 +7828,13 @@ terser@^5.10.0, terser@^5.14.1:
commander "^2.20.0"
source-map-support "~0.5.20"
terser@^5.26.0:
version "5.31.6"
resolved "https://registry.yarnpkg.com/terser/-/terser-5.31.6.tgz#c63858a0f0703988d0266a82fcbf2d7ba76422b1"
integrity sha512-PQ4DAriWzKj+qgehQ7LK5bQqCFNMmlhjR2PFFLuqGCpuCAauxemVBWwWOxo3UIwWQx8+Pr61Df++r76wDmkQBg==
terser@^5.31.1:
version "5.46.0"
resolved "https://registry.yarnpkg.com/terser/-/terser-5.46.0.tgz#1b81e560d584bbdd74a8ede87b4d9477b0ff9695"
integrity sha512-jTwoImyr/QbOWFFso3YoU3ik0jBBDJ6JTOQiy/J2YxVJdZCc+5u7skhNwiOR3FQIygFqVUPHl7qbbxtjW2K3Qg==
dependencies:
"@jridgewell/source-map" "^0.3.3"
acorn "^8.8.2"
acorn "^8.15.0"
commander "^2.20.0"
source-map-support "~0.5.20"
@@ -8055,13 +8108,13 @@ update-browserslist-db@^1.0.9:
escalade "^3.1.1"
picocolors "^1.0.0"
update-browserslist-db@^1.1.0:
version "1.1.0"
resolved "https://registry.yarnpkg.com/update-browserslist-db/-/update-browserslist-db-1.1.0.tgz#7ca61c0d8650766090728046e416a8cde682859e"
integrity sha512-EdRAaAyk2cUE1wOf2DkEhzxqOQvFOoRJFNS6NeyJ01Gp2beMRpBAINjM2iDXE3KCuKhwnvHIQCJm6ThL2Z+HzQ==
update-browserslist-db@^1.2.0:
version "1.2.3"
resolved "https://registry.yarnpkg.com/update-browserslist-db/-/update-browserslist-db-1.2.3.tgz#64d76db58713136acbeb4c49114366cc6cc2e80d"
integrity sha512-Js0m9cx+qOgDxo0eMiFGEueWztz+d4+M3rGlmKPT+T4IS/jP4ylw3Nwpu6cpTTP8R1MAC1kF4VbdLt3ARf209w==
dependencies:
escalade "^3.1.2"
picocolors "^1.0.1"
escalade "^3.2.0"
picocolors "^1.1.1"
update-notifier@^5.1.0:
version "5.1.0"
@@ -8195,10 +8248,10 @@ wait-on@^6.0.0:
minimist "^1.2.5"
rxjs "^7.5.4"
watchpack@^2.4.1:
version "2.4.2"
resolved "https://registry.yarnpkg.com/watchpack/-/watchpack-2.4.2.tgz#2feeaed67412e7c33184e5a79ca738fbd38564da"
integrity sha512-TnbFSbcOCcDgjZ4piURLCbJ3nJhznVh9kw6F6iokjiFPl8ONxe9A6nMDVXDiNbrSfLILs6vB07F7wLBrwPYzJw==
watchpack@^2.5.1:
version "2.5.1"
resolved "https://registry.yarnpkg.com/watchpack/-/watchpack-2.5.1.tgz#dd38b601f669e0cbf567cb802e75cead82cde102"
integrity sha512-Zn5uXdcFNIA1+1Ei5McRd+iRzfhENPCe7LeABkJtNulSxjma+l7ltNx55BWZkRlwRnpOgHqxnjyaDgJnNXnqzg==
dependencies:
glob-to-regexp "^0.4.1"
graceful-fs "^4.1.2"
@@ -8297,39 +8350,46 @@ webpack-sources@^1.1.0:
source-list-map "^2.0.0"
source-map "~0.6.1"
webpack-sources@^3.2.2, webpack-sources@^3.2.3:
webpack-sources@^3.2.2:
version "3.2.3"
resolved "https://registry.npmmirror.com/webpack-sources/-/webpack-sources-3.2.3.tgz#2d4daab8451fd4b240cc27055ff6a0c2ccea0cde"
integrity sha512-/DyMEOrDgLKKIG0fmvtz+4dUX/3Ghozwgm6iPp8KRhvn+eQf9+Q7GWxVNMk3+uCPWfdXYC4ExGBckIXdFEfH1w==
webpack-sources@^3.3.3:
version "3.3.3"
resolved "https://registry.yarnpkg.com/webpack-sources/-/webpack-sources-3.3.3.tgz#d4bf7f9909675d7a070ff14d0ef2a4f3c982c723"
integrity sha512-yd1RBzSGanHkitROoPFd6qsrxt+oFhg/129YzheDGqeustzX0vTZJZsSsQjVQC4yzBQ56K55XU8gaNCtIzOnTg==
webpack@^5.61.0, webpack@^5.73.0:
version "5.94.0"
resolved "https://registry.yarnpkg.com/webpack/-/webpack-5.94.0.tgz#77a6089c716e7ab90c1c67574a28da518a20970f"
integrity sha512-KcsGn50VT+06JH/iunZJedYGUJS5FGjow8wb9c0v5n1Om8O1g4L6LjtfxwlXIATopoQu+vOXXa7gYisWxCoPyg==
version "5.105.0"
resolved "https://registry.yarnpkg.com/webpack/-/webpack-5.105.0.tgz#38b5e6c5db8cbe81debbd16e089335ada05ea23a"
integrity sha512-gX/dMkRQc7QOMzgTe6KsYFM7DxeIONQSui1s0n/0xht36HvrgbxtM1xBlgx596NbpHuQU8P7QpKwrZYwUX48nw==
dependencies:
"@types/estree" "^1.0.5"
"@webassemblyjs/ast" "^1.12.1"
"@webassemblyjs/wasm-edit" "^1.12.1"
"@webassemblyjs/wasm-parser" "^1.12.1"
acorn "^8.7.1"
acorn-import-attributes "^1.9.5"
browserslist "^4.21.10"
"@types/eslint-scope" "^3.7.7"
"@types/estree" "^1.0.8"
"@types/json-schema" "^7.0.15"
"@webassemblyjs/ast" "^1.14.1"
"@webassemblyjs/wasm-edit" "^1.14.1"
"@webassemblyjs/wasm-parser" "^1.14.1"
acorn "^8.15.0"
acorn-import-phases "^1.0.3"
browserslist "^4.28.1"
chrome-trace-event "^1.0.2"
enhanced-resolve "^5.17.1"
es-module-lexer "^1.2.1"
enhanced-resolve "^5.19.0"
es-module-lexer "^2.0.0"
eslint-scope "5.1.1"
events "^3.2.0"
glob-to-regexp "^0.4.1"
graceful-fs "^4.2.11"
json-parse-even-better-errors "^2.3.1"
loader-runner "^4.2.0"
loader-runner "^4.3.1"
mime-types "^2.1.27"
neo-async "^2.6.2"
schema-utils "^3.2.0"
tapable "^2.1.1"
terser-webpack-plugin "^5.3.10"
watchpack "^2.4.1"
webpack-sources "^3.2.3"
schema-utils "^4.3.3"
tapable "^2.3.0"
terser-webpack-plugin "^5.3.16"
watchpack "^2.5.1"
webpack-sources "^3.3.3"
webpackbar@^5.0.0-3:
version "5.0.2"