489 Commits

Author SHA1 Message Date
Copilot
158ff7d99e Fix transformers API compatibility: support v4.26+ and v5.0+ with version-aware parameter selection (#1514)
* Initial plan

* Fix transformers API compatibility issues

Co-authored-by: thinkall <3197038+thinkall@users.noreply.github.com>

* Add backward compatibility for transformers v4.26+ by version check

Support both tokenizer (v4.26-4.43) and processing_class (v4.44+) parameters based on installed transformers version. Fallback to tokenizer if version check fails.

Co-authored-by: thinkall <3197038+thinkall@users.noreply.github.com>

* Improve exception handling specificity

Use specific exception types (ImportError, AttributeError, ValueError) instead of broad Exception catch for better error handling.

Co-authored-by: thinkall <3197038+thinkall@users.noreply.github.com>

* Run pre-commit formatting on all files

Applied black formatting to fix code style across the repository.

Co-authored-by: thinkall <3197038+thinkall@users.noreply.github.com>

---------

Co-authored-by: copilot-swe-agent[bot] <198982749+Copilot@users.noreply.github.com>
Co-authored-by: thinkall <3197038+thinkall@users.noreply.github.com>
2026-01-28 09:00:21 +08:00
Copilot
fc4efe3510 Fix sklearn 1.7+ compatibility: BaseEstimator type detection for ensemble (#1512)
* Initial plan

* Fix ExtraTreesEstimator regression ensemble error with sklearn 1.7+

Co-authored-by: thinkall <3197038+thinkall@users.noreply.github.com>

* Address code review feedback: improve __sklearn_tags__ implementation

Co-authored-by: thinkall <3197038+thinkall@users.noreply.github.com>

* Fix format error

* Emphasize pre-commit

---------

Co-authored-by: copilot-swe-agent[bot] <198982749+Copilot@users.noreply.github.com>
Co-authored-by: thinkall <3197038+thinkall@users.noreply.github.com>
Co-authored-by: Li Jiang <lijiang1@microsoft.com>
2026-01-23 10:20:59 +08:00
Copilot
d9e74031e0 Expose task-level and estimator-level preprocessors as public API (#1497)
* Initial plan

* Add public preprocess() API methods for AutoML and estimators

Co-authored-by: thinkall <3197038+thinkall@users.noreply.github.com>

* Add documentation for preprocess() API methods

Co-authored-by: thinkall <3197038+thinkall@users.noreply.github.com>

* Add example script demonstrating preprocess() API usage

Co-authored-by: thinkall <3197038+thinkall@users.noreply.github.com>

* Address code review feedback - fix type hints and simplify test logic

Co-authored-by: thinkall <3197038+thinkall@users.noreply.github.com>

* Fix formatting issues with pre-commit hooks

Co-authored-by: thinkall <3197038+thinkall@users.noreply.github.com>

* Remove example.py, make tests faster

---------

Co-authored-by: copilot-swe-agent[bot] <198982749+Copilot@users.noreply.github.com>
Co-authored-by: thinkall <3197038+thinkall@users.noreply.github.com>
Co-authored-by: Li Jiang <bnujli@gmail.com>
Co-authored-by: Li Jiang <lijiang1@microsoft.com>
2026-01-21 14:38:25 +08:00
Copilot
7ec1414e9b Clarify period parameter and automatic label lagging in time series forecasting (#1495)
* Initial plan

* Add comprehensive documentation for period parameter and automatic label lagging

Co-authored-by: thinkall <3197038+thinkall@users.noreply.github.com>

* Address code review feedback on docstring clarity

Co-authored-by: thinkall <3197038+thinkall@users.noreply.github.com>

* Clarify period vs prediction output length per @thinkall's feedback

Co-authored-by: thinkall <3197038+thinkall@users.noreply.github.com>

* Refine terminology per code review feedback

Co-authored-by: thinkall <3197038+thinkall@users.noreply.github.com>

* Run pre-commit formatting fixes

Co-authored-by: thinkall <3197038+thinkall@users.noreply.github.com>

---------

Co-authored-by: copilot-swe-agent[bot] <198982749+Copilot@users.noreply.github.com>
Co-authored-by: thinkall <3197038+thinkall@users.noreply.github.com>
Co-authored-by: Li Jiang <bnujli@gmail.com>
2026-01-21 14:19:23 +08:00
Copilot
9233a52736 Add configurable label overlap handling for classification holdout strategy (#1491)
* Initial plan

* Fix training/test set overlap in holdout classification by only adding missing labels when needed

Co-authored-by: thinkall <3197038+thinkall@users.noreply.github.com>

* Address code review feedback: add bounds checking and fix edge cases

Co-authored-by: thinkall <3197038+thinkall@users.noreply.github.com>

* Fix bounds checking: use correct comparison operator for array indexing

Co-authored-by: thinkall <3197038+thinkall@users.noreply.github.com>

* Fix potential ValueError with max() on empty lists and simplify test assertions

Co-authored-by: thinkall <3197038+thinkall@users.noreply.github.com>

* Add extra bounds checking for label_matches indices

Co-authored-by: thinkall <3197038+thinkall@users.noreply.github.com>

* Fix pandas_on_spark compatibility by using iloc_pandas_on_spark util method

Co-authored-by: thinkall <3197038+thinkall@users.noreply.github.com>

* Run pre-commit to fix formatting issues

Co-authored-by: thinkall <3197038+thinkall@users.noreply.github.com>

* Improve missing label handling to avoid overlap when possible

For classes with multiple instances that end up in one set, properly
re-split them between train and val instead of duplicating. Only add
to both sets when the class has exactly 1 instance (unavoidable overlap).

Co-authored-by: thinkall <3197038+thinkall@users.noreply.github.com>

* Fix: ensure at least 1 instance remains in original set when re-splitting

Addresses comments on lines 580 and 724 - prevents moving ALL instances
from validation to training or vice versa by using min(num_instances - 1, ...).

Co-authored-by: thinkall <3197038+thinkall@users.noreply.github.com>

* Refactor: Extract missing label handling into two methods (fast/slow)

- Created _handle_missing_labels_fast(): Fast version that adds first instance to both sets (may overlap)
- Created _handle_missing_labels_no_overlap(): Precise version that avoids overlap when possible
- Added allow_label_overlap parameter to AutoML.fit() (default=True for fast version)
- Updated documentation with new parameter
- Both versions maintain label completeness while offering speed/precision tradeoff

Co-authored-by: thinkall <3197038+thinkall@users.noreply.github.com>

* Add usage example for allow_label_overlap to Best-Practices.md

- Added comprehensive documentation in Classification section
- Included code examples showing both fast and precise versions
- Explained tradeoffs between speed and precision
- Noted that parameter only affects holdout evaluation

Co-authored-by: thinkall <3197038+thinkall@users.noreply.github.com>

* Address code review feedback: update documentation and tests

- Updated docstrings to clarify fast version only adds instances to missing sets
- Fixed documentation to reflect actual behavior (not "both sets" but "set with missing label")
- Completely rewrote test_no_overlap.py to test both allow_label_overlap modes
- Added tests with sample_weights for better code coverage
- Added test for single-instance class handling
- All 5 tests passing

Co-authored-by: thinkall <3197038+thinkall@users.noreply.github.com>

* Fix AttributeError: initialize _allow_label_overlap in settings and retrain_from_log

- Added allow_label_overlap to settings initialization with default=True
- Added parameter defaulting in fit() method to use settings value if not provided
- Added _allow_label_overlap initialization in retrain_from_log method
- Fixes test failures in test_multiclass, test_regression, and spark tests

Co-authored-by: thinkall <3197038+thinkall@users.noreply.github.com>

* Add docstring to fit()

---------

Co-authored-by: copilot-swe-agent[bot] <198982749+Copilot@users.noreply.github.com>
Co-authored-by: thinkall <3197038+thinkall@users.noreply.github.com>
Co-authored-by: Li Jiang <bnujli@gmail.com>
Co-authored-by: Li Jiang <lijiang1@microsoft.com>
2026-01-21 14:03:48 +08:00
Copilot
7ac076d544 Use scientific notation for best error in logger output (#1498)
* Initial plan

* Change best error format from .4f to .4e for scientific notation

Co-authored-by: thinkall <3197038+thinkall@users.noreply.github.com>

---------

Co-authored-by: copilot-swe-agent[bot] <198982749+Copilot@users.noreply.github.com>
Co-authored-by: thinkall <3197038+thinkall@users.noreply.github.com>
Co-authored-by: Li Jiang <bnujli@gmail.com>
2026-01-21 09:06:19 +08:00
Copilot
3d489f1aaa Add validation and clear error messages for custom_metric parameter (#1500)
* Initial plan

* Add validation and documentation for custom_metric parameter

Co-authored-by: thinkall <3197038+thinkall@users.noreply.github.com>

* Refactor validation into reusable method and improve error handling

Co-authored-by: thinkall <3197038+thinkall@users.noreply.github.com>

* Apply pre-commit formatting fixes

Co-authored-by: thinkall <3197038+thinkall@users.noreply.github.com>

---------

Co-authored-by: copilot-swe-agent[bot] <198982749+Copilot@users.noreply.github.com>
Co-authored-by: thinkall <3197038+thinkall@users.noreply.github.com>
Co-authored-by: Li Jiang <bnujli@gmail.com>
2026-01-21 08:58:11 +08:00
Copilot
c64eeb5e8d Document that final_estimator parameters in ensemble are not auto-tuned (#1499)
* Initial plan

* Document final_estimator parameter behavior in ensemble configuration

Co-authored-by: thinkall <3197038+thinkall@users.noreply.github.com>

* Address code review feedback: fix syntax in examples and use float comparison

Co-authored-by: thinkall <3197038+thinkall@users.noreply.github.com>

* Run pre-commit to fix formatting issues

Co-authored-by: thinkall <3197038+thinkall@users.noreply.github.com>

---------

Co-authored-by: copilot-swe-agent[bot] <198982749+Copilot@users.noreply.github.com>
Co-authored-by: thinkall <3197038+thinkall@users.noreply.github.com>
Co-authored-by: Li Jiang <bnujli@gmail.com>
2026-01-20 21:59:31 +08:00
Copilot
1687ca9a94 Fix eval_set preprocessing for XGBoost estimators with categorical features (#1470)
* Initial plan

* Initial analysis - reproduced eval_set preprocessing bug

Co-authored-by: thinkall <3197038+thinkall@users.noreply.github.com>

* Fix eval_set preprocessing for XGBoost estimators with categorical features

Co-authored-by: thinkall <3197038+thinkall@users.noreply.github.com>

* Add eval_set tests to test_xgboost function

Co-authored-by: thinkall <3197038+thinkall@users.noreply.github.com>

* Fix linting issues with ruff and black

Co-authored-by: thinkall <3197038+thinkall@users.noreply.github.com>

---------

Co-authored-by: copilot-swe-agent[bot] <198982749+Copilot@users.noreply.github.com>
Co-authored-by: thinkall <3197038+thinkall@users.noreply.github.com>
Co-authored-by: Li Jiang <bnujli@gmail.com>
2026-01-20 20:41:21 +08:00
Copilot
4ea9650f99 Fix nested dictionary merge in SearchThread losing sampled hyperparameters (#1494)
* Initial plan

* Add recursive dict update to fix nested config merge

Co-authored-by: thinkall <3197038+thinkall@users.noreply.github.com>

---------

Co-authored-by: copilot-swe-agent[bot] <198982749+Copilot@users.noreply.github.com>
Co-authored-by: thinkall <3197038+thinkall@users.noreply.github.com>
Co-authored-by: Li Jiang <bnujli@gmail.com>
2026-01-20 15:50:18 +08:00
Copilot
22dcfcd3c0 Add comprehensive metric documentation and URL reference to AutoML docstrings (#1471)
* Initial plan

* Update AutoML metric documentation with full list and documentation link

Co-authored-by: thinkall <3197038+thinkall@users.noreply.github.com>

* Apply black and mdformat formatting to code and documentation

Co-authored-by: thinkall <3197038+thinkall@users.noreply.github.com>

* Apply pre-commit formatting fixes

Co-authored-by: thinkall <3197038+thinkall@users.noreply.github.com>

---------

Co-authored-by: copilot-swe-agent[bot] <198982749+Copilot@users.noreply.github.com>
Co-authored-by: thinkall <3197038+thinkall@users.noreply.github.com>
Co-authored-by: Li Jiang <bnujli@gmail.com>
2026-01-20 10:34:54 +08:00
Li Jiang
d7208b32d0 Bump version to 2.5.0 (#1492) 2026-01-20 10:30:39 +08:00
Copilot
5f1aa2dda8 Fix: Preserve FLAML_sample_size in best_config_per_estimator (#1475)
* Initial plan

* Fix: Preserve FLAML_sample_size in best_config_per_estimator

Modified best_config_per_estimator property to keep FLAML_sample_size when returning best configurations. Previously, AutoMLState.sanitize() was removing this key, which caused the sample size information to be lost when using starting_points from a previous run.

Co-authored-by: thinkall <3197038+thinkall@users.noreply.github.com>

* Add a test to verify the improvement of starting_points

* Update documentation to reflect FLAML_sample_size preservation

Updated Task-Oriented-AutoML.md to document that best_config_per_estimator now preserves FLAML_sample_size:
- Added note in "Warm start" section explaining that FLAML_sample_size is preserved for effective warm-starting
- Added note in "Get best configuration" section with example showing FLAML_sample_size in output
- Explains importance of sample size preservation for continuing optimization with correct sample sizes

Co-authored-by: thinkall <3197038+thinkall@users.noreply.github.com>

* Fix unintended code change

* Improve docstrings and docs

---------

Co-authored-by: copilot-swe-agent[bot] <198982749+Copilot@users.noreply.github.com>
Co-authored-by: thinkall <3197038+thinkall@users.noreply.github.com>
Co-authored-by: Li Jiang <bnujli@gmail.com>
Co-authored-by: Li Jiang <lijiang1@microsoft.com>
2026-01-20 07:42:31 +08:00
Copilot
67bdcde4d5 Fix BlendSearch OptunaSearch warning for non-hierarchical spaces with Ray Tune domains (#1477)
* Initial plan

* Fix BlendSearch OptunaSearch warning for non-hierarchical spaces

Co-authored-by: thinkall <3197038+thinkall@users.noreply.github.com>

* Clean up test file

Co-authored-by: thinkall <3197038+thinkall@users.noreply.github.com>

* Add regression test for BlendSearch UDF mode warning fix

Co-authored-by: thinkall <3197038+thinkall@users.noreply.github.com>

* Improve the fix and tests

* Fix Define-by-run function passed in  argument is not yet supported when using

---------

Co-authored-by: copilot-swe-agent[bot] <198982749+Copilot@users.noreply.github.com>
Co-authored-by: thinkall <3197038+thinkall@users.noreply.github.com>
Co-authored-by: Li Jiang <bnujli@gmail.com>
Co-authored-by: Li Jiang <lijiang1@microsoft.com>
2026-01-20 00:01:41 +08:00
Li Jiang
f1817ea7b1 Add support to python 3.13 (#1486) 2026-01-19 18:31:43 +08:00
Li Jiang
f6a5163e6a Fix isinstance usage issues (#1488)
* Fix isinstance usage issues

* Pin python version to 3.12 for pre-commit

* Update mdformat to 0.7.22
2026-01-19 15:19:05 +08:00
Li Jiang
a74354f7a9 Update documents, Bump version to 2.4.1, Sync Fabric till 088cfb98 (#1482)
* Add best practices

* Update docs to reflect on the recent changes

* Improve model persisting best practices

* Bump version to 2.4.1

* List all estimators

* Remove autogen

* Update dependencies
2026-01-13 12:49:36 +08:00
Li Jiang
ced1d6f331 Support pickling the whole AutoML instance, Sync Fabric till 0d4ab16f (#1481) 2026-01-12 23:04:38 +08:00
Copilot
0b138d9193 Fix log_training_metric causing IndexError for time series models (#1469)
Co-authored-by: Li Jiang <lijiang1@microsoft.com>
2026-01-10 18:07:17 +08:00
Li Jiang
1c9835dc0a Add support to Python 3.12, Sync Fabric till dc382961 (#1467)
* Merged PR 1686010: Bump version to 2.3.5.post2, Distribute source and wheel, Fix license-file, Only log better models

- Fix license-file
- Bump version to 2.3.5.post2
- Distribute source and wheel
- Log better models only
- Add artifact_path to register_automl_pipeline
- Improve logging of _automl_user_configurations

----
This pull request fixes the project’s configuration by updating the license metadata for compliance with FLAML OSS 2.3.5.

The changes in `/pyproject.toml` update the project’s license and readme metadata by replacing deprecated keys with the new structured fields.
- `/pyproject.toml`: Replaced `license_file` with `license = { text = "MIT" }`.
- `/pyproject.toml`: Replaced `description-file` with `readme = "README.md"`.
<!-- GitOpsUserAgent=GitOps.Apps.Server.pullrequestcopilot -->

Related work items: #4252053

* Merged PR 1688479: Handle feature_importances_ is None, Catch RuntimeError and wait for spark cluster to recover

- Add warning message when feature_importances_ is None (#3982120)
- Catch RuntimeError and wait for spark cluster to recover (#3982133)

----
Bug fix.

This pull request prevents an AttributeError in the feature importance plotting function by adding a check for a `None` value with an informative warning message.
- `flaml/fabric/visualization.py`: Checks if `result.feature_importances_` is `None`, logs a warning with possible reasons, and returns early.
- `flaml/fabric/visualization.py`: Imports `logger` from `flaml.automl.logger` to support the warning message.
<!-- GitOpsUserAgent=GitOps.Apps.Server.pullrequestcopilot -->

Related work items: #3982120, #3982133

* Removed deprecated metadata section

* Fix log_params, log_artifact doesn't support run_id in mlflow 2.6.0

* Remove autogen

* Remove autogen

* Remove autogen

* Merged PR 1776547: Fix flaky test test_automl

Don't throw error when time budget is not enough

----
#### AI description  (iteration 1)
#### PR Classification
Bug fix addressing a failing test in the AutoML notebook example.

#### PR Summary
This PR fixes a flaky test by adding a conditional check in the AutoML test that prints a message and exits early if no best estimator is set, thereby preventing unpredictable test failures.
- `test/automl/test_notebook_example.py`: Introduced a check to print "Training budget is not sufficient" and return if `automl.best_estimator` is not found.
<!-- GitOpsUserAgent=GitOps.Apps.Server.pullrequestcopilot -->

Related work items: #4573514

* Merged PR 1777952: Fix unrecognized or malformed field 'license-file' when uploading wheel to feed

Try to fix InvalidDistribution: Invalid distribution metadata: unrecognized or malformed field 'license-file'

----
Bug fix addressing package metadata configuration.

This pull request fixes the error with unrecognized or malformed license file fields during wheel uploads by updating the setup configuration.
- In `setup.py`, added `license="MIT"` and `license_files=["LICENSE"]` to provide proper license metadata.
<!-- GitOpsUserAgent=GitOps.Apps.Server.pullrequestcopilot -->

Related work items: #4560034

* Cherry-pick Merged PR 1879296: Add support to python 3.12 and spark 4.0

* Cherry-pick Merged PR 1890869: Improve time_budget estimation for mlflow logging

* Cherry-pick Merged PR 1879296: Add support to python 3.12 and spark 4.0

* Disable openai workflow

* Add python 3.12 to test envs

* Manually trigger openai

* Support markdown files with underscore-prefixed file names

* Improve save dependencies

* SynapseML is not installed

* Fix syntax error:Module !flaml/autogen was never imported

* macos 3.12 also hangs

* fix syntax error

* Update python version in actions

* Install setuptools for using pkg_resources

* Fix test_automl_performance in Github actions

* Fix test_nested_run
2026-01-10 12:17:21 +08:00
Li Jiang
1285700d7a Update readme, bump version to 2.4.0, fix CI errors (#1466)
* Update gitignore

* Bump version to 2.4.0

* Update readme

* Pre-download california housing data

* Use pre-downloaded california housing data

* Pin lightning<=2.5.6

* Fix typo in find and replace

* Fix estimators has no attribute __sklearn_tags__

* Pin torch to 2.2.2 in tests

* Fix conflict

* Update pytorch-forecasting

* Update pytorch-forecasting

* Update pytorch-forecasting

* Use numpy<2 for testing

* Update scikit-learn

* Run Build and UT every other day

* Pin pip<24.1

* Pin pip<24.1 in pipeline

* Loosen pip, install pytorch_forecasting only in py311

* Add support to new versions of nlp dependecies

* Fix formats

* Remove redefinition

* Update mlflow versions

* Fix mlflow version syntax

* Update gitignore

* Clean up cache to free space

* Remove clean up action cache

* Fix blendsearch

* Update test workflow

* Update setup.py

* Fix catboost version

* Update workflow

* Prepare for python 3.14

* Support no catboost

* Fix tests

* Fix python_requires

* Update test workflow

* Fix vw tests

* Remove python 3.9

* Fix nlp tests

* Fix prophet

* Print pip freeze for better debugging

* Fix Optuna search does not support parameters of type Float with samplers of type Quantized

* Save dependencies for later inspection

* Fix coverage.xml not exists

* Fix github action permission

* Handle python 3.13

* Address openml is not installed

* Check dependencies before run tests

* Update dependencies

* Fix syntax error

* Use bash

* Update dependencies

* Fix git error

* Loose mlflow constraints

* Add rerun, use mlflow-skinny

* Fix git error

* Remove ray tests

* Update xgboost versions

* Fix automl pickle error

* Don't test python 3.10 on macos as it's stuck

* Rebase before push

* Reduce number of branches
2026-01-09 13:40:52 +08:00
Keita Onabuta
e19107407b update loc second args - column (#1458)
Configure second args of loc function to time_col instead of dataframe - X.
2025-08-30 11:07:19 +08:00
Li Jiang
f5d6693253 Bump version to 2.3.7 (#1457) 2025-08-26 14:59:32 +08:00
Azamatkhan Arifkhanov
d4e43c50a2 Fix OSError: [Errno 24] Too many open files: 'nul' (#1455)
* Update model.py

Added closing of save_fds.

* Updated model.py for pre-commit requirements
2025-08-26 12:50:22 +08:00
Li Jiang
bb16dcde93 Bump version to 2.3.6 (#1451) 2025-08-05 14:29:36 +08:00
Li Jiang
be81a76da9 Fix TypeError of customized kfold method which needs 'y' (#1450) 2025-08-02 08:05:50 +08:00
Li Jiang
dec92e5b02 Upgrade python 3.8 to 3.10 in github actions (#1440) 2025-05-27 21:34:21 +08:00
Li Jiang
22911ea1ef Merged PR 1685054: Add more logs and function wait_futures for easier post analysis (#1438)
- Add function wait_futures for easier post analysis
- Use logger instead of print

----
#### AI description  (iteration 1)
#### PR Classification
A code enhancement for debugging asynchronous mlflow logging and improving post-run analysis.

#### PR Summary
This PR adds detailed debug logging to the mlflow integration and introduces a new `wait_futures` function to streamline the collection of asynchronous task results for improved analysis.
- `flaml/fabric/mlflow.py`: Added debug log statements around starting and ending mlflow runs to trace run IDs and execution flow.
- `flaml/automl/automl.py`: Implemented the `wait_futures` function to handle asynchronous task results and replaced a print call with `logger.info` for consistent logging.
<!-- GitOpsUserAgent=GitOps.Apps.Server.pullrequestcopilot -->

Related work items: #4029592
2025-05-27 15:32:56 +08:00
murunlin
12183e5f73 Add the detailed info for parameter 'verbose' (#1435)
* explain-verbose-parameter

* concise-verbose-docstring

* explain-verbose-parameter

* explain-verbose-parameter

* test-ignore

* test-ignore

* sklearn-version-califonia

* submit-0526

---------

Co-authored-by: Runlin Mu (FESCO Adecco Human Resources) <v-runlinmu@microsoft.com>
Co-authored-by: Li Jiang <bnujli@gmail.com>
2025-05-27 10:01:01 +08:00
Li Jiang
c2b25310fc Sync Fabric till 2cd1c3da (#1433)
* Sync Fabric till 2cd1c3da

* Remove synapseml from tag names

* Fix 'NoneType' object has no attribute 'DataFrame'

* Deprecated 3.8 support

* Fix 'NoneType' object has no attribute 'DataFrame'

* Still use python 3.8 for pydoc

* Don't run tests in parallel

* Remove autofe and lowcode
2025-05-23 10:19:31 +08:00
murunlin
0f9420590d fix: best_model_for_estimator returns inconsistent feature_importances_ compared to automl.model (#1429)
* mrl-issue1422-0513

* fix version dependency

* fix datasets version

* test completion

---------

Co-authored-by: Runlin Mu (FESCO Adecco Human Resources) <v-runlinmu@microsoft.com>
Co-authored-by: Li Jiang <bnujli@gmail.com>
2025-05-15 09:37:34 +08:00
hexiang-x
5107c506b4 fix:When use_spark = True and mlflow_logging = True are set, an error is reported when logging the best model: 'NoneType' object has no attribute 'save' bug Something isn't working (#1432) 2025-05-14 19:34:06 +08:00
Stickic-cyber
468bc62d27 Fix issue with "list index out of range" when max_iter=1 (#1419) 2025-04-09 21:54:17 +08:00
SkBlaz
7157af44e0 Improved error handling in case no scikit present (#1402)
* Improved error handling in case no scikit present

Currently there is no description for when this error is thrown. Being explicit seems of value.

* Update histgb.py

---------

Co-authored-by: Li Jiang <bnujli@gmail.com>
2025-03-03 15:39:43 +08:00
Li Jiang
dd26263330 Bump version to 2.3.5 (#1409) 2025-02-17 22:26:59 +08:00
Li Jiang
2ba5f8bed1 Fix params pop error (#1408) 2025-02-17 15:06:05 +08:00
Daniel Grindrod
d0a11958a5 fix: Fixed bug where group folds and sample weights couldn't be used in the same automl instance (#1405) 2025-02-15 10:41:27 +08:00
Will Charles
840f76e5e5 Changed tune.report import for ray>=2 (#1392)
* Changed tune.report import for ray>=2

* env: Changed pydantic restriction in env

* Reverted Pydantic install conditions

* Reverted Pydantic install conditions

* test: Check if GPU is available

* tests: uncommented a line

* tests: Better fix for Ray GPU checking

* tests: Added timeout to dataset loading

* tests: Deleted _test_hf_data()

* test: Reduce lrl2 dataset size

* bug: timeout error

* bug: timeout error

* fix: Added threading check for timout issue

* Undo old commits

* Timeout fix from #1406

---------

Co-authored-by: Daniel Grindrod <dannycg1996@gmail.com>
2025-02-14 09:38:33 +08:00
Li Jiang
d8b7d25b80 Fix test hang issue (#1406)
* Add try except to resource.setrlimit

* Set time limit only in main thread

* Check only test model

* Pytest debug

* Test separately

* Move test_model.py to automl folder
2025-02-13 19:50:35 +08:00
Li Jiang
6d53929803 Bump version to 2.3.4 (#1389) 2024-12-18 12:49:59 +08:00
Daniel Grindrod
c038fbca07 fix: KeyError no longer occurs when using groupfolds for regression tasks. (#1385)
* fix: Now resetting indexes for regression datasets when using group folds

* refactor: Simplified if statement to include all fold types

* docs: Updated docs to make it clear that group folds can be used for regression tasks

---------

Co-authored-by: Daniel Grindrod <daniel.grindrod@evotec.com>
Co-authored-by: Li Jiang <bnujli@gmail.com>
2024-12-18 10:06:58 +08:00
Daniel Grindrod
42d1dcfa0e fix: Fixed bug with catboost and groups (#1383)
Co-authored-by: Daniel Grindrod <daniel.grindrod@evotec.com>
2024-12-17 13:54:49 +08:00
EgorKraevTransferwise
b83c8a7d3b Pass cost_attr and cost_budget from flaml.tune.run() to the search algo (#1382) 2024-12-04 20:50:15 +08:00
Li Jiang
9a1f6b0291 Bump version to 2.3.3 (#1378) 2024-11-13 11:44:34 +08:00
kernelmethod
07f4413aae Fix logging nuisances that can arise when importing flaml (#1377) 2024-11-13 07:49:55 +08:00
Daniel Grindrod
5a74227bc3 Flaml: fix lgbm reproducibility (#1369)
* fix: Fixed bug where every underlying LGBMRegressor or LGBMClassifier had n_estimators = 1

* test: Added test showing case where FLAMLised CatBoostModel result isn't reproducible

* fix: Fixing issue where callbacks cause LGBM results to not be reproducible

* Update test/automl/test_regression.py

Co-authored-by: Li Jiang <bnujli@gmail.com>

* fix: Adding back the LGBM EarlyStopping

* refactor: Fix tweaked to ensure other models aren't likely to be affected

* test: Fixed test to allow reproduced results to be better than the FLAML results, when LGBM earlystopping is involved

---------

Co-authored-by: Daniel Grindrod <Daniel.Grindrod@evotec.com>
Co-authored-by: Li Jiang <bnujli@gmail.com>
2024-11-01 10:06:15 +08:00
Ranuga
7644958e21 Add documentation for automl.model.estimator usage (#1311)
* Added documentation for automl.model.estimator usage

Updated documentation across various examples and the model.py file to include information about automl.model.estimator. This addition enhances the clarity and usability of FLAML by providing users with clear guidance on how to utilize this feature in their AutoML workflows. These changes aim to improve the overall user experience and facilitate easier understanding of FLAML's capabilities.

* fix: Ran pre-commit hook on docs

---------

Co-authored-by: Li Jiang <bnujli@gmail.com>
Co-authored-by: Daniel Grindrod <dannycg1996@gmail.com>
Co-authored-by: Daniel Grindrod <Daniel.Grindrod@evotec.com>
2024-10-31 20:53:54 +08:00
Daniel Grindrod
a316f84fe1 fix: LinearSVC results now reproducible (#1376)
Co-authored-by: Daniel Grindrod <Daniel.Grindrod@evotec.com>
2024-10-31 14:02:16 +08:00
Daniel Grindrod
72881d3a2b fix: Fixing the random state of ElasticNetClassifier by default, to ensure reproduciblity. Also included elasticnet in reproducibility tests (#1374)
Co-authored-by: Daniel Grindrod <Daniel.Grindrod@evotec.com>
Co-authored-by: Li Jiang <bnujli@gmail.com>
2024-10-29 14:21:43 +08:00
Li Jiang
69da685d1e Fix data transform issue, spark log_loss metric compute error and json dumps TypeError (Sync Fabric till 3c545e67) (#1371)
* Merged PR 1444697: Fix json dumps TypeError

Fix json dumps TypeError

----
Bug fix to address a `TypeError` in `json.dumps`.

This pull request fixes a `TypeError` encountered when using `json.dumps` on `automl._automl_user_configurations` by introducing a safe JSON serialization function.
- Added `safe_json_dumps` function in `flaml/fabric/mlflow.py` to handle non-serializable objects.
- Updated `MLflowIntegration` class in `flaml/fabric/mlflow.py` to use `safe_json_dumps` for JSON serialization.
- Modified `test/automl/test_multiclass.py` to test the new `safe_json_dumps` function.

Related work items: #3439408

* Fix data transform issue and spark log_loss metric compute error
2024-10-29 11:58:40 +08:00