Commit Graph

37 Commits

Author SHA1 Message Date
Copilot
fc4efe3510 Fix sklearn 1.7+ compatibility: BaseEstimator type detection for ensemble (#1512)
* Initial plan

* Fix ExtraTreesEstimator regression ensemble error with sklearn 1.7+

Co-authored-by: thinkall <3197038+thinkall@users.noreply.github.com>

* Address code review feedback: improve __sklearn_tags__ implementation

Co-authored-by: thinkall <3197038+thinkall@users.noreply.github.com>

* Fix format error

* Emphasize pre-commit

---------

Co-authored-by: copilot-swe-agent[bot] <198982749+Copilot@users.noreply.github.com>
Co-authored-by: thinkall <3197038+thinkall@users.noreply.github.com>
Co-authored-by: Li Jiang <lijiang1@microsoft.com>
2026-01-23 10:20:59 +08:00
Copilot
d9e74031e0 Expose task-level and estimator-level preprocessors as public API (#1497)
* Initial plan

* Add public preprocess() API methods for AutoML and estimators

Co-authored-by: thinkall <3197038+thinkall@users.noreply.github.com>

* Add documentation for preprocess() API methods

Co-authored-by: thinkall <3197038+thinkall@users.noreply.github.com>

* Add example script demonstrating preprocess() API usage

Co-authored-by: thinkall <3197038+thinkall@users.noreply.github.com>

* Address code review feedback - fix type hints and simplify test logic

Co-authored-by: thinkall <3197038+thinkall@users.noreply.github.com>

* Fix formatting issues with pre-commit hooks

Co-authored-by: thinkall <3197038+thinkall@users.noreply.github.com>

* Remove example.py, make tests faster

---------

Co-authored-by: copilot-swe-agent[bot] <198982749+Copilot@users.noreply.github.com>
Co-authored-by: thinkall <3197038+thinkall@users.noreply.github.com>
Co-authored-by: Li Jiang <bnujli@gmail.com>
Co-authored-by: Li Jiang <lijiang1@microsoft.com>
2026-01-21 14:38:25 +08:00
Li Jiang
ced1d6f331 Support pickling the whole AutoML instance, Sync Fabric till 0d4ab16f (#1481) 2026-01-12 23:04:38 +08:00
Li Jiang
1c9835dc0a Add support to Python 3.12, Sync Fabric till dc382961 (#1467)
* Merged PR 1686010: Bump version to 2.3.5.post2, Distribute source and wheel, Fix license-file, Only log better models

- Fix license-file
- Bump version to 2.3.5.post2
- Distribute source and wheel
- Log better models only
- Add artifact_path to register_automl_pipeline
- Improve logging of _automl_user_configurations

----
This pull request fixes the project’s configuration by updating the license metadata for compliance with FLAML OSS 2.3.5.

The changes in `/pyproject.toml` update the project’s license and readme metadata by replacing deprecated keys with the new structured fields.
- `/pyproject.toml`: Replaced `license_file` with `license = { text = "MIT" }`.
- `/pyproject.toml`: Replaced `description-file` with `readme = "README.md"`.
<!-- GitOpsUserAgent=GitOps.Apps.Server.pullrequestcopilot -->

Related work items: #4252053

* Merged PR 1688479: Handle feature_importances_ is None, Catch RuntimeError and wait for spark cluster to recover

- Add warning message when feature_importances_ is None (#3982120)
- Catch RuntimeError and wait for spark cluster to recover (#3982133)

----
Bug fix.

This pull request prevents an AttributeError in the feature importance plotting function by adding a check for a `None` value with an informative warning message.
- `flaml/fabric/visualization.py`: Checks if `result.feature_importances_` is `None`, logs a warning with possible reasons, and returns early.
- `flaml/fabric/visualization.py`: Imports `logger` from `flaml.automl.logger` to support the warning message.
<!-- GitOpsUserAgent=GitOps.Apps.Server.pullrequestcopilot -->

Related work items: #3982120, #3982133

* Removed deprecated metadata section

* Fix log_params, log_artifact doesn't support run_id in mlflow 2.6.0

* Remove autogen

* Remove autogen

* Remove autogen

* Merged PR 1776547: Fix flaky test test_automl

Don't throw error when time budget is not enough

----
#### AI description  (iteration 1)
#### PR Classification
Bug fix addressing a failing test in the AutoML notebook example.

#### PR Summary
This PR fixes a flaky test by adding a conditional check in the AutoML test that prints a message and exits early if no best estimator is set, thereby preventing unpredictable test failures.
- `test/automl/test_notebook_example.py`: Introduced a check to print "Training budget is not sufficient" and return if `automl.best_estimator` is not found.
<!-- GitOpsUserAgent=GitOps.Apps.Server.pullrequestcopilot -->

Related work items: #4573514

* Merged PR 1777952: Fix unrecognized or malformed field 'license-file' when uploading wheel to feed

Try to fix InvalidDistribution: Invalid distribution metadata: unrecognized or malformed field 'license-file'

----
Bug fix addressing package metadata configuration.

This pull request fixes the error with unrecognized or malformed license file fields during wheel uploads by updating the setup configuration.
- In `setup.py`, added `license="MIT"` and `license_files=["LICENSE"]` to provide proper license metadata.
<!-- GitOpsUserAgent=GitOps.Apps.Server.pullrequestcopilot -->

Related work items: #4560034

* Cherry-pick Merged PR 1879296: Add support to python 3.12 and spark 4.0

* Cherry-pick Merged PR 1890869: Improve time_budget estimation for mlflow logging

* Cherry-pick Merged PR 1879296: Add support to python 3.12 and spark 4.0

* Disable openai workflow

* Add python 3.12 to test envs

* Manually trigger openai

* Support markdown files with underscore-prefixed file names

* Improve save dependencies

* SynapseML is not installed

* Fix syntax error:Module !flaml/autogen was never imported

* macos 3.12 also hangs

* fix syntax error

* Update python version in actions

* Install setuptools for using pkg_resources

* Fix test_automl_performance in Github actions

* Fix test_nested_run
2026-01-10 12:17:21 +08:00
Li Jiang
1285700d7a Update readme, bump version to 2.4.0, fix CI errors (#1466)
* Update gitignore

* Bump version to 2.4.0

* Update readme

* Pre-download california housing data

* Use pre-downloaded california housing data

* Pin lightning<=2.5.6

* Fix typo in find and replace

* Fix estimators has no attribute __sklearn_tags__

* Pin torch to 2.2.2 in tests

* Fix conflict

* Update pytorch-forecasting

* Update pytorch-forecasting

* Update pytorch-forecasting

* Use numpy<2 for testing

* Update scikit-learn

* Run Build and UT every other day

* Pin pip<24.1

* Pin pip<24.1 in pipeline

* Loosen pip, install pytorch_forecasting only in py311

* Add support to new versions of nlp dependecies

* Fix formats

* Remove redefinition

* Update mlflow versions

* Fix mlflow version syntax

* Update gitignore

* Clean up cache to free space

* Remove clean up action cache

* Fix blendsearch

* Update test workflow

* Update setup.py

* Fix catboost version

* Update workflow

* Prepare for python 3.14

* Support no catboost

* Fix tests

* Fix python_requires

* Update test workflow

* Fix vw tests

* Remove python 3.9

* Fix nlp tests

* Fix prophet

* Print pip freeze for better debugging

* Fix Optuna search does not support parameters of type Float with samplers of type Quantized

* Save dependencies for later inspection

* Fix coverage.xml not exists

* Fix github action permission

* Handle python 3.13

* Address openml is not installed

* Check dependencies before run tests

* Update dependencies

* Fix syntax error

* Use bash

* Update dependencies

* Fix git error

* Loose mlflow constraints

* Add rerun, use mlflow-skinny

* Fix git error

* Remove ray tests

* Update xgboost versions

* Fix automl pickle error

* Don't test python 3.10 on macos as it's stuck

* Rebase before push

* Reduce number of branches
2026-01-09 13:40:52 +08:00
Azamatkhan Arifkhanov
d4e43c50a2 Fix OSError: [Errno 24] Too many open files: 'nul' (#1455)
* Update model.py

Added closing of save_fds.

* Updated model.py for pre-commit requirements
2025-08-26 12:50:22 +08:00
Li Jiang
2ba5f8bed1 Fix params pop error (#1408) 2025-02-17 15:06:05 +08:00
Will Charles
840f76e5e5 Changed tune.report import for ray>=2 (#1392)
* Changed tune.report import for ray>=2

* env: Changed pydantic restriction in env

* Reverted Pydantic install conditions

* Reverted Pydantic install conditions

* test: Check if GPU is available

* tests: uncommented a line

* tests: Better fix for Ray GPU checking

* tests: Added timeout to dataset loading

* tests: Deleted _test_hf_data()

* test: Reduce lrl2 dataset size

* bug: timeout error

* bug: timeout error

* fix: Added threading check for timout issue

* Undo old commits

* Timeout fix from #1406

---------

Co-authored-by: Daniel Grindrod <dannycg1996@gmail.com>
2025-02-14 09:38:33 +08:00
Li Jiang
d8b7d25b80 Fix test hang issue (#1406)
* Add try except to resource.setrlimit

* Set time limit only in main thread

* Check only test model

* Pytest debug

* Test separately

* Move test_model.py to automl folder
2025-02-13 19:50:35 +08:00
Daniel Grindrod
42d1dcfa0e fix: Fixed bug with catboost and groups (#1383)
Co-authored-by: Daniel Grindrod <daniel.grindrod@evotec.com>
2024-12-17 13:54:49 +08:00
Daniel Grindrod
5a74227bc3 Flaml: fix lgbm reproducibility (#1369)
* fix: Fixed bug where every underlying LGBMRegressor or LGBMClassifier had n_estimators = 1

* test: Added test showing case where FLAMLised CatBoostModel result isn't reproducible

* fix: Fixing issue where callbacks cause LGBM results to not be reproducible

* Update test/automl/test_regression.py

Co-authored-by: Li Jiang <bnujli@gmail.com>

* fix: Adding back the LGBM EarlyStopping

* refactor: Fix tweaked to ensure other models aren't likely to be affected

* test: Fixed test to allow reproduced results to be better than the FLAML results, when LGBM earlystopping is involved

---------

Co-authored-by: Daniel Grindrod <Daniel.Grindrod@evotec.com>
Co-authored-by: Li Jiang <bnujli@gmail.com>
2024-11-01 10:06:15 +08:00
Ranuga
7644958e21 Add documentation for automl.model.estimator usage (#1311)
* Added documentation for automl.model.estimator usage

Updated documentation across various examples and the model.py file to include information about automl.model.estimator. This addition enhances the clarity and usability of FLAML by providing users with clear guidance on how to utilize this feature in their AutoML workflows. These changes aim to improve the overall user experience and facilitate easier understanding of FLAML's capabilities.

* fix: Ran pre-commit hook on docs

---------

Co-authored-by: Li Jiang <bnujli@gmail.com>
Co-authored-by: Daniel Grindrod <dannycg1996@gmail.com>
Co-authored-by: Daniel Grindrod <Daniel.Grindrod@evotec.com>
2024-10-31 20:53:54 +08:00
Daniel Grindrod
a316f84fe1 fix: LinearSVC results now reproducible (#1376)
Co-authored-by: Daniel Grindrod <Daniel.Grindrod@evotec.com>
2024-10-31 14:02:16 +08:00
Daniel Grindrod
72881d3a2b fix: Fixing the random state of ElasticNetClassifier by default, to ensure reproduciblity. Also included elasticnet in reproducibility tests (#1374)
Co-authored-by: Daniel Grindrod <Daniel.Grindrod@evotec.com>
Co-authored-by: Li Jiang <bnujli@gmail.com>
2024-10-29 14:21:43 +08:00
Daniel Grindrod
d224218ecf fix: FLAML catboost metrics arent reproducible (#1364)
* fix: CatBoostRegressors metrics are now reproducible

* test: Made tests live, which ensure the reproducibility of catboost models

* fix: Added defunct line of code as a comment

* fix: Re-adding removed if statement, and test to show one issue that if statement can cause

* fix: Stopped ending CatBoost training early when time budget is running out

---------

Co-authored-by: Daniel Grindrod <Daniel.Grindrod@evotec.com>
2024-10-23 13:51:23 +08:00
Li Jiang
5bfa0b1cd3 Improve mlflow integration and add more models (#1331)
* Add more spark models and improved mlflow integration

* Update test_extra_models, setup and gitignore

* Remove autofe

* Remove autofe

* Remove autofe

* Sync changes in internal

* Fix test for env without pyspark

* Fix import errors

* Fix tests

* Fix typos

* Fix pytorch-forecasting version

* Remove internal funcs, rename _mlflow.py

* Fix import error

* Fix dependency

* Fix experiment name setting

* Fix dependency

* Update pandas version

* Update pytorch-forecasting version

* Add warning message for not has_automl

* Fix test errors with nltk 3.8.2

* Don't enable mlflow logging w/o an active run

* Fix pytorch-forecasting can't be pickled issue

* Update pyspark tests condition

* Update synapseml

* Update synapseml

* No parent run, no logging for OSS

* Log when autolog is enabled

* upgrade code

* Enable autolog for tune

* Increase time budget for test

* End run before start a new run

* Update parent run

* Fix import error

* clean up

* skip macos and win

* Update notes

* Update default value of model_history
2024-08-13 07:53:47 +00:00
Jirka Borovec
b348cb1136 configure & apply pyupgrade with py3.8+ (#1333)
* configure pyupgrade with `py3.8+`

* apply update

---------

Co-authored-by: Li Jiang <bnujli@gmail.com>
2024-08-12 02:54:18 +00:00
Yang, Bo
8e63dd417b Don't pass callbacks=None to XGBoostSklearnEstimator._fit (#1322)
* Don't pass `callbacks=None` to `XGBoostSklearnEstimator._fit`

The original implmentation would pass `callbacks=None` to `XGBoostSklearnEstimator._fit` and eventually lead to a `TypeError` of `XGBModel.fit() got an unexpected keyword argument 'callbacks'`. This PR instead does not pass the `callbacks=None` parameter to avoid the error.

* Update setup.py to allow for xgboost 2.x

---------

Co-authored-by: Li Jiang <bnujli@gmail.com>
2024-08-06 09:24:11 +00:00
Li Jiang
d8129b9211 Fix typos, upgrade yarn packages, add some improvements (#1290)
* Fix typos, upgrade yarn packages, add some improvements

* Fix joblib 1.4.0 breaks joblib-spark

* Fix xgboost test error

* Pin xgboost<2.0.0

* Try update prophet to 1.5.1

* Update github workflow

* Revert prophet version

* Update github workflow

* Update install libomp

* Fix test errors

* Fix test errors

* Add retry to test and coverage

* Revert "Add retry to test and coverage"

This reverts commit ce13097cd5.

* Increase test budget

* Add more data to test_models, try fixing ValueError: Found array with 0 sample(s) (shape=(0, 252)) while a minimum of 1 is required.
2024-07-19 13:40:04 +00:00
Gleb Levitski
6b93c2e394 [ENH] Add support for sklearn HistGradientBoostingEstimator (#1230)
* Update model.py

HistGradientBoosting support

* Create __init__.py

* Update model.py

* Create histgb.py

* Update __init__.py

* Update test_model.py

* added histgb to estimator list

* Update Task-Oriented-AutoML.md

added docs

* lint

* fixed bugs

---------

Co-authored-by: Gleb <gleb@Glebs-MacBook-Pro.local>
Co-authored-by: Li Jiang <bnujli@gmail.com>
2023-10-31 14:45:23 +00:00
Chi Wang
868e7dd1ca support xgboost 2.0 (#1219)
* support xgboost 2.0

* try classes_

* test version

* quote

* use_label_encoder

* Fix xgboost test error

* remove deprecated files

* remove deprecated files

* remove deprecated import

* replace deprecated import in integrate_spark.ipynb

* replace deprecated import in automl_lightgbm.ipynb

* formatted integrate_spark.ipynb

* replace deprecated import

* try fix driver python path

* Update python-package.yml

* replace deprecated reference

* move spark python env var to other section

* Update setup.py, install xgb<2 for MacOS

* Fix typo

* assert

* Try assert xgboost version

* Fail fast

* Keep all test/spark to try fail fast

* No need to skip spark test in Mac or Win

* Remove assert xgb version

* Remove fail fast

* Found root cause, fix test_sparse_matrix_xgboost

* Revert "No need to skip spark test in Mac or Win"

This reverts commit a09034817f.

* remove assertion

---------

Co-authored-by: Li Jiang <bnujli@gmail.com>
Co-authored-by: levscaut <57213911+levscaut@users.noreply.github.com>
Co-authored-by: levscaut <lwd2010530@qq.com>
Co-authored-by: Li Jiang <lijiang1@microsoft.com>
2023-09-22 06:55:00 +00:00
minghao
da92238ffe Commenting use_label_encoder - xgboost (#1122)
* Commenting use_label_encoder - xgboost

* format change

* moving the import xgboost version to the head

* Shfit params for use_label to outside maxdept

* Keep the original logic

---------

Co-authored-by: Shaokun <shaokunzhang529@gmail.com>
Co-authored-by: Li Jiang <bnujli@gmail.com>
2023-08-01 01:58:30 +00:00
Chi Wang
7ddb171cd9 autogen.agent -> autogen.agentchat (#1148)
* autogen.agent -> autogen.agentchat

* bug fix in portfolio

* notebook

* timeout

* timeout

* infer lang; close #1150

* timeout

* message context

* context handling

* add sender to generate_reply

* clean up the receive function

* move mathchat to contrib

* contrib

* last_message
2023-07-29 04:17:51 +00:00
levscaut
5eece5c748 Enhance Integration with Spark (#1097)
* add doc for spark

* labelCol equals to label by default

* change title and reformat

* reference about default index type

* fix doc build

* Update website/docs/Examples/Integrate - Spark.md

* update doc

* Added more references

* remove exception case when `y_train.name` is None

* fix broken link

---------

Co-authored-by: Wendong Li <v-wendongli@microsoft.com>
Co-authored-by: Li Jiang <bnujli@gmail.com>
2023-07-10 04:44:01 +00:00
EgorKraevTransferwise
5245efbd2c Factor out time series-related functionality into a time series Task object (#989)
* Refactor into automl subpackage

Moved some of the packages into an automl subpackage to tidy before the
task-based refactor. This is in response to discussions with the group
and a comment on the first task-based PR.

Only changes here are moving subpackages and modules into the new
automl, fixing imports to work with this structure and fixing some
dependencies in setup.py.

* Fix doc building post automl subpackage refactor

* Fix broken links in website post automl subpackage refactor

* Fix broken links in website post automl subpackage refactor

* Remove vw from test deps as this is breaking the build

* Move default back to the top-level

I'd moved this to automl as that's where it's used internally, but had
missed that this is actually part of the public interface so makes sense
to live where it was.

* Re-add top level modules with deprecation warnings

flaml.data, flaml.ml and flaml.model are re-added to the top level,
being re-exported from flaml.automl for backwards compatability. Adding
a deprecation warning so that we can have a planned removal later.

* Fix model.py line-endings

* WIP

* WIP - Notes below

Got to the point where the methods from AutoML are pulled to
GenericTask. Started removing private markers and removing the passing
of automl to these methods. Done with decide_split_type, started on
prepare_data. Need to do the others after

* Re-add generic_task

* Most of the merge done, test_forecast_automl fit succeeds, fails at predict()

* Remaining fixes - test_forecast.py passes

* Comment out holidays-related code as it's not currently used

* Further holidays cleanup

* Fix imports in a test

* tidy up validate_data in time series task

* Test fixes

* Fix tests: add Task.__str__

* Fix tests: test for ray.ObjectRef

* Hotwire TS_Sklearn wrapper to fix test fail

* Attempt at test fix

* Fix test where val_pred_y is a list

* Attempt to fix remaining tests

* Push to retrigger tests

* Push to retrigger tests

* Push to retrigger tests

* Push to retrigger tests

* Remove plots from automl/test_forecast

* Remove unused data size field from Task

* Fix import for CLASSIFICATION in notebook

* Monkey patch TFT to avoid plotting, to fix tests on MacOS

* Monkey patch TFT to avoid plotting v2, to fix tests on MacOS

* Monkey patch TFT to avoid plotting v2, to fix tests on MacOS

* Fix circular import

* remove redundant code in task.py post-merge

* Fix test: set svd_solver="full" in PCA

* Update flaml/automl/data.py

Co-authored-by: Chi Wang <wang.chi@microsoft.com>

* Fix review comments

* Fix task -> str in custom learner constructor

* Remove unused CLASSIFICATION imports

* Hotwire TS_Sklearn wrapper to fix test fail by setting
optimizer_for_horizon == False

* Revert changes to the automl_classification and pin FLAML version

* Fix imports in reverted notebook

* Fix FLAML version in automl notebooks

* Fix ml.py line endings

* Fix CLASSIFICATION task import in automl_classification notebook

* Uncomment pip install in notebook and revert import

Not convinced this will work because of installing an older version of
the package into the environment in which we're running the tests, but
let's see.

* Revert c6a5dd1a0

* Fix get_classification_objective import in suggest.py

* Remove hcrystallball docs reference in TS_Sklearn

* Merge markharley:extract-task-class-from-automl into this

* Fix import, remove smooth.py

* Fix dependencies to fix TFT fail on Windows Python 3.8 and 3.9

* Add tensorboardX dependency to fix TFT fail on Windows Python 3.8 and 3.9

* Set pytorch-lightning==1.9.0 to fix  TFT fail on Windows Python 3.8 and 3.9

* Set pytorch-lightning==1.9.0 to fix  TFT fail on Windows Python 3.8 and 3.9

* Disable PCA reduction of lagged features for now, to fix svd convervence fail

* Merge flaml/main into time_series_task

* Attempt to fix formatting

* Attempt to fix formatting

* tentatively implement holt-winters-no covariates

* fix forecast method, clean class

* checking external regressors too

* update test forecast

* remove duplicated test file, re-add sarimax, search space cleanup

* Update flaml/automl/model.py

removed links. Most important one probably was: https://robjhyndman.com/hyndsight/ets-regressors/

Co-authored-by: Chi Wang <wang.chi@microsoft.com>

* prevent short series

* add docs

* First attempt at merging Holt-Winters

* Linter fix

* Add holt-winters to TimeSeriesTask.estimators

* Fix spark test fail

* Attempt to fix another spark test fail

* Attempt to fix another spark test fail

* Change Black max line length to 127

* Change Black max line length to 120

* Add logging for ARIMA params, clean up time series models inheritance

* Add more logging for missing ARIMA params

* Remove a meaningless test causing a fail, add stricter check on ARIMA params

* Fix a bug in HoltWinters

* A pointless change to hopefully trigger the on and off KeyError in ARIMA.fit()

* Fix formatting

* Attempt to fix formatting

* Attempt to fix formatting

* Attempt to fix formatting

* Attempt to fix formatting

* Add type annotations to _train_with_config() in state.py

* Add type annotations to prepare_sample_train_data() in state.py

* Add docstring for time_col argument of AutoML.fit()

* Address @sonichi's comments on PR

* Fix formatting

* Fix formatting

* Reduce test time budget

* Reduce test time budget

* Increase time budget for the test to pass

* Remove redundant imports

* Remove more redundant imports

* Minor fixes of points raised by Qingyun

* Try to fix pandas import fail

* Try to fix pandas import fail, again

* Try to fix pandas import fail, again

* Try to fix pandas import fail, again

* Try to fix pandas import fail, again

* Try to fix pandas import fail, again

* Try to fix pandas import fail, again

* Try to fix pandas import fail, again

* Try to fix pandas import fail, again

* Try to fix pandas import fail, again

* Try to fix pandas import fail, again

* Formatting fixes

* More formatting fixes

* Added test that loops over TS models to ensure coverage

* Fix formatting issues

* Fix more formatting issues

* Fix random fail in check

* Put back in tests for ARIMA predict without fit

* Put back in tests for lgbm

* Update test/test_model.py

cover dedup

* Match target length to X length in missing test

---------

Co-authored-by: Mark Harley <mark.harley@transferwise.com>
Co-authored-by: Mark Harley <mharley.code@gmail.com>
Co-authored-by: Qingyun Wu <qingyun.wu@psu.edu>
Co-authored-by: Chi Wang <wang.chi@microsoft.com>
Co-authored-by: Andrea W <a.ruggerini@ammagamma.com>
Co-authored-by: Andrea Ruggerini <nescio.adv@gmail.com>
Co-authored-by: Egor Kraev <Egor.Kraev@tw.com>
Co-authored-by: Li Jiang <bnujli@gmail.com>
2023-06-19 11:20:32 +00:00
Chi Wang
a0b318b12e create an automl option to remove unnecessary dependency for autogen and tune (#1007)
* version update post release v1.2.2

* automl option

* import pandas

* remove automl.utils

* default

* test

* type hint and version update

* dependency update

* link to open in colab

* use packging.version to close #725

---------

Co-authored-by: Li Jiang <lijiang1@microsoft.com>
Co-authored-by: Li Jiang <bnujli@gmail.com>
2023-05-24 23:55:04 +00:00
Susan Xueqing Liu
00c30a398e fix NLP zero division error (#1009)
* fix NLP zero division error

* set predictions to None

* set predictions to None

* set predictions to None

* refactor

* refactor

---------

Co-authored-by: Li Jiang <lijiang1@microsoft.com>
Co-authored-by: Chi Wang <wang.chi@microsoft.com>
Co-authored-by: Li Jiang <bnujli@gmail.com>
2023-05-03 05:50:28 +00:00
Jirka Borovec
73bb6e7667 pyproject.toml & switch to Ruff (#976)
* unify config to pyproject.toml
replace flake8 with Ruff

* drop configs

* update

* fixing

* Apply suggestions from code review

Co-authored-by: Zvi Baratz <z.baratz@gmail.com>

* setup

* ci

* pr template

* reword

---------

Co-authored-by: Zvi Baratz <z.baratz@gmail.com>
Co-authored-by: Li Jiang <lijiang1@microsoft.com>
2023-04-28 01:54:55 +00:00
Susan Xueqing Liu
7114b8f742 fix zerodivision (#1000)
* fix zerodivision

* update

* remove final

---------

Co-authored-by: Li Jiang <lijiang1@microsoft.com>
2023-04-23 03:55:51 +00:00
Jirka Borovec
a701cd82f8 set black with 120 line length (#975)
* set black with 120 line length

* apply pre-commit

* apply black
2023-04-10 19:50:40 +00:00
Susan Xueqing Liu
ef5a17cd83 handling nlp divide by zero (#926)
* handling nlp divide by zero

* catching zerodivisionerror

* catching zerodivisionerror

* catching zerodivisionerror

* addressing comments

* addressing comments

* updating test case

* update

* add blank to last line

* update nlp notebook

* rerun

* rerun

* sync with main

* add model selection for nlg

* addressing keyerror

* add raise exception

* update

* fix bug

* revert

* updating automl_nlp

* Update flaml/automl/model.py

Co-authored-by: Zvi Baratz <z.baratz@gmail.com>

* address comments

* address comments

---------

Co-authored-by: Li Jiang <lijiang1@microsoft.com>
Co-authored-by: Zvi Baratz <z.baratz@gmail.com>
2023-04-09 16:53:30 +00:00
Andrea Ruggerini
7f9402b8fd Add Holt-Winters exponential smoothing (#962)
* tentatively implement holt-winters-no covariates

* fix forecast method, clean class

* checking external regressors too

* update test forecast

* remove duplicated test file, re-add sarimax, search space cleanup

* Update flaml/automl/model.py

removed links. Most important one probably was: https://robjhyndman.com/hyndsight/ets-regressors/

Co-authored-by: Chi Wang <wang.chi@microsoft.com>

* prevent short series

* add docs

---------

Co-authored-by: Andrea W <a.ruggerini@ammagamma.com>
Co-authored-by: Chi Wang <wang.chi@microsoft.com>
2023-04-04 17:29:54 +00:00
Li Jiang
50334f2c52 Support spark dataframe as input dataset and spark models as estimators (#934)
* add basic support to Spark dataframe

add support to SynapseML LightGBM model

update to pyspark>=3.2.0 to leverage pandas_on_Spark API

* clean code, add TODOs

* add sample_train_data for pyspark.pandas dataframe, fix bugs

* improve some functions, fix bugs

* fix dict change size during iteration

* update model predict

* update LightGBM model, update test

* update SynapseML LightGBM params

* update synapseML and tests

* update TODOs

* Added support to roc_auc for spark models

* Added support to score of spark estimator

* Added test for automl score of spark estimator

* Added cv support to pyspark.pandas dataframe

* Update test, fix bugs

* Added tests

* Updated docs, tests, added a notebook

* Fix bugs in non-spark env

* Fix bugs and improve tests

* Fix uninstall pyspark

* Fix tests error

* Fix java.lang.OutOfMemoryError: Java heap space

* Fix test_performance

* Update test_sparkml to test_0sparkml to use the expected spark conf

* Remove unnecessary widgets in notebook

* Fix iloc java.lang.StackOverflowError

* fix pre-commit

* Added params check for spark dataframes

* Refactor code for train_test_split to a function

* Update train_test_split_pyspark

* Refactor if-else, remove unnecessary code

* Remove y from predict, remove mem control from n_iter compute

* Update workflow

* Improve _split_pyspark

* Fix test failure of too short training time

* Fix typos, improve docstrings

* Fix index errors of pandas_on_spark, add spark loss metric

* Fix typo of ndcgAtK

* Update NDCG metrics and tests

* Remove unuseful logger

* Use cache and count to ensure consistent indexes

* refactor for merge maain

* fix errors of refactor

* Updated SparkLightGBMEstimator and cache

* Updated config2params

* Remove unused import

* Fix unknown parameters

* Update default_estimator_list

* Add unit tests for spark metrics
2023-03-25 19:59:46 +00:00
Susan Xueqing Liu
a3e770eac5 fix delete (#950) 2023-03-14 03:19:58 +00:00
Mark Harley
27b2712016 Extract task class from automl (#857)
* Refactor into automl subpackage

Moved some of the packages into an automl subpackage to tidy before the
task-based refactor. This is in response to discussions with the group
and a comment on the first task-based PR.

Only changes here are moving subpackages and modules into the new
automl, fixing imports to work with this structure and fixing some
dependencies in setup.py.

* Fix doc building post automl subpackage refactor

* Fix broken links in website post automl subpackage refactor

* Fix broken links in website post automl subpackage refactor

* Remove vw from test deps as this is breaking the build

* Move default back to the top-level

I'd moved this to automl as that's where it's used internally, but had
missed that this is actually part of the public interface so makes sense
to live where it was.

* Re-add top level modules with deprecation warnings

flaml.data, flaml.ml and flaml.model are re-added to the top level,
being re-exported from flaml.automl for backwards compatability. Adding
a deprecation warning so that we can have a planned removal later.

* Fix model.py line-endings

* WIP

* WIP - Notes below

Got to the point where the methods from AutoML are pulled to
GenericTask. Started removing private markers and removing the passing
of automl to these methods. Done with decide_split_type, started on
prepare_data. Need to do the others after

* Re-add generic_task

* Fix tests: add Task.__str__

* Fix tests: test for ray.ObjectRef

* Hotwire TS_Sklearn wrapper to fix test fail

* Remove unused data size field from Task

* Fix import for CLASSIFICATION in notebook

* Update flaml/automl/data.py

Co-authored-by: Chi Wang <wang.chi@microsoft.com>

* Fix review comments

* Fix task -> str in custom learner constructor

* Remove unused CLASSIFICATION imports

* Hotwire TS_Sklearn wrapper to fix test fail by setting
optimizer_for_horizon == False

* Revert changes to the automl_classification and pin FLAML version

* Fix imports in reverted notebook

* Fix FLAML version in automl notebooks

* Fix ml.py line endings

* Fix CLASSIFICATION task import in automl_classification notebook

* Uncomment pip install in notebook and revert import

Not convinced this will work because of installing an older version of
the package into the environment in which we're running the tests, but
let's see.

* Revert c6a5dd1a0

* Revert "Revert c6a5dd1a0"

This reverts commit e55e35adea.

* Black format model.py

* Bump version to 1.1.2 in automl_xgboost

* Add docstrings to the Task ABC

* Fix import in custom_learner

* fix 'optimize_for_horizon' for ts_sklearn

* remove debugging print statements

* Check for is_forecast() before is_classification() in decide_split_type

* Attempt to fix formatting fail

* Another attempt to fix formatting fail

* And another attempt to fix formatting fail

* Add type annotations for task arg in signatures and docstrings

* Fix formatting

* Fix linting

---------

Co-authored-by: Qingyun Wu <qingyun.wu@psu.edu>
Co-authored-by: EgorKraevTransferwise <egor.kraev@transferwise.com>
Co-authored-by: Chi Wang <wang.chi@microsoft.com>
Co-authored-by: Kevin Chen <chenkevin.8787@gmail.com>
2023-03-11 02:39:08 +00:00
Jirka Borovec
6aa1d16ebc pre-commit: update config (#925)
* update config

* apply precommit
2023-02-22 00:49:38 +00:00
Mark Harley
44ddf9e104 Refactor into automl subpackage (#809)
* Refactor into automl subpackage

Moved some of the packages into an automl subpackage to tidy before the
task-based refactor. This is in response to discussions with the group
and a comment on the first task-based PR.

Only changes here are moving subpackages and modules into the new
automl, fixing imports to work with this structure and fixing some
dependencies in setup.py.

* Fix doc building post automl subpackage refactor

* Fix broken links in website post automl subpackage refactor

* Fix broken links in website post automl subpackage refactor

* Remove vw from test deps as this is breaking the build

* Move default back to the top-level

I'd moved this to automl as that's where it's used internally, but had
missed that this is actually part of the public interface so makes sense
to live where it was.

* Re-add top level modules with deprecation warnings

flaml.data, flaml.ml and flaml.model are re-added to the top level,
being re-exported from flaml.automl for backwards compatability. Adding
a deprecation warning so that we can have a planned removal later.

* Fix model.py line-endings

* Pin pytorch-lightning to less than 1.8.0

We're seeing strange lightning related bugs from pytorch-forecasting
since the release of lightning 1.8.0. Going to try constraining this to
see if we have a fix.

* Fix the lightning version pin

Was optimistic with setting it in the 1.7.x range, but that isn't
compatible with python 3.6

* Remove lightning version pin

* Revert dependency version changes

* Minor change to retrigger the build

* Fix line endings in ml.py and model.py

Co-authored-by: Qingyun Wu <qingyun.wu@psu.edu>
Co-authored-by: EgorKraevTransferwise <egor.kraev@transferwise.com>
2022-12-06 15:46:08 -05:00