* Initial plan
* Fix training/test set overlap in holdout classification by only adding missing labels when needed
Co-authored-by: thinkall <3197038+thinkall@users.noreply.github.com>
* Address code review feedback: add bounds checking and fix edge cases
Co-authored-by: thinkall <3197038+thinkall@users.noreply.github.com>
* Fix bounds checking: use correct comparison operator for array indexing
Co-authored-by: thinkall <3197038+thinkall@users.noreply.github.com>
* Fix potential ValueError with max() on empty lists and simplify test assertions
Co-authored-by: thinkall <3197038+thinkall@users.noreply.github.com>
* Add extra bounds checking for label_matches indices
Co-authored-by: thinkall <3197038+thinkall@users.noreply.github.com>
* Fix pandas_on_spark compatibility by using iloc_pandas_on_spark util method
Co-authored-by: thinkall <3197038+thinkall@users.noreply.github.com>
* Run pre-commit to fix formatting issues
Co-authored-by: thinkall <3197038+thinkall@users.noreply.github.com>
* Improve missing label handling to avoid overlap when possible
For classes with multiple instances that end up in one set, properly
re-split them between train and val instead of duplicating. Only add
to both sets when the class has exactly 1 instance (unavoidable overlap).
Co-authored-by: thinkall <3197038+thinkall@users.noreply.github.com>
* Fix: ensure at least 1 instance remains in original set when re-splitting
Addresses comments on lines 580 and 724 - prevents moving ALL instances
from validation to training or vice versa by using min(num_instances - 1, ...).
Co-authored-by: thinkall <3197038+thinkall@users.noreply.github.com>
* Refactor: Extract missing label handling into two methods (fast/slow)
- Created _handle_missing_labels_fast(): Fast version that adds first instance to both sets (may overlap)
- Created _handle_missing_labels_no_overlap(): Precise version that avoids overlap when possible
- Added allow_label_overlap parameter to AutoML.fit() (default=True for fast version)
- Updated documentation with new parameter
- Both versions maintain label completeness while offering speed/precision tradeoff
Co-authored-by: thinkall <3197038+thinkall@users.noreply.github.com>
* Add usage example for allow_label_overlap to Best-Practices.md
- Added comprehensive documentation in Classification section
- Included code examples showing both fast and precise versions
- Explained tradeoffs between speed and precision
- Noted that parameter only affects holdout evaluation
Co-authored-by: thinkall <3197038+thinkall@users.noreply.github.com>
* Address code review feedback: update documentation and tests
- Updated docstrings to clarify fast version only adds instances to missing sets
- Fixed documentation to reflect actual behavior (not "both sets" but "set with missing label")
- Completely rewrote test_no_overlap.py to test both allow_label_overlap modes
- Added tests with sample_weights for better code coverage
- Added test for single-instance class handling
- All 5 tests passing
Co-authored-by: thinkall <3197038+thinkall@users.noreply.github.com>
* Fix AttributeError: initialize _allow_label_overlap in settings and retrain_from_log
- Added allow_label_overlap to settings initialization with default=True
- Added parameter defaulting in fit() method to use settings value if not provided
- Added _allow_label_overlap initialization in retrain_from_log method
- Fixes test failures in test_multiclass, test_regression, and spark tests
Co-authored-by: thinkall <3197038+thinkall@users.noreply.github.com>
* Add docstring to fit()
---------
Co-authored-by: copilot-swe-agent[bot] <198982749+Copilot@users.noreply.github.com>
Co-authored-by: thinkall <3197038+thinkall@users.noreply.github.com>
Co-authored-by: Li Jiang <bnujli@gmail.com>
Co-authored-by: Li Jiang <lijiang1@microsoft.com>
* Initial plan
* Fix: Preserve FLAML_sample_size in best_config_per_estimator
Modified best_config_per_estimator property to keep FLAML_sample_size when returning best configurations. Previously, AutoMLState.sanitize() was removing this key, which caused the sample size information to be lost when using starting_points from a previous run.
Co-authored-by: thinkall <3197038+thinkall@users.noreply.github.com>
* Add a test to verify the improvement of starting_points
* Update documentation to reflect FLAML_sample_size preservation
Updated Task-Oriented-AutoML.md to document that best_config_per_estimator now preserves FLAML_sample_size:
- Added note in "Warm start" section explaining that FLAML_sample_size is preserved for effective warm-starting
- Added note in "Get best configuration" section with example showing FLAML_sample_size in output
- Explains importance of sample size preservation for continuing optimization with correct sample sizes
Co-authored-by: thinkall <3197038+thinkall@users.noreply.github.com>
* Fix unintended code change
* Improve docstrings and docs
---------
Co-authored-by: copilot-swe-agent[bot] <198982749+Copilot@users.noreply.github.com>
Co-authored-by: thinkall <3197038+thinkall@users.noreply.github.com>
Co-authored-by: Li Jiang <bnujli@gmail.com>
Co-authored-by: Li Jiang <lijiang1@microsoft.com>
* Merged PR 1686010: Bump version to 2.3.5.post2, Distribute source and wheel, Fix license-file, Only log better models
- Fix license-file
- Bump version to 2.3.5.post2
- Distribute source and wheel
- Log better models only
- Add artifact_path to register_automl_pipeline
- Improve logging of _automl_user_configurations
----
This pull request fixes the project’s configuration by updating the license metadata for compliance with FLAML OSS 2.3.5.
The changes in `/pyproject.toml` update the project’s license and readme metadata by replacing deprecated keys with the new structured fields.
- `/pyproject.toml`: Replaced `license_file` with `license = { text = "MIT" }`.
- `/pyproject.toml`: Replaced `description-file` with `readme = "README.md"`.
<!-- GitOpsUserAgent=GitOps.Apps.Server.pullrequestcopilot -->
Related work items: #4252053
* Merged PR 1688479: Handle feature_importances_ is None, Catch RuntimeError and wait for spark cluster to recover
- Add warning message when feature_importances_ is None (#3982120)
- Catch RuntimeError and wait for spark cluster to recover (#3982133)
----
Bug fix.
This pull request prevents an AttributeError in the feature importance plotting function by adding a check for a `None` value with an informative warning message.
- `flaml/fabric/visualization.py`: Checks if `result.feature_importances_` is `None`, logs a warning with possible reasons, and returns early.
- `flaml/fabric/visualization.py`: Imports `logger` from `flaml.automl.logger` to support the warning message.
<!-- GitOpsUserAgent=GitOps.Apps.Server.pullrequestcopilot -->
Related work items: #3982120, #3982133
* Removed deprecated metadata section
* Fix log_params, log_artifact doesn't support run_id in mlflow 2.6.0
* Remove autogen
* Remove autogen
* Remove autogen
* Merged PR 1776547: Fix flaky test test_automl
Don't throw error when time budget is not enough
----
#### AI description (iteration 1)
#### PR Classification
Bug fix addressing a failing test in the AutoML notebook example.
#### PR Summary
This PR fixes a flaky test by adding a conditional check in the AutoML test that prints a message and exits early if no best estimator is set, thereby preventing unpredictable test failures.
- `test/automl/test_notebook_example.py`: Introduced a check to print "Training budget is not sufficient" and return if `automl.best_estimator` is not found.
<!-- GitOpsUserAgent=GitOps.Apps.Server.pullrequestcopilot -->
Related work items: #4573514
* Merged PR 1777952: Fix unrecognized or malformed field 'license-file' when uploading wheel to feed
Try to fix InvalidDistribution: Invalid distribution metadata: unrecognized or malformed field 'license-file'
----
Bug fix addressing package metadata configuration.
This pull request fixes the error with unrecognized or malformed license file fields during wheel uploads by updating the setup configuration.
- In `setup.py`, added `license="MIT"` and `license_files=["LICENSE"]` to provide proper license metadata.
<!-- GitOpsUserAgent=GitOps.Apps.Server.pullrequestcopilot -->
Related work items: #4560034
* Cherry-pick Merged PR 1879296: Add support to python 3.12 and spark 4.0
* Cherry-pick Merged PR 1890869: Improve time_budget estimation for mlflow logging
* Cherry-pick Merged PR 1879296: Add support to python 3.12 and spark 4.0
* Disable openai workflow
* Add python 3.12 to test envs
* Manually trigger openai
* Support markdown files with underscore-prefixed file names
* Improve save dependencies
* SynapseML is not installed
* Fix syntax error:Module !flaml/autogen was never imported
* macos 3.12 also hangs
* fix syntax error
* Update python version in actions
* Install setuptools for using pkg_resources
* Fix test_automl_performance in Github actions
* Fix test_nested_run
* Update gitignore
* Bump version to 2.4.0
* Update readme
* Pre-download california housing data
* Use pre-downloaded california housing data
* Pin lightning<=2.5.6
* Fix typo in find and replace
* Fix estimators has no attribute __sklearn_tags__
* Pin torch to 2.2.2 in tests
* Fix conflict
* Update pytorch-forecasting
* Update pytorch-forecasting
* Update pytorch-forecasting
* Use numpy<2 for testing
* Update scikit-learn
* Run Build and UT every other day
* Pin pip<24.1
* Pin pip<24.1 in pipeline
* Loosen pip, install pytorch_forecasting only in py311
* Add support to new versions of nlp dependecies
* Fix formats
* Remove redefinition
* Update mlflow versions
* Fix mlflow version syntax
* Update gitignore
* Clean up cache to free space
* Remove clean up action cache
* Fix blendsearch
* Update test workflow
* Update setup.py
* Fix catboost version
* Update workflow
* Prepare for python 3.14
* Support no catboost
* Fix tests
* Fix python_requires
* Update test workflow
* Fix vw tests
* Remove python 3.9
* Fix nlp tests
* Fix prophet
* Print pip freeze for better debugging
* Fix Optuna search does not support parameters of type Float with samplers of type Quantized
* Save dependencies for later inspection
* Fix coverage.xml not exists
* Fix github action permission
* Handle python 3.13
* Address openml is not installed
* Check dependencies before run tests
* Update dependencies
* Fix syntax error
* Use bash
* Update dependencies
* Fix git error
* Loose mlflow constraints
* Add rerun, use mlflow-skinny
* Fix git error
* Remove ray tests
* Update xgboost versions
* Fix automl pickle error
* Don't test python 3.10 on macos as it's stuck
* Rebase before push
* Reduce number of branches
* Sync Fabric till 2cd1c3da
* Remove synapseml from tag names
* Fix 'NoneType' object has no attribute 'DataFrame'
* Deprecated 3.8 support
* Fix 'NoneType' object has no attribute 'DataFrame'
* Still use python 3.8 for pydoc
* Don't run tests in parallel
* Remove autofe and lowcode
* Add try except to resource.setrlimit
* Set time limit only in main thread
* Check only test model
* Pytest debug
* Test separately
* Move test_model.py to automl folder
* fix: Now resetting indexes for regression datasets when using group folds
* refactor: Simplified if statement to include all fold types
* docs: Updated docs to make it clear that group folds can be used for regression tasks
---------
Co-authored-by: Daniel Grindrod <daniel.grindrod@evotec.com>
Co-authored-by: Li Jiang <bnujli@gmail.com>
* fix: Fixed bug where every underlying LGBMRegressor or LGBMClassifier had n_estimators = 1
* test: Added test showing case where FLAMLised CatBoostModel result isn't reproducible
* fix: Fixing issue where callbacks cause LGBM results to not be reproducible
* Update test/automl/test_regression.py
Co-authored-by: Li Jiang <bnujli@gmail.com>
* fix: Adding back the LGBM EarlyStopping
* refactor: Fix tweaked to ensure other models aren't likely to be affected
* test: Fixed test to allow reproduced results to be better than the FLAML results, when LGBM earlystopping is involved
---------
Co-authored-by: Daniel Grindrod <Daniel.Grindrod@evotec.com>
Co-authored-by: Li Jiang <bnujli@gmail.com>
* Merged PR 1444697: Fix json dumps TypeError
Fix json dumps TypeError
----
Bug fix to address a `TypeError` in `json.dumps`.
This pull request fixes a `TypeError` encountered when using `json.dumps` on `automl._automl_user_configurations` by introducing a safe JSON serialization function.
- Added `safe_json_dumps` function in `flaml/fabric/mlflow.py` to handle non-serializable objects.
- Updated `MLflowIntegration` class in `flaml/fabric/mlflow.py` to use `safe_json_dumps` for JSON serialization.
- Modified `test/automl/test_multiclass.py` to test the new `safe_json_dumps` function.
Related work items: #3439408
* Fix data transform issue and spark log_loss metric compute error
* fix: CatBoostRegressors metrics are now reproducible
* test: Made tests live, which ensure the reproducibility of catboost models
* fix: Added defunct line of code as a comment
* fix: Re-adding removed if statement, and test to show one issue that if statement can cause
* fix: Stopped ending CatBoost training early when time budget is running out
---------
Co-authored-by: Daniel Grindrod <Daniel.Grindrod@evotec.com>
* Remove temporary pickle files
* Update version to 2.3.1
* Use TemporaryDirectory for pickle and log_artifact
* Fix 'CatBoostClassifier' object has no attribute '_get_param_names'
* Add more spark models and improved mlflow integration
* Update test_extra_models, setup and gitignore
* Remove autofe
* Remove autofe
* Remove autofe
* Sync changes in internal
* Fix test for env without pyspark
* Fix import errors
* Fix tests
* Fix typos
* Fix pytorch-forecasting version
* Remove internal funcs, rename _mlflow.py
* Fix import error
* Fix dependency
* Fix experiment name setting
* Fix dependency
* Update pandas version
* Update pytorch-forecasting version
* Add warning message for not has_automl
* Fix test errors with nltk 3.8.2
* Don't enable mlflow logging w/o an active run
* Fix pytorch-forecasting can't be pickled issue
* Update pyspark tests condition
* Update synapseml
* Update synapseml
* No parent run, no logging for OSS
* Log when autolog is enabled
* upgrade code
* Enable autolog for tune
* Increase time budget for test
* End run before start a new run
* Update parent run
* Fix import error
* clean up
* skip macos and win
* Update notes
* Update default value of model_history
* Fix typos, upgrade yarn packages, add some improvements
* Fix joblib 1.4.0 breaks joblib-spark
* Fix xgboost test error
* Pin xgboost<2.0.0
* Try update prophet to 1.5.1
* Update github workflow
* Revert prophet version
* Update github workflow
* Update install libomp
* Fix test errors
* Fix test errors
* Add retry to test and coverage
* Revert "Add retry to test and coverage"
This reverts commit ce13097cd5.
* Increase test budget
* Add more data to test_models, try fixing ValueError: Found array with 0 sample(s) (shape=(0, 252)) while a minimum of 1 is required.
* support xgboost 2.0
* try classes_
* test version
* quote
* use_label_encoder
* Fix xgboost test error
* remove deprecated files
* remove deprecated files
* remove deprecated import
* replace deprecated import in integrate_spark.ipynb
* replace deprecated import in automl_lightgbm.ipynb
* formatted integrate_spark.ipynb
* replace deprecated import
* try fix driver python path
* Update python-package.yml
* replace deprecated reference
* move spark python env var to other section
* Update setup.py, install xgb<2 for MacOS
* Fix typo
* assert
* Try assert xgboost version
* Fail fast
* Keep all test/spark to try fail fast
* No need to skip spark test in Mac or Win
* Remove assert xgb version
* Remove fail fast
* Found root cause, fix test_sparse_matrix_xgboost
* Revert "No need to skip spark test in Mac or Win"
This reverts commit a09034817f.
* remove assertion
---------
Co-authored-by: Li Jiang <bnujli@gmail.com>
Co-authored-by: levscaut <57213911+levscaut@users.noreply.github.com>
Co-authored-by: levscaut <lwd2010530@qq.com>
Co-authored-by: Li Jiang <lijiang1@microsoft.com>
* fix generate_reply
* code format
* add test case
* update
* update
* Update test/autogen/agentchat/test_responsive_agent.py
Co-authored-by: Chi Wang <wang.chi@microsoft.com>
* Update test/autogen/agentchat/test_responsive_agent.py
Co-authored-by: Chi Wang <wang.chi@microsoft.com>
* Update flaml/autogen/agentchat/responsive_agent.py
Co-authored-by: Chi Wang <wang.chi@microsoft.com>
---------
Co-authored-by: Chi Wang <wang.chi@microsoft.com>