* Initial plan
* Fix training/test set overlap in holdout classification by only adding missing labels when needed
Co-authored-by: thinkall <3197038+thinkall@users.noreply.github.com>
* Address code review feedback: add bounds checking and fix edge cases
Co-authored-by: thinkall <3197038+thinkall@users.noreply.github.com>
* Fix bounds checking: use correct comparison operator for array indexing
Co-authored-by: thinkall <3197038+thinkall@users.noreply.github.com>
* Fix potential ValueError with max() on empty lists and simplify test assertions
Co-authored-by: thinkall <3197038+thinkall@users.noreply.github.com>
* Add extra bounds checking for label_matches indices
Co-authored-by: thinkall <3197038+thinkall@users.noreply.github.com>
* Fix pandas_on_spark compatibility by using iloc_pandas_on_spark util method
Co-authored-by: thinkall <3197038+thinkall@users.noreply.github.com>
* Run pre-commit to fix formatting issues
Co-authored-by: thinkall <3197038+thinkall@users.noreply.github.com>
* Improve missing label handling to avoid overlap when possible
For classes with multiple instances that end up in one set, properly
re-split them between train and val instead of duplicating. Only add
to both sets when the class has exactly 1 instance (unavoidable overlap).
Co-authored-by: thinkall <3197038+thinkall@users.noreply.github.com>
* Fix: ensure at least 1 instance remains in original set when re-splitting
Addresses comments on lines 580 and 724 - prevents moving ALL instances
from validation to training or vice versa by using min(num_instances - 1, ...).
Co-authored-by: thinkall <3197038+thinkall@users.noreply.github.com>
* Refactor: Extract missing label handling into two methods (fast/slow)
- Created _handle_missing_labels_fast(): Fast version that adds first instance to both sets (may overlap)
- Created _handle_missing_labels_no_overlap(): Precise version that avoids overlap when possible
- Added allow_label_overlap parameter to AutoML.fit() (default=True for fast version)
- Updated documentation with new parameter
- Both versions maintain label completeness while offering speed/precision tradeoff
Co-authored-by: thinkall <3197038+thinkall@users.noreply.github.com>
* Add usage example for allow_label_overlap to Best-Practices.md
- Added comprehensive documentation in Classification section
- Included code examples showing both fast and precise versions
- Explained tradeoffs between speed and precision
- Noted that parameter only affects holdout evaluation
Co-authored-by: thinkall <3197038+thinkall@users.noreply.github.com>
* Address code review feedback: update documentation and tests
- Updated docstrings to clarify fast version only adds instances to missing sets
- Fixed documentation to reflect actual behavior (not "both sets" but "set with missing label")
- Completely rewrote test_no_overlap.py to test both allow_label_overlap modes
- Added tests with sample_weights for better code coverage
- Added test for single-instance class handling
- All 5 tests passing
Co-authored-by: thinkall <3197038+thinkall@users.noreply.github.com>
* Fix AttributeError: initialize _allow_label_overlap in settings and retrain_from_log
- Added allow_label_overlap to settings initialization with default=True
- Added parameter defaulting in fit() method to use settings value if not provided
- Added _allow_label_overlap initialization in retrain_from_log method
- Fixes test failures in test_multiclass, test_regression, and spark tests
Co-authored-by: thinkall <3197038+thinkall@users.noreply.github.com>
* Add docstring to fit()
---------
Co-authored-by: copilot-swe-agent[bot] <198982749+Copilot@users.noreply.github.com>
Co-authored-by: thinkall <3197038+thinkall@users.noreply.github.com>
Co-authored-by: Li Jiang <bnujli@gmail.com>
Co-authored-by: Li Jiang <lijiang1@microsoft.com>
* Initial plan
* Fix: Preserve FLAML_sample_size in best_config_per_estimator
Modified best_config_per_estimator property to keep FLAML_sample_size when returning best configurations. Previously, AutoMLState.sanitize() was removing this key, which caused the sample size information to be lost when using starting_points from a previous run.
Co-authored-by: thinkall <3197038+thinkall@users.noreply.github.com>
* Add a test to verify the improvement of starting_points
* Update documentation to reflect FLAML_sample_size preservation
Updated Task-Oriented-AutoML.md to document that best_config_per_estimator now preserves FLAML_sample_size:
- Added note in "Warm start" section explaining that FLAML_sample_size is preserved for effective warm-starting
- Added note in "Get best configuration" section with example showing FLAML_sample_size in output
- Explains importance of sample size preservation for continuing optimization with correct sample sizes
Co-authored-by: thinkall <3197038+thinkall@users.noreply.github.com>
* Fix unintended code change
* Improve docstrings and docs
---------
Co-authored-by: copilot-swe-agent[bot] <198982749+Copilot@users.noreply.github.com>
Co-authored-by: thinkall <3197038+thinkall@users.noreply.github.com>
Co-authored-by: Li Jiang <bnujli@gmail.com>
Co-authored-by: Li Jiang <lijiang1@microsoft.com>
* Simplify automl.fit calls in Best Practices
Removed 'retrain_full' and 'eval_method' parameters from automl.fit calls.
* Fix best practices not shown
* Add best practices
* Update docs to reflect on the recent changes
* Improve model persisting best practices
* Bump version to 2.4.1
* List all estimators
* Remove autogen
* Update dependencies
* Fix macOS hang with running coverage
* Run coverage only in ubuntu
* Fix syntax error
* Fix run tests logic
* Update readme
* Don't test python 3.10 on macos as it's stuck
* Enable all python versions for macos
* Merged PR 1686010: Bump version to 2.3.5.post2, Distribute source and wheel, Fix license-file, Only log better models
- Fix license-file
- Bump version to 2.3.5.post2
- Distribute source and wheel
- Log better models only
- Add artifact_path to register_automl_pipeline
- Improve logging of _automl_user_configurations
----
This pull request fixes the project’s configuration by updating the license metadata for compliance with FLAML OSS 2.3.5.
The changes in `/pyproject.toml` update the project’s license and readme metadata by replacing deprecated keys with the new structured fields.
- `/pyproject.toml`: Replaced `license_file` with `license = { text = "MIT" }`.
- `/pyproject.toml`: Replaced `description-file` with `readme = "README.md"`.
<!-- GitOpsUserAgent=GitOps.Apps.Server.pullrequestcopilot -->
Related work items: #4252053
* Merged PR 1688479: Handle feature_importances_ is None, Catch RuntimeError and wait for spark cluster to recover
- Add warning message when feature_importances_ is None (#3982120)
- Catch RuntimeError and wait for spark cluster to recover (#3982133)
----
Bug fix.
This pull request prevents an AttributeError in the feature importance plotting function by adding a check for a `None` value with an informative warning message.
- `flaml/fabric/visualization.py`: Checks if `result.feature_importances_` is `None`, logs a warning with possible reasons, and returns early.
- `flaml/fabric/visualization.py`: Imports `logger` from `flaml.automl.logger` to support the warning message.
<!-- GitOpsUserAgent=GitOps.Apps.Server.pullrequestcopilot -->
Related work items: #3982120, #3982133
* Removed deprecated metadata section
* Fix log_params, log_artifact doesn't support run_id in mlflow 2.6.0
* Remove autogen
* Remove autogen
* Remove autogen
* Merged PR 1776547: Fix flaky test test_automl
Don't throw error when time budget is not enough
----
#### AI description (iteration 1)
#### PR Classification
Bug fix addressing a failing test in the AutoML notebook example.
#### PR Summary
This PR fixes a flaky test by adding a conditional check in the AutoML test that prints a message and exits early if no best estimator is set, thereby preventing unpredictable test failures.
- `test/automl/test_notebook_example.py`: Introduced a check to print "Training budget is not sufficient" and return if `automl.best_estimator` is not found.
<!-- GitOpsUserAgent=GitOps.Apps.Server.pullrequestcopilot -->
Related work items: #4573514
* Merged PR 1777952: Fix unrecognized or malformed field 'license-file' when uploading wheel to feed
Try to fix InvalidDistribution: Invalid distribution metadata: unrecognized or malformed field 'license-file'
----
Bug fix addressing package metadata configuration.
This pull request fixes the error with unrecognized or malformed license file fields during wheel uploads by updating the setup configuration.
- In `setup.py`, added `license="MIT"` and `license_files=["LICENSE"]` to provide proper license metadata.
<!-- GitOpsUserAgent=GitOps.Apps.Server.pullrequestcopilot -->
Related work items: #4560034
* Cherry-pick Merged PR 1879296: Add support to python 3.12 and spark 4.0
* Cherry-pick Merged PR 1890869: Improve time_budget estimation for mlflow logging
* Cherry-pick Merged PR 1879296: Add support to python 3.12 and spark 4.0
* Disable openai workflow
* Add python 3.12 to test envs
* Manually trigger openai
* Support markdown files with underscore-prefixed file names
* Improve save dependencies
* SynapseML is not installed
* Fix syntax error:Module !flaml/autogen was never imported
* macos 3.12 also hangs
* fix syntax error
* Update python version in actions
* Install setuptools for using pkg_resources
* Fix test_automl_performance in Github actions
* Fix test_nested_run
* Update gitignore
* Bump version to 2.4.0
* Update readme
* Pre-download california housing data
* Use pre-downloaded california housing data
* Pin lightning<=2.5.6
* Fix typo in find and replace
* Fix estimators has no attribute __sklearn_tags__
* Pin torch to 2.2.2 in tests
* Fix conflict
* Update pytorch-forecasting
* Update pytorch-forecasting
* Update pytorch-forecasting
* Use numpy<2 for testing
* Update scikit-learn
* Run Build and UT every other day
* Pin pip<24.1
* Pin pip<24.1 in pipeline
* Loosen pip, install pytorch_forecasting only in py311
* Add support to new versions of nlp dependecies
* Fix formats
* Remove redefinition
* Update mlflow versions
* Fix mlflow version syntax
* Update gitignore
* Clean up cache to free space
* Remove clean up action cache
* Fix blendsearch
* Update test workflow
* Update setup.py
* Fix catboost version
* Update workflow
* Prepare for python 3.14
* Support no catboost
* Fix tests
* Fix python_requires
* Update test workflow
* Fix vw tests
* Remove python 3.9
* Fix nlp tests
* Fix prophet
* Print pip freeze for better debugging
* Fix Optuna search does not support parameters of type Float with samplers of type Quantized
* Save dependencies for later inspection
* Fix coverage.xml not exists
* Fix github action permission
* Handle python 3.13
* Address openml is not installed
* Check dependencies before run tests
* Update dependencies
* Fix syntax error
* Use bash
* Update dependencies
* Fix git error
* Loose mlflow constraints
* Add rerun, use mlflow-skinny
* Fix git error
* Remove ray tests
* Update xgboost versions
* Fix automl pickle error
* Don't test python 3.10 on macos as it's stuck
* Rebase before push
* Reduce number of branches
- Add function wait_futures for easier post analysis
- Use logger instead of print
----
#### AI description (iteration 1)
#### PR Classification
A code enhancement for debugging asynchronous mlflow logging and improving post-run analysis.
#### PR Summary
This PR adds detailed debug logging to the mlflow integration and introduces a new `wait_futures` function to streamline the collection of asynchronous task results for improved analysis.
- `flaml/fabric/mlflow.py`: Added debug log statements around starting and ending mlflow runs to trace run IDs and execution flow.
- `flaml/automl/automl.py`: Implemented the `wait_futures` function to handle asynchronous task results and replaced a print call with `logger.info` for consistent logging.
<!-- GitOpsUserAgent=GitOps.Apps.Server.pullrequestcopilot -->
Related work items: #4029592
* Sync Fabric till 2cd1c3da
* Remove synapseml from tag names
* Fix 'NoneType' object has no attribute 'DataFrame'
* Deprecated 3.8 support
* Fix 'NoneType' object has no attribute 'DataFrame'
* Still use python 3.8 for pydoc
* Don't run tests in parallel
* Remove autofe and lowcode