- Add function wait_futures for easier post analysis
- Use logger instead of print
----
#### AI description (iteration 1)
#### PR Classification
A code enhancement for debugging asynchronous mlflow logging and improving post-run analysis.
#### PR Summary
This PR adds detailed debug logging to the mlflow integration and introduces a new `wait_futures` function to streamline the collection of asynchronous task results for improved analysis.
- `flaml/fabric/mlflow.py`: Added debug log statements around starting and ending mlflow runs to trace run IDs and execution flow.
- `flaml/automl/automl.py`: Implemented the `wait_futures` function to handle asynchronous task results and replaced a print call with `logger.info` for consistent logging.
<!-- GitOpsUserAgent=GitOps.Apps.Server.pullrequestcopilot -->
Related work items: #4029592
* Sync Fabric till 2cd1c3da
* Remove synapseml from tag names
* Fix 'NoneType' object has no attribute 'DataFrame'
* Deprecated 3.8 support
* Fix 'NoneType' object has no attribute 'DataFrame'
* Still use python 3.8 for pydoc
* Don't run tests in parallel
* Remove autofe and lowcode
* mrl-issue1422-0513
* fix version dependency
* fix datasets version
* test completion
---------
Co-authored-by: Runlin Mu (FESCO Adecco Human Resources) <v-runlinmu@microsoft.com>
Co-authored-by: Li Jiang <bnujli@gmail.com>
* Improved error handling in case no scikit present
Currently there is no description for when this error is thrown. Being explicit seems of value.
* Update histgb.py
---------
Co-authored-by: Li Jiang <bnujli@gmail.com>
* Add try except to resource.setrlimit
* Set time limit only in main thread
* Check only test model
* Pytest debug
* Test separately
* Move test_model.py to automl folder
* fix: Now resetting indexes for regression datasets when using group folds
* refactor: Simplified if statement to include all fold types
* docs: Updated docs to make it clear that group folds can be used for regression tasks
---------
Co-authored-by: Daniel Grindrod <daniel.grindrod@evotec.com>
Co-authored-by: Li Jiang <bnujli@gmail.com>
* fix: Fixed bug where every underlying LGBMRegressor or LGBMClassifier had n_estimators = 1
* test: Added test showing case where FLAMLised CatBoostModel result isn't reproducible
* fix: Fixing issue where callbacks cause LGBM results to not be reproducible
* Update test/automl/test_regression.py
Co-authored-by: Li Jiang <bnujli@gmail.com>
* fix: Adding back the LGBM EarlyStopping
* refactor: Fix tweaked to ensure other models aren't likely to be affected
* test: Fixed test to allow reproduced results to be better than the FLAML results, when LGBM earlystopping is involved
---------
Co-authored-by: Daniel Grindrod <Daniel.Grindrod@evotec.com>
Co-authored-by: Li Jiang <bnujli@gmail.com>
* Added documentation for automl.model.estimator usage
Updated documentation across various examples and the model.py file to include information about automl.model.estimator. This addition enhances the clarity and usability of FLAML by providing users with clear guidance on how to utilize this feature in their AutoML workflows. These changes aim to improve the overall user experience and facilitate easier understanding of FLAML's capabilities.
* fix: Ran pre-commit hook on docs
---------
Co-authored-by: Li Jiang <bnujli@gmail.com>
Co-authored-by: Daniel Grindrod <dannycg1996@gmail.com>
Co-authored-by: Daniel Grindrod <Daniel.Grindrod@evotec.com>
* Merged PR 1444697: Fix json dumps TypeError
Fix json dumps TypeError
----
Bug fix to address a `TypeError` in `json.dumps`.
This pull request fixes a `TypeError` encountered when using `json.dumps` on `automl._automl_user_configurations` by introducing a safe JSON serialization function.
- Added `safe_json_dumps` function in `flaml/fabric/mlflow.py` to handle non-serializable objects.
- Updated `MLflowIntegration` class in `flaml/fabric/mlflow.py` to use `safe_json_dumps` for JSON serialization.
- Modified `test/automl/test_multiclass.py` to test the new `safe_json_dumps` function.
Related work items: #3439408
* Fix data transform issue and spark log_loss metric compute error
* fix: CatBoostRegressors metrics are now reproducible
* test: Made tests live, which ensure the reproducibility of catboost models
* fix: Added defunct line of code as a comment
* fix: Re-adding removed if statement, and test to show one issue that if statement can cause
* fix: Stopped ending CatBoost training early when time budget is running out
---------
Co-authored-by: Daniel Grindrod <Daniel.Grindrod@evotec.com>
* Remove temporary pickle files
* Update version to 2.3.1
* Use TemporaryDirectory for pickle and log_artifact
* Fix 'CatBoostClassifier' object has no attribute '_get_param_names'