* Initial plan
* Fix training/test set overlap in holdout classification by only adding missing labels when needed
Co-authored-by: thinkall <3197038+thinkall@users.noreply.github.com>
* Address code review feedback: add bounds checking and fix edge cases
Co-authored-by: thinkall <3197038+thinkall@users.noreply.github.com>
* Fix bounds checking: use correct comparison operator for array indexing
Co-authored-by: thinkall <3197038+thinkall@users.noreply.github.com>
* Fix potential ValueError with max() on empty lists and simplify test assertions
Co-authored-by: thinkall <3197038+thinkall@users.noreply.github.com>
* Add extra bounds checking for label_matches indices
Co-authored-by: thinkall <3197038+thinkall@users.noreply.github.com>
* Fix pandas_on_spark compatibility by using iloc_pandas_on_spark util method
Co-authored-by: thinkall <3197038+thinkall@users.noreply.github.com>
* Run pre-commit to fix formatting issues
Co-authored-by: thinkall <3197038+thinkall@users.noreply.github.com>
* Improve missing label handling to avoid overlap when possible
For classes with multiple instances that end up in one set, properly
re-split them between train and val instead of duplicating. Only add
to both sets when the class has exactly 1 instance (unavoidable overlap).
Co-authored-by: thinkall <3197038+thinkall@users.noreply.github.com>
* Fix: ensure at least 1 instance remains in original set when re-splitting
Addresses comments on lines 580 and 724 - prevents moving ALL instances
from validation to training or vice versa by using min(num_instances - 1, ...).
Co-authored-by: thinkall <3197038+thinkall@users.noreply.github.com>
* Refactor: Extract missing label handling into two methods (fast/slow)
- Created _handle_missing_labels_fast(): Fast version that adds first instance to both sets (may overlap)
- Created _handle_missing_labels_no_overlap(): Precise version that avoids overlap when possible
- Added allow_label_overlap parameter to AutoML.fit() (default=True for fast version)
- Updated documentation with new parameter
- Both versions maintain label completeness while offering speed/precision tradeoff
Co-authored-by: thinkall <3197038+thinkall@users.noreply.github.com>
* Add usage example for allow_label_overlap to Best-Practices.md
- Added comprehensive documentation in Classification section
- Included code examples showing both fast and precise versions
- Explained tradeoffs between speed and precision
- Noted that parameter only affects holdout evaluation
Co-authored-by: thinkall <3197038+thinkall@users.noreply.github.com>
* Address code review feedback: update documentation and tests
- Updated docstrings to clarify fast version only adds instances to missing sets
- Fixed documentation to reflect actual behavior (not "both sets" but "set with missing label")
- Completely rewrote test_no_overlap.py to test both allow_label_overlap modes
- Added tests with sample_weights for better code coverage
- Added test for single-instance class handling
- All 5 tests passing
Co-authored-by: thinkall <3197038+thinkall@users.noreply.github.com>
* Fix AttributeError: initialize _allow_label_overlap in settings and retrain_from_log
- Added allow_label_overlap to settings initialization with default=True
- Added parameter defaulting in fit() method to use settings value if not provided
- Added _allow_label_overlap initialization in retrain_from_log method
- Fixes test failures in test_multiclass, test_regression, and spark tests
Co-authored-by: thinkall <3197038+thinkall@users.noreply.github.com>
* Add docstring to fit()
---------
Co-authored-by: copilot-swe-agent[bot] <198982749+Copilot@users.noreply.github.com>
Co-authored-by: thinkall <3197038+thinkall@users.noreply.github.com>
Co-authored-by: Li Jiang <bnujli@gmail.com>
Co-authored-by: Li Jiang <lijiang1@microsoft.com>
* Initial plan
* Fix: Preserve FLAML_sample_size in best_config_per_estimator
Modified best_config_per_estimator property to keep FLAML_sample_size when returning best configurations. Previously, AutoMLState.sanitize() was removing this key, which caused the sample size information to be lost when using starting_points from a previous run.
Co-authored-by: thinkall <3197038+thinkall@users.noreply.github.com>
* Add a test to verify the improvement of starting_points
* Update documentation to reflect FLAML_sample_size preservation
Updated Task-Oriented-AutoML.md to document that best_config_per_estimator now preserves FLAML_sample_size:
- Added note in "Warm start" section explaining that FLAML_sample_size is preserved for effective warm-starting
- Added note in "Get best configuration" section with example showing FLAML_sample_size in output
- Explains importance of sample size preservation for continuing optimization with correct sample sizes
Co-authored-by: thinkall <3197038+thinkall@users.noreply.github.com>
* Fix unintended code change
* Improve docstrings and docs
---------
Co-authored-by: copilot-swe-agent[bot] <198982749+Copilot@users.noreply.github.com>
Co-authored-by: thinkall <3197038+thinkall@users.noreply.github.com>
Co-authored-by: Li Jiang <bnujli@gmail.com>
Co-authored-by: Li Jiang <lijiang1@microsoft.com>
* Simplify automl.fit calls in Best Practices
Removed 'retrain_full' and 'eval_method' parameters from automl.fit calls.
* Fix best practices not shown
* Add best practices
* Update docs to reflect on the recent changes
* Improve model persisting best practices
* Bump version to 2.4.1
* List all estimators
* Remove autogen
* Update dependencies
* Added documentation for automl.model.estimator usage
Updated documentation across various examples and the model.py file to include information about automl.model.estimator. This addition enhances the clarity and usability of FLAML by providing users with clear guidance on how to utilize this feature in their AutoML workflows. These changes aim to improve the overall user experience and facilitate easier understanding of FLAML's capabilities.
* fix: Ran pre-commit hook on docs
---------
Co-authored-by: Li Jiang <bnujli@gmail.com>
Co-authored-by: Daniel Grindrod <dannycg1996@gmail.com>
Co-authored-by: Daniel Grindrod <Daniel.Grindrod@evotec.com>
* support xgboost 2.0
* try classes_
* test version
* quote
* use_label_encoder
* Fix xgboost test error
* remove deprecated files
* remove deprecated files
* remove deprecated import
* replace deprecated import in integrate_spark.ipynb
* replace deprecated import in automl_lightgbm.ipynb
* formatted integrate_spark.ipynb
* replace deprecated import
* try fix driver python path
* Update python-package.yml
* replace deprecated reference
* move spark python env var to other section
* Update setup.py, install xgb<2 for MacOS
* Fix typo
* assert
* Try assert xgboost version
* Fail fast
* Keep all test/spark to try fail fast
* No need to skip spark test in Mac or Win
* Remove assert xgb version
* Remove fail fast
* Found root cause, fix test_sparse_matrix_xgboost
* Revert "No need to skip spark test in Mac or Win"
This reverts commit a09034817f.
* remove assertion
---------
Co-authored-by: Li Jiang <bnujli@gmail.com>
Co-authored-by: levscaut <57213911+levscaut@users.noreply.github.com>
Co-authored-by: levscaut <lwd2010530@qq.com>
Co-authored-by: Li Jiang <lijiang1@microsoft.com>
* group chat for visualization
* show figure
* webpage update
* link update
* example 2
* example 2
---------
Co-authored-by: Qingyun Wu <qingyun.wu@psu.edu>
* Update readme and AutoGen docs
* Update Autogen#notebook-examples, Add link to AutoGen arxiv
* Update website/docs/Use-Cases/Autogen.md
Co-authored-by: Chi Wang <wang.chi@microsoft.com>
* Update link
---------
Co-authored-by: Chi Wang <wang.chi@microsoft.com>
Co-authored-by: Qingyun Wu <qingyun.wu@psu.edu>
* max consecutive auto reply
* chess notebook
* link to notebook
* clear history
* filter
* **context -> context
* format str template
* groupchat
* register class specific reply
* groupchat notebook
* move human reply into generate_reply
* arg in config
* colab link
* remove room
* rename
* Update docstring for oai.completion.
Specify the details of how **config is used in Completion.
* Update docs about how to interact with local LLMs
* Update docs about how to interact with local LLMs
* Reformat file.
* Fix issues.
* Update website/blog/2023-07-14-Local-LLMs/index.mdx
Co-authored-by: Chi Wang <wang.chi@microsoft.com>
* Update website/blog/2023-07-14-Local-LLMs/index.mdx
Co-authored-by: Chi Wang <wang.chi@microsoft.com>
* Update website/docs/Use-Cases/Auto-Generation.md
Co-authored-by: Chi Wang <wang.chi@microsoft.com>
* Add documents about multiple workers.
* Update user instructions.
* Label big fix as optional
* Update website/blog/2023-07-14-Local-LLMs/index.mdx
Co-authored-by: Chi Wang <wang.chi@microsoft.com>
---------
Co-authored-by: Chi Wang <wang.chi@microsoft.com>
* update colab link
* typo
* upload file instruction
* update system message and notebooks
* update notebooks
* notebook test
* aoai api version and exclusion
* gpt-3.5-turbo
* dict check
* change model for test
* endpoints, cache_path and func description update
* model list
* gitter -> discord
* add funccall example and doc
* revise to comments
* Update website/docs/Use-Cases/Auto-Generation.md
Co-authored-by: Chi Wang <wang.chi@microsoft.com>
* revise
* update
* minor update
* add test notebook
* update
---------
Co-authored-by: Chi Wang <wang.chi@microsoft.com>
* add doc for spark
* labelCol equals to label by default
* change title and reformat
* reference about default index type
* fix doc build
* Update website/docs/Examples/Integrate - Spark.md
* update doc
* Added more references
* remove exception case when `y_train.name` is None
* fix broken link
---------
Co-authored-by: Wendong Li <v-wendongli@microsoft.com>
Co-authored-by: Li Jiang <bnujli@gmail.com>
* update funccall
* code format
* update to comments
* update notebook
* remove test for py3.7
* allow funccall to class functions
* add test and clean up notebook
* revise notebook and test
* update
* update mathagent
* Update flaml/autogen/agent/agent.py
Co-authored-by: Chi Wang <wang.chi@microsoft.com>
* Update flaml/autogen/agent/user_proxy_agent.py
Co-authored-by: Chi Wang <wang.chi@microsoft.com>
* revise to comments
* revise function call design, notebook and test. add doc
* code format
* ad message_to_dict function
* update mathproxyagent
* revise docstr
* update
* Update flaml/autogen/agent/math_user_proxy_agent.py
Co-authored-by: Chi Wang <wang.chi@microsoft.com>
* Update flaml/autogen/agent/math_user_proxy_agent.py
Co-authored-by: Qingyun Wu <qingyun.wu@psu.edu>
* Update flaml/autogen/agent/user_proxy_agent.py
Co-authored-by: Qingyun Wu <qingyun.wu@psu.edu>
* simply funccall in userproxyagent, rewind auto-gen.md, revise to comments
* code format
* update
* remove notebook for another pr
* revise oai_conversation part in agent, revise function exec in user_proxy_agent
* update test_funccall
* update
* update
* fix pydantic version
* Update test/autogen/test_agent.py
Co-authored-by: Chi Wang <wang.chi@microsoft.com>
* fix bug
* fix bug
* update
* update is_termination_msg to accept dict
---------
Co-authored-by: Chi Wang <wang.chi@microsoft.com>
Co-authored-by: Qingyun Wu <qingyun.wu@psu.edu>
Co-authored-by: Li Jiang <bnujli@gmail.com>
* update openai model support
* new gpt3.5
* docstr
* function_call and content may co-exist
* test function call
---------
Co-authored-by: Qingyun Wu <qingyun.wu@psu.edu>