Compare commits

..

12 Commits

Author SHA1 Message Date
Gleb Levitski
3de0dc667e Add ruff sort to pre-commit and sort imports in the library (#1259)
* lint

* bump ver

* bump ver

* fixed circular import

---------

Co-authored-by: Jirka Borovec <6035284+Borda@users.noreply.github.com>
2024-03-12 21:28:57 +00:00
dependabot[bot]
6840dc2b09 Bump follow-redirects from 1.15.2 to 1.15.4 in /website (#1266)
Bumps [follow-redirects](https://github.com/follow-redirects/follow-redirects) from 1.15.2 to 1.15.4.
- [Release notes](https://github.com/follow-redirects/follow-redirects/releases)
- [Commits](https://github.com/follow-redirects/follow-redirects/compare/v1.15.2...v1.15.4)

---
updated-dependencies:
- dependency-name: follow-redirects
  dependency-type: indirect
...

Signed-off-by: dependabot[bot] <support@github.com>
Co-authored-by: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com>
Co-authored-by: Jirka Borovec <6035284+Borda@users.noreply.github.com>
2024-03-12 16:50:01 +00:00
Chi Wang
1a9fa3ac23 Np.inf (#1289)
* np.Inf -> np.inf

* bump version to 2.1.2
2024-03-12 16:27:05 +00:00
Jack Gerrits
325baa40a5 Don't specify a pre-release in the numpy dependency (#1286) 2024-03-12 14:43:49 +00:00
Dhruv Thakur
550d1cfe9b Update AutoML-NLP.md (#1239)
* Update AutoML-NLP.md

#834

* more space

---------

Co-authored-by: Jirka Borovec <6035284+Borda@users.noreply.github.com>
Co-authored-by: Chi Wang <wang.chi@microsoft.com>
2024-02-10 07:32:57 +00:00
Jirka Borovec
249f0f1708 docs: fix link to reference (#1263)
* docs: fix link to reference

* Apply suggestions from code review

Co-authored-by: Li Jiang <bnujli@gmail.com>

---------

Co-authored-by: Li Jiang <bnujli@gmail.com>
2024-02-09 16:48:51 +00:00
Li Jiang
b645da3ea7 Fix spark errors (#1274)
* Fix mlflow not found error

* Fix joblib>1.2.0 force cancel error

* Remove joblib version constraint

* Update log

* Improve joblib exception catch

* Added permissions
2024-02-09 01:08:24 +00:00
ScottzCodez
0415638dd1 Update Installation.md (#1258)
Typo Fixed.
2023-11-29 01:39:20 +00:00
Gleb Levitski
6b93c2e394 [ENH] Add support for sklearn HistGradientBoostingEstimator (#1230)
* Update model.py

HistGradientBoosting support

* Create __init__.py

* Update model.py

* Create histgb.py

* Update __init__.py

* Update test_model.py

* added histgb to estimator list

* Update Task-Oriented-AutoML.md

added docs

* lint

* fixed bugs

---------

Co-authored-by: Gleb <gleb@Glebs-MacBook-Pro.local>
Co-authored-by: Li Jiang <bnujli@gmail.com>
2023-10-31 14:45:23 +00:00
dependabot[bot]
a93bf39720 Bump @babel/traverse from 7.20.1 to 7.23.2 in /website (#1248)
Bumps [@babel/traverse](https://github.com/babel/babel/tree/HEAD/packages/babel-traverse) from 7.20.1 to 7.23.2.
- [Release notes](https://github.com/babel/babel/releases)
- [Changelog](https://github.com/babel/babel/blob/main/CHANGELOG.md)
- [Commits](https://github.com/babel/babel/commits/v7.23.2/packages/babel-traverse)

---
updated-dependencies:
- dependency-name: "@babel/traverse"
  dependency-type: indirect
...

Signed-off-by: dependabot[bot] <support@github.com>
Co-authored-by: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com>
2023-10-21 14:48:46 +00:00
dependabot[bot]
dc8060a21b Bump postcss from 8.4.18 to 8.4.31 in /website (#1238)
Bumps [postcss](https://github.com/postcss/postcss) from 8.4.18 to 8.4.31.
- [Release notes](https://github.com/postcss/postcss/releases)
- [Changelog](https://github.com/postcss/postcss/blob/main/CHANGELOG.md)
- [Commits](https://github.com/postcss/postcss/compare/8.4.18...8.4.31)

---
updated-dependencies:
- dependency-name: postcss
  dependency-type: indirect
...

Signed-off-by: dependabot[bot] <support@github.com>
Co-authored-by: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com>
Co-authored-by: Qingyun Wu <qingyun.wu@psu.edu>
2023-10-12 07:56:29 +00:00
Aindree Chatterjee
30db685cee Update README.md with autogen links (#1235)
* Update README.md

Added the links to discord, website and github repo for Autogen in ReadMe.md's first news.
In corelation to issue #1231

* Update README.md
2023-10-09 15:32:39 +00:00
164 changed files with 1100 additions and 686 deletions

View File

@@ -17,6 +17,9 @@ on:
merge_group:
types: [checks_requested]
permissions:
contents: write
jobs:
checks:
if: github.event_name != 'push'

View File

@@ -13,6 +13,8 @@ on:
- 'notebook/autogen_chatgpt_gpt4.ipynb'
- '.github/workflows/openai.yml'
permissions: {}
jobs:
test:
strategy:

View File

@@ -10,6 +10,7 @@ defaults:
run:
shell: bash
permissions: {}
jobs:
pre-commit-check:

View File

@@ -17,6 +17,7 @@ on:
merge_group:
types: [checks_requested]
permissions: {}
concurrency:
group: ${{ github.workflow }}-${{ github.ref }}-${{ github.head_ref }}
cancel-in-progress: ${{ github.ref != 'refs/heads/main' }}

View File

@@ -14,7 +14,7 @@
<br>
</p>
:fire: Heads-up: We're preparing to migrate [AutoGen](https://microsoft.github.io/autogen/) into a dedicated github repository. Alongside this move, we'll also launch a dedicated Discord server and a website for comprehensive documentation.
:fire: Heads-up: We have migrated [AutoGen](https://microsoft.github.io/autogen/) into a dedicated [github repository](https://github.com/microsoft/autogen). Alongside this move, we have also launched a dedicated [Discord](https://discord.gg/pAbnFJrkgZ) server and a [website](https://microsoft.github.io/autogen/) for comprehensive documentation.
:fire: The automated multi-agent chat framework in [AutoGen](https://microsoft.github.io/autogen/) is in preview from v2.0.0.

View File

@@ -1,9 +1,9 @@
import logging
from flaml.automl import AutoML, logger_formatter
from flaml.tune.searcher import CFO, BlendSearch, FLOW2, BlendSearchTuner, RandomSearch
from flaml.onlineml.autovw import AutoVW
from flaml.version import __version__
from flaml.automl import AutoML, logger_formatter
from flaml.onlineml.autovw import AutoVW
from flaml.tune.searcher import CFO, FLOW2, BlendSearch, BlendSearchTuner, RandomSearch
from flaml.version import __version__
# Set the root logger.
logger = logging.getLogger(__name__)

View File

@@ -1,3 +1,3 @@
from .oai import *
from .agentchat import *
from .code_utils import DEFAULT_MODEL, FAST_MODEL
from .oai import *

View File

@@ -1,8 +1,8 @@
from .agent import Agent
from .conversable_agent import ConversableAgent
from .assistant_agent import AssistantAgent
from .user_proxy_agent import UserProxyAgent
from .conversable_agent import ConversableAgent
from .groupchat import GroupChat, GroupChatManager
from .user_proxy_agent import UserProxyAgent
__all__ = [
"Agent",

View File

@@ -1,6 +1,7 @@
from .conversable_agent import ConversableAgent
from typing import Callable, Dict, Optional, Union
from .conversable_agent import ConversableAgent
class AssistantAgent(ConversableAgent):
"""(In preview) Assistant agent, designed to solve a task with LLM.

View File

@@ -1,14 +1,14 @@
import re
import os
from pydantic import BaseModel, Extra, root_validator
from typing import Any, Callable, Dict, List, Optional, Union
import re
from time import sleep
from typing import Any, Callable, Dict, List, Optional, Union
from pydantic import BaseModel, Extra, root_validator
from flaml.autogen.agentchat import Agent, UserProxyAgent
from flaml.autogen.code_utils import UNKNOWN, extract_code, execute_code, infer_lang
from flaml.autogen.code_utils import UNKNOWN, execute_code, extract_code, infer_lang
from flaml.autogen.math_utils import get_answer
PROMPTS = {
# default
"default": """Let's use Python to solve a math problem.

View File

@@ -1,6 +1,7 @@
from typing import Any, Callable, Dict, List, Optional, Tuple, Union
from flaml.autogen.agentchat.agent import Agent
from flaml.autogen.agentchat.assistant_agent import AssistantAgent
from typing import Callable, Dict, Optional, Union, List, Tuple, Any
class RetrieveAssistantAgent(AssistantAgent):

View File

@@ -1,12 +1,13 @@
import chromadb
from flaml.autogen.agentchat.agent import Agent
from flaml.autogen.agentchat import UserProxyAgent
from flaml.autogen.retrieve_utils import create_vector_db_from_dir, query_vector_db, num_tokens_from_text
from flaml.autogen.code_utils import extract_code
from typing import Any, Callable, Dict, List, Optional, Tuple, Union
from typing import Callable, Dict, Optional, Union, List, Tuple, Any
import chromadb
from IPython import get_ipython
from flaml.autogen.agentchat import UserProxyAgent
from flaml.autogen.agentchat.agent import Agent
from flaml.autogen.code_utils import extract_code
from flaml.autogen.retrieve_utils import create_vector_db_from_dir, num_tokens_from_text, query_vector_db
try:
from termcolor import colored
except ImportError:

View File

@@ -1,10 +1,10 @@
import asyncio
from collections import defaultdict
import copy
import json
from collections import defaultdict
from typing import Any, Callable, Dict, List, Optional, Tuple, Type, Union
from flaml.autogen import oai
from .agent import Agent
from flaml.autogen.code_utils import (
DEFAULT_MODEL,
UNKNOWN,
@@ -13,6 +13,8 @@ from flaml.autogen.code_utils import (
infer_lang,
)
from .agent import Agent
try:
from termcolor import colored
except ImportError:

View File

@@ -1,6 +1,7 @@
from dataclasses import dataclass
import sys
from dataclasses import dataclass
from typing import Dict, List, Optional, Union
from .agent import Agent
from .conversable_agent import ConversableAgent

View File

@@ -1,6 +1,7 @@
from .conversable_agent import ConversableAgent
from typing import Callable, Dict, Optional, Union
from .conversable_agent import ConversableAgent
class UserProxyAgent(ConversableAgent):
"""(In preview) A proxy agent for the user, that can execute code and provide feedback to the other agents.

View File

@@ -1,13 +1,14 @@
import logging
import os
import pathlib
import re
import signal
import subprocess
import sys
import os
import pathlib
from typing import List, Dict, Tuple, Optional, Union, Callable
import re
import time
from hashlib import md5
import logging
from typing import Callable, Dict, List, Optional, Tuple, Union
from flaml.autogen import oai
try:

View File

@@ -1,5 +1,6 @@
from typing import Optional
from flaml.autogen import oai, DEFAULT_MODEL
from flaml.autogen import DEFAULT_MODEL, oai
_MATH_PROMPT = "{problem} Solve the problem carefully. Simplify your answer as much as possible. Put the final answer in \\boxed{{}}."
_MATH_CONFIG = {

View File

@@ -1,10 +1,10 @@
from flaml.autogen.oai.completion import Completion, ChatCompletion
from flaml.autogen.oai.completion import ChatCompletion, Completion
from flaml.autogen.oai.openai_utils import (
get_config_list,
config_list_from_json,
config_list_from_models,
config_list_gpt4_gpt35,
config_list_openai_aoai,
config_list_from_models,
config_list_from_json,
get_config_list,
)
__all__ = [

View File

@@ -1,28 +1,31 @@
from time import sleep
import logging
import time
from typing import List, Optional, Dict, Callable, Union
import sys
import shutil
import sys
import time
from time import sleep
from typing import Callable, Dict, List, Optional, Union
import numpy as np
from flaml import tune, BlendSearch
from flaml.tune.space import is_constant
from flaml import BlendSearch, tune
from flaml.automl.logger import logger_formatter
from flaml.tune.space import is_constant
from .openai_utils import get_key
try:
import openai
from openai.error import (
ServiceUnavailableError,
RateLimitError,
APIError,
InvalidRequestError,
APIConnectionError,
Timeout,
AuthenticationError,
)
from openai import Completion as openai_Completion
import diskcache
import openai
from openai import Completion as openai_Completion
from openai.error import (
APIConnectionError,
APIError,
AuthenticationError,
InvalidRequestError,
RateLimitError,
ServiceUnavailableError,
Timeout,
)
ERROR = None
except ImportError:

View File

@@ -1,7 +1,7 @@
import os
import json
from typing import List, Optional, Dict, Set, Union
import logging
import os
from typing import Dict, List, Optional, Set, Union
NON_CACHE_KEY = ["api_key", "api_base", "api_type", "api_version"]

View File

@@ -1,13 +1,14 @@
from typing import List, Union, Dict, Tuple
import os
import requests
from urllib.parse import urlparse
import glob
import tiktoken
import chromadb
from chromadb.api import API
import chromadb.utils.embedding_functions as ef
import logging
import os
from typing import Dict, List, Tuple, Union
from urllib.parse import urlparse
import chromadb
import chromadb.utils.embedding_functions as ef
import requests
import tiktoken
from chromadb.api import API
logger = logging.getLogger(__name__)
TEXT_FORMATS = ["txt", "json", "csv", "tsv", "md", "html", "htm", "rtf", "rst", "jsonl", "log", "xml", "yaml", "yml"]

View File

@@ -1,5 +1,5 @@
from flaml.automl.automl import AutoML, size
from flaml.automl.logger import logger_formatter
from flaml.automl.state import SearchState, AutoMLState
from flaml.automl.state import AutoMLState, SearchState
__all__ = ["AutoML", "AutoMLState", "SearchState", "logger_formatter", "size"]

View File

@@ -3,40 +3,41 @@
# * Licensed under the MIT License. See LICENSE file in the
# * project root for license information.
from __future__ import annotations
import time
import json
import logging
import os
import sys
from typing import Callable, List, Union, Optional
import time
from functools import partial
from typing import Callable, List, Optional, Union
import numpy as np
import logging
import json
from flaml.automl.state import SearchState, AutoMLState
from flaml import tune
from flaml.automl.logger import logger, logger_formatter
from flaml.automl.ml import train_estimator
from flaml.automl.time_series import TimeSeriesDataset
from flaml.config import (
MIN_SAMPLE_TRAIN,
MEM_THRES,
RANDOM_SEED,
SMALL_LARGE_THRES,
CV_HOLDOUT_THRESHOLD,
SPLIT_RATIO,
N_SPLITS,
SAMPLE_MULTIPLY_FACTOR,
)
from flaml.automl.spark import DataFrame, Series, psDataFrame, psSeries
from flaml.automl.state import AutoMLState, SearchState
from flaml.automl.task.factory import task_factory
# TODO check to see when we can remove these
from flaml.automl.task.task import CLASSIFICATION, Task
from flaml.automl.task.factory import task_factory
from flaml import tune
from flaml.automl.logger import logger, logger_formatter
from flaml.automl.time_series import TimeSeriesDataset
from flaml.automl.training_log import training_log_reader, training_log_writer
from flaml.config import (
CV_HOLDOUT_THRESHOLD,
MEM_THRES,
MIN_SAMPLE_TRAIN,
N_SPLITS,
RANDOM_SEED,
SAMPLE_MULTIPLY_FACTOR,
SMALL_LARGE_THRES,
SPLIT_RATIO,
)
from flaml.default import suggest_learner
from flaml.version import __version__ as flaml_version
from flaml.automl.spark import psDataFrame, psSeries, DataFrame, Series
from flaml.tune.spark.utils import check_spark, get_broadcast_data
from flaml.version import __version__ as flaml_version
ERROR = (
DataFrame is None and ImportError("please install flaml[automl] option to use the flaml.automl package.") or None
@@ -2647,7 +2648,7 @@ class AutoML(BaseEstimator):
if self._estimator_index == len(estimator_list):
self._estimator_index = 0
return estimator_list[self._estimator_index]
min_estimated_cost, selected = np.Inf, None
min_estimated_cost, selected = np.inf, None
inv = []
untried_exists = False
for i, estimator in enumerate(estimator_list):

View File

@@ -0,0 +1 @@
from .histgb import HistGradientBoostingEstimator

View File

@@ -0,0 +1,75 @@
try:
from sklearn.ensemble import HistGradientBoostingClassifier, HistGradientBoostingRegressor
except ImportError:
pass
from flaml import tune
from flaml.automl.model import SKLearnEstimator
from flaml.automl.task import Task
class HistGradientBoostingEstimator(SKLearnEstimator):
"""The class for tuning Histogram Gradient Boosting."""
ITER_HP = "max_iter"
HAS_CALLBACK = False
DEFAULT_ITER = 100
@classmethod
def search_space(cls, data_size: int, task, **params) -> dict:
upper = max(5, min(32768, int(data_size[0]))) # upper must be larger than lower
return {
"n_estimators": {
"domain": tune.lograndint(lower=4, upper=upper),
"init_value": 4,
"low_cost_init_value": 4,
},
"max_leaves": {
"domain": tune.lograndint(lower=4, upper=upper),
"init_value": 4,
"low_cost_init_value": 4,
},
"min_samples_leaf": {
"domain": tune.lograndint(lower=2, upper=2**7 + 1),
"init_value": 20,
},
"learning_rate": {
"domain": tune.loguniform(lower=1 / 1024, upper=1.0),
"init_value": 0.1,
},
"log_max_bin": { # log transformed with base 2, <= 256
"domain": tune.lograndint(lower=3, upper=9),
"init_value": 8,
},
"l2_regularization": {
"domain": tune.loguniform(lower=1 / 1024, upper=1024),
"init_value": 1.0,
},
}
def config2params(self, config: dict) -> dict:
params = super().config2params(config)
if "log_max_bin" in params:
params["max_bins"] = (1 << params.pop("log_max_bin")) - 1
if "max_leaves" in params:
params["max_leaf_nodes"] = params.get("max_leaf_nodes", params.pop("max_leaves"))
if "n_estimators" in params:
params["max_iter"] = params.get("max_iter", params.pop("n_estimators"))
if "random_state" not in params:
params["random_state"] = 24092023
if "n_jobs" in params:
params.pop("n_jobs")
return params
def __init__(
self,
task: Task,
**config,
):
super().__init__(task, **config)
self.params["verbose"] = 0
if self._task.is_classification():
self.estimator_class = HistGradientBoostingClassifier
else:
self.estimator_class = HistGradientBoostingRegressor

View File

@@ -2,15 +2,17 @@
# * Copyright (c) Microsoft Corporation. All rights reserved.
# * Licensed under the MIT License. See LICENSE file in the
# * project root for license information.
import numpy as np
import os
from datetime import datetime
from typing import TYPE_CHECKING, Union
import os
import numpy as np
from flaml.automl.spark import DataFrame, Series, pd, ps, psDataFrame, psSeries
from flaml.automl.training_log import training_log_reader
from flaml.automl.spark import ps, psDataFrame, psSeries, DataFrame, Series, pd
try:
from scipy.sparse import vstack, issparse
from scipy.sparse import issparse, vstack
except ImportError:
pass
@@ -41,8 +43,9 @@ def load_openml_dataset(dataset_id, data_dir=None, random_state=0, dataset_forma
y_train: A series or array of labels for training data.
y_test: A series or array of labels for test data.
"""
import openml
import pickle
import openml
from sklearn.model_selection import train_test_split
filename = "openml_ds" + str(dataset_id) + ".pkl"
@@ -93,9 +96,10 @@ def load_openml_task(task_id, data_dir):
y_train: A series of labels for training data.
y_test: A series of labels for test data.
"""
import openml
import pickle
import openml
task = openml.tasks.get_task(task_id)
filename = "openml_task" + str(task_id) + ".pkl"
filepath = os.path.join(data_dir, filename)
@@ -341,8 +345,8 @@ class DataTransformer:
drop = True
else:
drop = False
from sklearn.impute import SimpleImputer
from sklearn.compose import ColumnTransformer
from sklearn.impute import SimpleImputer
self.transformer = ColumnTransformer(
[

View File

@@ -2,30 +2,30 @@
# * Copyright (c) FLAML authors. All rights reserved.
# * Licensed under the MIT License. See LICENSE file in the
# * project root for license information.
import time
from typing import Union, Callable, TypeVar, Optional, Tuple
import logging
import time
from typing import Callable, Optional, Tuple, TypeVar, Union
import numpy as np
from flaml.automl.data import group_counts
from flaml.automl.task.task import Task
from flaml.automl.model import BaseEstimator, TransformersEstimator
from flaml.automl.spark import psDataFrame, psSeries, ERROR as SPARK_ERROR, Series, DataFrame
from flaml.automl.spark import ERROR as SPARK_ERROR
from flaml.automl.spark import DataFrame, Series, psDataFrame, psSeries
from flaml.automl.task.task import Task
try:
from sklearn.metrics import (
mean_squared_error,
r2_score,
roc_auc_score,
accuracy_score,
mean_absolute_error,
log_loss,
average_precision_score,
f1_score,
log_loss,
mean_absolute_error,
mean_absolute_percentage_error,
mean_squared_error,
ndcg_score,
r2_score,
roc_auc_score,
)
except ImportError:
pass
@@ -323,7 +323,7 @@ def compute_estimator(
estimator_name: str,
eval_method: str,
eval_metric: Union[str, Callable],
best_val_loss=np.Inf,
best_val_loss=np.inf,
n_jobs: Optional[int] = 1, # some estimators of EstimatorSubclass don't accept n_jobs. Should be None in that case.
estimator_class: Optional[EstimatorSubclass] = None,
cv_score_agg_func: Optional[callable] = None,

View File

@@ -2,36 +2,42 @@
# * Copyright (c) FLAML authors. All rights reserved.
# * Licensed under the MIT License. See LICENSE file in the
# * project root for license information.
import logging
import math
import os
import shutil
import signal
import sys
import time
from contextlib import contextmanager
from functools import partial
import signal
import os
from typing import Callable, List, Union
import numpy as np
import time
import logging
import shutil
import sys
import math
from flaml import tune
from flaml.automl.data import (
group_counts,
)
from flaml.automl.task.factory import task_factory
from flaml.automl.task.task import (
Task,
NLG_TASKS,
SEQCLASSIFICATION,
SEQREGRESSION,
TOKENCLASSIFICATION,
SUMMARIZATION,
NLG_TASKS,
TOKENCLASSIFICATION,
Task,
)
from flaml.automl.task.factory import task_factory
try:
from sklearn.ensemble import RandomForestRegressor, RandomForestClassifier
from sklearn.ensemble import ExtraTreesRegressor, ExtraTreesClassifier
from sklearn.linear_model import LogisticRegression
from sklearn.dummy import DummyClassifier, DummyRegressor
from sklearn.ensemble import (
ExtraTreesClassifier,
ExtraTreesRegressor,
RandomForestClassifier,
RandomForestRegressor,
)
from sklearn.linear_model import LogisticRegression
from xgboost import __version__ as xgboost_version
except ImportError:
pass
@@ -41,13 +47,14 @@ try:
except ImportError:
pass
from flaml.automl.spark import psDataFrame, sparkDataFrame, psSeries, ERROR as SPARK_ERROR, DataFrame, Series
from flaml.automl.spark.utils import len_labels, to_pandas_on_spark
from flaml.automl.spark import ERROR as SPARK_ERROR
from flaml.automl.spark import DataFrame, Series, psDataFrame, psSeries, sparkDataFrame
from flaml.automl.spark.configs import (
ParamList_LightGBM_Classifier,
ParamList_LightGBM_Regressor,
ParamList_LightGBM_Ranker,
ParamList_LightGBM_Regressor,
)
from flaml.automl.spark.utils import len_labels, to_pandas_on_spark
if DataFrame is not None:
from pandas import to_datetime
@@ -62,7 +69,7 @@ except ImportError:
resource = None
try:
from lightgbm import LGBMClassifier, LGBMRegressor, LGBMRanker
from lightgbm import LGBMClassifier, LGBMRanker, LGBMRegressor
except ImportError:
LGBMClassifier = LGBMRegressor = LGBMRanker = None
@@ -320,8 +327,7 @@ class BaseEstimator:
Returns:
The evaluation score on the validation dataset.
"""
from .ml import metric_loss_score
from .ml import is_min_metric
from .ml import is_min_metric, metric_loss_score
if self._model is not None:
if self._task == "rank":
@@ -759,7 +765,7 @@ class TransformersEstimator(BaseEstimator):
return not self._kwargs.get("gpu_per_trial")
def _set_training_args(self, **kwargs):
from .nlp.utils import date_str, Counter
from .nlp.utils import Counter, date_str
for key, val in kwargs.items():
assert key not in self.params, (
@@ -873,10 +879,10 @@ class TransformersEstimator(BaseEstimator):
@property
def data_collator(self):
from flaml.automl.task.task import Task
from flaml.automl.nlp.huggingface.data_collator import (
task_to_datacollator_class,
)
from flaml.automl.task.task import Task
data_collator_class = task_to_datacollator_class.get(
self._task.name if isinstance(self._task, Task) else self._task
@@ -917,6 +923,7 @@ class TransformersEstimator(BaseEstimator):
from transformers import TrainerCallback
from transformers.trainer_utils import set_seed
from .nlp.huggingface.trainer import TrainerForAuto
try:
@@ -1146,6 +1153,7 @@ class TransformersEstimator(BaseEstimator):
def predict(self, X, **pred_kwargs):
import transformers
from datasets import Dataset
from .nlp.huggingface.utils import postprocess_prediction_and_true
transformers.logging.set_verbosity_error()

View File

@@ -1,17 +1,18 @@
from dataclasses import dataclass
from transformers.data.data_collator import (
DataCollatorWithPadding,
DataCollatorForTokenClassification,
DataCollatorForSeq2Seq,
)
from collections import OrderedDict
from dataclasses import dataclass
from transformers.data.data_collator import (
DataCollatorForSeq2Seq,
DataCollatorForTokenClassification,
DataCollatorWithPadding,
)
from flaml.automl.task.task import (
TOKENCLASSIFICATION,
MULTICHOICECLASSIFICATION,
SUMMARIZATION,
SEQCLASSIFICATION,
SEQREGRESSION,
SUMMARIZATION,
TOKENCLASSIFICATION,
)
@@ -19,6 +20,7 @@ from flaml.automl.task.task import (
class DataCollatorForMultipleChoiceClassification(DataCollatorWithPadding):
def __call__(self, features):
from itertools import chain
import torch
label_name = "label" if "label" in features[0].keys() else "labels"

View File

@@ -1,6 +1,7 @@
import argparse
from dataclasses import dataclass, field
from typing import Optional, List
from typing import List, Optional
from flaml.automl.task.task import NLG_TASKS
try:

View File

@@ -1,14 +1,16 @@
from itertools import chain
import numpy as np
from flaml.automl.task.task import (
SUMMARIZATION,
SEQREGRESSION,
SEQCLASSIFICATION,
MULTICHOICECLASSIFICATION,
TOKENCLASSIFICATION,
NLG_TASKS,
)
from flaml.automl.data import pd
from flaml.automl.task.task import (
MULTICHOICECLASSIFICATION,
NLG_TASKS,
SEQCLASSIFICATION,
SEQREGRESSION,
SUMMARIZATION,
TOKENCLASSIFICATION,
)
def todf(X, Y, column_name):
@@ -377,6 +379,7 @@ def load_model(checkpoint_path, task, num_labels=None):
transformers.logging.set_verbosity_error()
from transformers import AutoConfig
from flaml.automl.task.task import (
SEQCLASSIFICATION,
SEQREGRESSION,
@@ -384,10 +387,12 @@ def load_model(checkpoint_path, task, num_labels=None):
)
def get_this_model(checkpoint_path, task, model_config):
from transformers import AutoModelForSequenceClassification
from transformers import AutoModelForSeq2SeqLM
from transformers import AutoModelForMultipleChoice
from transformers import AutoModelForTokenClassification
from transformers import (
AutoModelForMultipleChoice,
AutoModelForSeq2SeqLM,
AutoModelForSequenceClassification,
AutoModelForTokenClassification,
)
if task in (SEQCLASSIFICATION, SEQREGRESSION):
return AutoModelForSequenceClassification.from_pretrained(

View File

@@ -1,11 +1,12 @@
from typing import Dict, Any
from typing import Any, Dict
import numpy as np
from flaml.automl.task.task import (
SUMMARIZATION,
SEQREGRESSION,
SEQCLASSIFICATION,
MULTICHOICECLASSIFICATION,
SEQCLASSIFICATION,
SEQREGRESSION,
SUMMARIZATION,
TOKENCLASSIFICATION,
)

View File

@@ -6,8 +6,10 @@ try:
import pyspark.pandas as ps
import pyspark.sql.functions as F
import pyspark.sql.types as T
from pyspark.pandas import DataFrame as psDataFrame
from pyspark.pandas import Series as psSeries
from pyspark.pandas import set_option
from pyspark.sql import DataFrame as sparkDataFrame
from pyspark.pandas import DataFrame as psDataFrame, Series as psSeries, set_option
from pyspark.util import VersionUtils
except ImportError:

View File

@@ -1,14 +1,16 @@
import numpy as np
from typing import Union
from flaml.automl.spark import psSeries, F
import numpy as np
from pyspark.ml.evaluation import (
BinaryClassificationEvaluator,
RegressionEvaluator,
MulticlassClassificationEvaluator,
MultilabelClassificationEvaluator,
RankingEvaluator,
RegressionEvaluator,
)
from flaml.automl.spark import F, psSeries
def ps_group_counts(groups: Union[psSeries, np.ndarray]) -> np.ndarray:
if isinstance(groups, np.ndarray):

View File

@@ -1,17 +1,19 @@
import logging
from typing import Union, List, Optional, Tuple
from typing import List, Optional, Tuple, Union
import numpy as np
from flaml.automl.spark import (
sparkDataFrame,
ps,
DataFrame,
F,
Series,
T,
_spark_major_minor_version,
ps,
psDataFrame,
psSeries,
_spark_major_minor_version,
DataFrame,
Series,
set_option,
sparkDataFrame,
)
logger = logging.getLogger(__name__)

View File

@@ -1,13 +1,15 @@
import inspect
import copy
import inspect
import time
from typing import Any, Optional
import numpy as np
from flaml import tune
from flaml.automl.logger import logger
from flaml.automl.ml import compute_estimator, train_estimator
from flaml.automl.spark import DataFrame, Series, psDataFrame, psSeries
from flaml.automl.time_series.ts_data import TimeSeriesDataset
from flaml.automl.spark import psDataFrame, psSeries, DataFrame, Series
class SearchState:

View File

@@ -1,8 +1,9 @@
from typing import Optional, Union
import numpy as np
from flaml.automl.data import DataFrame, Series
from flaml.automl.task.task import Task, TS_FORECAST
from flaml.automl.task.task import TS_FORECAST, Task
def task_factory(

View File

@@ -1,43 +1,44 @@
import logging
import time
from typing import List, Optional
import numpy as np
from flaml.automl.data import TS_TIMESTAMP_COL, concat
from flaml.automl.ml import EstimatorSubclass, get_val_loss, default_cv_score_agg_func
from flaml.automl.task.task import (
Task,
get_classification_objective,
TS_FORECAST,
TS_FORECASTPANEL,
)
from flaml.config import RANDOM_SEED
from flaml.automl.spark import ps, psDataFrame, psSeries, pd
import numpy as np
from flaml.automl.data import TS_TIMESTAMP_COL, concat
from flaml.automl.ml import EstimatorSubclass, default_cv_score_agg_func, get_val_loss
from flaml.automl.spark import pd, ps, psDataFrame, psSeries
from flaml.automl.spark.utils import (
iloc_pandas_on_spark,
len_labels,
set_option,
spark_kFold,
train_test_split_pyspark,
unique_pandas_on_spark,
unique_value_first_index,
len_labels,
set_option,
)
from flaml.automl.task.task import (
TS_FORECAST,
TS_FORECASTPANEL,
Task,
get_classification_objective,
)
from flaml.config import RANDOM_SEED
try:
from scipy.sparse import issparse
except ImportError:
pass
try:
from sklearn.utils import shuffle
from sklearn.model_selection import (
train_test_split,
RepeatedStratifiedKFold,
RepeatedKFold,
GroupKFold,
TimeSeriesSplit,
GroupShuffleSplit,
RepeatedKFold,
RepeatedStratifiedKFold,
StratifiedGroupKFold,
TimeSeriesSplit,
train_test_split,
)
from sklearn.utils import shuffle
except ImportError:
pass
@@ -49,19 +50,20 @@ class GenericTask(Task):
def estimators(self):
if self._estimators is None:
# put this into a function to avoid circular dependency
from flaml.automl.contrib.histgb import HistGradientBoostingEstimator
from flaml.automl.model import (
XGBoostSklearnEstimator,
XGBoostLimitDepthEstimator,
RandomForestEstimator,
LGBMEstimator,
LRL1Classifier,
LRL2Classifier,
CatBoostEstimator,
ExtraTreesEstimator,
KNeighborsEstimator,
LGBMEstimator,
LRL1Classifier,
LRL2Classifier,
RandomForestEstimator,
SparkLGBMEstimator,
TransformersEstimator,
TransformersEstimatorModelSelection,
SparkLGBMEstimator,
XGBoostLimitDepthEstimator,
XGBoostSklearnEstimator,
)
self._estimators = {
@@ -77,6 +79,7 @@ class GenericTask(Task):
"kneighbor": KNeighborsEstimator,
"transformer": TransformersEstimator,
"transformer_ms": TransformersEstimatorModelSelection,
"histgb": HistGradientBoostingEstimator,
}
return self._estimators

View File

@@ -1,6 +1,8 @@
from abc import ABC, abstractmethod
from typing import TYPE_CHECKING, List, Optional, Tuple, Union
import numpy as np
from flaml.automl.data import DataFrame, Series, psDataFrame, psSeries
if TYPE_CHECKING:

View File

@@ -2,26 +2,25 @@ import logging
import time
from typing import List
import pandas as pd
import numpy as np
import pandas as pd
from scipy.sparse import issparse
from sklearn.model_selection import (
GroupKFold,
TimeSeriesSplit,
)
from flaml.automl.ml import get_val_loss, default_cv_score_agg_func
from flaml.automl.time_series.ts_data import (
TimeSeriesDataset,
DataTransformerTS,
normalize_ts_data,
)
from flaml.automl.ml import default_cv_score_agg_func, get_val_loss
from flaml.automl.task.task import (
Task,
get_classification_objective,
TS_FORECAST,
TS_FORECASTPANEL,
Task,
get_classification_objective,
)
from flaml.automl.time_series.ts_data import (
DataTransformerTS,
TimeSeriesDataset,
normalize_ts_data,
)
logger = logging.getLogger(__name__)
@@ -33,18 +32,18 @@ class TimeSeriesTask(Task):
if self._estimators is None:
# put this into a function to avoid circular dependency
from flaml.automl.time_series import (
ARIMA,
LGBM_TS,
RF_TS,
SARIMAX,
CatBoost_TS,
ExtraTrees_TS,
HoltWinters,
Orbit,
Prophet,
TemporalFusionTransformerEstimator,
XGBoost_TS,
XGBoostLimitDepth_TS,
RF_TS,
LGBM_TS,
ExtraTrees_TS,
CatBoost_TS,
Prophet,
Orbit,
ARIMA,
SARIMAX,
TemporalFusionTransformerEstimator,
HoltWinters,
)
self._estimators = {

View File

@@ -1,17 +1,16 @@
from .ts_model import (
Prophet,
Orbit,
ARIMA,
SARIMAX,
HoltWinters,
LGBM_TS,
XGBoost_TS,
RF_TS,
ExtraTrees_TS,
XGBoostLimitDepth_TS,
CatBoost_TS,
TimeSeriesEstimator,
)
from .tft import TemporalFusionTransformerEstimator
from .ts_data import TimeSeriesDataset
from .ts_model import (
ARIMA,
LGBM_TS,
RF_TS,
SARIMAX,
CatBoost_TS,
ExtraTrees_TS,
HoltWinters,
Orbit,
Prophet,
TimeSeriesEstimator,
XGBoost_TS,
XGBoostLimitDepth_TS,
)

View File

@@ -1,5 +1,5 @@
import math
import datetime
import math
from functools import lru_cache
import pandas as pd

View File

@@ -12,8 +12,8 @@ except ImportError:
DataFrame = Series = None
import numpy as np
from sklearn.preprocessing import StandardScaler
from sklearn.decomposition import PCA
from sklearn.preprocessing import StandardScaler
def make_lag_features(X: pd.DataFrame, y: pd.Series, lags: int):

View File

@@ -105,6 +105,7 @@ class TemporalFusionTransformerEstimator(TimeSeriesEstimator):
def fit(self, X_train, y_train, budget=None, **kwargs):
import warnings
import pytorch_lightning as pl
import torch
from pytorch_forecasting import TemporalFusionTransformer

View File

@@ -2,7 +2,7 @@ import copy
import datetime
import math
from dataclasses import dataclass, field
from typing import List, Optional, Callable, Dict, Generator, Union
from typing import Callable, Dict, Generator, List, Optional, Union
import numpy as np
@@ -10,9 +10,9 @@ try:
import pandas as pd
from pandas import DataFrame, Series, to_datetime
from scipy.sparse import issparse
from sklearn.preprocessing import LabelEncoder
from sklearn.impute import SimpleImputer
from sklearn.compose import ColumnTransformer
from sklearn.impute import SimpleImputer
from sklearn.preprocessing import LabelEncoder
from .feature import monthly_fourier_features
except ImportError:

View File

@@ -1,8 +1,8 @@
import time
import logging
import os
from datetime import datetime
import math
import os
import time
from datetime import datetime
from typing import List, Optional, Union
try:
@@ -22,26 +22,26 @@ except ImportError:
import numpy as np
from flaml import tune
from flaml.automl.model import (
suppress_stdout_stderr,
SKLearnEstimator,
logger,
LGBMEstimator,
XGBoostSklearnEstimator,
RandomForestEstimator,
ExtraTreesEstimator,
XGBoostLimitDepthEstimator,
CatBoostEstimator,
)
from flaml.automl.data import TS_TIMESTAMP_COL, TS_VALUE_COL
from flaml.automl.time_series.ts_data import (
TimeSeriesDataset,
enrich_dataset,
enrich_dataframe,
normalize_ts_data,
create_forward_frame,
from flaml.automl.model import (
CatBoostEstimator,
ExtraTreesEstimator,
LGBMEstimator,
RandomForestEstimator,
SKLearnEstimator,
XGBoostLimitDepthEstimator,
XGBoostSklearnEstimator,
logger,
suppress_stdout_stderr,
)
from flaml.automl.task import Task
from flaml.automl.time_series.ts_data import (
TimeSeriesDataset,
create_forward_frame,
enrich_dataframe,
enrich_dataset,
normalize_ts_data,
)
class TimeSeriesEstimator(SKLearnEstimator):
@@ -143,6 +143,7 @@ class TimeSeriesEstimator(SKLearnEstimator):
def score(self, X_val: DataFrame, y_val: Series, **kwargs):
from sklearn.metrics import r2_score
from ..ml import metric_loss_score
y_pred = self.predict(X_val, **kwargs)

View File

@@ -4,9 +4,9 @@
"""
import json
from typing import IO
from contextlib import contextmanager
import logging
from contextlib import contextmanager
from typing import IO
logger = logging.getLogger("flaml.automl")

View File

@@ -1,18 +1,18 @@
from .suggest import (
suggest_config,
suggest_learner,
suggest_hyperparams,
preprocess_and_suggest_hyperparams,
meta_feature,
)
from .estimator import (
flamlize_estimator,
LGBMClassifier,
LGBMRegressor,
XGBClassifier,
XGBRegressor,
RandomForestClassifier,
RandomForestRegressor,
ExtraTreesClassifier,
ExtraTreesRegressor,
LGBMClassifier,
LGBMRegressor,
RandomForestClassifier,
RandomForestRegressor,
XGBClassifier,
XGBRegressor,
flamlize_estimator,
)
from .suggest import (
meta_feature,
preprocess_and_suggest_hyperparams,
suggest_config,
suggest_hyperparams,
suggest_learner,
)

View File

@@ -1,5 +1,7 @@
from functools import wraps
from flaml.automl.task.task import CLASSIFICATION
from .suggest import preprocess_and_suggest_hyperparams
DEFAULT_LOCATION = "default_location"

View File

@@ -1,7 +1,7 @@
import numpy as np
import pandas as pd
from sklearn.preprocessing import RobustScaler
from sklearn.metrics import pairwise_distances
from sklearn.preprocessing import RobustScaler
def _augment(row):
@@ -12,7 +12,7 @@ def _augment(row):
def construct_portfolio(regret_matrix, meta_features, regret_bound):
"""The portfolio construction algorithm.
(Reference)[https://arxiv.org/abs/2202.09927].
Reference: [Mining Robust Default Configurations for Resource-constrained AutoML](https://arxiv.org/abs/2202.09927).
Args:
regret_matrix: A dataframe of regret matrix.

View File

@@ -1,11 +1,13 @@
import pandas as pd
import numpy as np
import argparse
from pathlib import Path
import json
from pathlib import Path
import numpy as np
import pandas as pd
from sklearn.preprocessing import RobustScaler
from flaml.default import greedy
from flaml.default.regret import load_result, build_regret
from flaml.default.regret import build_regret, load_result
from flaml.version import __version__
regret_bound = 0.01

View File

@@ -1,5 +1,6 @@
import argparse
from os import path
import pandas as pd

View File

@@ -1,11 +1,13 @@
import numpy as np
import json
import logging
import pathlib
import json
import numpy as np
from flaml.automl.data import DataTransformer
from flaml.automl.task.task import CLASSIFICATION, get_classification_objective
from flaml.automl.task.generic_task import len_labels
from flaml.automl.task.factory import task_factory
from flaml.automl.task.generic_task import len_labels
from flaml.automl.task.task import CLASSIFICATION, get_classification_objective
from flaml.version import __version__
try:

View File

@@ -2,7 +2,6 @@ import warnings
from flaml.automl.ml import *
warnings.warn(
"Importing from `flaml.ml` is deprecated. Please use `flaml.automl.ml`.",
DeprecationWarning,

View File

@@ -1,16 +1,17 @@
from typing import Optional, Union
import logging
from typing import Optional, Union
from flaml.onlineml import OnlineTrialRunner
from flaml.onlineml.trial import get_ns_feature_dim_from_vw_example
from flaml.tune import (
Trial,
Categorical,
Float,
PolynomialExpansionSet,
Trial,
polynomial_expansion_set,
)
from flaml.onlineml import OnlineTrialRunner
from flaml.tune.scheduler import ChaChaScheduler
from flaml.tune.searcher import ChampionFrontierSearcher
from flaml.onlineml.trial import get_ns_feature_dim_from_vw_example
logger = logging.getLogger(__name__)
@@ -140,7 +141,7 @@ class AutoVW:
max_live_model_num=self._max_live_model_num,
searcher=searcher,
scheduler=scheduler,
**self._automl_runner_args
**self._automl_runner_args,
)
def predict(self, data_sample):

View File

@@ -1,14 +1,16 @@
import numpy as np
import logging
import time
import math
import copy
import collections
import copy
import logging
import math
import time
from typing import Optional, Union
import numpy as np
from flaml.tune import Trial
try:
from sklearn.metrics import mean_squared_error, mean_absolute_error
from sklearn.metrics import mean_absolute_error, mean_squared_error
except ImportError:
pass

View File

@@ -1,10 +1,11 @@
import numpy as np
import logging
import math
import numpy as np
from flaml.tune import Trial
from flaml.tune.scheduler import TrialScheduler
import logging
logger = logging.getLogger(__name__)

View File

@@ -3,16 +3,16 @@ try:
assert ray_version >= "1.10.0"
from ray.tune import (
uniform,
lograndint,
loguniform,
qlograndint,
qloguniform,
qrandint,
qrandn,
quniform,
randint,
qrandint,
randn,
qrandn,
loguniform,
qloguniform,
lograndint,
qlograndint,
uniform,
)
if ray_version.startswith("1."):
@@ -20,21 +20,20 @@ try:
else:
from ray.tune.search import sample
except (ImportError, AssertionError):
from . import sample
from .sample import (
uniform,
lograndint,
loguniform,
qlograndint,
qloguniform,
qrandint,
qrandn,
quniform,
randint,
qrandint,
randn,
qrandn,
loguniform,
qloguniform,
lograndint,
qlograndint,
uniform,
)
from . import sample
from .tune import run, report, INCUMBENT_RESULT
from .sample import polynomial_expansion_set
from .sample import PolynomialExpansionSet, Categorical, Float
from .sample import Categorical, Float, PolynomialExpansionSet, polynomial_expansion_set
from .trial import Trial
from .tune import INCUMBENT_RESULT, report, run
from .utils import choice

View File

@@ -15,10 +15,12 @@
# This source file is adapted here because ray does not fully support Windows.
# Copyright (c) Microsoft Corporation.
from typing import Dict, Optional
import numpy as np
from .trial import Trial
import logging
from typing import Dict, Optional
import numpy as np
from .trial import Trial
logger = logging.getLogger(__name__)

View File

@@ -19,6 +19,7 @@ import logging
from copy import copy
from math import isclose
from typing import Any, Dict, List, Optional, Sequence, Union
import numpy as np
# Backwards compatibility

View File

@@ -1,6 +1,6 @@
from .trial_scheduler import TrialScheduler
from .online_scheduler import (
ChaChaScheduler,
OnlineScheduler,
OnlineSuccessiveDoublingScheduler,
ChaChaScheduler,
)
from .trial_scheduler import TrialScheduler

View File

@@ -1,9 +1,12 @@
import numpy as np
import logging
from typing import Dict
from flaml.tune.scheduler import TrialScheduler
import numpy as np
from flaml.tune import Trial
from .trial_scheduler import TrialScheduler
logger = logging.getLogger(__name__)

View File

@@ -2,10 +2,11 @@
# * Copyright (c) Microsoft Corporation. All rights reserved.
# * Licensed under the MIT License. See LICENSE file in the
# * project root for license information.
from typing import Dict, Optional, List, Tuple, Callable, Union
import numpy as np
import time
import pickle
import time
from typing import Callable, Dict, List, Optional, Tuple, Union
import numpy as np
try:
from ray import __version__ as ray_version
@@ -18,17 +19,17 @@ try:
from ray.tune.search import Searcher
from ray.tune.search.optuna import OptunaSearch as GlobalSearch
except (ImportError, AssertionError):
from .suggestion import Searcher
from .suggestion import OptunaSearch as GlobalSearch
from ..trial import unflatten_dict, flatten_dict
from .. import INCUMBENT_RESULT
from .search_thread import SearchThread
from .flow2 import FLOW2
from ..space import add_cost_to_space, indexof, normalize, define_by_run_func
from ..result import TIME_TOTAL_S
from .suggestion import Searcher
import logging
from .. import INCUMBENT_RESULT
from ..result import TIME_TOTAL_S
from ..space import add_cost_to_space, define_by_run_func, indexof, normalize
from ..trial import flatten_dict, unflatten_dict
from .flow2 import FLOW2
from .search_thread import SearchThread
SEARCH_THREAD_EPS = 1.0
PENALTY = 1e10 # penalty term for constraints
logger = logging.getLogger(__name__)
@@ -931,27 +932,27 @@ try:
assert ray_version >= "1.10.0"
from ray.tune import (
uniform,
quniform,
choice,
randint,
qrandint,
randn,
qrandn,
loguniform,
qloguniform,
qrandint,
qrandn,
quniform,
randint,
randn,
uniform,
)
except (ImportError, AssertionError):
from ..sample import (
uniform,
quniform,
choice,
randint,
qrandint,
randn,
qrandn,
loguniform,
qloguniform,
qrandint,
qrandn,
quniform,
randint,
randn,
uniform,
)
try:
@@ -978,7 +979,7 @@ class BlendSearchTuner(BlendSearch, NNITuner):
result = {
"config": parameters,
self._metric: extract_scalar_reward(value),
self.cost_attr: 1 if isinstance(value, float) else value.get(self.cost_attr, value.get("sequence", 1))
self.cost_attr: 1 if isinstance(value, float) else value.get(self.cost_attr, value.get("sequence", 1)),
# if nni does not report training cost,
# using sequence as an approximation.
# if no sequence, using a constant 1

View File

@@ -2,8 +2,8 @@
# * Copyright (c) Microsoft Corporation. All rights reserved.
# * Licensed under the MIT License. See LICENSE file in the
# * project root for license information.
from .flow2 import FLOW2
from .blendsearch import CFO
from .flow2 import FLOW2
class FLOW2Cat(FLOW2):

View File

@@ -2,31 +2,34 @@
# * Copyright (c) Microsoft Corporation. All rights reserved.
# * Licensed under the MIT License. See LICENSE file in the
# * project root for license information.
from typing import Dict, Optional, Tuple
import numpy as np
import logging
from collections import defaultdict
from typing import Dict, Optional, Tuple
import numpy as np
try:
from ray import __version__ as ray_version
assert ray_version >= "1.0.0"
if ray_version.startswith("1."):
from ray.tune.suggest import Searcher
from ray.tune import sample
from ray.tune.suggest import Searcher
else:
from ray.tune.search import Searcher, sample
from ray.tune.utils.util import flatten_dict, unflatten_dict
except (ImportError, AssertionError):
from .suggestion import Searcher
from flaml.tune import sample
from ..trial import flatten_dict, unflatten_dict
from .suggestion import Searcher
from flaml.config import SAMPLE_MULTIPLY_FACTOR
from ..space import (
complete_config,
denormalize,
normalize,
generate_variants_compatible,
normalize,
)
logger = logging.getLogger(__name__)
@@ -135,7 +138,7 @@ class FLOW2(Searcher):
self.max_resource = max_resource
self._resource = None
self._f_best = None # only use for lexico_comapre. It represent the best value achieved by lexico_flow.
self._step_lb = np.Inf
self._step_lb = np.inf
self._histories = None # only use for lexico_comapre. It records the result of historical configurations.
if space is not None:
self._init_search()

View File

@@ -1,9 +1,11 @@
import numpy as np
import logging
import itertools
from typing import Dict, Optional, List
from flaml.tune import Categorical, Float, PolynomialExpansionSet, Trial
import logging
from typing import Dict, List, Optional
import numpy as np
from flaml.onlineml import VowpalWabbitTrial
from flaml.tune import Categorical, Float, PolynomialExpansionSet, Trial
from flaml.tune.searcher import CFO
logger = logging.getLogger(__name__)

View File

@@ -3,6 +3,7 @@
# * Licensed under the MIT License. See LICENSE file in the
# * project root for license information.
from typing import Dict, Optional
import numpy as np
try:
@@ -15,11 +16,12 @@ try:
from ray.tune.search import Searcher
except (ImportError, AssertionError):
from .suggestion import Searcher
from .flow2 import FLOW2
from ..space import add_cost_to_space, unflatten_hierarchical
from ..result import TIME_TOTAL_S
import logging
from ..result import TIME_TOTAL_S
from ..space import add_cost_to_space, unflatten_hierarchical
from .flow2 import FLOW2
logger = logging.getLogger(__name__)

View File

@@ -15,15 +15,17 @@
# This source file is adapted here because ray does not fully support Windows.
# Copyright (c) Microsoft Corporation.
import time
import functools
import warnings
import copy
import numpy as np
import functools
import logging
from typing import Any, Dict, Optional, Union, List, Tuple, Callable
import pickle
from .variant_generator import parse_spec_vars
import time
import warnings
from collections import defaultdict
from typing import Any, Callable, Dict, List, Optional, Tuple, Union
import numpy as np
from ..sample import (
Categorical,
Domain,
@@ -34,7 +36,7 @@ from ..sample import (
Uniform,
)
from ..trial import flatten_dict, unflatten_dict
from collections import defaultdict
from .variant_generator import parse_spec_vars
logger = logging.getLogger(__name__)
@@ -183,7 +185,7 @@ class ConcurrencyLimiter(Searcher):
"""
def __init__(self, searcher: Searcher, max_concurrent: int, batch: bool = False):
assert type(max_concurrent) is int and max_concurrent > 0
assert isinstance(max_concurrent, int) and max_concurrent > 0
self.searcher = searcher
self.max_concurrent = max_concurrent
self.batch = batch
@@ -252,8 +254,8 @@ try:
import optuna as ot
from optuna.distributions import BaseDistribution as OptunaDistribution
from optuna.samplers import BaseSampler
from optuna.trial import TrialState as OptunaTrialState
from optuna.trial import Trial as OptunaTrial
from optuna.trial import TrialState as OptunaTrialState
except ImportError:
ot = None
OptunaDistribution = None

View File

@@ -17,9 +17,11 @@
# Copyright (c) Microsoft Corporation.
import copy
import logging
from typing import Any, Dict, Generator, List, Tuple
import numpy
import random
from typing import Any, Dict, Generator, List, Tuple
import numpy
from ..sample import Categorical, Domain, RandomState
try:

View File

@@ -11,9 +11,10 @@ try:
except (ImportError, AssertionError):
from . import sample
from .searcher.variant_generator import generate_variants
from typing import Dict, Optional, Any, Tuple, Generator, List, Union
import numpy as np
import logging
from typing import Any, Dict, Generator, List, Optional, Tuple, Union
import numpy as np
logger = logging.getLogger(__name__)
@@ -489,7 +490,7 @@ def complete_config(
elif domain.bounded:
up, low, gauss_std = 1, 0, 1.0
else:
up, low, gauss_std = np.Inf, -np.Inf, 1.0
up, low, gauss_std = np.inf, -np.inf, 1.0
if domain.bounded:
if isinstance(up, list):
up[-1] = min(up[-1], 1)

View File

@@ -1,8 +1,8 @@
from flaml.tune.spark.utils import (
broadcast_code,
check_spark,
get_n_cpus,
with_parameters,
broadcast_code,
)
__all__ = ["check_spark", "get_n_cpus", "with_parameters", "broadcast_code"]

View File

@@ -5,7 +5,6 @@ import threading
import time
from functools import lru_cache, partial
logger = logging.getLogger(__name__)
logger_formatter = logging.Formatter(
"[%(name)s: %(asctime)s] {%(lineno)d} %(levelname)s - %(message)s", "%m-%d %H:%M:%S"
@@ -13,10 +12,10 @@ logger_formatter = logging.Formatter(
logger.propagate = False
os.environ["PYARROW_IGNORE_TIMEZONE"] = "1"
try:
import py4j
import pyspark
from pyspark.sql import SparkSession
from pyspark.util import VersionUtils
import py4j
except ImportError:
_have_spark = False
py4j = None
@@ -286,6 +285,7 @@ class PySparkOvertimeMonitor:
def __exit__(self, exc_type, exc_value, exc_traceback):
"""Exit the context manager.
This will wait for the monitor thread to nicely exit."""
logger.debug(f"monitor exited: {exc_type}, {exc_value}, {exc_traceback}")
if self._force_cancel and _have_spark:
self._finished_flag = True
self._monitor_daemon.join()
@@ -296,6 +296,11 @@ class PySparkOvertimeMonitor:
if not exc_type:
return True
elif exc_type == py4j.protocol.Py4JJavaError:
logger.debug("Py4JJavaError Exception: %s", exc_value)
return True
elif exc_type == TypeError:
# When force cancel, joblib>1.2.0 will raise joblib.externals.loky.process_executor._ExceptionWithTraceback
logger.debug("TypeError Exception: %s", exc_value)
return True
else:
return False

View File

@@ -15,10 +15,10 @@
# This source file is adapted here because ray does not fully support Windows.
# Copyright (c) Microsoft Corporation.
import uuid
import time
from numbers import Number
import uuid
from collections import deque
from numbers import Number
def flatten_dict(dt, delimiter="/", prevent_delimiter=False):

View File

@@ -2,6 +2,7 @@
# * Copyright (c) Microsoft Corporation. All rights reserved.
# * Licensed under the MIT License. See LICENSE file in the
# * project root for license information.
import logging
from typing import Optional
# try:
@@ -10,7 +11,6 @@ from typing import Optional
# from ray.tune.trial import Trial
# except (ImportError, AssertionError):
from .trial import Trial
import logging
logger = logging.getLogger(__name__)

View File

@@ -2,13 +2,14 @@
# * Copyright (c) FLAML authors. All rights reserved.
# * Licensed under the MIT License. See LICENSE file in the
# * project root for license information.
from typing import Optional, Union, List, Callable, Tuple, Dict
import numpy as np
import datetime
import time
import os
import sys
import time
from collections import defaultdict
from typing import Callable, Dict, List, Optional, Tuple, Union
import numpy as np
try:
from ray import __version__ as ray_version
@@ -21,11 +22,13 @@ except (ImportError, AssertionError):
else:
ray_available = True
from .trial import Trial
from .result import DEFAULT_METRIC
import logging
from flaml.tune.spark.utils import PySparkOvertimeMonitor, check_spark
from .result import DEFAULT_METRIC
from .trial import Trial
logger = logging.getLogger(__name__)
logger.propagate = False
_use_ray = True
@@ -92,10 +95,12 @@ class ExperimentAnalysis(EA):
feasible_index_filter = np.where(
feasible_value
<= max(
f_best[k_metric] + self.lexico_objectives["tolerances"][k_metric]
if not isinstance(self.lexico_objectives["tolerances"][k_metric], str)
else f_best[k_metric]
* (1 + 0.01 * float(self.lexico_objectives["tolerances"][k_metric].replace("%", ""))),
(
f_best[k_metric] + self.lexico_objectives["tolerances"][k_metric]
if not isinstance(self.lexico_objectives["tolerances"][k_metric], str)
else f_best[k_metric]
* (1 + 0.01 * float(self.lexico_objectives["tolerances"][k_metric].replace("%", "")))
),
k_target,
)
)[0]
@@ -481,7 +486,7 @@ def run(
else:
logger.setLevel(logging.CRITICAL)
from .searcher.blendsearch import BlendSearch, CFO, RandomSearch
from .searcher.blendsearch import CFO, BlendSearch, RandomSearch
if lexico_objectives is not None:
if "modes" not in lexico_objectives.keys():
@@ -650,12 +655,13 @@ def run(
if not spark_available:
raise spark_error_msg
try:
from pyspark.sql import SparkSession
from joblib import Parallel, delayed, parallel_backend
from joblibspark import register_spark
from pyspark.sql import SparkSession
except ImportError as e:
raise ImportError(f"{e}. Try pip install flaml[spark] or set use_spark=False.")
from flaml.tune.searcher.suggestion import ConcurrencyLimiter
from .trial_runner import SparkTrialRunner
register_spark()

View File

@@ -1 +1 @@
__version__ = "2.1.1"
__version__ = "2.1.2"

View File

@@ -24,6 +24,7 @@ select = [
# "D", # see: https://pypi.org/project/pydocstyle
# "N", # see: https://pypi.org/project/pep8-naming
# "S", # see: https://pypi.org/project/flake8-bandit
"I", # see: https://pypi.org/project/isort/
]
ignore = [
"E501",

View File

@@ -1,6 +1,7 @@
import setuptools
import os
import setuptools
here = os.path.abspath(os.path.dirname(__file__))
with open("README.md", "r", encoding="UTF-8") as fh:
@@ -14,7 +15,7 @@ with open(os.path.join(here, "flaml/version.py")) as fp:
__version__ = version["__version__"]
install_requires = [
"NumPy>=1.17.0rc1",
"NumPy>=1.17",
]
@@ -47,7 +48,6 @@ setuptools.setup(
"spark": [
"pyspark>=3.2.0",
"joblibspark>=0.5.0",
"joblib<1.3.0", # temp solution for joblib 1.3.0 issue, no need once https://github.com/joblib/joblib-spark/pull/48 is merged
],
"test": [
"lightgbm>=2.3.1",
@@ -88,7 +88,6 @@ setuptools.setup(
"pydantic==1.10.9",
"sympy",
"wolframalpha",
"joblib<1.3.0", # temp solution for joblib 1.3.0 issue, no need once https://github.com/joblib/joblib-spark/pull/48 is merged
],
"catboost": ["catboost>=0.26"],
"blendsearch": [
@@ -153,7 +152,6 @@ setuptools.setup(
"joblibspark>=0.5.0",
"optuna==2.8.0",
"pyspark>=3.2.0",
"joblib<1.3.0", # temp solution for joblib 1.3.0 issue, no need once https://github.com/joblib/joblib-spark/pull/48 is merged
],
"autozero": ["scikit-learn", "pandas", "packaging"],
},

View File

@@ -1,6 +1,8 @@
import os
import sys
import pytest
from flaml import autogen
from flaml.autogen.agentchat import AssistantAgent, UserProxyAgent

View File

@@ -1,7 +1,9 @@
import asyncio
from flaml import autogen
from test_assistant_agent import KEY_LOC, OAI_CONFIG_LIST
from flaml import autogen
def get_market_news(ind, ind_upper):
data = {

View File

@@ -1,4 +1,5 @@
import pytest
from flaml.autogen.agentchat import ConversableAgent

View File

@@ -1,12 +1,14 @@
import pytest
import sys
import pytest
from test_assistant_agent import KEY_LOC, OAI_CONFIG_LIST
from flaml import autogen
from flaml.autogen.agentchat.contrib.math_user_proxy_agent import (
MathUserProxyAgent,
_remove_print,
_add_print_to_last_line,
_remove_print,
)
from test_assistant_agent import KEY_LOC, OAI_CONFIG_LIST
@pytest.mark.skipif(

View File

@@ -1,9 +1,13 @@
import pytest
import sys
from flaml import autogen
import pytest
from test_assistant_agent import KEY_LOC, OAI_CONFIG_LIST
from flaml import autogen
try:
import chromadb
from flaml.autogen.agentchat.contrib.retrieve_assistant_agent import (
RetrieveAssistantAgent,
)
@@ -11,7 +15,6 @@ try:
RetrieveUserProxyAgent,
)
from flaml.autogen.retrieve_utils import create_vector_db_from_dir, query_vector_db
import chromadb
skip_test = False
except ImportError:

View File

@@ -1,16 +1,18 @@
import datasets
import json
import os
import sys
from functools import partial
import datasets
import numpy as np
import pytest
from functools import partial
import os
import json
from flaml import autogen
from flaml.autogen.code_utils import (
eval_function_completions,
generate_assertions,
implement,
generate_code,
implement,
)
from flaml.autogen.math_utils import eval_math_responses, solve_problem
@@ -117,8 +119,8 @@ def test_multi_model():
def test_nocontext():
try:
import openai
import diskcache
import openai
except ImportError as exc:
print(exc)
return
@@ -206,8 +208,8 @@ def test_humaneval(num_samples=1):
autogen.Completion.clear_cache(cache_path_root="{here}/cache")
autogen.Completion.set_cache(seed)
try:
import openai
import diskcache
import openai
except ImportError as exc:
print(exc)
return
@@ -325,8 +327,8 @@ def test_humaneval(num_samples=1):
def test_math(num_samples=-1):
try:
import openai
import diskcache
import openai
except ImportError as exc:
print(exc)
return

View File

@@ -1,8 +1,10 @@
import json
import os
from flaml import autogen
from test_completion import KEY_LOC, OAI_CONFIG_LIST
from flaml import autogen
def test_config_list_from_json():
config_list = autogen.config_list_gpt4_gpt35(key_file_path=KEY_LOC)

View File

@@ -1,14 +1,16 @@
import sys
import os
import sys
import pytest
from flaml import autogen
from flaml.autogen.code_utils import (
UNKNOWN,
extract_code,
execute_code,
infer_lang,
extract_code,
improve_code,
improve_function,
infer_lang,
)
KEY_LOC = "notebook"

View File

@@ -2,11 +2,13 @@ try:
import openai
except ImportError:
openai = None
import pytest
import json
import pytest
from test_code import KEY_LOC
from flaml import autogen
from flaml.autogen.math_utils import eval_math_responses
from test_code import KEY_LOC
@pytest.mark.skipif(openai is None, reason="openai not installed")

View File

@@ -1,5 +1,6 @@
import sys
import os
import sys
import pytest
try:
@@ -15,8 +16,7 @@ here = os.path.abspath(os.path.dirname(__file__))
def run_notebook(input_nb, output_nb="executed_openai_notebook.ipynb", save=False):
import nbformat
from nbconvert.preprocessors import ExecutePreprocessor
from nbconvert.preprocessors import CellExecutionError
from nbconvert.preprocessors import CellExecutionError, ExecutePreprocessor
try:
nb_loc = os.path.join(here, os.pardir, os.pardir, "notebook")

View File

@@ -1,13 +1,14 @@
import unittest
from datetime import datetime
import numpy as np
import pandas as pd
import scipy.sparse
from sklearn.datasets import load_breast_cancer
from sklearn.model_selection import train_test_split
import pandas as pd
from datetime import datetime
from flaml import AutoML
from flaml import AutoML, tune
from flaml.automl.model import LGBMEstimator
from flaml import tune
class MyLargeLGBM(LGBMEstimator):
@@ -194,6 +195,22 @@ class TestClassification(unittest.TestCase):
automl.fit(X, y, **automl_settings)
del automl
automl = AutoML()
automl_settings = {
"time_budget": 3,
"task": "classification",
"n_jobs": 1,
"estimator_list": ["histgb"],
"eval_method": "cv",
"n_splits": 3,
"metric": "accuracy",
"log_training_metric": True,
# "verbose": 4,
"ensemble": True,
}
automl.fit(X, y, **automl_settings)
del automl
def test_binary(self):
automl_experiment = AutoML()
automl_settings = {

View File

@@ -1,10 +1,12 @@
from urllib.error import URLError
from sklearn.datasets import fetch_openml
from sklearn.model_selection import train_test_split
from sklearn.externals._arff import ArffException
from functools import partial
from flaml.automl import AutoML, size
from urllib.error import URLError
from sklearn.datasets import fetch_openml
from sklearn.externals._arff import ArffException
from sklearn.model_selection import train_test_split
from flaml import tune
from flaml.automl import AutoML, size
dataset = "credit-g"
@@ -71,9 +73,10 @@ def custom_metric(
weight_train,
*args,
):
from sklearn.metrics import log_loss
import time
from sklearn.metrics import log_loss
start = time.time()
y_pred = estimator.predict_proba(X_val)
pred_time = (time.time() - start) / len(X_val)

View File

@@ -1,11 +1,13 @@
import sys
import pytest
from flaml import AutoML, tune
@pytest.mark.skipif(sys.platform == "darwin", reason="do not run on mac os")
def test_custom_hp_nlp():
from test.nlp.utils import get_toy_data_seqclassification, get_automl_settings
from test.nlp.utils import get_automl_settings, get_toy_data_seqclassification
X_train, y_train, X_val, y_val, X_test = get_toy_data_seqclassification()

View File

@@ -4,7 +4,6 @@ import numpy as np
import pandas as pd
from flaml import AutoML
from flaml.automl.task.time_series_task import TimeSeriesTask
@@ -153,6 +152,7 @@ def test_numpy():
def test_numpy_large():
import numpy as np
import pandas as pd
from flaml import AutoML
X_train = pd.date_range("2017-01-01", periods=70000, freq="T")

View File

@@ -1,8 +1,9 @@
import mlflow
import mlflow.entities
import pytest
from pandas import DataFrame
from sklearn.datasets import load_iris
import mlflow
import mlflow.entities
from flaml import AutoML

View File

@@ -1,11 +1,12 @@
import unittest
import numpy as np
import scipy.sparse
from sklearn.datasets import load_iris, load_wine
from flaml import AutoML
from flaml import AutoML, tune
from flaml.automl.data import get_output_from_log
from flaml.automl.model import LGBMEstimator, XGBoostSklearnEstimator, SKLearnEstimator
from flaml import tune
from flaml.automl.model import LGBMEstimator, SKLearnEstimator, XGBoostSklearnEstimator
from flaml.automl.training_log import training_log_reader
@@ -112,9 +113,10 @@ def custom_metric(
groups_val=None,
groups_train=None,
):
from sklearn.metrics import log_loss
import time
from sklearn.metrics import log_loss
start = time.time()
y_pred = estimator.predict_proba(X_val)
pred_time = (time.time() - start) / len(X_val)
@@ -289,10 +291,10 @@ class TestMultiClass(unittest.TestCase):
estimator = automl_experiment_macro.model
y_pred = estimator.predict(X_train)
y_pred_proba = estimator.predict_proba(X_train)
from flaml.automl.ml import norm_confusion_matrix, multi_class_curves
from flaml.automl.ml import multi_class_curves, norm_confusion_matrix
print(norm_confusion_matrix(y_train, y_pred))
from sklearn.metrics import roc_curve, precision_recall_curve
from sklearn.metrics import precision_recall_curve, roc_curve
print(multi_class_curves(y_train, y_pred_proba, roc_curve))
print(multi_class_curves(y_train, y_pred_proba, precision_recall_curve))

View File

@@ -1,10 +1,9 @@
import nbformat
from nbconvert.preprocessors import ExecutePreprocessor
from nbconvert.preprocessors import CellExecutionError
import os
import sys
import pytest
import nbformat
import pytest
from nbconvert.preprocessors import CellExecutionError, ExecutePreprocessor
here = os.path.abspath(os.path.dirname(__file__))

View File

@@ -1,13 +1,15 @@
import sys
from minio.error import ServerError
from openml.exceptions import OpenMLServerException
from requests.exceptions import ChunkedEncodingError, SSLError
from minio.error import ServerError
def test_automl(budget=5, dataset_format="dataframe", hpo_method=None):
from flaml.automl.data import load_openml_dataset
import urllib3
from flaml.automl.data import load_openml_dataset
performance_check_budget = 600
if (
sys.platform == "darwin"
@@ -118,6 +120,7 @@ def _test_nobudget():
def test_mlflow():
# subprocess.check_call([sys.executable, "-m", "pip", "install", "mlflow"])
import mlflow
from flaml.automl.data import load_openml_task
try:
@@ -159,8 +162,9 @@ def test_mlflow():
def test_mlflow_iris():
from sklearn.datasets import load_iris
import mlflow
from sklearn.datasets import load_iris
from flaml import AutoML
with mlflow.start_run():

View File

@@ -1,11 +1,13 @@
from flaml.tune.space import unflatten_hierarchical
from flaml import AutoML
from sklearn.datasets import fetch_california_housing
import os
import unittest
import logging
import tempfile
import io
import logging
import os
import tempfile
import unittest
from sklearn.datasets import fetch_california_housing
from flaml import AutoML
from flaml.tune.space import unflatten_hierarchical
class TestLogging(unittest.TestCase):
@@ -49,7 +51,7 @@ class TestLogging(unittest.TestCase):
import optuna as ot
study = ot.create_study()
from flaml.tune.space import define_by_run_func, add_cost_to_space
from flaml.tune.space import add_cost_to_space, define_by_run_func
sample = define_by_run_func(study.ask(), automl.search_space)
logger.info(sample)
@@ -60,10 +62,11 @@ class TestLogging(unittest.TestCase):
config = automl.best_config.copy()
config["learner"] = automl.best_estimator
automl.trainable({"ml": config})
from flaml import tune, BlendSearch
from flaml.automl import size
from functools import partial
from flaml import BlendSearch, tune
from flaml.automl import size
low_cost_partial_config = automl.low_cost_partial_config
search_alg = BlendSearch(
metric="val_loss",

View File

@@ -1,4 +1,5 @@
import unittest
import numpy as np
import scipy.sparse
from sklearn.datasets import (

View File

@@ -1,7 +1,8 @@
from flaml import AutoML
import pandas as pd
from sklearn.datasets import fetch_california_housing, fetch_openml
from flaml import AutoML
class TestScore:
def test_forecast(self, budget=5):

View File

@@ -1,8 +1,8 @@
from sklearn.datasets import fetch_openml
from flaml.automl import AutoML
from sklearn.model_selection import GroupKFold, train_test_split, KFold
from sklearn.metrics import accuracy_score
from sklearn.model_selection import GroupKFold, KFold, train_test_split
from flaml.automl import AutoML
dataset = "credit-g"
@@ -89,8 +89,9 @@ def test_groups():
def test_stratified_groupkfold():
from sklearn.model_selection import StratifiedGroupKFold
from minio.error import ServerError
from sklearn.model_selection import StratifiedGroupKFold
from flaml.automl.data import load_openml_dataset
try:

Some files were not shown because too many files have changed in this diff Show More