mirror of
https://github.com/microsoft/FLAML.git
synced 2026-02-18 14:42:24 +08:00
Compare commits
17 Commits
| Author | SHA1 | Date | |
|---|---|---|---|
|
|
b3fba9734e | ||
|
|
8b2411b219 | ||
|
|
fd1f36597b | ||
|
|
00c30a398e | ||
|
|
31864d2d77 | ||
|
|
19aee67f55 | ||
|
|
39b9a9a417 | ||
|
|
6d7fb3d786 | ||
|
|
06cd3f52e5 | ||
|
|
73bb6e7667 | ||
|
|
a8752b6aa0 | ||
|
|
e9cd6a058c | ||
|
|
f097c20f86 | ||
|
|
fa5ccea862 | ||
|
|
7114b8f742 | ||
|
|
da0d8c05e1 | ||
|
|
99bb0a8425 |
5
.flake8
5
.flake8
@@ -1,5 +0,0 @@
|
||||
[flake8]
|
||||
ignore = E203, E266, E501, W503, F403, F401, C901
|
||||
max-line-length = 127
|
||||
max-complexity = 10
|
||||
select = B,C,E,F,W,T4,B9
|
||||
2
.github/PULL_REQUEST_TEMPLATE.md
vendored
2
.github/PULL_REQUEST_TEMPLATE.md
vendored
@@ -12,7 +12,7 @@
|
||||
|
||||
## Checks
|
||||
|
||||
- [ ] I've used [pre-commit](https://microsoft.github.io/FLAML/docs/Contribute#pre-commit) to lint the changes in this PR, or I've made sure [lint with flake8](https://github.com/microsoft/FLAML/blob/816a82a1155b4de4705b21a615ccdff67c6da379/.github/workflows/python-package.yml#L54-L59) output is two 0s.
|
||||
- I've used [pre-commit](https://microsoft.github.io/FLAML/docs/Contribute#pre-commit) to lint the changes in this PR (note the same in integrated in our CI checks).
|
||||
- [ ] I've included any doc changes needed for https://microsoft.github.io/FLAML/. See https://microsoft.github.io/FLAML/docs/Contribute#documentation to build and test documentation locally.
|
||||
- [ ] I've added tests (if relevant) corresponding to the changes introduced in this PR.
|
||||
- [ ] I've made sure all auto checks have passed.
|
||||
|
||||
4
.github/workflows/openai.yml
vendored
4
.github/workflows/openai.yml
vendored
@@ -29,10 +29,10 @@ jobs:
|
||||
python-version: ${{ matrix.python-version }}
|
||||
- name: Install packages and dependencies
|
||||
run: |
|
||||
docker --version
|
||||
python -m pip install --upgrade pip wheel
|
||||
pip install -e .
|
||||
pip install -e .[autogen,blendsearch]
|
||||
python -c "import flaml"
|
||||
pip install -e .[openai]
|
||||
- name: Coverage
|
||||
env:
|
||||
OPENAI_API_KEY: ${{ secrets.OPENAI_API_KEY }}
|
||||
|
||||
6
.github/workflows/python-package.yml
vendored
6
.github/workflows/python-package.yml
vendored
@@ -82,12 +82,6 @@ jobs:
|
||||
run: |
|
||||
# Uninstall pyspark to test env without pyspark
|
||||
pip uninstall -y pyspark
|
||||
- name: Lint with flake8
|
||||
run: |
|
||||
# stop the build if there are Python syntax errors or undefined names
|
||||
flake8 . --count --select=E9,F63,F7,F82 --show-source --statistics
|
||||
# exit-zero treats all errors as warnings. The GitHub editor is 127 chars wide
|
||||
flake8 . --count --exit-zero --max-complexity=10 --max-line-length=127 --statistics
|
||||
- name: Test with pytest
|
||||
if: (matrix.python-version != '3.7' || matrix.os == 'macos-latest') && matrix.python-version != '3.10'
|
||||
run: |
|
||||
|
||||
@@ -7,15 +7,6 @@ ci:
|
||||
autoupdate_schedule: 'quarterly'
|
||||
|
||||
repos:
|
||||
- repo: https://github.com/psf/black
|
||||
rev: 23.1.0
|
||||
hooks:
|
||||
- id: black
|
||||
args: ["--line-length=120"]
|
||||
- repo: https://github.com/pycqa/flake8
|
||||
rev: 6.0.0
|
||||
hooks:
|
||||
- id: flake8
|
||||
- repo: https://github.com/pre-commit/pre-commit-hooks
|
||||
rev: v4.4.0
|
||||
hooks:
|
||||
@@ -31,3 +22,12 @@ repos:
|
||||
- id: trailing-whitespace
|
||||
- id: end-of-file-fixer
|
||||
- id: no-commit-to-branch
|
||||
- repo: https://github.com/psf/black
|
||||
rev: 23.3.0
|
||||
hooks:
|
||||
- id: black
|
||||
- repo: https://github.com/charliermarsh/ruff-pre-commit
|
||||
rev: v0.0.261
|
||||
hooks:
|
||||
- id: ruff
|
||||
args: ["--fix"]
|
||||
|
||||
57
README.md
57
README.md
@@ -3,8 +3,8 @@
|
||||
[](https://github.com/microsoft/FLAML/actions/workflows/python-package.yml)
|
||||

|
||||
[](https://pepy.tech/project/flaml)
|
||||
[](https://gitter.im/FLAMLer/community?utm_source=badge&utm_medium=badge&utm_campaign=pr-badge&utm_content=badge)
|
||||
[](https://discord.gg/Cppx2vSPVP)
|
||||
<!-- [](https://gitter.im/FLAMLer/community?utm_source=badge&utm_medium=badge&utm_campaign=pr-badge&utm_content=badge) -->
|
||||
|
||||
|
||||
# A Fast Library for Automated Machine Learning & Tuning
|
||||
@@ -16,18 +16,16 @@
|
||||
|
||||
:fire: v1.2.0 is released with support for ChatGPT and GPT-4.
|
||||
|
||||
:fire: A [lab forum](https://github.com/microsoft/FLAML/tree/tutorial-aaai23/tutorial) on FLAML at AAAI 2023.
|
||||
|
||||
:fire: A [hands-on tutorial](https://github.com/microsoft/FLAML/tree/tutorial/tutorial) on FLAML presented at KDD 2022
|
||||
|
||||
## What is FLAML
|
||||
FLAML is a lightweight Python library that finds accurate machine
|
||||
learning models automatically, efficiently and economically. It frees users from selecting
|
||||
models and hyperparameters for each model. It can also be used to tune generic hyperparameters for foundation models, MLOps/LMOps workflows, pipelines, mathematical/statistical models, algorithms, computing experiments, software configurations and so on.
|
||||
FLAML is a lightweight Python library for efficient automation of machine
|
||||
learning, including selection of
|
||||
models, hyperparameters, and other tunable choices of an application (e.g., inference hyperparameters for foundation models, configurations in MLOps/LMOps workflows, pipelines, mathematical/statistical models, algorithms, computing experiments, software configurations).
|
||||
|
||||
1. For common machine learning or AI tasks like classification, regression, and generation, it quickly finds quality models for user-provided data with low computational resources. It supports both classical machine learning models and deep neural networks, including foundation models such as the GPT series.
|
||||
1. It is easy to customize or extend. Users can find their desired customizability from a smooth range: minimal customization (computational resource budget), medium customization (e.g., scikit-style learner, search space and metric), or full customization (arbitrary training and evaluation code).
|
||||
1. It supports fast automatic tuning, capable of handling complex constraints/guidance/early stopping. FLAML is powered by a new, [cost-effective
|
||||
* For foundation models like the GPT series, it automates the experimentation and optimization of their inference performance to maximize the effectiveness for downstream applications and minimize the inference cost.
|
||||
* For common machine learning tasks like classification and regression, it quickly finds quality models for user-provided data with low computational resources.
|
||||
* It is easy to customize or extend. Users can find their desired customizability from a smooth range: minimal customization (computational resource budget), medium customization (e.g., scikit-style learner, search space and metric), or full customization (arbitrary training/inference/evaluation code).
|
||||
* It supports fast automatic tuning, capable of handling complex constraints/guidance/early stopping. FLAML is powered by a [cost-effective
|
||||
hyperparameter optimization](https://microsoft.github.io/FLAML/docs/Use-Cases/Tune-User-Defined-Function/#hyperparameter-optimization-algorithm)
|
||||
and model selection method invented by Microsoft Research, and many followup [research studies](https://microsoft.github.io/FLAML/docs/Research).
|
||||
|
||||
@@ -61,6 +59,25 @@ Use the following guides to get started with FLAML in .NET:
|
||||
|
||||
## Quickstart
|
||||
|
||||
* (New) You can optimize [generations](https://microsoft.github.io/FLAML/docs/Use-Cases/Auto-Generation) by ChatGPT or GPT-4 etc. with your own tuning data, success metrics and budgets.
|
||||
|
||||
```python
|
||||
from flaml import oai
|
||||
|
||||
config, analysis = oai.Completion.tune(
|
||||
data=tune_data,
|
||||
metric="success",
|
||||
mode="max",
|
||||
eval_func=eval_func,
|
||||
inference_budget=0.05,
|
||||
optimization_budget=3,
|
||||
num_samples=-1,
|
||||
)
|
||||
```
|
||||
|
||||
The automated experimentation and optimization can help you maximize the utility out of these expensive models.
|
||||
A suite of utilities such as caching and templating are offered to accelerate the experimentation and application development.
|
||||
|
||||
* With three lines of code, you can start using this economical and fast
|
||||
AutoML engine as a [scikit-learn style estimator](https://microsoft.github.io/FLAML/docs/Use-Cases/Task-Oriented-AutoML).
|
||||
|
||||
@@ -95,33 +112,15 @@ estimator = LGBMRegressor()
|
||||
estimator.fit(X_train, y_train)
|
||||
```
|
||||
|
||||
* (New) You can optimize [generations](https://microsoft.github.io/FLAML/docs/Use-Cases/Auto-Generation) by ChatGPT or GPT-4 etc. with your own tuning data, success metrics and budgets.
|
||||
|
||||
```python
|
||||
from flaml import oai
|
||||
|
||||
config, analysis = oai.Completion.tune(
|
||||
data=tune_data,
|
||||
metric="success",
|
||||
mode="max",
|
||||
eval_func=eval_func,
|
||||
inference_budget=0.05,
|
||||
optimization_budget=3,
|
||||
num_samples=-1,
|
||||
)
|
||||
```
|
||||
|
||||
## Documentation
|
||||
|
||||
You can find a detailed documentation about FLAML [here](https://microsoft.github.io/FLAML/) where you can find the API documentation, use cases and examples.
|
||||
|
||||
In addition, you can find:
|
||||
|
||||
- [Talks](https://www.youtube.com/channel/UCfU0zfFXHXdAd5x-WvFBk5A) and [tutorials](https://github.com/microsoft/FLAML/tree/tutorial/tutorial) about FLAML.
|
||||
|
||||
- Research around FLAML [here](https://microsoft.github.io/FLAML/docs/Research).
|
||||
|
||||
- FAQ [here](https://microsoft.github.io/FLAML/docs/FAQ).
|
||||
- Discord [here](https://discord.gg/Cppx2vSPVP).
|
||||
|
||||
- Contributing guide [here](https://microsoft.github.io/FLAML/docs/Contribute).
|
||||
|
||||
|
||||
@@ -0,0 +1,2 @@
|
||||
DEFAULT_MODEL = "gpt-4"
|
||||
FAST_MODEL = "gpt-3.5-turbo"
|
||||
|
||||
43
flaml/autogen/agent/agent.py
Normal file
43
flaml/autogen/agent/agent.py
Normal file
@@ -0,0 +1,43 @@
|
||||
from collections import defaultdict
|
||||
|
||||
|
||||
class Agent:
|
||||
"""(Experimental) An abstract class for AI agent.
|
||||
An agent can communicate with other agents, human and perform actions.
|
||||
Different agents can differ in how and who they communicate with, and what actions they can perform. For example, an autonomous agent can communicate with human and other agents, and perform actions by creating agents and sending messages to other agents. A planning agent can communicate with other agents to make a plan and keep track of tasks. An execution agent can only communicate with other agents, and perform actions such as executing a command or code.
|
||||
"""
|
||||
|
||||
def __init__(self, name, system_message=""):
|
||||
# empty memory
|
||||
self._memory = []
|
||||
# a dictionary of conversations, default value is list
|
||||
self._conversations = defaultdict(list)
|
||||
self._name = name
|
||||
self._system_message = system_message
|
||||
|
||||
@property
|
||||
def name(self):
|
||||
"""Get the name of the agent."""
|
||||
return self._name
|
||||
|
||||
def _remember(self, memory):
|
||||
"""Remember something."""
|
||||
self._memory.append(memory)
|
||||
|
||||
def _send(self, message, recipient):
|
||||
"""Send a message to another agent."""
|
||||
self._conversations[recipient.name].append({"content": message, "role": "assistant"})
|
||||
recipient.receive(message, self)
|
||||
|
||||
def _receive(self, message, sender):
|
||||
"""Receive a message from another agent."""
|
||||
# print(self.name, "received message from", sender.name, ":", message)
|
||||
self._conversations[sender.name].append({"content": message, "role": "user"})
|
||||
|
||||
def receive(self, message, sender):
|
||||
"""Receive a message from another agent.
|
||||
This method is called by the sender.
|
||||
It needs to be overriden by the subclass to perform followup actions.
|
||||
"""
|
||||
self._receive(message, sender)
|
||||
# perform actions based on the message
|
||||
53
flaml/autogen/agent/coding_agent.py
Normal file
53
flaml/autogen/agent/coding_agent.py
Normal file
@@ -0,0 +1,53 @@
|
||||
from .agent import Agent
|
||||
from .execution_agent import ExecutionAgent
|
||||
from flaml.autogen.code_utils import generate_code, DEFAULT_MODEL
|
||||
from flaml import oai
|
||||
|
||||
|
||||
class PythonAgent(Agent):
|
||||
"""(Experimental) Suggest code blocks."""
|
||||
|
||||
DEFAULT_SYSTEM_MESSAGE = """You are a coding agent. You suggest python code for a user to execute for a given task. Don't suggest shell command. Output the code in a coding block. Check the execution result. If the result indicates there is an error, fix the error and output the code again.
|
||||
"""
|
||||
|
||||
DEFAULT_CONFIG = {
|
||||
"model": DEFAULT_MODEL,
|
||||
}
|
||||
EXECUTION_AGENT_PREFIX = "execution_agent4"
|
||||
SUCCESS_EXIT_CODE = "exitcode: 0\n"
|
||||
|
||||
def __init__(self, name, system_message=DEFAULT_SYSTEM_MESSAGE, work_dir=None, **config):
|
||||
super().__init__(name, system_message)
|
||||
self._work_dir = work_dir
|
||||
self._config = self.DEFAULT_CONFIG.copy()
|
||||
self._config.update(config)
|
||||
self._sender_dict = {}
|
||||
|
||||
def receive(self, message, sender):
|
||||
if sender.name not in self._sender_dict:
|
||||
self._sender_dict[sender.name] = sender
|
||||
self._conversations[sender.name] = [{"content": self._system_message, "role": "system"}]
|
||||
super().receive(message, sender)
|
||||
if sender.name.startswith(self.EXECUTION_AGENT_PREFIX) and message.startswith(self.SUCCESS_EXIT_CODE):
|
||||
# the code is correct, respond to the original sender
|
||||
name = sender.name[len(self.EXECUTION_AGENT_PREFIX) :]
|
||||
original_sender = self._sender_dict[name]
|
||||
output = message[len(self.SUCCESS_EXIT_CODE) :]
|
||||
if output:
|
||||
self._send(f"{output}", original_sender)
|
||||
else:
|
||||
self._send("Done. No output.", original_sender)
|
||||
return
|
||||
responses = oai.ChatCompletion.create(messages=self._conversations[sender.name], **self._config)
|
||||
# cost = oai.ChatCompletion.cost(responses)
|
||||
response = oai.ChatCompletion.extract_text(responses)[0]
|
||||
if sender.name.startswith(self.EXECUTION_AGENT_PREFIX):
|
||||
execution_agent = sender
|
||||
else:
|
||||
# create an execution agent
|
||||
execution_agent = ExecutionAgent(f"{self.EXECUTION_AGENT_PREFIX}{sender.name}", work_dir=self._work_dir)
|
||||
# initialize the conversation
|
||||
self._conversations[execution_agent.name] = self._conversations[sender.name].copy()
|
||||
self._sender_dict[execution_agent.name] = execution_agent
|
||||
# send the response to the execution agent
|
||||
self._send(response, execution_agent)
|
||||
24
flaml/autogen/agent/execution_agent.py
Normal file
24
flaml/autogen/agent/execution_agent.py
Normal file
@@ -0,0 +1,24 @@
|
||||
from .agent import Agent
|
||||
from flaml.autogen.code_utils import execute_code, extract_code
|
||||
|
||||
|
||||
class ExecutionAgent(Agent):
|
||||
"""(Experimental) Perform actions based on instructions from other agents.
|
||||
An execution agent can only communicate with other agents, and perform actions such as executing a command or code.
|
||||
"""
|
||||
|
||||
def __init__(self, name, system_message="", work_dir=None):
|
||||
super().__init__(name, system_message)
|
||||
self._word_dir = work_dir
|
||||
|
||||
def receive(self, message, sender):
|
||||
super().receive(message, sender)
|
||||
# extract code
|
||||
code, lang = extract_code(message)
|
||||
if lang == "bash":
|
||||
assert code.startswith("python ")
|
||||
file_name = code[len("python ") :]
|
||||
exitcode, logs = execute_code(filename=file_name, work_dir=self._word_dir)
|
||||
else:
|
||||
exitcode, logs = execute_code(code, work_dir=self._word_dir)
|
||||
self._send(f"exitcode: {exitcode}\n{logs.decode('utf-8')}", sender)
|
||||
@@ -1,56 +1,287 @@
|
||||
import signal
|
||||
import subprocess
|
||||
import sys
|
||||
import os
|
||||
import pathlib
|
||||
from typing import List, Dict, Tuple, Optional, Union, Callable
|
||||
from flaml import oai
|
||||
import re
|
||||
import time
|
||||
from hashlib import md5
|
||||
from flaml.autogen import oai, DEFAULT_MODEL, FAST_MODEL
|
||||
|
||||
# Regular expression for finding a code block
|
||||
CODE_BLOCK_PATTERN = r"```(\w*)\n(.*?)\n```"
|
||||
WORKING_DIR = os.path.join(os.path.dirname(os.path.realpath(__file__)), "extensions")
|
||||
|
||||
|
||||
def extract_code(text: str, pattern: str = CODE_BLOCK_PATTERN) -> str:
|
||||
# Use a regular expression to find the code block
|
||||
match = re.search(pattern, text, flags=re.DOTALL)
|
||||
# If a match is found, return the code
|
||||
if match:
|
||||
return match.group(2), match.group(1)
|
||||
# If no code block is found, return the whole text
|
||||
return text, "unknown"
|
||||
|
||||
|
||||
def generate_code(pattern: str = CODE_BLOCK_PATTERN, **config) -> Tuple[str, float]:
|
||||
"""Generate code.
|
||||
|
||||
Args:
|
||||
pattern (Optional, str): The regular expression pattern for finding the code block.
|
||||
The default pattern is for finding a code block in a markdown file.
|
||||
config (Optional, dict): The configuration for the API call.
|
||||
|
||||
Returns:
|
||||
str: The generated code.
|
||||
float: The cost of the generation.
|
||||
"""
|
||||
response = oai.Completion.create(**config)
|
||||
cost = oai.Completion.cost(response)
|
||||
return extract_code(oai.Completion.extract_text(response)[0], pattern), cost
|
||||
|
||||
|
||||
_IMPROVE_FUNCTION_CONFIG = {
|
||||
"prompt": """Improve the function '{func_name}' to achieve the objective '{objective}'.
|
||||
The current implementation of the function is as follows:
|
||||
{file_string}""",
|
||||
"model": DEFAULT_MODEL,
|
||||
"request_timeout": 300,
|
||||
}
|
||||
|
||||
|
||||
def improve_function(file_name, func_name, objective, **config):
|
||||
"""(work in progress) Improve the function to achieve the objective."""
|
||||
params = {**_IMPROVE_FUNCTION_CONFIG, **config}
|
||||
# read the entire file into a str
|
||||
with open(file_name, "r") as f:
|
||||
file_string = f.read()
|
||||
response = oai.Completion.create(
|
||||
{"func_name": func_name, "objective": objective, "file_string": file_string}, **params
|
||||
)
|
||||
cost = oai.Completion.cost(response)
|
||||
return oai.Completion.extract_text(response)[0], cost
|
||||
|
||||
|
||||
_IMPROVE_CODE_CONFIG = {
|
||||
"prompt": """Analyze the code in the following files and return a list of suggestions for improvement{followup}, to achieve the objective of '{objective}'.
|
||||
{code}
|
||||
""",
|
||||
"model": DEFAULT_MODEL,
|
||||
"request_timeout": 900,
|
||||
}
|
||||
|
||||
|
||||
def improve_code(files, objective, suggest_only=True, **config):
|
||||
"""Improve the code to achieve a given objective.
|
||||
|
||||
Args:
|
||||
files (list): A list of file names containing the source code.
|
||||
objective (str): The objective to achieve.
|
||||
suggest_only (bool): Whether to return only the suggestions or the improved code.
|
||||
config (Optional, dict): The configuration for the API call.
|
||||
|
||||
Returns:
|
||||
str: The improved code if suggest_only=False; a list of suggestions if suggest_only=True (default).
|
||||
float: The cost of the generation.
|
||||
"""
|
||||
code = ""
|
||||
for file_name in files:
|
||||
# read the entire file into a string
|
||||
with open(file_name, "r") as f:
|
||||
file_string = f.read()
|
||||
code += f"""{file_name}:
|
||||
{file_string}
|
||||
|
||||
"""
|
||||
params = {**_IMPROVE_CODE_CONFIG, **config}
|
||||
followup = "" if suggest_only else " followed by the improved code"
|
||||
response = oai.Completion.create({"objective": objective, "code": code, "followup": followup}, **params)
|
||||
cost = oai.Completion.cost(response)
|
||||
return oai.Completion.extract_text(response)[0], cost
|
||||
|
||||
|
||||
def timeout_handler(signum, frame):
|
||||
raise TimeoutError("Timed out!")
|
||||
|
||||
|
||||
def execute_code(code: str, max_exec_time: Optional[int] = 3):
|
||||
signal.signal(signal.SIGALRM, timeout_handler)
|
||||
code = code.strip()
|
||||
with open("codetest.py", "w") as fout:
|
||||
fout.write(code)
|
||||
try:
|
||||
signal.alarm(max_exec_time)
|
||||
result = subprocess.run(
|
||||
[sys.executable, "codetest.py"],
|
||||
stdout=subprocess.DEVNULL,
|
||||
stderr=subprocess.PIPE,
|
||||
)
|
||||
signal.alarm(0)
|
||||
except TimeoutError:
|
||||
return 0
|
||||
return int(result.returncode == 0)
|
||||
def execute_code(
|
||||
code: Optional[str] = None,
|
||||
timeout: Optional[int] = 600,
|
||||
filename: Optional[str] = None,
|
||||
work_dir: Optional[str] = None,
|
||||
use_docker: Optional[bool] = True,
|
||||
) -> Tuple[int, bytes]:
|
||||
"""Execute code in a docker container.
|
||||
This function is not tested on MacOS.
|
||||
|
||||
Args:
|
||||
code (Optional, str): The code to execute.
|
||||
If None, the code from the file specified by filename will be executed.
|
||||
Either code or filename must be provided.
|
||||
timeout (Optional, int): The maximum execution time in seconds.
|
||||
filename (Optional, str): The file name to save the code or where the code is stored when `code` is None.
|
||||
If None, a file with a randomly generated name will be created.
|
||||
The randomly generated file will be deleted after execution.
|
||||
The file name must be a relative path. Relative paths are relative to the working directory.
|
||||
work_dir (Optional, str): The working directory for the code execution.
|
||||
If None, a default working directory will be used.
|
||||
The default working directory is the "extensions" directory under
|
||||
"xxx/flaml/autogen", where "xxx" is the path to the flaml package.
|
||||
use_docker (Optional, bool): Whether to use a docker container for code execution.
|
||||
If True, the code will be executed in a docker container.
|
||||
If False, the code will be executed in the current environment.
|
||||
Default is True. If the code is executed in the current environment,
|
||||
the code must be trusted.
|
||||
|
||||
Returns:
|
||||
int: 0 if the code executes successfully.
|
||||
bytes: The error message if the code fails to execute; the stdout otherwise.
|
||||
"""
|
||||
assert code is not None or filename is not None, "Either code or filename must be provided."
|
||||
|
||||
original_filename = filename
|
||||
if filename is None:
|
||||
code_hash = md5(code.encode()).hexdigest()
|
||||
# create a file with a automatically generated name
|
||||
filename = f"tmp_code_{code_hash}.py"
|
||||
if work_dir is None:
|
||||
work_dir = WORKING_DIR
|
||||
filepath = os.path.join(work_dir, filename)
|
||||
file_dir = os.path.dirname(filepath)
|
||||
os.makedirs(file_dir, exist_ok=True)
|
||||
|
||||
if code is not None:
|
||||
with open(filepath, "w") as fout:
|
||||
fout.write(code)
|
||||
# check if already running in a docker container
|
||||
in_docker_container = os.path.exists("/.dockerenv")
|
||||
if not use_docker or in_docker_container:
|
||||
# already running in a docker container
|
||||
signal.signal(signal.SIGALRM, timeout_handler)
|
||||
try:
|
||||
signal.alarm(timeout)
|
||||
# run the code in a subprocess in the current docker container in the working directory
|
||||
result = subprocess.run(
|
||||
[sys.executable, filename],
|
||||
cwd=work_dir,
|
||||
capture_output=True,
|
||||
)
|
||||
signal.alarm(0)
|
||||
except TimeoutError:
|
||||
if original_filename is None:
|
||||
os.remove(filepath)
|
||||
return 1, "Timeout"
|
||||
if original_filename is None:
|
||||
os.remove(filepath)
|
||||
return result.returncode, result.stderr if result.returncode else result.stdout
|
||||
|
||||
import docker
|
||||
from requests.exceptions import ReadTimeout, ConnectionError
|
||||
|
||||
# create a docker client
|
||||
client = docker.from_env()
|
||||
image_list = ["python:3-alpine", "python:3", "python:3-windowsservercore"]
|
||||
for image in image_list:
|
||||
# check if the image exists
|
||||
try:
|
||||
client.images.get(image)
|
||||
break
|
||||
except docker.errors.ImageNotFound:
|
||||
# pull the image
|
||||
print("Pulling image", image)
|
||||
try:
|
||||
client.images.pull(image)
|
||||
break
|
||||
except docker.errors.DockerException:
|
||||
print("Failed to pull image", image)
|
||||
# get a randomized str based on current time to wrap the exit code
|
||||
exit_code_str = f"exitcode{time.time()}"
|
||||
abs_path = pathlib.Path(work_dir).absolute()
|
||||
# if sys.platform == "win32":
|
||||
# abs_path = str(abs_path).replace("\\", "/")
|
||||
# abs_path = f"/{abs_path[0].lower()}{abs_path[2:]}"
|
||||
# create a docker container
|
||||
container = client.containers.run(
|
||||
image,
|
||||
command=[
|
||||
"sh",
|
||||
"-c",
|
||||
f"python {filename}; exit_code=$?; echo -n {exit_code_str}; echo -n $exit_code; echo {exit_code_str}",
|
||||
],
|
||||
working_dir="/workspace",
|
||||
detach=True,
|
||||
# get absolute path to the working directory
|
||||
volumes={abs_path: {"bind": "/workspace", "mode": "rw"}},
|
||||
)
|
||||
start_time = time.time()
|
||||
while container.status != "exited" and time.time() - start_time < timeout:
|
||||
# Reload the container object
|
||||
container.reload()
|
||||
if container.status != "exited":
|
||||
container.stop()
|
||||
container.remove()
|
||||
if original_filename is None:
|
||||
os.remove(filepath)
|
||||
return 1, "Timeout"
|
||||
# try:
|
||||
# container.wait(timeout=timeout)
|
||||
# except (ReadTimeout, ConnectionError):
|
||||
# container.stop()
|
||||
# container.remove()
|
||||
# if original_filename is None:
|
||||
# os.remove(filepath)
|
||||
# return 1, "Timeout"
|
||||
# get the container logs
|
||||
logs = container.logs().decode("utf-8").rstrip()
|
||||
# remove the container
|
||||
container.remove()
|
||||
# check if the code executed successfully
|
||||
exit_code = container.attrs["State"]["ExitCode"]
|
||||
if exit_code == 0:
|
||||
# extract the exit code from the logs
|
||||
pattern = re.compile(f"{exit_code_str}(\\d+){exit_code_str}")
|
||||
match = pattern.search(logs)
|
||||
exit_code = int(match.group(1))
|
||||
# remove the exit code from the logs
|
||||
logs = pattern.sub("", logs)
|
||||
|
||||
logs = bytes(logs, "utf-8")
|
||||
if original_filename is None:
|
||||
os.remove(filepath)
|
||||
# return the exit code and logs
|
||||
return exit_code, logs
|
||||
|
||||
|
||||
def generate_assertions(definition: str, model: Optional[str] = "gpt-3.5-turbo") -> Tuple[str, float]:
|
||||
_GENERATE_ASSERTIONS_CONFIG = {
|
||||
"prompt": """Given the signature and docstring, write the exactly same number of assertion(s) for the provided example(s) in the docstring, without assertion messages.
|
||||
|
||||
func signature:
|
||||
{definition}
|
||||
assertions:""",
|
||||
"model": FAST_MODEL,
|
||||
"max_tokens": 256,
|
||||
"stop": "\n\n",
|
||||
}
|
||||
|
||||
|
||||
def generate_assertions(definition: str, **config) -> Tuple[str, float]:
|
||||
"""Generate assertions for a function.
|
||||
|
||||
Args:
|
||||
definition (str): The function definition, including the signature and docstr.
|
||||
model (str): The model used for generation.
|
||||
config (Optional, dict): The configuration for the API call.
|
||||
|
||||
Returns:
|
||||
str: The generated assertions.
|
||||
float: The cost of the generation.
|
||||
"""
|
||||
prompt = """Given the signature and docstring, write the exactly same number of assertion(s) for the provided example(s) in the docstring, without assertion messages.
|
||||
|
||||
func signature:
|
||||
{definition}
|
||||
assertions:"""
|
||||
params = {**_GENERATE_ASSERTIONS_CONFIG, **config}
|
||||
response = oai.Completion.create(
|
||||
{"definition": definition},
|
||||
model=model,
|
||||
prompt=prompt,
|
||||
max_tokens=256,
|
||||
stop="\n\n",
|
||||
**params,
|
||||
)
|
||||
cost = oai.Completion.cost(model, response)
|
||||
cost = oai.Completion.cost(response)
|
||||
assertions = oai.Completion.extract_text(response)[0]
|
||||
return assertions, cost
|
||||
|
||||
@@ -70,6 +301,8 @@ def eval_function_completions(
|
||||
test: Optional[str] = None,
|
||||
entry_point: Optional[str] = None,
|
||||
assertions: Optional[Union[str, Callable[[str], Tuple[str, float]]]] = None,
|
||||
timeout: Optional[float] = 3,
|
||||
use_docker: Optional[bool] = True,
|
||||
) -> Dict:
|
||||
"""Select a response from a list of responses for the function completion task (using generated assertions), and/or evaluate if the task is successful using a gold test.
|
||||
|
||||
@@ -80,6 +313,7 @@ def eval_function_completions(
|
||||
entry_point (Optional, str): The name of the function.
|
||||
assertions (Optional, str or Callable): The assertion code which serves as a filter of the responses, or an assertion generator.
|
||||
When provided, only the responses that pass the assertions will be considered for the actual test (if provided).
|
||||
timeout (Optional, float): The timeout for executing the code.
|
||||
|
||||
Returns:
|
||||
dict: The success metrics.
|
||||
@@ -95,7 +329,7 @@ def eval_function_completions(
|
||||
if response.startswith("def")
|
||||
else f"{definition}{response}\n{test}\ncheck({entry_point})"
|
||||
)
|
||||
success = execute_code(code)
|
||||
success = execute_code(code, timeout=timeout, use_docker=use_docker)[0] == 0
|
||||
success_list.append(success)
|
||||
return {
|
||||
"expected_success": 1 - pow(1 - sum(success_list) / n, n),
|
||||
@@ -112,7 +346,7 @@ def eval_function_completions(
|
||||
code = (
|
||||
f"{response}\n{assertions}" if response.startswith("def") else f"{definition}{response}\n{assertions}"
|
||||
)
|
||||
succeed_assertions = execute_code(code)
|
||||
succeed_assertions = execute_code(code, timeout=timeout, use_docker=use_docker)[0] == 0
|
||||
if succeed_assertions:
|
||||
break
|
||||
else:
|
||||
@@ -132,7 +366,7 @@ def eval_function_completions(
|
||||
if response.startswith("def")
|
||||
else f"{definition}{response}\n{test}\ncheck({entry_point})"
|
||||
)
|
||||
success = execute_code(code_test)
|
||||
success = execute_code(code_test, timeout=timeout, use_docker=use_docker)[0] == 0
|
||||
return {
|
||||
"index_selected": i,
|
||||
"succeed_assertions": succeed_assertions,
|
||||
@@ -142,9 +376,20 @@ def eval_function_completions(
|
||||
}
|
||||
|
||||
|
||||
_FUNC_COMPLETION_PROMPT = "# Python 3{definition}"
|
||||
_FUNC_COMPLETION_STOP = ["\nclass", "\ndef", "\nif", "\nprint"]
|
||||
_IMPLEMENT_CONFIGS = [
|
||||
{"model": FAST_MODEL, "prompt": _FUNC_COMPLETION_PROMPT, "temperature": 0, "seed": 0},
|
||||
{"model": FAST_MODEL, "prompt": _FUNC_COMPLETION_PROMPT, "stop": _FUNC_COMPLETION_STOP, "n": 7, "seed": 0},
|
||||
{"model": DEFAULT_MODEL, "prompt": _FUNC_COMPLETION_PROMPT, "temperature": 0, "seed": 1},
|
||||
{"model": DEFAULT_MODEL, "prompt": _FUNC_COMPLETION_PROMPT, "stop": _FUNC_COMPLETION_STOP, "n": 2, "seed": 2},
|
||||
{"model": DEFAULT_MODEL, "prompt": _FUNC_COMPLETION_PROMPT, "stop": _FUNC_COMPLETION_STOP, "n": 1, "seed": 2},
|
||||
]
|
||||
|
||||
|
||||
def implement(
|
||||
definition: str,
|
||||
configs: List[Dict],
|
||||
configs: Optional[List[Dict]] = None,
|
||||
assertions: Optional[Union[str, Callable[[str], Tuple[str, float]]]] = generate_assertions,
|
||||
) -> Tuple[str, float]:
|
||||
"""Implement a function from a definition.
|
||||
@@ -160,11 +405,12 @@ def implement(
|
||||
int: The index of the configuration which generates the implementation.
|
||||
"""
|
||||
cost = 0
|
||||
configs = configs or _IMPLEMENT_CONFIGS
|
||||
if len(configs) > 1 and callable(assertions):
|
||||
assertions, cost = assertions(definition)
|
||||
for i, config in enumerate(configs):
|
||||
response = oai.Completion.create({"definition": definition}, **config)
|
||||
cost += oai.Completion.cost(config["model"], response)
|
||||
cost += oai.Completion.cost(response)
|
||||
responses = oai.Completion.extract_text(response)
|
||||
metrics = eval_function_completions(responses, definition, assertions=assertions)
|
||||
assertions = metrics["assertions"]
|
||||
|
||||
0
flaml/autogen/extensions/__init__.py
Normal file
0
flaml/autogen/extensions/__init__.py
Normal file
@@ -1,4 +1,28 @@
|
||||
from typing import Optional
|
||||
from flaml.autogen import oai, DEFAULT_MODEL
|
||||
|
||||
_MATH_PROMPT = "{problem} Solve the problem carefully. Simplify your answer as much as possible. Put the final answer in \\boxed{{}}."
|
||||
_MATH_CONFIG = {
|
||||
"model": DEFAULT_MODEL,
|
||||
"prompt": _MATH_PROMPT,
|
||||
}
|
||||
|
||||
|
||||
def solve_problem(problem: str, **config) -> str:
|
||||
"""(work in progress) Solve the math problem.
|
||||
|
||||
Args:
|
||||
problem (str): The problem statement.
|
||||
config (Optional, dict): The configuration for the API call.
|
||||
|
||||
Returns:
|
||||
str: The solution to the problem.
|
||||
"""
|
||||
params = {**_MATH_CONFIG, **config}
|
||||
response = oai.Completion.create({"problem": problem}, **params)
|
||||
cost = oai.Completion.cost(response)
|
||||
results = eval_math_responses(oai.Completion.extract_text(response))
|
||||
return results.get("voted_answer"), cost
|
||||
|
||||
|
||||
def remove_boxed(string: str) -> Optional[str]:
|
||||
|
||||
@@ -4,6 +4,7 @@ import numpy as np
|
||||
import time
|
||||
from typing import List, Optional, Dict
|
||||
import sys
|
||||
import json
|
||||
from flaml import tune, BlendSearch
|
||||
from flaml.automl.logger import logger_formatter
|
||||
|
||||
@@ -17,11 +18,13 @@ try:
|
||||
APIConnectionError,
|
||||
Timeout,
|
||||
)
|
||||
from openai import Completion as openai_Completion
|
||||
import diskcache
|
||||
|
||||
ERROR = None
|
||||
except ImportError:
|
||||
ERROR = ImportError("please install flaml[openai] option to use the flaml.oai subpackage.")
|
||||
openai_Completion = object
|
||||
logger = logging.getLogger(__name__)
|
||||
if not logger.handlers:
|
||||
# Add the console handler.
|
||||
@@ -39,14 +42,15 @@ def get_key(config):
|
||||
Returns:
|
||||
tuple: A unique identifier which can be used as a key for a dict.
|
||||
"""
|
||||
if isinstance(config, dict):
|
||||
return tuple(get_key(x) for x in sorted(config.items()))
|
||||
if isinstance(config, list):
|
||||
return tuple(get_key(x) for x in config)
|
||||
return config
|
||||
# if isinstance(config, dict):
|
||||
# return tuple(get_key(x) for x in sorted(config.items()))
|
||||
# if isinstance(config, list):
|
||||
# return tuple(get_key(x) for x in config)
|
||||
# return config
|
||||
return json.dumps(config, sort_keys=True)
|
||||
|
||||
|
||||
class Completion:
|
||||
class Completion(openai_Completion):
|
||||
"""A class for OpenAI completion API.
|
||||
|
||||
It also supports: ChatCompletion, Azure OpenAI API.
|
||||
@@ -115,6 +119,8 @@ class Completion:
|
||||
_total_cost = 0
|
||||
optimization_budget = None
|
||||
|
||||
_history_dict = _count_create = None
|
||||
|
||||
@classmethod
|
||||
def set_cache(cls, seed=41, cache_path=".cache"):
|
||||
"""Set cache path.
|
||||
@@ -129,62 +135,113 @@ class Completion:
|
||||
cls.cache_path = f"{cache_path}/{seed}"
|
||||
|
||||
@classmethod
|
||||
def _get_response(cls, config: dict, eval_only=False, use_cache=True):
|
||||
def _book_keeping(cls, config: Dict, response):
|
||||
"""Book keeping for the created completions."""
|
||||
if cls._history_dict is None:
|
||||
return
|
||||
if cls._history_compact:
|
||||
value = {
|
||||
"created_at": [],
|
||||
"cost": [],
|
||||
}
|
||||
if "messages" in config:
|
||||
messages = config["messages"]
|
||||
if len(messages) > 1 and messages[-1]["role"] != "assistant":
|
||||
existing_key = get_key(messages[:-1])
|
||||
value = cls._history_dict.pop(existing_key, value)
|
||||
key = get_key(messages + [choice["message"] for choice in response["choices"]])
|
||||
else:
|
||||
key = get_key([config["prompt"]] + [choice.get("text") for choice in response["choices"]])
|
||||
value["created_at"].append(cls._count_create)
|
||||
value["cost"].append(cls.cost(response))
|
||||
cls._history_dict[key] = value
|
||||
cls._count_create += 1
|
||||
return
|
||||
cls._history_dict[cls._count_create] = {
|
||||
"request": config,
|
||||
"response": response.to_dict_recursive(),
|
||||
}
|
||||
cls._count_create += 1
|
||||
|
||||
@classmethod
|
||||
def _get_response(cls, config: Dict, eval_only=False, use_cache=True):
|
||||
"""Get the response from the openai api call.
|
||||
|
||||
Try cache first. If not found, call the openai api. If the api call fails, retry after retry_time.
|
||||
"""
|
||||
config = config.copy()
|
||||
openai.api_key = config.pop("api_key", openai.api_key)
|
||||
openai.api_base = config.pop("api_base", openai.api_base)
|
||||
openai.api_key_path = config.pop("api_key_path", openai.api_key_path)
|
||||
openai.api_type = config.pop("api_type", openai.api_type)
|
||||
openai.api_version = config.pop("api_version", openai.api_version)
|
||||
key = get_key(config)
|
||||
if use_cache:
|
||||
response = cls._cache.get(key, None)
|
||||
if response is not None and (response != -1 or not eval_only):
|
||||
# print("using cached response")
|
||||
cls._book_keeping(config, response)
|
||||
return response
|
||||
openai_completion = openai.ChatCompletion if config["model"] in cls.chat_models else openai.Completion
|
||||
start_time = time.time()
|
||||
request_timeout = cls.request_timeout
|
||||
while True:
|
||||
try:
|
||||
response = openai_completion.create(request_timeout=request_timeout, **config)
|
||||
cls._cache.set(key, response)
|
||||
return response
|
||||
if "request_timeout" in config:
|
||||
response = openai_completion.create(**config)
|
||||
else:
|
||||
response = openai_completion.create(request_timeout=request_timeout, **config)
|
||||
except (
|
||||
ServiceUnavailableError,
|
||||
APIError,
|
||||
APIConnectionError,
|
||||
):
|
||||
# transient error
|
||||
logger.warning(f"retrying in {cls.retry_time} seconds...", exc_info=1)
|
||||
sleep(cls.retry_time)
|
||||
except (RateLimitError, Timeout) as e:
|
||||
except APIError as err:
|
||||
error_code = err and err.json_body and err.json_body.get("error")
|
||||
error_code = error_code and error_code.get("code")
|
||||
if error_code == "content_filter":
|
||||
raise
|
||||
# transient error
|
||||
logger.warning(f"retrying in {cls.retry_time} seconds...", exc_info=1)
|
||||
sleep(cls.retry_time)
|
||||
except (RateLimitError, Timeout) as err:
|
||||
time_left = cls.retry_timeout - (time.time() - start_time + cls.retry_time)
|
||||
if (
|
||||
time_left > 0
|
||||
and isinstance(e, RateLimitError)
|
||||
and isinstance(err, RateLimitError)
|
||||
or time_left > request_timeout
|
||||
and isinstance(e, Timeout)
|
||||
and isinstance(err, Timeout)
|
||||
):
|
||||
logger.info(f"retrying in {cls.retry_time} seconds...", exc_info=1)
|
||||
elif eval_only:
|
||||
raise
|
||||
else:
|
||||
break
|
||||
if isinstance(e, Timeout):
|
||||
if isinstance(err, Timeout):
|
||||
if "request_timeout" in config:
|
||||
raise
|
||||
request_timeout <<= 1
|
||||
request_timeout = min(request_timeout, time_left)
|
||||
sleep(cls.retry_time)
|
||||
except InvalidRequestError:
|
||||
if "azure" == openai.api_type and "model" in config:
|
||||
# azure api uses "engine" instead of "model"
|
||||
config = config.copy()
|
||||
config["engine"] = config.pop("model").replace("gpt-3.5-turbo", "gpt-35-turbo")
|
||||
else:
|
||||
raise
|
||||
else:
|
||||
if use_cache:
|
||||
cls._cache.set(key, response)
|
||||
cls._book_keeping(config, response)
|
||||
return response
|
||||
logger.warning(
|
||||
f"Failed to get response from openai api due to getting RateLimitError or Timeout for {cls.retry_timeout} seconds."
|
||||
)
|
||||
response = -1
|
||||
cls._cache.set(key, response)
|
||||
if use_cache:
|
||||
cls._cache.set(key, response)
|
||||
return response
|
||||
|
||||
@classmethod
|
||||
@@ -619,17 +676,56 @@ class Completion:
|
||||
return params, analysis
|
||||
|
||||
@classmethod
|
||||
def create(cls, context: Optional[Dict] = None, use_cache: Optional[bool] = True, **config):
|
||||
def create(
|
||||
cls,
|
||||
context: Optional[Dict] = None,
|
||||
use_cache: Optional[bool] = True,
|
||||
config_list: Optional[List] = None,
|
||||
**config,
|
||||
):
|
||||
"""Make a completion for a given context.
|
||||
|
||||
Args:
|
||||
context (dict, Optional): The context to instantiate the prompt.
|
||||
context (Dict, Optional): The context to instantiate the prompt.
|
||||
It needs to contain keys that are used by the prompt template.
|
||||
E.g., `prompt="Complete the following sentence: {prefix}"`.
|
||||
`context={"prefix": "Today I feel"}`.
|
||||
The actual prompt sent to OpenAI will be:
|
||||
E.g., `prompt="Complete the following sentence: {prefix}, context={"prefix": "Today I feel"}`.
|
||||
The actual prompt will be:
|
||||
"Complete the following sentence: Today I feel".
|
||||
More examples can be found at [templating](/docs/Use-Cases/Auto-Generation#templating).
|
||||
use_cache (bool, Optional): Whether to use cached responses.
|
||||
config_list (List, Optional): List of configurations for the completion to try.
|
||||
The first one that does not raise an error will be used.
|
||||
Only the differences from the default config need to be provided.
|
||||
E.g.,
|
||||
|
||||
```python
|
||||
response = oai.Completion.create(
|
||||
config_list=[
|
||||
{
|
||||
"model": "gpt-4",
|
||||
"api_key": os.environ.get("AZURE_OPENAI_API_KEY"),
|
||||
"api_type": "azure",
|
||||
"api_base": os.environ.get("AZURE_OPENAI_API_BASE"),
|
||||
"api_version": "2023-03-15-preview",
|
||||
},
|
||||
{
|
||||
"model": "gpt-3.5-turbo",
|
||||
"api_key": os.environ.get("OPENAI_API_KEY"),
|
||||
"api_type": "open_ai",
|
||||
"api_base": "https://api.openai.com/v1",
|
||||
"api_version": None,
|
||||
},
|
||||
{
|
||||
"model": "llama-7B",
|
||||
"api_base": "http://127.0.0.1:8080",
|
||||
"api_type": "open_ai",
|
||||
"api_version": None,
|
||||
}
|
||||
],
|
||||
prompt="Hi",
|
||||
)
|
||||
```
|
||||
|
||||
**config: Configuration for the completion.
|
||||
Besides the parameters for the openai API call, it can also contain a seed (int) for the cache.
|
||||
This is useful when implementing "controlled randomness" for the completion.
|
||||
@@ -640,6 +736,21 @@ class Completion:
|
||||
"""
|
||||
if ERROR:
|
||||
raise ERROR
|
||||
if config_list:
|
||||
retry_timeout = cls.retry_timeout
|
||||
for i, each_config in enumerate(config_list):
|
||||
base_config = config.copy()
|
||||
base_config.update(each_config)
|
||||
try:
|
||||
cls.retry_timeout = 0 if i < len(config_list) - 1 else retry_timeout
|
||||
# retry_timeout = 0 to avoid retrying
|
||||
return cls.create(context, use_cache, **base_config)
|
||||
except (RateLimitError, Timeout):
|
||||
logger.info(f"failed with config {i}", exc_info=1)
|
||||
if i == len(config_list) - 1:
|
||||
raise
|
||||
finally:
|
||||
cls.retry_timeout = retry_timeout
|
||||
params = cls._construct_params(context, config)
|
||||
if not use_cache:
|
||||
return cls._get_response(params, eval_only=True, use_cache=False)
|
||||
@@ -764,13 +875,12 @@ class Completion:
|
||||
result_agg, responses_list, result_list = {}, [], []
|
||||
metric_keys = None
|
||||
cost = 0
|
||||
model = config["model"]
|
||||
old_level = logger.getEffectiveLevel()
|
||||
logger.setLevel(logging_level)
|
||||
for i, data_i in enumerate(data):
|
||||
logger.info(f"evaluating data instance {i}")
|
||||
response = cls.create(data_i, use_cache, **config)
|
||||
cost += cls.cost(model, response)
|
||||
cost += cls.cost(response)
|
||||
# evaluate the quality of the responses
|
||||
responses = cls.extract_text(response)
|
||||
if eval_func is not None:
|
||||
@@ -829,16 +939,16 @@ class Completion:
|
||||
return result_agg
|
||||
|
||||
@classmethod
|
||||
def cost(cls, model: str, response: dict):
|
||||
def cost(cls, response: dict):
|
||||
"""Compute the cost of an API call.
|
||||
|
||||
Args:
|
||||
model (str): The model name.
|
||||
response (dict): The response from OpenAI API.
|
||||
|
||||
Returns:
|
||||
The cost in USD.
|
||||
"""
|
||||
model = response["model"]
|
||||
if model not in cls.price1K:
|
||||
raise ValueError(f"Unknown model: {model}")
|
||||
usage = response["usage"]
|
||||
@@ -864,6 +974,68 @@ class Completion:
|
||||
return [choice["text"] for choice in choices]
|
||||
return [choice["message"].get("content", "") for choice in choices]
|
||||
|
||||
@classmethod
|
||||
@property
|
||||
def logged_history(cls) -> Dict:
|
||||
"""Return the book keeping dictionary."""
|
||||
return cls._history_dict
|
||||
|
||||
@classmethod
|
||||
def start_logging(
|
||||
cls, history_dict: Optional[Dict] = None, compact: Optional[bool] = True, reset_counter: Optional[bool] = True
|
||||
):
|
||||
"""Start book keeping.
|
||||
|
||||
Args:
|
||||
history_dict (Dict): A dictionary for book keeping.
|
||||
If no provided, a new one will be created.
|
||||
compact (bool): Whether to keep the history dictionary compact.
|
||||
Compact history contains one key per conversation, and the value is a dictionary
|
||||
like:
|
||||
```python
|
||||
{
|
||||
"create_at": [0, 1],
|
||||
"cost": [0.1, 0.2],
|
||||
}
|
||||
```
|
||||
where "created_at" is the index of API calls indicating the order of all the calls,
|
||||
and "cost" is the cost of each call. This example shows that the conversation is based
|
||||
on two API calls. The compact format is useful for condensing the history of a conversation.
|
||||
If compact is False, the history dictionary will contain all the API calls: the key
|
||||
is the index of the API call, and the value is a dictionary like:
|
||||
```python
|
||||
{
|
||||
"request": request_dict,
|
||||
"response": response_dict,
|
||||
}
|
||||
```
|
||||
where request_dict is the request sent to OpenAI API, and response_dict is the response.
|
||||
For a conversation containing two API calls, the non-compact history dictionary will be like:
|
||||
```python
|
||||
{
|
||||
0: {
|
||||
"request": request_dict_0,
|
||||
"response": response_dict_0,
|
||||
},
|
||||
1: {
|
||||
"request": request_dict_1,
|
||||
"response": response_dict_1,
|
||||
},
|
||||
```
|
||||
The first request's messages plus the response is equal to the second request's messages.
|
||||
For a conversation with many turns, the non-compact history dictionary has a quadratic size
|
||||
while the compact history dict has a linear size.
|
||||
reset_counter (bool): whether to reset the counter of the number of API calls.
|
||||
"""
|
||||
cls._history_dict = {} if history_dict is None else history_dict
|
||||
cls._history_compact = compact
|
||||
cls._count_create = 0 if reset_counter or cls._count_create is None else cls._count_create
|
||||
|
||||
@classmethod
|
||||
def stop_logging(cls):
|
||||
"""End book keeping."""
|
||||
cls._history_dict = cls._count_create = None
|
||||
|
||||
|
||||
class ChatCompletion(Completion):
|
||||
"""A class for OpenAI API ChatCompletion."""
|
||||
|
||||
@@ -341,6 +341,9 @@ class AutoML(BaseEstimator):
|
||||
}
|
||||
}
|
||||
```
|
||||
mlflow_logging: boolean, default=True | Whether to log the training results to mlflow.
|
||||
This requires mlflow to be installed and to have an active mlflow run.
|
||||
FLAML will create nested runs.
|
||||
|
||||
"""
|
||||
self._track_iter = 0
|
||||
@@ -390,6 +393,7 @@ class AutoML(BaseEstimator):
|
||||
settings["fit_kwargs_by_estimator"] = settings.get("fit_kwargs_by_estimator", {})
|
||||
settings["custom_hp"] = settings.get("custom_hp", {})
|
||||
settings["skip_transform"] = settings.get("skip_transform", False)
|
||||
settings["mlflow_logging"] = settings.get("mlflow_logging", True)
|
||||
|
||||
self._estimator_type = "classifier" if settings["task"] in CLASSIFICATION else "regressor"
|
||||
|
||||
@@ -1213,6 +1217,7 @@ class AutoML(BaseEstimator):
|
||||
custom_hp=None,
|
||||
cv_score_agg_func=None,
|
||||
skip_transform=None,
|
||||
mlflow_logging=None,
|
||||
fit_kwargs_by_estimator=None,
|
||||
**fit_kwargs,
|
||||
):
|
||||
@@ -1474,6 +1479,11 @@ class AutoML(BaseEstimator):
|
||||
```
|
||||
|
||||
skip_transform: boolean, default=False | Whether to pre-process data prior to modeling.
|
||||
mlflow_logging: boolean, default=None | Whether to log the training results to mlflow.
|
||||
Default value is None, which means the logging decision is made based on
|
||||
AutoML.__init__'s mlflow_logging argument.
|
||||
This requires mlflow to be installed and to have an active mlflow run.
|
||||
FLAML will create nested runs.
|
||||
fit_kwargs_by_estimator: dict, default=None | The user specified keywords arguments, grouped by estimator name.
|
||||
For TransformersEstimator, available fit_kwargs can be found from
|
||||
[TrainingArgumentsForAuto](nlp/huggingface/training_args).
|
||||
@@ -1659,6 +1669,7 @@ class AutoML(BaseEstimator):
|
||||
self._state.fit_kwargs = fit_kwargs
|
||||
custom_hp = custom_hp or self._settings.get("custom_hp")
|
||||
self._skip_transform = self._settings.get("skip_transform") if skip_transform is None else skip_transform
|
||||
self._mlflow_logging = self._settings.get("mlflow_logging") if mlflow_logging is None else mlflow_logging
|
||||
fit_kwargs_by_estimator = fit_kwargs_by_estimator or self._settings.get("fit_kwargs_by_estimator")
|
||||
self._state.fit_kwargs_by_estimator = fit_kwargs_by_estimator.copy() # shallow copy of fit_kwargs_by_estimator
|
||||
self._state.weight_val = sample_weight_val
|
||||
@@ -2139,7 +2150,7 @@ class AutoML(BaseEstimator):
|
||||
estimator,
|
||||
search_state.sample_size,
|
||||
)
|
||||
if mlflow is not None and mlflow.active_run():
|
||||
if self._mlflow_logging and mlflow is not None and mlflow.active_run():
|
||||
with mlflow.start_run(nested=True):
|
||||
mlflow.log_metric("iter_counter", self._track_iter)
|
||||
if (search_state.metric_for_logging is not None) and (
|
||||
|
||||
@@ -1135,9 +1135,8 @@ class TransformersEstimator(BaseEstimator):
|
||||
predictions = new_trainer.predict(test_dataset).predictions
|
||||
except ZeroDivisionError:
|
||||
logger.warning("Zero division error appeared in HuggingFace Transformers.")
|
||||
predictions = np.array([-0.05] * len(test_dataset))
|
||||
else:
|
||||
return predictions
|
||||
predictions = None
|
||||
return predictions
|
||||
|
||||
def score(self, X_val: DataFrame, y_val: Series, **kwargs):
|
||||
import transformers
|
||||
@@ -1169,14 +1168,13 @@ class TransformersEstimator(BaseEstimator):
|
||||
|
||||
kwargs = {} if self._task not in NLG_TASKS else {"metric_key_prefix": "predict"}
|
||||
try:
|
||||
predictions = new_trainer.predict(test_dataset, **kwargs)
|
||||
predictions = new_trainer.predict(test_dataset, **kwargs).predictions
|
||||
except ZeroDivisionError:
|
||||
logger.warning("Zero division error appeared in HuggingFace Transformers.")
|
||||
predictions = np.array([0] * len(test_dataset))
|
||||
|
||||
predictions = None
|
||||
post_y_pred, _ = postprocess_prediction_and_true(
|
||||
task=self._task,
|
||||
y_pred=predictions.predictions,
|
||||
y_pred=predictions,
|
||||
tokenizer=self.tokenizer,
|
||||
hf_args=self._training_args,
|
||||
X=X,
|
||||
@@ -2326,10 +2324,7 @@ class HoltWinters(ARIMA):
|
||||
if self.params["trend"] == "mul" and (train_df.y == 0).sum() > 0:
|
||||
self.params["trend"] = "add"
|
||||
|
||||
if not self.params["seasonal"] or not self.params["trend"] in [
|
||||
"mul",
|
||||
"add",
|
||||
]:
|
||||
if not self.params["seasonal"] or self.params["trend"] not in ["mul", "add"]:
|
||||
self.params["damped_trend"] = False
|
||||
|
||||
model = HWExponentialSmoothing(
|
||||
|
||||
@@ -311,6 +311,8 @@ def tokenize_swag(this_row, tokenizer, hf_args=None, return_column_name=False):
|
||||
|
||||
def postprocess_prediction_and_true(task, y_pred, tokenizer, hf_args, y_true=None, X=None):
|
||||
# postprocess the matrix prediction y_pred and ground truth y_true into user readable format, e.g., for summarization, decode into text
|
||||
if y_pred is None:
|
||||
return np.array([0.0] * len(X)), y_true
|
||||
if task == SEQCLASSIFICATION:
|
||||
return np.argmax(y_pred, axis=1), y_true
|
||||
elif task == SEQREGRESSION:
|
||||
|
||||
@@ -647,14 +647,10 @@ def run(
|
||||
time_start = time.time()
|
||||
try:
|
||||
FLAML_MAX_CONCURRENT = int(os.getenv("FLAML_MAX_CONCURRENT", 0))
|
||||
num_executors = max(num_executors, FLAML_MAX_CONCURRENT, 1)
|
||||
except ValueError:
|
||||
FLAML_MAX_CONCURRENT = 0
|
||||
max_spark_parallelism = (
|
||||
min(spark.sparkContext.defaultParallelism, FLAML_MAX_CONCURRENT)
|
||||
if FLAML_MAX_CONCURRENT > 0
|
||||
else spark.sparkContext.defaultParallelism
|
||||
)
|
||||
num_executors = max(num_executors, FLAML_MAX_CONCURRENT, 1)
|
||||
max_spark_parallelism = max(spark.sparkContext.defaultParallelism, FLAML_MAX_CONCURRENT)
|
||||
if scheduler:
|
||||
scheduler.set_search_properties(metric=metric, mode=mode)
|
||||
if isinstance(search_alg, ConcurrencyLimiter):
|
||||
|
||||
@@ -1 +1 @@
|
||||
__version__ = "1.2.1"
|
||||
__version__ = "1.2.3"
|
||||
|
||||
@@ -21,9 +21,9 @@
|
||||
"\n",
|
||||
"## Requirements\n",
|
||||
"\n",
|
||||
"FLAML requires `Python>=3.7`. To run this notebook example, please install flaml with the [openai] option:\n",
|
||||
"FLAML requires `Python>=3.7`. To run this notebook example, please install flaml with the [openai,blendsearch] option:\n",
|
||||
"```bash\n",
|
||||
"pip install flaml[openai]==1.2.0\n",
|
||||
"pip install flaml[openai,blendsearch]==1.2.2\n",
|
||||
"```"
|
||||
]
|
||||
},
|
||||
@@ -40,7 +40,7 @@
|
||||
},
|
||||
"outputs": [],
|
||||
"source": [
|
||||
"# %pip install flaml[openai]==1.2.0 datasets"
|
||||
"# %pip install flaml[openai,blendsearch]==1.2.2 datasets"
|
||||
]
|
||||
},
|
||||
{
|
||||
|
||||
@@ -21,9 +21,9 @@
|
||||
"\n",
|
||||
"## Requirements\n",
|
||||
"\n",
|
||||
"FLAML requires `Python>=3.7`. To run this notebook example, please install flaml with the [openai] option:\n",
|
||||
"FLAML requires `Python>=3.7`. To run this notebook example, please install flaml with the [autogen,blendsearch] option:\n",
|
||||
"```bash\n",
|
||||
"pip install flaml[openai]==1.2.0\n",
|
||||
"pip install flaml[autogen,blendsearch]==1.2.2\n",
|
||||
"```"
|
||||
]
|
||||
},
|
||||
@@ -40,7 +40,7 @@
|
||||
},
|
||||
"outputs": [],
|
||||
"source": [
|
||||
"# %pip install flaml[openai]==1.2.0 datasets"
|
||||
"# %pip install flaml[autogen,blendsearch]==1.2.2 datasets"
|
||||
]
|
||||
},
|
||||
{
|
||||
@@ -297,7 +297,13 @@
|
||||
"from functools import partial\n",
|
||||
"from flaml.autogen.code_utils import eval_function_completions, generate_assertions\n",
|
||||
"\n",
|
||||
"eval_with_generated_assertions = partial(eval_function_completions, assertions=generate_assertions)"
|
||||
"eval_with_generated_assertions = partial(\n",
|
||||
" eval_function_completions,\n",
|
||||
" assertions=generate_assertions,\n",
|
||||
" use_docker=False,\n",
|
||||
" # Please set use_docker=True if you have docker available to run the generated code.\n",
|
||||
" # Using docker is safer than running the generated code directly.\n",
|
||||
")\n"
|
||||
]
|
||||
},
|
||||
{
|
||||
|
||||
File diff suppressed because one or more lines are too long
@@ -19,9 +19,9 @@
|
||||
"\n",
|
||||
"## Requirements\n",
|
||||
"\n",
|
||||
"FLAML requires `Python>=3.7`. To run this notebook example, please install flaml with the [openai] option:\n",
|
||||
"FLAML requires `Python>=3.7`. To run this notebook example, please install flaml with the [autogen] option:\n",
|
||||
"```bash\n",
|
||||
"pip install flaml[openai]==1.2.0\n",
|
||||
"pip install flaml[autogen]==1.2.2\n",
|
||||
"```"
|
||||
]
|
||||
},
|
||||
@@ -38,7 +38,7 @@
|
||||
},
|
||||
"outputs": [],
|
||||
"source": [
|
||||
"# %pip install flaml[openai]==1.2.0 datasets"
|
||||
"# %pip install flaml[autogen]==1.2.2 datasets"
|
||||
]
|
||||
},
|
||||
{
|
||||
@@ -381,7 +381,7 @@
|
||||
"success = 0\n",
|
||||
"for i, d in enumerate(data):\n",
|
||||
" response, cost_i, j = implement(d[\"definition\"], configs)\n",
|
||||
" metrics = eval_function_completions(responses=[response], **d)\n",
|
||||
" metrics = eval_function_completions(responses=[response], use_docker=False, **d)\n",
|
||||
" success += metrics[\"success\"]\n",
|
||||
" cost += cost_i\n",
|
||||
" print(f\"Example {i}, config {j}, success {success}\")\n",
|
||||
|
||||
@@ -21,7 +21,7 @@
|
||||
"\n",
|
||||
"FLAML requires `Python>=3.7`. To run this notebook example, please install flaml with the [openai] option:\n",
|
||||
"```bash\n",
|
||||
"pip install flaml[openai]==1.2.0\n",
|
||||
"pip install flaml[openai]==1.2.2\n",
|
||||
"```"
|
||||
]
|
||||
},
|
||||
@@ -38,7 +38,7 @@
|
||||
},
|
||||
"outputs": [],
|
||||
"source": [
|
||||
"# %pip install flaml[openai]==1.2.0 datasets"
|
||||
"# %pip install flaml[openai]==1.2.2 datasets"
|
||||
]
|
||||
},
|
||||
{
|
||||
|
||||
51
pyproject.toml
Normal file
51
pyproject.toml
Normal file
@@ -0,0 +1,51 @@
|
||||
[metadata]
|
||||
license_file = "LICENSE"
|
||||
description-file = "README.md"
|
||||
|
||||
|
||||
[tool.pytest.ini_options]
|
||||
addopts = '-m "not conda"'
|
||||
markers = [
|
||||
"conda: test related to conda forge distribution"
|
||||
]
|
||||
|
||||
[tool.black]
|
||||
# https://github.com/psf/black
|
||||
line-length = 120
|
||||
exclude = "(.eggs|.git|.hg|.mypy_cache|.venv|_build|buck-out|build|dist)"
|
||||
|
||||
|
||||
[tool.ruff]
|
||||
line-length = 120
|
||||
# Enable Pyflakes `E` and `F` codes by default.
|
||||
select = [
|
||||
"E", "W", # see: https://pypi.org/project/pycodestyle
|
||||
"F", # see: https://pypi.org/project/pyflakes
|
||||
# "D", # see: https://pypi.org/project/pydocstyle
|
||||
# "N", # see: https://pypi.org/project/pep8-naming
|
||||
# "S", # see: https://pypi.org/project/flake8-bandit
|
||||
]
|
||||
ignore = [
|
||||
"E501",
|
||||
"F401",
|
||||
"F403",
|
||||
"C901",
|
||||
]
|
||||
# Exclude a variety of commonly ignored directories.
|
||||
exclude = [
|
||||
".eggs",
|
||||
".git",
|
||||
".mypy_cache",
|
||||
".ruff_cache",
|
||||
"__pypackages__",
|
||||
"_build",
|
||||
"build",
|
||||
"dist",
|
||||
"docs"
|
||||
]
|
||||
ignore-init-module-imports = true
|
||||
unfixable = ["F401"]
|
||||
|
||||
[tool.ruff.mccabe]
|
||||
# Unlike Flake8, default to a complexity level of 10.
|
||||
max-complexity = 10
|
||||
@@ -1,4 +0,0 @@
|
||||
[pytest]
|
||||
addopts = -m "not conda"
|
||||
markers =
|
||||
conda: test related to conda forge distribution
|
||||
7
setup.py
7
setup.py
@@ -49,14 +49,13 @@ setuptools.setup(
|
||||
"joblibspark>=0.5.0",
|
||||
],
|
||||
"test": [
|
||||
"flake8>=3.8.4",
|
||||
"thop",
|
||||
"pytest>=6.1.1",
|
||||
"coverage>=5.3",
|
||||
"pre-commit",
|
||||
"torch",
|
||||
"torchvision",
|
||||
"catboost>=0.26",
|
||||
"catboost>=0.26,<1.2",
|
||||
"rgf-python",
|
||||
"optuna==2.8.0",
|
||||
"openml==0.10.2",
|
||||
@@ -77,6 +76,7 @@ setuptools.setup(
|
||||
"nbformat",
|
||||
"ipykernel",
|
||||
"pytorch-lightning<1.9.1", # test_forecast_panel
|
||||
"requests<2.29.0", # https://github.com/docker/docker-py/issues/3113
|
||||
],
|
||||
"catboost": ["catboost>=0.26"],
|
||||
"blendsearch": ["optuna==2.8.0"],
|
||||
@@ -120,7 +120,8 @@ setuptools.setup(
|
||||
"pytorch-forecasting>=0.9.0",
|
||||
],
|
||||
"benchmark": ["catboost>=0.26", "psutil==5.8.0", "xgboost==1.3.3"],
|
||||
"openai": ["openai==0.27.4", "diskcache", "optuna==2.8.0"],
|
||||
"openai": ["openai==0.27.4", "diskcache"],
|
||||
"autogen": ["openai==0.27.4", "diskcache", "docker"],
|
||||
"synapse": ["joblibspark>=0.5.0", "optuna==2.8.0", "pyspark>=3.2.0"],
|
||||
},
|
||||
classifiers=[
|
||||
|
||||
0
test/autogen/extensions/__init__.py
Normal file
0
test/autogen/extensions/__init__.py
Normal file
77
test/autogen/extensions/tsp.py
Normal file
77
test/autogen/extensions/tsp.py
Normal file
@@ -0,0 +1,77 @@
|
||||
"""Solve a non-symmetric TSP problem.
|
||||
|
||||
Triangular inequality is not required in this problem.
|
||||
"""
|
||||
import math
|
||||
import pdb
|
||||
import random
|
||||
import sys
|
||||
from itertools import combinations, permutations
|
||||
|
||||
|
||||
def solve_tsp(dists: dict) -> float:
|
||||
"""Solve the TSP problem
|
||||
|
||||
Args:
|
||||
dists (dict): the distance matrix between each nodes. Each item in the
|
||||
dict is a pair (node A, node B) to the distance from A to B.
|
||||
|
||||
Returns:
|
||||
float: the optimal cost
|
||||
"""
|
||||
# Get the unique nodes from the distance matrix
|
||||
nodes = set()
|
||||
for pair in dists.keys():
|
||||
nodes.add(pair[0])
|
||||
nodes.add(pair[1])
|
||||
|
||||
# Generate all possible routes (permutations of nodes)
|
||||
routes = permutations(nodes)
|
||||
|
||||
# Initialize the optimal cost as infinite
|
||||
optimal_cost = float("inf")
|
||||
optimal_route = None
|
||||
|
||||
# Iterate through all possible routes
|
||||
for route in routes:
|
||||
cost = 0
|
||||
# Calculate the cost of the current route
|
||||
for i in range(len(route)):
|
||||
current_node = route[i]
|
||||
next_node = route[(i + 1) % len(route)]
|
||||
cost += dists[(current_node, next_node)]
|
||||
|
||||
# Update the optimal cost if the current cost is smaller
|
||||
if cost < optimal_cost:
|
||||
optimal_cost = cost
|
||||
optimal_route = route
|
||||
|
||||
print("Cost:", optimal_cost, "with route", optimal_route)
|
||||
return optimal_cost
|
||||
|
||||
|
||||
def tsp_data(n: int, seed: int = 2022) -> dict:
|
||||
"""Generate some sample data for the non-symmetric TSP problem.
|
||||
|
||||
Args:
|
||||
n (int): number of nodes in the problem
|
||||
seed (int): the random seed.
|
||||
|
||||
Returns:
|
||||
dict: the pairwise distance matrix.
|
||||
"""
|
||||
# Initialize the random seed
|
||||
random.seed(seed)
|
||||
|
||||
# Initialize the distance matrix
|
||||
dist_matrix = {}
|
||||
|
||||
# Generate distances for each pair of nodes
|
||||
for i in range(n):
|
||||
for j in range(n):
|
||||
if i != j:
|
||||
# Generate a random distance between nodes i and j
|
||||
distance = round(random.uniform(1, 100), 2)
|
||||
dist_matrix[(i, j)] = distance
|
||||
|
||||
return dist_matrix
|
||||
35
test/autogen/extensions/tsp_api.py
Normal file
35
test/autogen/extensions/tsp_api.py
Normal file
@@ -0,0 +1,35 @@
|
||||
from .tsp import tsp_data
|
||||
|
||||
|
||||
def change_dist(dist: dict, i: int, j: int, new_cost: float) -> float:
|
||||
"""Change the distance between two points.
|
||||
|
||||
Args:
|
||||
dist (dict): distance matrix, where the key is a pair and value is
|
||||
the cost (aka, distance).
|
||||
i (int): the source node
|
||||
j (int): the destination node
|
||||
new_cost (float): the new cost for the distance
|
||||
|
||||
Returns:
|
||||
float: the previous cost
|
||||
"""
|
||||
prev_cost = dist[i, j]
|
||||
dist[i, j] = new_cost
|
||||
return prev_cost
|
||||
|
||||
|
||||
def compare_costs(prev_cost, new_cost) -> float:
|
||||
"""Compare the previous cost and the new cost.
|
||||
|
||||
Args:
|
||||
prev_cost (float): the previous cost
|
||||
new_cost (float): the updated cost
|
||||
|
||||
Returns:
|
||||
float: the ratio between these two costs
|
||||
"""
|
||||
return (new_cost - prev_cost) / prev_cost
|
||||
|
||||
|
||||
dists = tsp_data(5, seed=1)
|
||||
72
test/autogen/test_agent.py
Normal file
72
test/autogen/test_agent.py
Normal file
@@ -0,0 +1,72 @@
|
||||
from flaml.autogen.code_utils import extract_code
|
||||
from flaml import oai
|
||||
|
||||
|
||||
def test_extract_code():
|
||||
print(extract_code("```bash\npython temp.py\n```"))
|
||||
|
||||
|
||||
def test_coding_agent():
|
||||
try:
|
||||
import openai
|
||||
except ImportError:
|
||||
return
|
||||
from flaml.autogen.agent.coding_agent import PythonAgent
|
||||
from flaml.autogen.agent.agent import Agent
|
||||
|
||||
conversations = {}
|
||||
oai.ChatCompletion.start_logging(conversations)
|
||||
agent = PythonAgent("coding_agent")
|
||||
user = Agent("user")
|
||||
agent.receive(
|
||||
"""Create a temp.py file with the following content:
|
||||
```
|
||||
print('Hello world!')
|
||||
```""",
|
||||
user,
|
||||
)
|
||||
print(conversations)
|
||||
oai.ChatCompletion.start_logging(compact=False)
|
||||
agent.receive("""Execute temp.py""", user)
|
||||
print(oai.ChatCompletion.logged_history)
|
||||
oai.ChatCompletion.stop_logging()
|
||||
|
||||
|
||||
def test_tsp():
|
||||
try:
|
||||
import openai
|
||||
except ImportError:
|
||||
return
|
||||
from flaml.autogen.agent.coding_agent import PythonAgent
|
||||
from flaml.autogen.agent.agent import Agent
|
||||
|
||||
hard_questions = [
|
||||
"What if we must go from node 1 to node 2?",
|
||||
"Can we double all distances?",
|
||||
"Can we add a new point to the graph? It's distance should be randomly between 0 - 5 to each of the existing points.",
|
||||
]
|
||||
|
||||
oai.ChatCompletion.start_logging()
|
||||
agent = PythonAgent("coding_agent", work_dir="test/autogen", temperature=0)
|
||||
user = Agent("user")
|
||||
with open("test/autogen/tsp_prompt.txt", "r") as f:
|
||||
prompt = f.read()
|
||||
# agent.receive(prompt.format(question=hard_questions[0]), user)
|
||||
# agent.receive(prompt.format(question=hard_questions[1]), user)
|
||||
agent.receive(prompt.format(question=hard_questions[2]), user)
|
||||
print(oai.ChatCompletion.logged_history)
|
||||
oai.ChatCompletion.stop_logging()
|
||||
|
||||
|
||||
if __name__ == "__main__":
|
||||
import openai
|
||||
|
||||
openai.api_key_path = "test/openai/key.txt"
|
||||
# if you use Azure OpenAI, comment the above line and uncomment the following lines
|
||||
# openai.api_type = "azure"
|
||||
# openai.api_base = "https://<your_endpoint>.openai.azure.com/"
|
||||
# openai.api_version = "2023-03-15-preview" # change if necessary
|
||||
# openai.api_key = "<your_api_key>"
|
||||
# test_extract_code()
|
||||
test_coding_agent()
|
||||
test_tsp()
|
||||
115
test/autogen/tsp_prompt.txt
Normal file
115
test/autogen/tsp_prompt.txt
Normal file
@@ -0,0 +1,115 @@
|
||||
|
||||
Now, we have a system to solve TSP problems. Let's try to solve a problem.
|
||||
|
||||
Given a distance dictionary `dicts`, where the key is a pair of nodes and the
|
||||
value is the distance between them. For example, `dists[(1, 2)]` is the distance
|
||||
between node 1 and node 2. We want to find the optimal cost for the TSP problem.
|
||||
|
||||
The users might have some questions regarding the solution. So, you are
|
||||
responsible to write code to answer the their questions. Note that you usually
|
||||
would need to run `solve_tsp` and `compare_costs` to compare the costs before
|
||||
and after the change.
|
||||
|
||||
Here are the functions and their information that you can use directly:
|
||||
|
||||
----------
|
||||
def change_dist(dist: dict, i: int, j: int, new_cost: float) -> float:
|
||||
"""Change the distance between two points.
|
||||
|
||||
Args:
|
||||
dist (dict): distance matrix, where the key is a pair and value is
|
||||
the cost (aka, distance).
|
||||
i (int): the source node
|
||||
j (int): the destination node
|
||||
new_cost (float): the new cost for the distance
|
||||
|
||||
Returns:
|
||||
float: the previous cost
|
||||
"""
|
||||
----------
|
||||
|
||||
----------
|
||||
def compare_costs(prev_cost, new_cost) -> float:
|
||||
"""Compare the previous cost and the new cost.
|
||||
|
||||
Args:
|
||||
prev_cost (float): the previous cost
|
||||
new_cost (float): the updated cost
|
||||
|
||||
Returns:
|
||||
float: the ratio between these two costs
|
||||
"""
|
||||
----------
|
||||
|
||||
----------
|
||||
def solve_tsp(dists: dict) -> float:
|
||||
"""Solve the TSP problem
|
||||
|
||||
Args:
|
||||
dists (dict): the distance matrix between each nodes. Each item in the
|
||||
dict is a pair (node A, node B) to the distance from A to B.
|
||||
|
||||
Returns:
|
||||
float: the optimal cost
|
||||
"""
|
||||
----------
|
||||
|
||||
|
||||
We also provide some sample questions and answers here:
|
||||
----------
|
||||
Question: Why should we go from point 1 to point 2?
|
||||
Code:
|
||||
```
|
||||
from extensions.tsp import solve_tsp
|
||||
from extensions.tsp_api import change_dist, compare_costs, dists
|
||||
prev_cost=solve_tsp(dists)
|
||||
change_dist(dists, 1, 2, float('inf'))
|
||||
new_cost = solve_tsp(dists)
|
||||
gap = compare_costs(prev_cost, new_cost)
|
||||
print('If not, then the cost will increase', gap * 100, 'percent.')
|
||||
```
|
||||
|
||||
----------
|
||||
Question: Can we double the distance between point 4 and 2?
|
||||
Code:
|
||||
```
|
||||
from extensions.tsp import solve_tsp
|
||||
from extensions.tsp_api import change_dist, compare_costs, dists
|
||||
prev_cost=solve_tsp(dists)
|
||||
change_dist(dists, 3, 4, dists[(3, 4)] * 2)
|
||||
new_cost = solve_tsp(dists)
|
||||
gap = compare_costs(prev_cost, new_cost)
|
||||
print('If we double the distance between 4 and 2, then the cost will decrease', - gap * 100, 'percent.')
|
||||
```
|
||||
|
||||
----------
|
||||
Question: what would happen if we remove point 2?
|
||||
Code:
|
||||
```
|
||||
from extensions.tsp import solve_tsp
|
||||
from extensions.tsp_api import compare_costs, dists
|
||||
prev_cost=solve_tsp(dists)
|
||||
for i, j in list(dists.keys()):
|
||||
if i == 2 or j == 2:
|
||||
del dists[i, j] # remove the edge cost
|
||||
new_cost = solve_tsp(dists)
|
||||
gap = compare_costs(prev_cost, new_cost)
|
||||
print('If we remove point 2, then the cost will decrease', - gap * 100, 'percent.')
|
||||
```
|
||||
|
||||
----------
|
||||
Question: What if the edge between point 2 to 3 is removed?
|
||||
Code:
|
||||
```
|
||||
from extensions.tsp import solve_tsp
|
||||
from extensions.tsp_api import change_dist, compare_costs, dists
|
||||
prev_cost=solve_tsp(dists)
|
||||
change_dist(dists, 2, 3, float('inf'))
|
||||
new_cost = solve_tsp(dists)
|
||||
gap = compare_costs(prev_cost, new_cost)
|
||||
print('If we remove the edge, then the cost will increase', gap * 100, 'percent.')
|
||||
```
|
||||
|
||||
Now, answer the questions by using Python code:
|
||||
Question: {question}
|
||||
Code:
|
||||
64
test/automl/test_mlflow.py
Normal file
64
test/automl/test_mlflow.py
Normal file
@@ -0,0 +1,64 @@
|
||||
import pytest
|
||||
from pandas import DataFrame
|
||||
from sklearn.datasets import load_iris
|
||||
import mlflow
|
||||
import mlflow.entities
|
||||
from flaml import AutoML
|
||||
|
||||
|
||||
class TestMLFlowLoggingParam:
|
||||
def test_should_start_new_run_by_default(self, automl_settings):
|
||||
with mlflow.start_run():
|
||||
parent = mlflow.last_active_run()
|
||||
automl = AutoML()
|
||||
X_train, y_train = load_iris(return_X_y=True)
|
||||
automl.fit(X_train=X_train, y_train=y_train, **automl_settings)
|
||||
|
||||
children = self._get_child_runs(parent)
|
||||
assert len(children) >= 1, "Expected at least 1 child run, got {}".format(len(children))
|
||||
|
||||
def test_should_not_start_new_run_when_mlflow_logging_set_to_false_in_init(self, automl_settings):
|
||||
with mlflow.start_run():
|
||||
parent = mlflow.last_active_run()
|
||||
automl = AutoML(mlflow_logging=False)
|
||||
X_train, y_train = load_iris(return_X_y=True)
|
||||
automl.fit(X_train=X_train, y_train=y_train, **automl_settings)
|
||||
|
||||
children = self._get_child_runs(parent)
|
||||
assert len(children) == 0, "Expected 0 child runs, got {}".format(len(children))
|
||||
|
||||
def test_should_not_start_new_run_when_mlflow_logging_set_to_false_in_fit(self, automl_settings):
|
||||
with mlflow.start_run():
|
||||
parent = mlflow.last_active_run()
|
||||
automl = AutoML()
|
||||
X_train, y_train = load_iris(return_X_y=True)
|
||||
automl.fit(X_train=X_train, y_train=y_train, mlflow_logging=False, **automl_settings)
|
||||
|
||||
children = self._get_child_runs(parent)
|
||||
assert len(children) == 0, "Expected 0 child runs, got {}".format(len(children))
|
||||
|
||||
def test_should_start_new_run_when_mlflow_logging_set_to_true_in_fit(self, automl_settings):
|
||||
with mlflow.start_run():
|
||||
parent = mlflow.last_active_run()
|
||||
automl = AutoML(mlflow_logging=False)
|
||||
X_train, y_train = load_iris(return_X_y=True)
|
||||
automl.fit(X_train=X_train, y_train=y_train, mlflow_logging=True, **automl_settings)
|
||||
|
||||
children = self._get_child_runs(parent)
|
||||
assert len(children) >= 1, "Expected at least 1 child run, got {}".format(len(children))
|
||||
|
||||
@staticmethod
|
||||
def _get_child_runs(parent_run: mlflow.entities.Run) -> DataFrame:
|
||||
experiment_id = parent_run.info.experiment_id
|
||||
return mlflow.search_runs(
|
||||
[experiment_id], filter_string="tags.mlflow.parentRunId = '{}'".format(parent_run.info.run_id)
|
||||
)
|
||||
|
||||
@pytest.fixture(scope="class")
|
||||
def automl_settings(self):
|
||||
return {
|
||||
"time_budget": 2, # in seconds
|
||||
"metric": "accuracy",
|
||||
"task": "classification",
|
||||
"log_file_name": "iris.log",
|
||||
}
|
||||
@@ -3,13 +3,118 @@ import sys
|
||||
import numpy as np
|
||||
import pytest
|
||||
from functools import partial
|
||||
import os
|
||||
from flaml import oai
|
||||
from flaml.autogen.code_utils import (
|
||||
eval_function_completions,
|
||||
generate_assertions,
|
||||
implement,
|
||||
generate_code,
|
||||
extract_code,
|
||||
improve_function,
|
||||
improve_code,
|
||||
execute_code,
|
||||
)
|
||||
from flaml.autogen.math_utils import eval_math_responses
|
||||
from flaml.autogen.math_utils import eval_math_responses, solve_problem
|
||||
|
||||
|
||||
def test_multi_model():
|
||||
try:
|
||||
import openai
|
||||
except ImportError as exc:
|
||||
print(exc)
|
||||
return
|
||||
response = oai.Completion.create(
|
||||
config_list=[
|
||||
{
|
||||
"model": "gpt-4",
|
||||
"api_key": os.environ.get("OPENAI_API_KEY"),
|
||||
"api_type": "open_ai",
|
||||
"api_base": "https://api.openai.com/v1",
|
||||
"api_version": None,
|
||||
},
|
||||
{
|
||||
"model": "gpt-4",
|
||||
"api_key": os.environ.get("AZURE_OPENAI_API_KEY"),
|
||||
"api_type": "azure",
|
||||
"api_base": os.environ.get("AZURE_OPENAI_API_BASE"),
|
||||
"api_version": "2023-03-15-preview",
|
||||
},
|
||||
{
|
||||
"model": "gpt-3.5-turbo",
|
||||
"api_key": os.environ.get("OPENAI_API_KEY"),
|
||||
"api_type": "open_ai",
|
||||
"api_base": "https://api.openai.com/v1",
|
||||
"api_version": None,
|
||||
},
|
||||
{
|
||||
"model": "gpt-3.5-turbo",
|
||||
"api_key": os.environ.get("AZURE_OPENAI_API_KEY"),
|
||||
"api_type": "azure",
|
||||
"api_base": os.environ.get("AZURE_OPENAI_API_BASE"),
|
||||
"api_version": "2023-03-15-preview",
|
||||
},
|
||||
],
|
||||
prompt="Hi",
|
||||
)
|
||||
print(response)
|
||||
|
||||
|
||||
@pytest.mark.skipif(
|
||||
sys.platform in ["darwin", "win32"],
|
||||
reason="do not run on MacOS or windows",
|
||||
)
|
||||
def test_execute_code():
|
||||
try:
|
||||
import docker
|
||||
except ImportError as exc:
|
||||
print(exc)
|
||||
return
|
||||
exitcode, msg = execute_code("print('hello world')", filename="tmp/codetest.py")
|
||||
assert exitcode == 0 and msg == b"hello world\n", msg
|
||||
# read a file
|
||||
print(execute_code("with open('tmp/codetest.py', 'r') as f: a=f.read()"))
|
||||
# create a file
|
||||
print(execute_code("with open('tmp/codetest.py', 'w') as f: f.write('b=1')", work_dir="test/openai/my_tmp"))
|
||||
# execute code in a file
|
||||
print(execute_code(filename="tmp/codetest.py"))
|
||||
# execute code for assertion error
|
||||
exit_code, msg = execute_code("assert 1==2")
|
||||
assert exit_code, msg
|
||||
# execute code which takes a long time
|
||||
exit_code, error = execute_code("import time; time.sleep(2)", timeout=1)
|
||||
assert exit_code and error == "Timeout"
|
||||
exit_code, error = execute_code("import time; time.sleep(2)", timeout=1, use_docker=False)
|
||||
assert exit_code and error == "Timeout"
|
||||
|
||||
|
||||
def test_improve():
|
||||
try:
|
||||
import openai
|
||||
import diskcache
|
||||
except ImportError as exc:
|
||||
print(exc)
|
||||
return
|
||||
improved, _ = improve_function(
|
||||
"flaml/autogen/math_utils.py",
|
||||
"solve_problem",
|
||||
"Solve math problems accurately, by avoiding calculation errors and reduce reasoning errors.",
|
||||
)
|
||||
with open("test/openai/math_utils.py.improved", "w") as f:
|
||||
f.write(improved)
|
||||
suggestion, _ = improve_code(
|
||||
["flaml/autogen/code_utils.py", "flaml/autogen/math_utils.py"],
|
||||
"leverage generative AI smartly and cost-effectively",
|
||||
)
|
||||
print(suggestion)
|
||||
improvement, cost = improve_code(
|
||||
["flaml/autogen/code_utils.py", "flaml/autogen/math_utils.py"],
|
||||
"leverage generative AI smartly and cost-effectively",
|
||||
suggest_only=False,
|
||||
)
|
||||
print(cost)
|
||||
with open("test/openai/suggested_improvement.txt", "w") as f:
|
||||
f.write(improvement)
|
||||
|
||||
|
||||
def test_nocontext():
|
||||
@@ -19,8 +124,59 @@ def test_nocontext():
|
||||
except ImportError as exc:
|
||||
print(exc)
|
||||
return
|
||||
response = oai.Completion.create(model="text-ada-001", prompt="1+1=", max_tokens=1)
|
||||
response = oai.Completion.create(
|
||||
model="text-ada-001", prompt="1+1=", max_tokens=1, use_cache=False, request_timeout=10
|
||||
)
|
||||
print(response)
|
||||
code, _ = generate_code(
|
||||
model="gpt-3.5-turbo",
|
||||
messages=[
|
||||
{
|
||||
"role": "system",
|
||||
"content": "You want to become a better assistant by learning new skills and improving your existing ones.",
|
||||
},
|
||||
{
|
||||
"role": "user",
|
||||
"content": "Write reusable code to use web scraping to get information from websites.",
|
||||
},
|
||||
],
|
||||
)
|
||||
print(code)
|
||||
# test extract_code from markdown
|
||||
code, _ = extract_code(
|
||||
"""
|
||||
Example:
|
||||
```
|
||||
print("hello extract code")
|
||||
```
|
||||
"""
|
||||
)
|
||||
print(code)
|
||||
|
||||
code, _ = extract_code(
|
||||
"""
|
||||
Example:
|
||||
```python
|
||||
def scrape(url):
|
||||
import requests
|
||||
from bs4 import BeautifulSoup
|
||||
response = requests.get(url)
|
||||
soup = BeautifulSoup(response.text, "html.parser")
|
||||
title = soup.find("title").text
|
||||
text = soup.find("div", {"id": "bodyContent"}).text
|
||||
return title, text
|
||||
```
|
||||
Test:
|
||||
```python
|
||||
url = "https://en.wikipedia.org/wiki/Web_scraping"
|
||||
title, text = scrape(url)
|
||||
print(f"Title: {title}")
|
||||
print(f"Text: {text}")
|
||||
"""
|
||||
)
|
||||
print(code)
|
||||
solution, cost = solve_problem("1+1=")
|
||||
print(solution, cost)
|
||||
|
||||
|
||||
@pytest.mark.skipif(
|
||||
@@ -102,6 +258,7 @@ def test_humaneval(num_samples=1):
|
||||
inference_budget=0.002,
|
||||
optimization_budget=2,
|
||||
num_samples=num_samples,
|
||||
# logging_level=logging.INFO,
|
||||
prompt=[
|
||||
"{definition}",
|
||||
"# Python 3{definition}",
|
||||
@@ -125,6 +282,13 @@ def test_humaneval(num_samples=1):
|
||||
|
||||
|
||||
def test_math(num_samples=-1):
|
||||
try:
|
||||
import openai
|
||||
import diskcache
|
||||
except ImportError as exc:
|
||||
print(exc)
|
||||
return
|
||||
|
||||
seed = 41
|
||||
data = datasets.load_dataset("competition_math")
|
||||
train_data = data["train"].shuffle(seed=seed)
|
||||
@@ -157,13 +321,6 @@ def test_math(num_samples=-1):
|
||||
% data["problem"]
|
||||
]
|
||||
|
||||
try:
|
||||
import openai
|
||||
import diskcache
|
||||
except ImportError as exc:
|
||||
print(exc)
|
||||
return
|
||||
|
||||
oai.ChatCompletion.set_cache(seed)
|
||||
vanilla_config = {
|
||||
"model": "gpt-3.5-turbo",
|
||||
@@ -175,12 +332,10 @@ def test_math(num_samples=-1):
|
||||
}
|
||||
test_data_sample = test_data[0:3]
|
||||
result = oai.ChatCompletion.test(test_data_sample, vanilla_config, eval_math_responses)
|
||||
test_data_sample = test_data[3:6]
|
||||
result = oai.ChatCompletion.test(
|
||||
test_data_sample,
|
||||
vanilla_config,
|
||||
eval_math_responses,
|
||||
use_cache=False,
|
||||
agg_method="median",
|
||||
)
|
||||
|
||||
@@ -194,14 +349,12 @@ def test_math(num_samples=-1):
|
||||
test_data_sample,
|
||||
vanilla_config,
|
||||
eval_math_responses,
|
||||
use_cache=False,
|
||||
agg_method=my_median,
|
||||
)
|
||||
result = oai.ChatCompletion.test(
|
||||
test_data_sample,
|
||||
vanilla_config,
|
||||
eval_math_responses,
|
||||
use_cache=False,
|
||||
agg_method={
|
||||
"expected_success": my_median,
|
||||
"success": my_average,
|
||||
@@ -233,7 +386,12 @@ def test_math(num_samples=-1):
|
||||
if __name__ == "__main__":
|
||||
import openai
|
||||
|
||||
openai.api_key_path = "test/openai/key.txt"
|
||||
test_nocontext()
|
||||
test_humaneval(1)
|
||||
test_math(1)
|
||||
openai.api_key = os.environ["OPENAI_API_KEY"] = open("test/openai/key.txt").read().strip()
|
||||
os.environ["AZURE_OPENAI_API_KEY"] = open("test/openai/key_azure.txt").read().strip()
|
||||
os.environ["AZURE_OPENAI_API_BASE"] = open("test/openai/base_azure.txt").read().strip()
|
||||
# test_multi_model()
|
||||
# test_execute_code()
|
||||
test_improve()
|
||||
# test_nocontext()
|
||||
# test_humaneval(1)
|
||||
# test_math(1)
|
||||
|
||||
@@ -2,6 +2,7 @@ import os
|
||||
import sys
|
||||
import warnings
|
||||
import pytest
|
||||
import mlflow
|
||||
import sklearn.datasets as skds
|
||||
from flaml import AutoML
|
||||
from flaml.tune.spark.utils import check_spark
|
||||
@@ -18,10 +19,15 @@ else:
|
||||
|
||||
spark = (
|
||||
pyspark.sql.SparkSession.builder.appName("MyApp")
|
||||
.master("local[1]")
|
||||
.master("local[2]")
|
||||
.config(
|
||||
"spark.jars.packages",
|
||||
"com.microsoft.azure:synapseml_2.12:0.10.2,org.apache.hadoop:hadoop-azure:3.3.5,com.microsoft.azure:azure-storage:8.6.6",
|
||||
(
|
||||
"com.microsoft.azure:synapseml_2.12:0.10.2,"
|
||||
"org.apache.hadoop:hadoop-azure:3.3.5,"
|
||||
"com.microsoft.azure:azure-storage:8.6.6,"
|
||||
f"org.mlflow:mlflow-spark:{mlflow.__version__}"
|
||||
),
|
||||
)
|
||||
.config("spark.jars.repositories", "https://mmlspark.azureedge.net/maven")
|
||||
.config("spark.sql.debug.maxToStringFields", "100")
|
||||
@@ -29,6 +35,10 @@ else:
|
||||
.config("spark.executor.extraJavaOptions", "-Xss1m")
|
||||
.getOrCreate()
|
||||
)
|
||||
spark.sparkContext._conf.set(
|
||||
"spark.mlflow.pysparkml.autolog.logModelAllowlistFile",
|
||||
"https://mmlspark.blob.core.windows.net/publicwasb/log_model_allowlist.txt",
|
||||
)
|
||||
# spark.sparkContext.setLogLevel("ERROR")
|
||||
spark_available, _ = check_spark()
|
||||
skip_spark = not spark_available
|
||||
|
||||
@@ -187,14 +187,10 @@ def test_n_current_trials():
|
||||
def get_n_current_trials(n_concurrent_trials=0, num_executors=num_executors):
|
||||
try:
|
||||
FLAML_MAX_CONCURRENT = int(os.getenv("FLAML_MAX_CONCURRENT", 0))
|
||||
num_executors = max(num_executors, FLAML_MAX_CONCURRENT, 1)
|
||||
except ValueError:
|
||||
FLAML_MAX_CONCURRENT = 0
|
||||
max_spark_parallelism = (
|
||||
min(spark.sparkContext.defaultParallelism, FLAML_MAX_CONCURRENT)
|
||||
if FLAML_MAX_CONCURRENT > 0
|
||||
else spark.sparkContext.defaultParallelism
|
||||
)
|
||||
num_executors = max(num_executors, FLAML_MAX_CONCURRENT, 1)
|
||||
max_spark_parallelism = max(spark.sparkContext.defaultParallelism, FLAML_MAX_CONCURRENT)
|
||||
max_concurrent = max(1, max_spark_parallelism)
|
||||
n_concurrent_trials = min(
|
||||
n_concurrent_trials if n_concurrent_trials > 0 else num_executors,
|
||||
@@ -204,16 +200,21 @@ def test_n_current_trials():
|
||||
return n_concurrent_trials
|
||||
|
||||
os.environ["FLAML_MAX_CONCURRENT"] = "invlaid"
|
||||
assert get_n_current_trials() == num_executors
|
||||
assert get_n_current_trials() == max(num_executors, 1)
|
||||
tmp_max = spark.sparkContext.defaultParallelism
|
||||
assert get_n_current_trials(1) == 1
|
||||
assert get_n_current_trials(2) == min(2, tmp_max)
|
||||
assert get_n_current_trials(50) == min(50, tmp_max)
|
||||
assert get_n_current_trials(200) == min(200, tmp_max)
|
||||
os.environ["FLAML_MAX_CONCURRENT"] = "0"
|
||||
assert get_n_current_trials() == max(num_executors, 1)
|
||||
os.environ["FLAML_MAX_CONCURRENT"] = "4"
|
||||
tmp_max = min(4, spark.sparkContext.defaultParallelism)
|
||||
assert get_n_current_trials() == tmp_max
|
||||
tmp_max = max(4, spark.sparkContext.defaultParallelism)
|
||||
assert get_n_current_trials() == min(4, tmp_max)
|
||||
os.environ["FLAML_MAX_CONCURRENT"] = "9999999"
|
||||
assert get_n_current_trials() == spark.sparkContext.defaultParallelism
|
||||
assert get_n_current_trials() == 9999999
|
||||
os.environ["FLAML_MAX_CONCURRENT"] = "100"
|
||||
tmp_max = min(100, spark.sparkContext.defaultParallelism)
|
||||
tmp_max = max(100, spark.sparkContext.defaultParallelism)
|
||||
assert get_n_current_trials(1) == 1
|
||||
assert get_n_current_trials(2) == min(2, tmp_max)
|
||||
assert get_n_current_trials(50) == min(50, tmp_max)
|
||||
@@ -410,7 +411,7 @@ if __name__ == "__main__":
|
||||
# test_broadcast_code()
|
||||
# test_get_broadcast_data()
|
||||
# test_train_test_split_pyspark()
|
||||
# test_n_current_trials()
|
||||
test_n_current_trials()
|
||||
# test_len_labels()
|
||||
# test_iloc_pandas_on_spark()
|
||||
test_spark_metric_loss_score()
|
||||
|
||||
BIN
website/blog/2023-04-21-LLM-tuning-math/img/level2algebra.png
Normal file
BIN
website/blog/2023-04-21-LLM-tuning-math/img/level2algebra.png
Normal file
Binary file not shown.
|
After Width: | Height: | Size: 39 KiB |
BIN
website/blog/2023-04-21-LLM-tuning-math/img/level3algebra.png
Normal file
BIN
website/blog/2023-04-21-LLM-tuning-math/img/level3algebra.png
Normal file
Binary file not shown.
|
After Width: | Height: | Size: 40 KiB |
BIN
website/blog/2023-04-21-LLM-tuning-math/img/level4algebra.png
Normal file
BIN
website/blog/2023-04-21-LLM-tuning-math/img/level4algebra.png
Normal file
Binary file not shown.
|
After Width: | Height: | Size: 36 KiB |
BIN
website/blog/2023-04-21-LLM-tuning-math/img/level5algebra.png
Normal file
BIN
website/blog/2023-04-21-LLM-tuning-math/img/level5algebra.png
Normal file
Binary file not shown.
|
After Width: | Height: | Size: 34 KiB |
74
website/blog/2023-04-21-LLM-tuning-math/index.mdx
Normal file
74
website/blog/2023-04-21-LLM-tuning-math/index.mdx
Normal file
@@ -0,0 +1,74 @@
|
||||
---
|
||||
title: Does Model and Inference Parameter Matter in LLM Applications? - A Case Study for MATH
|
||||
authors: sonichi
|
||||
tags: [LLM, GPT, research]
|
||||
---
|
||||
|
||||

|
||||
|
||||
**TL;DR:**
|
||||
* **A case study using the MATH benchmark shows that model selection and inference parameters do matter in Large Language Model (LLM) applications.**
|
||||
* **The tuned gpt-3.5-turbo model vastly outperformed untuned gpt-4 in accuracy for easier problems, while gpt-4 was a better choice for the most difficult problems.**
|
||||
* **FLAML can help with model selection, parameter tuning, and cost-saving in LLM applications.**
|
||||
|
||||
|
||||
Large language models (LLMs) are powerful tools that can generate natural language texts for various applications, such as chatbots, summarization, translation, and more. GPT-4 is currently the state of the art LLM in the world. Is model selection irrelevant? What about inference parameters?
|
||||
|
||||
In this blog post, we will explore how model and inference parameter matter in LLM applications, using a case study for [MATH](https://datasets-benchmarks-proceedings.neurips.cc/paper/2021/hash/be83ab3ecd0db773eb2dc1b0a17836a1-Abstract-round2.html), a benchmark for evaluating LLMs on advanced mathematical problem solving. MATH consists of 12K math competition problems from AMC-10, AMC-12 and AIME. Each problem is accompanied by a step-by-step solution.
|
||||
|
||||
We will use the new subpackage [`flaml.autogen`](docs/Use-Cases/Auto-Generation) to automatically find the best model and inference parameter for LLMs on a given task and dataset given an inference budget, using a novel low-cost search & pruning strategy. FLAML currently supports all the LLMs from OpenAI, such as GPT-3.5 and GPT-4.
|
||||
|
||||
We will use FLAML to perform model selection and inference parameter tuning. Then we compare the performance and inference cost on solving algebra problems with the untuned gpt-4. We will also analyze how different difficulty levels affect the results.
|
||||
|
||||
## Experiment Setup
|
||||
|
||||
We use FLAML to select between the following models with a target inference budget $0.02 per instance:
|
||||
- gpt-3.5-turbo, a relatively cheap model that powers the popular ChatGPT app
|
||||
- gpt-4, the state of the art LLM that costs more than 100 times of gpt-3.5-turbo
|
||||
|
||||
We adapt the models using 20 examples in the train set, using the problem statement as the input and generating the solution as the output. We use the following inference parameters:
|
||||
|
||||
- temperature: The parameter that controls the randomness of the output text. A higher temperature means more diversity but less coherence. We search for the optimal temperature in the range of [0, 1].
|
||||
- top_p: The parameter that controls the probability mass of the output tokens. Only tokens with a cumulative probability less than or equal to top-p are considered. A lower top-p means more diversity but less coherence. We search for the optimal top-p in the range of [0, 1].
|
||||
- max_tokens: The maximum number of tokens that can be generated for each output. We search for the optimal max length in the range of [50, 1000].
|
||||
- n: The number of responses to generate. We search for the optimal n in the range of [1, 100].
|
||||
- prompt: We use the template: "{problem} Solve the problem carefully. Simplify your answer as much as possible. Put the final answer in \\boxed{{}}." where {problem} will be replaced by the math problem instance.
|
||||
|
||||
In this experiment, when n > 1, we find the answer with highest votes among all the responses and then select it as the final answer to compare with the ground truth. For example, if n = 5 and 3 of the responses contain a final answer 301 while 2 of the responses contain a final answer 159, we choose 301 as the final answer. This can help with resolving potential errors due to randomness. We use the average accuracy and average inference cost as the metric to evaluate the performance over a dataset. The inference cost of a particular instance is measured by the price per 1K tokens and the number of tokens consumed.
|
||||
|
||||
## Experiment Results
|
||||
|
||||
The first figure in this blog post shows the average accuracy and average inference cost of each configuration on the level 2 Algebra test set.
|
||||
|
||||
Surprisingly, the tuned gpt-3.5-turbo model is selected as a better model and it vastly outperforms untuned gpt-4 in accuracy (92% vs. 70%) with equal or 2.5 times higher inference budget.
|
||||
The same observation can be obtained on the level 3 Algebra test set.
|
||||
|
||||

|
||||
|
||||
However, the selected model changes on level 4 Algebra.
|
||||
|
||||

|
||||
|
||||
This time gpt-4 is selected as the best model. The tuned gpt-4 achieves much higher accuracy (56% vs. 44%) and lower cost than the untuned gpt-4.
|
||||
On level 5 the result is similar.
|
||||
|
||||

|
||||
|
||||
We can see that FLAML has found different optimal model and inference parameters for each subset of a particular level, which shows that these parameters matter in cost-sensitive LLM applications and need to be carefully tuned or adapted.
|
||||
|
||||
An example notebook to run these experiments can be found at: https://github.com/microsoft/FLAML/blob/v1.2.1/notebook/autogen_chatgpt.ipynb
|
||||
|
||||
## Analysis and Discussion
|
||||
|
||||
While gpt-3.5-turbo demonstrates competitive accuracy with voted answers in relatively easy algebra problems under the same inference budget, gpt-4 is a better choice for the most difficult problems. In general, through parameter tuning and model selection, we can identify the opportunity to save the expensive model for more challenging tasks, and improve the overall effectiveness of a budget-constrained system.
|
||||
|
||||
There are many other alternative ways of solving math problems, which we have not covered in this blog post. When there are choices beyond the inference parameters, they can be generally tuned via [`flaml.tune`](docs/Use-Cases/Tune-User-Defined-Function).
|
||||
|
||||
The need for model selection, parameter tuning and cost saving is not specific to the math problems. The [Auto-GPT](https://github.com/Significant-Gravitas/Auto-GPT) project is an example where high cost can easily prevent a generic complex task to be accomplished as it needs many LLM inference calls.
|
||||
|
||||
## For Further Reading
|
||||
|
||||
* [Research paper about the tuning technique](https://arxiv.org/abs/2303.04673)
|
||||
* [Documentation about `flaml.autogen`](docs/Use-Cases/Auto-Generation)
|
||||
|
||||
*Do you have any experience to share about LLM applications? Do you like to see more support or research of LLM optimization or automation? Please join our [Discord](https://discord.gg/Cppx2vSPVP) server for discussion.*
|
||||
5
website/blog/authors.yml
Normal file
5
website/blog/authors.yml
Normal file
@@ -0,0 +1,5 @@
|
||||
sonichi:
|
||||
name: Chi Wang
|
||||
title: Principal Researcher at Microsoft Research
|
||||
url: https://www.linkedin.com/in/chi-wang-49b15b16/
|
||||
image_url: https://github.com/sonichi.png
|
||||
@@ -5,9 +5,9 @@ In this example, we will tune several hyperparameters for the OpenAI's completio
|
||||
|
||||
### Prerequisites
|
||||
|
||||
Install the [openai] option. The OpenAI integration is in preview.
|
||||
Install the [autogen,blendsearch] option.
|
||||
```bash
|
||||
pip install "flaml[openai]==1.2.0"
|
||||
pip install "flaml[autogen,blendsearch]==1.2.2 datasets"
|
||||
```
|
||||
|
||||
Setup your OpenAI key:
|
||||
@@ -64,7 +64,9 @@ Before starting tuning, you need to define the metric for the optimization. For
|
||||
from functools import partial
|
||||
from flaml.autogen.code_utils import eval_function_completions, generate_assertions
|
||||
|
||||
eval_with_generated_assertions = partial(eval_function_completions, assertions=generate_assertions)
|
||||
eval_with_generated_assertions = partial(
|
||||
eval_function_completions, assertions=generate_assertions,
|
||||
)
|
||||
```
|
||||
|
||||
This function will first generate assertion statements for each problem. Then, it uses the assertions to select the generated responses.
|
||||
|
||||
@@ -2,14 +2,16 @@
|
||||
|
||||
<!-- ### Welcome to FLAML, a Fast Library for Automated Machine Learning & Tuning! -->
|
||||
|
||||
FLAML is a lightweight Python library that finds accurate machine
|
||||
learning models automatically, efficiently and economically. It frees users from selecting models and hyperparameters for each model.
|
||||
FLAML is a lightweight Python library for efficient automation of machine
|
||||
learning, including selection of
|
||||
models, hyperparameters, and other tunable choices of an application.
|
||||
|
||||
### Main Features
|
||||
|
||||
1. For common machine learning or AI tasks like classification, regression, and generation, it quickly finds quality models for user-provided data with low computational resources. It supports both classical machine learning models and deep neural networks, including foundation models such as the GPT series.
|
||||
2. It is easy to customize or extend. Users can find their desired customizability from a smooth range: minimal customization (computational resource budget), medium customization (e.g., scikit-style learner, search space and metric), or full customization (arbitrary training and evaluation code). Users can customize only when and what they need to, and leave the rest to the library.
|
||||
3. It supports fast and economical automatic tuning, capable of handling large search space with heterogeneous evaluation cost and complex constraints/guidance/early stopping. FLAML is powered by a new, [cost-effective
|
||||
* For foundation models like the GPT series, it automates the experimentation and optimization of their inference performance to maximize the effectiveness for downstream applications and minimize the inference cost.
|
||||
* For common machine learning tasks like classification and regression, it quickly finds quality models for user-provided data with low computational resources.
|
||||
* It is easy to customize or extend. Users can find their desired customizability from a smooth range: minimal customization (computational resource budget), medium customization (e.g., scikit-style learner, search space and metric), or full customization (arbitrary training/inference/evaluation code). Users can customize only when and what they need to, and leave the rest to the library.
|
||||
* It supports fast and economical automatic tuning, capable of handling large search space with heterogeneous evaluation cost and complex constraints/guidance/early stopping. FLAML is powered by a [cost-effective
|
||||
hyperparameter optimization](Use-Cases/Tune-User-Defined-Function#hyperparameter-optimization-algorithm)
|
||||
and model selection method invented by Microsoft Research, and many followup [research studies](Research).
|
||||
|
||||
@@ -19,6 +21,27 @@ Install FLAML from pip: `pip install flaml`. Find more options in [Installation]
|
||||
|
||||
There are several ways of using flaml:
|
||||
|
||||
#### (New) [Auto Generation](Use-Cases/Auto-Generation)
|
||||
|
||||
For example, you can optimize generations by ChatGPT or GPT-4 etc. with your own tuning data, success metrics and budgets.
|
||||
|
||||
```python
|
||||
from flaml import oai
|
||||
|
||||
config, analysis = oai.Completion.tune(
|
||||
data=tune_data,
|
||||
metric="success",
|
||||
mode="max",
|
||||
eval_func=eval_func,
|
||||
inference_budget=0.05,
|
||||
optimization_budget=3,
|
||||
num_samples=-1,
|
||||
)
|
||||
```
|
||||
|
||||
The automated experimentation and optimization can help you maximize the utility out of these expensive models.
|
||||
A suite of utilities such as caching and templating are offered to accelerate the experimentation and application development.
|
||||
|
||||
#### [Task-oriented AutoML](Use-Cases/task-oriented-automl)
|
||||
|
||||
For example, with three lines of code, you can start using this economical and fast AutoML engine as a scikit-learn style estimator.
|
||||
@@ -86,33 +109,12 @@ from flaml.default import LGBMClassifier
|
||||
|
||||
Then, you can use it just like you use the original `LGMBClassifier`. Your other code can remain unchanged. When you call the `fit()` function from `flaml.default.LGBMClassifier`, it will automatically instantiate a good data-dependent hyperparameter configuration for your dataset, which is expected to work better than the default configuration.
|
||||
|
||||
#### (New) [Auto Generation](Use-Cases/Auto-Generation)
|
||||
|
||||
You can optimize generations by ChatGPT or GPT-4 etc. with your own tuning data, success metrics and budgets.
|
||||
|
||||
```python
|
||||
from flaml import oai
|
||||
|
||||
config, analysis = oai.Completion.tune(
|
||||
data=tune_data,
|
||||
metric="success",
|
||||
mode="max",
|
||||
eval_func=eval_func,
|
||||
inference_budget=0.05,
|
||||
optimization_budget=3,
|
||||
num_samples=-1,
|
||||
)
|
||||
```
|
||||
|
||||
The optimization can help you maximize the utility out of these expensive models.
|
||||
|
||||
### Where to Go Next?
|
||||
|
||||
* Understand the use cases for [Task-oriented AutoML](Use-Cases/task-oriented-automl), [Tune user-defined function](Use-Cases/Tune-User-Defined-Function) and [Zero-shot AutoML](Use-Cases/Zero-Shot-AutoML).
|
||||
* Find code examples under "Examples": from [AutoML - Classification](Examples/AutoML-Classification) to [Tune - PyTorch](Examples/Tune-PyTorch).
|
||||
* Find [talks](https://www.youtube.com/channel/UCfU0zfFXHXdAd5x-WvFBk5A) and [tutorials](https://github.com/microsoft/FLAML/tree/tutorial/tutorial) about FLAML.
|
||||
* Understand the use cases for [Auto Generation](Use-Cases/Auto-Generation), [Task-oriented AutoML](Use-Cases/Task-Oriented-Automl), [Tune user-defined function](Use-Cases/Tune-User-Defined-Function) and [Zero-shot AutoML](Use-Cases/Zero-Shot-AutoML).
|
||||
* Find code examples under "Examples": from [AutoGen - OpenAI](Examples/AutoGen-OpenAI) to [Tune - PyTorch](Examples/Tune-PyTorch).
|
||||
* Learn about [research](Research) around FLAML.
|
||||
* Refer to [SDK](reference/automl/automl) and [FAQ](FAQ).
|
||||
* Chat on [Discord](https://discord.gg/Cppx2vSPVP).
|
||||
|
||||
If you like our project, please give it a [star](https://github.com/microsoft/FLAML/stargazers) on GitHub. If you are interested in contributing, please read [Contributor's Guide](Contribute).
|
||||
|
||||
|
||||
@@ -1,6 +1,6 @@
|
||||
# Auto Generation
|
||||
|
||||
`flaml.autogen` is a subpackage for automating generation tasks. It uses [`flaml.tune`](../reference/tune/tune) to find good hyperparameter configurations under budget constraints.
|
||||
`flaml.autogen` is a package for automating generation tasks (in preview). It uses [`flaml.tune`](../reference/tune/tune) to find good hyperparameter configurations under budget constraints.
|
||||
Such optimization has several benefits:
|
||||
* Maximize the utility out of using expensive foundation models.
|
||||
* Reduce the inference cost by using cheaper models or configurations which achieve equal or better performance.
|
||||
@@ -98,17 +98,19 @@ config, analysis = oai.Completion.tune(
|
||||
`num_samples` is the number of configurations to sample. -1 means unlimited (until optimization budget is exhausted).
|
||||
The returned `config` contains the optimized configuration and `analysis` contains an [ExperimentAnalysis](../reference/tune/analysis#experimentanalysis-objects) object for all the tried configurations and results.
|
||||
|
||||
## Perform inference with the tuned config
|
||||
The tuend config can be used to perform inference.
|
||||
|
||||
One can use [`flaml.oai.Completion.create`](../reference/autogen/oai/completion#create) to performance inference.
|
||||
## Perform Inference
|
||||
|
||||
One can use [`flaml.oai.Completion.create`](../reference/autogen/oai/completion#create) to perform inference.
|
||||
There are a number of benefits of using `flaml.oai.Completion.create` to perform inference.
|
||||
|
||||
A template is either a format str, or a function which produces a str from several input fields.
|
||||
|
||||
### API unification
|
||||
|
||||
`flaml.oai.Completion.create` is compatible with both `openai.Completion.create` and `openai.ChatCompletion.create`, and both OpenAI API and Azure OpenAI API. So models such as "text-davinci-003", "gpt-3.5-turbo" and "gpt-4" can share a common API. When only tuning the chat-based models, `flaml.oai.ChatCompletion` can be used.
|
||||
|
||||
For local LLMs, one can spin up an endpoint using a package like [simple_ai_server](https://github.com/lhenault/simpleAI), and then use the same API to send a request.
|
||||
|
||||
### Caching
|
||||
|
||||
API call results are cached locally and reused when the same request is issued. This is useful when repeating or continuing experiments for reproducibility and cost saving. It still allows controlled randomness by setting the "seed", using [`set_cache`](../reference/autogen/oai/completion#set_cache) or specifying in `create()`.
|
||||
@@ -117,21 +119,241 @@ API call results are cached locally and reused when the same request is issued.
|
||||
|
||||
It is easy to hit error when calling OpenAI APIs, due to connection, rate limit, or timeout. Some of the errors are transient. `flaml.oai.Completion.create` deals with the transient errors and retries automatically. Initial request timeout, retry timeout and retry time interval can be configured via `flaml.oai.request_timeout`, `flaml.oai.retry_timeout` and `flaml.oai.retry_time`.
|
||||
|
||||
Moreover, one can pass a list of configurations of different models/endpoints to mitigate the rate limits. For example,
|
||||
|
||||
```python
|
||||
response = oai.Completion.create(
|
||||
config_list=[
|
||||
{
|
||||
"model": "gpt-4",
|
||||
"api_key": os.environ.get("AZURE_OPENAI_API_KEY"),
|
||||
"api_type": "azure",
|
||||
"api_base": os.environ.get("AZURE_OPENAI_API_BASE"),
|
||||
"api_version": "2023-03-15-preview",
|
||||
},
|
||||
{
|
||||
"model": "gpt-3.5-turbo",
|
||||
"api_key": os.environ.get("OPENAI_API_KEY"),
|
||||
"api_type": "open_ai",
|
||||
"api_base": "https://api.openai.com/v1",
|
||||
"api_version": None,
|
||||
},
|
||||
{
|
||||
"model": "llama-7B",
|
||||
"api_base": "http://127.0.0.1:8080",
|
||||
"api_type": "open_ai",
|
||||
"api_version": None,
|
||||
}
|
||||
],
|
||||
prompt="Hi",
|
||||
)
|
||||
```
|
||||
|
||||
It will try querying Azure OpenAI gpt-4, OpenAI gpt-3.5-turbo, and llama-7B one by one, until a valid result is returned. This can speed up the development process where the rate limit is a bottleneck.
|
||||
|
||||
### Templating
|
||||
|
||||
If the provided prompt or message is a template, it will be automatically materialized with a given context. For example,
|
||||
|
||||
```python
|
||||
response = oai.Completion.create(problme=problem, prompt="{problem} Solve the problem carefully.", **config)
|
||||
response = oai.Completion.create(
|
||||
context={"problem": "How many positive integers, not exceeding 100, are multiples of 2 or 3 but not 4?"},
|
||||
prompt="{problem} Solve the problem carefully.",
|
||||
**config
|
||||
)
|
||||
```
|
||||
|
||||
## Other utilities
|
||||
`flaml.oai.Completion` also offers some additional utilities, such as:
|
||||
A template is either a format str, like the example above, or a function which produces a str from several input fields, like the example below.
|
||||
|
||||
```python
|
||||
def content(turn, **context):
|
||||
return "\n".join(
|
||||
[
|
||||
context[f"user_message_{turn}"],
|
||||
context[f"external_info_{turn}"]
|
||||
]
|
||||
)
|
||||
|
||||
messages = [
|
||||
{
|
||||
"role": "system",
|
||||
"content": "You are a teaching assistant of math.",
|
||||
},
|
||||
{
|
||||
"role": "user",
|
||||
"content": partial(content, turn=0),
|
||||
},
|
||||
]
|
||||
context = {
|
||||
"user_message_0": "Could you explain the solution to Problem 1?",
|
||||
"external_info_0": "Problem 1: ...",
|
||||
}
|
||||
|
||||
response = oai.ChatCompletion.create(context, messages=messages, **config)
|
||||
messages.append(
|
||||
{
|
||||
"role": "assistant",
|
||||
"content": oai.ChatCompletion.extract_text(response)[0]
|
||||
}
|
||||
)
|
||||
messages.append(
|
||||
{
|
||||
"role": "user",
|
||||
"content": partial(content, turn=1),
|
||||
},
|
||||
)
|
||||
context.append(
|
||||
{
|
||||
"user_message_1": "Why can't we apply Theorem 1 to Equation (2)?",
|
||||
"external_info_1": "Theorem 1: ...",
|
||||
}
|
||||
)
|
||||
response = oai.ChatCompletion.create(context, messages=messages, **config)
|
||||
```
|
||||
|
||||
### Logging (Experimental)
|
||||
|
||||
When debugging or diagnosing an LLM-based system, it is often convenient to log the API calls and analyze them. `flaml.oai.Completion` and `flaml.oai.ChatCompletion` offer an easy way to collect the API call histories. For example, to log the chat histories, simply run:
|
||||
```python
|
||||
flaml.oai.ChatCompletion.start_logging()
|
||||
```
|
||||
The API calls made after this will be automatically logged. They can be retrieved at any time by:
|
||||
```python
|
||||
flaml.oai.ChatCompletion.logged_history
|
||||
```
|
||||
To stop logging, use
|
||||
```python
|
||||
flaml.oai.ChatCompletion.stop_logging()
|
||||
```
|
||||
If one would like to append the history to an existing dict, pass the dict like:
|
||||
```python
|
||||
flaml.oai.ChatCompletion.start_logging(history_dict=existing_history_dict)
|
||||
```
|
||||
By default, the counter of API calls will be reset at `start_logging()`. If no reset is desired, set `reset_counter=False`.
|
||||
|
||||
There are two types of logging formats: compact logging and individual API call logging. The default format is compact.
|
||||
Set `compact=False` in `start_logging()` to switch.
|
||||
|
||||
* Example of a history dict with compact logging.
|
||||
```python
|
||||
{
|
||||
"""
|
||||
[
|
||||
{
|
||||
'role': 'system',
|
||||
'content': system_message,
|
||||
},
|
||||
{
|
||||
'role': 'user',
|
||||
'content': user_message_1,
|
||||
},
|
||||
{
|
||||
'role': 'assistant',
|
||||
'content': assistant_message_1,
|
||||
},
|
||||
{
|
||||
'role': 'user',
|
||||
'content': user_message_2,
|
||||
},
|
||||
{
|
||||
'role': 'assistant',
|
||||
'content': assistant_message_2,
|
||||
},
|
||||
]""": {
|
||||
"created_at": [0, 1],
|
||||
"cost": [0.1, 0.2],
|
||||
}
|
||||
}
|
||||
```
|
||||
|
||||
* Example of a history dict with individual API call logging.
|
||||
```python
|
||||
{
|
||||
0: {
|
||||
"request": {
|
||||
"messages": [
|
||||
{
|
||||
"role": "system",
|
||||
"content": system_message,
|
||||
},
|
||||
{
|
||||
"role": "user",
|
||||
"content": user_message_1,
|
||||
}
|
||||
],
|
||||
... # other parameters in the request
|
||||
},
|
||||
"response": {
|
||||
"choices": [
|
||||
"messages": {
|
||||
"role": "assistant",
|
||||
"content": assistant_message_1,
|
||||
},
|
||||
],
|
||||
... # other fields in the response
|
||||
}
|
||||
},
|
||||
1: {
|
||||
"request": {
|
||||
"messages": [
|
||||
{
|
||||
"role": "system",
|
||||
"content": system_message,
|
||||
},
|
||||
{
|
||||
"role": "user",
|
||||
"content": user_message_1,
|
||||
},
|
||||
{
|
||||
"role": "assistant",
|
||||
"content": assistant_message_1,
|
||||
},
|
||||
{
|
||||
"role": "user",
|
||||
"content": user_message_2,
|
||||
},
|
||||
],
|
||||
... # other parameters in the request
|
||||
},
|
||||
"response": {
|
||||
"choices": [
|
||||
"messages": {
|
||||
"role": "assistant",
|
||||
"content": assistant_message_2,
|
||||
},
|
||||
],
|
||||
... # other fields in the response
|
||||
}
|
||||
},
|
||||
}
|
||||
```
|
||||
It can be seen that the individual API call history contain redundant information of the conversation. For a long conversation the degree of redundancy is high.
|
||||
The compact history is more efficient and the individual API call history contains more details.
|
||||
|
||||
## Other Utilities
|
||||
|
||||
### Completion
|
||||
|
||||
[`flaml.oai.Completion`](../reference/autogen/oai/completion) also offers some additional utilities, such as:
|
||||
- a [`cost`](../reference/autogen/oai/completion#cost) function to calculate the cost of an API call.
|
||||
- a [`test`](../reference/autogen/oai/completion#test) function to conveniently evaluate the configuration over test data.
|
||||
- a [`extract_text`](../reference/autogen/oai/completion#extract_text) function to extract the text from a completion or chat response.
|
||||
- a [`set_cache`](../reference/autogen/oai/completion#extract_text) function to set the seed and cache path. The caching is introduced in the section above, with the benefit of cost saving, reproducibility, and controlled randomness.
|
||||
|
||||
Interested in trying it yourself? Please check the following notebook examples:
|
||||
### Code
|
||||
|
||||
[`flaml.autogen.code_utils`](../reference/autogen/code_utils) offers code-related utilities, such as:
|
||||
- a [`improve_code`](../reference/autogen/code_utils#improve_code) function to improve code for a given objective.
|
||||
- a [`generate_assertions`](../reference/autogen/code_utils#generate_assertions) function to generate assertion statements from function signature and docstr.
|
||||
- a [`implement`](../reference/autogen/code_utils#implement) function to implement a function from a definition.
|
||||
- a [`eval_function_completions`](../reference/autogen/code_utils#eval_function_completions) function to evaluate the success of a function completion task, or select a response from a list of responses using generated assertions.
|
||||
|
||||
### Math
|
||||
|
||||
[`flaml.autogen.math_utils`](../reference/autogen/math_utils) offers utilities for math problems, such as:
|
||||
- a [eval_math_responses](../reference/autogen/math_utils#eval_math_responses) function to select a response using voting, and check if the final answer is correct if the canonical solution is provided.
|
||||
|
||||
|
||||
*Interested in trying it yourself? Please check the following notebook examples:*
|
||||
* [Optimize for Code Gen](https://github.com/microsoft/FLAML/blob/main/notebook/autogen_openai.ipynb)
|
||||
* [Optimize for Math](https://github.com/microsoft/FLAML/blob/main/notebook/autogen_chatgpt.ipynb)
|
||||
|
||||
@@ -477,6 +477,18 @@ with mlflow.start_run():
|
||||
automl.fit(X_train=X_train, y_train=y_train, **settings)
|
||||
```
|
||||
|
||||
To disable mlflow logging pre-configured in FLAML, set `mlflow_logging=False`:
|
||||
```python
|
||||
automl = AutoML(mlflow_logging=False)
|
||||
```
|
||||
or
|
||||
```python
|
||||
automl.fit(X_train=X_train, y_train=y_train, mlflow_logging=False, **settings)
|
||||
```
|
||||
|
||||
Setting `mlflow_logging=False` in the constructor will disable mlflow logging for all the `fit()` calls.
|
||||
Setting `mlflow_logging=False` in `fit()` will disable mlflow logging for that `fit()` call only.
|
||||
|
||||
### Extra fit arguments
|
||||
|
||||
Extra fit arguments that are needed by the estimators can be passed to `AutoML.fit()`. For example, if there is a weight associated with each training example, they can be passed via `sample_weight`. For another example, `period` can be passed for time series forecaster. For any extra keywork argument passed to `AutoML.fit()` which has not been explicitly listed in the function signature, it will be passed to the underlying estimators' `fit()` as is. For another example, you can set the number of gpus used by each trial with the `gpu_per_trial` argument, which is only used by TransformersEstimator and XGBoostSklearnEstimator.
|
||||
@@ -503,7 +515,7 @@ automl_settings = {
|
||||
automl.fit(X_train=X_train, y_train=y_train, **automl_settings)
|
||||
```
|
||||
|
||||
## Retrieve and analyze the outcomes of AutoML.fit()
|
||||
## Retrieve the Outcomes
|
||||
|
||||
### Get best model
|
||||
|
||||
|
||||
@@ -32,6 +32,7 @@ module.exports = {
|
||||
position: 'left',
|
||||
label: 'SDK',
|
||||
},
|
||||
{to: 'blog', label: 'Blog', position: 'left'},
|
||||
{
|
||||
type: 'doc',
|
||||
docId: 'FAQ',
|
||||
@@ -57,23 +58,23 @@ module.exports = {
|
||||
// },
|
||||
// ],
|
||||
// },
|
||||
// {
|
||||
// title: 'Community',
|
||||
// items: [
|
||||
{
|
||||
title: 'Community',
|
||||
items: [
|
||||
// // {
|
||||
// // label: 'Stack Overflow',
|
||||
// // href: 'https://stackoverflow.com/questions/tagged/pymarlin',
|
||||
// // },
|
||||
// // {
|
||||
// // label: 'Discord',
|
||||
// // href: 'https://discordapp.com/invite/docusaurus',
|
||||
// // },
|
||||
{
|
||||
label: 'Discord',
|
||||
href: 'https://discord.gg/Cppx2vSPVP',
|
||||
},
|
||||
// // {
|
||||
// // label: 'Twitter',
|
||||
// // href: 'https://twitter.com/docusaurus',
|
||||
// // },
|
||||
// ],
|
||||
// },
|
||||
],
|
||||
},
|
||||
// {
|
||||
// title: 'More',
|
||||
// items: [
|
||||
|
||||
@@ -8,9 +8,9 @@ const FeatureList = [
|
||||
Svg: require('../../static/img/auto.svg').default,
|
||||
description: (
|
||||
<>
|
||||
FLAML finds accurate ML models with low computational resources
|
||||
for common ML tasks.
|
||||
It frees users from selecting learners and hyperparameters.
|
||||
FLAML finds accurate models or configurations with low computational resources
|
||||
for common ML/AI tasks.
|
||||
It frees users from selecting models and hyperparameters for training or inference.
|
||||
{/* It is fast and economical. */}
|
||||
</>
|
||||
),
|
||||
|
||||
Reference in New Issue
Block a user