Compare commits

..

26 Commits

Author SHA1 Message Date
Chi Wang
e463146cb8 response filter (#1039)
* response filter

* rewrite implement based on the filter

* multi responses

* abs path

* code handling

* option to not use docker

* context

* eval_only -> raise_error

* notebook

* utils

* utils

* separate tests

* test

* test

* test

* test

* test

* test

* test

* test

* **config in test()

* test

* test

* filename
2023-05-21 22:22:29 +00:00
Li Jiang
7de4eb347d Fix PULL_REQUEST_TEMPLATE and improve test by removing unnecessary environment variable (#1043)
* Improve test by removing unnecessary environment variable

* Fix PULL_REQUEST_TEMPLATE

* Hide pre-commit check

* remove the checkbox for pre-commit

Co-authored-by: Chi Wang <wang.chi@microsoft.com>

---------

Co-authored-by: Chi Wang <wang.chi@microsoft.com>
2023-05-19 20:05:14 +00:00
Pratyay Roy
683f6befd2 updated search space (#1044)
Co-authored-by: Pratyay Roy <63900765+pratyay-roy@users.noreply.github.com>
2023-05-17 22:36:41 +00:00
Qingyun Wu
a1f51d1d23 Blogpost (#1026)
* add 1m milestone blogpost

* format issues

* update subsection title

* acknowledgement

* Update website/blog/2023-05-07-1M-milestone/index.mdx

Co-authored-by: Chi Wang <wang.chi@microsoft.com>

* Update website/blog/2023-05-07-1M-milestone/index.mdx

Co-authored-by: Chi Wang <wang.chi@microsoft.com>

* update blogpost

* collaborators

* wording

* Azure Data to Azure Synapse

* name

* Azure Synapse Analytics

* tasks and search space

* Update website/blog/2023-05-07-1M-milestone/index.mdx

Co-authored-by: Chi Wang <wang.chi@microsoft.com>

---------

Co-authored-by: Chi Wang <wang.chi@microsoft.com>
Co-authored-by: Li Jiang <bnujli@gmail.com>
2023-05-17 03:49:19 +00:00
Chi Wang
0e2dbd5378 fix of website link (#1042) 2023-05-16 06:18:33 +00:00
Qingyun Wu
2e43509690 Human agent (#1025)
* add human agent and chat agent

* feedback msg

* clean print

* remove redundant import

* make coding agent work

* import check

* terminate condition

* rename

* add docstr

* exitcode to str

* print

* save and execute code

* add max_turn_num

* add max_turn_num in test_agent.py

* reduce max_turn_num in the test

* change max_turn_num to max_consecutive_auto_reply

* update human proxy agent

* remove execution agent and dated docstr

* clean doc

* add back work_dir

* add is_termination_msg when mode is NEVER

* revise stop condition

* remove work_dir in coding agent

* human_proxy_agent docstr

* auto_reply

* clean auto_reply

---------

Co-authored-by: Chi Wang <wang.chi@microsoft.com>
2023-05-16 00:37:38 +00:00
Susan Xueqing Liu
f01acb67f6 update model of text summarization (#1030) 2023-05-10 00:48:22 +00:00
Chi Wang
59e882e5cc chat completion check (#1024)
* chat completion check

* add test

* doc

* timeout

* bump version to 1.2.4
2023-05-09 20:39:46 +00:00
Beibin Li
51c8768bcf Catch AuthenticationError trying different configs (#1023)
* Catch AuthenticationError trying different configs
While trying different openai `config_list`, some
configs might be outdated (e.g., an API key is expired).
In these cases, we don't want the program to crash.
Instead, we might want to try other configs.

* Lint whitespace
2023-05-06 11:16:50 +00:00
Chi Wang
b3fba9734e Mark experimental classes; doc; multi-config trial (#1021)
* Mark experimental classes

* template

* multi model

* test

* multi-config doc

* doc

* doc

* test

---------

Co-authored-by: Li Jiang <bnujli@gmail.com>
2023-05-05 02:48:31 +00:00
Li Jiang
8b2411b219 update spark session in spark tests (#1006)
* add mlflow and spark integration tests

* remove unused params

* remove mlflow tests
2023-05-03 09:59:29 +00:00
Li Jiang
fd1f36597b update max_spark_parallelism to fit in auto-scale spark cluster (#1008)
* update max_spark_parallelism to fit in auto-scale spark cluster

* update test
2023-05-03 09:16:32 +00:00
Susan Xueqing Liu
00c30a398e fix NLP zero division error (#1009)
* fix NLP zero division error

* set predictions to None

* set predictions to None

* set predictions to None

* refactor

* refactor

---------

Co-authored-by: Li Jiang <lijiang1@microsoft.com>
Co-authored-by: Chi Wang <wang.chi@microsoft.com>
Co-authored-by: Li Jiang <bnujli@gmail.com>
2023-05-03 05:50:28 +00:00
garar
31864d2d77 Add mlflow_logging param (#1015)
Co-authored-by: Chi Wang <wang.chi@microsoft.com>
2023-05-03 03:09:04 +00:00
Chi Wang
19aee67f55 coding agent; logging (#1011)
* coding agent

* tsp

* tsp

* aoai

* logging

* compact

* Handle Import Error

* cost function

* reset counter; doc

* reset_counter

* home page update

* use case

* catboost in linux

* catboost

* catboost

* catboost

* doc

* intro

* catboost
2023-05-02 20:38:23 +00:00
Li Jiang
39b9a9a417 Fix catboost failure in mac-os python<3.9 (#1020) 2023-05-02 14:19:56 +00:00
Chi Wang
6d7fb3d786 raise content_filter error (#1018)
* raise content_filter error

* import error handling
2023-04-29 18:46:28 +00:00
Qingyun Wu
06cd3f52e5 update readme (#1014)
Co-authored-by: Li Jiang <lijiang1@microsoft.com>
2023-04-28 06:38:09 +00:00
Jirka Borovec
73bb6e7667 pyproject.toml & switch to Ruff (#976)
* unify config to pyproject.toml
replace flake8 with Ruff

* drop configs

* update

* fixing

* Apply suggestions from code review

Co-authored-by: Zvi Baratz <z.baratz@gmail.com>

* setup

* ci

* pr template

* reword

---------

Co-authored-by: Zvi Baratz <z.baratz@gmail.com>
Co-authored-by: Li Jiang <lijiang1@microsoft.com>
2023-04-28 01:54:55 +00:00
Anupam
a8752b6aa0 fixed sentence misplace #998 (#1010) 2023-04-26 15:07:33 +00:00
Sayan Roy
e9cd6a058c fixing the typo #990 (#994)
* fixing the typo #990

* Update website/docs/Use-Cases/Auto-Generation.md

Co-authored-by: Chi Wang <wang.chi@microsoft.com>

* removing extra space : Update website/docs/Use-Cases/Auto-Generation.md

Co-authored-by: Chi Wang <wang.chi@microsoft.com>

* Update website/docs/Use-Cases/Auto-Generation.md

Co-authored-by: Chi Wang <wang.chi@microsoft.com>

* Update website/docs/Use-Cases/Auto-Generation.md

---------

Co-authored-by: Chi Wang <wang.chi@microsoft.com>
2023-04-26 05:48:09 +00:00
Chi Wang
f097c20f86 version update post release v1.2.2 (#1005) 2023-04-25 04:48:17 +00:00
Chi Wang
fa5ccea862 extract code from text; solve_problem; request_timeout in config; improve code (#999)
* extract code from text

* solve_problem; request_timeout in config

* improve

* move import statement

* improve code

* generate assertions

* constant

* configs for implement; voting

* doc

* execute code in docker

* success indicator of code executation in docker

* success indicator

* execute code

* strip n

* add cost in generate_code

* add docstr

* filename

* bytes

* check docker version

* print log

* python test

* remove api key address

* rename exit code

* success exit code

* datasets

* exit code

* recover openai tests

* cache and pattern match

* wait

* wait

* cache and test

* timeout test

* python image name and skip macos

* windows image

* docker images

* volume path and yaml

* win path -> posix

* extensions

* path

* path

* path

* path

* path

* path

* path

* path

* path

* path

* path

* skip windows

* path

* timeout in windows

* use_docker

* use_docker

* hot fix from #1000

---------

Co-authored-by: Qingyun Wu <qingyun.wu@psu.edu>
2023-04-23 11:50:29 +00:00
Susan Xueqing Liu
7114b8f742 fix zerodivision (#1000)
* fix zerodivision

* update

* remove final

---------

Co-authored-by: Li Jiang <lijiang1@microsoft.com>
2023-04-23 03:55:51 +00:00
Chi Wang
da0d8c05e1 Blog post for LLM tuning (#986)
* outline

* revision

* eval function signature

* first draft

* link

* format

* example

* cleanup

* average

* move figure

* tldr

* bold

* bold

* tag
2023-04-22 04:41:16 +00:00
Susan Xueqing Liu
99bb0a8425 update nlp notebook (#940)
* update nlp notebook

* rerun

* rerun

* removing redundant in notebook

* remove redundant content in nlp notebook

* update notebook

* update plot

* update plot

* update plot

---------

Co-authored-by: Li Jiang <lijiang1@microsoft.com>
2023-04-17 17:29:36 +00:00
60 changed files with 9502 additions and 9720 deletions

View File

@@ -1,5 +0,0 @@
[flake8]
ignore = E203, E266, E501, W503, F403, F401, C901
max-line-length = 127
max-complexity = 10
select = B,C,E,F,W,T4,B9

View File

@@ -12,7 +12,7 @@
## Checks
- [ ] I've used [pre-commit](https://microsoft.github.io/FLAML/docs/Contribute#pre-commit) to lint the changes in this PR, or I've made sure [lint with flake8](https://github.com/microsoft/FLAML/blob/816a82a1155b4de4705b21a615ccdff67c6da379/.github/workflows/python-package.yml#L54-L59) output is two 0s.
<!-- - I've used [pre-commit](https://microsoft.github.io/FLAML/docs/Contribute#pre-commit) to lint the changes in this PR (note the same in integrated in our CI checks). -->
- [ ] I've included any doc changes needed for https://microsoft.github.io/FLAML/. See https://microsoft.github.io/FLAML/docs/Contribute#documentation to build and test documentation locally.
- [ ] I've added tests (if relevant) corresponding to the changes introduced in this PR.
- [ ] I've made sure all auto checks have passed.

View File

@@ -7,10 +7,10 @@ on:
pull_request:
branches: ['main']
paths:
- 'flaml/integrations/oai/**'
- 'test/openai/**'
- 'notebook/integrate_openai.ipynb'
- 'notebook/integrate_chatgpt_math.ipynb'
- 'flaml/autogen/**'
- 'test/autogen/**'
- 'notebook/autogen_openai_completion.ipynb'
- 'notebook/autogen_chatgpt_gpt4.ipynb'
- '.github/workflows/openai.yml'
jobs:
@@ -18,7 +18,7 @@ jobs:
strategy:
matrix:
os: [ubuntu-latest]
python-version: [3.9]
python-version: ["3.9", "3.10", "3.11"]
runs-on: ${{ matrix.os }}
environment: openai
steps:
@@ -29,18 +29,31 @@ jobs:
python-version: ${{ matrix.python-version }}
- name: Install packages and dependencies
run: |
docker --version
python -m pip install --upgrade pip wheel
pip install -e .
pip install -e .[autogen,blendsearch]
python -c "import flaml"
pip install -e .[openai]
pip install coverage pytest datasets
- name: Coverage
if: matrix.python-version == '3.9'
env:
OPENAI_API_KEY: ${{ secrets.OPENAI_API_KEY }}
AZURE_OPENAI_API_KEY: ${{ secrets.AZURE_OPENAI_API_KEY }}
AZURE_OPENAI_API_BASE: ${{ secrets.AZURE_OPENAI_API_BASE }}
run: |
pip install coverage pytest datasets nbconvert nbformat ipykernel
coverage run -a -m pytest test/openai
coverage run -a -m pytest test/autogen
coverage xml
cat "$(pwd)/test/openai/executed_openai_notebook_output.txt"
- name: Coverage and check notebook outputs
if: matrix.python-version != '3.9'
env:
OPENAI_API_KEY: ${{ secrets.OPENAI_API_KEY }}
AZURE_OPENAI_API_KEY: ${{ secrets.AZURE_OPENAI_API_KEY }}
AZURE_OPENAI_API_BASE: ${{ secrets.AZURE_OPENAI_API_BASE }}
run: |
pip install nbconvert nbformat ipykernel
coverage run -a -m pytest test/autogen/oai/test_notebook.py
coverage xml
cat "$(pwd)/test/autogen/oai/executed_openai_notebook_output.txt"
- name: Upload coverage to Codecov
uses: codecov/codecov-action@v3
with:

View File

@@ -82,12 +82,6 @@ jobs:
run: |
# Uninstall pyspark to test env without pyspark
pip uninstall -y pyspark
- name: Lint with flake8
run: |
# stop the build if there are Python syntax errors or undefined names
flake8 . --count --select=E9,F63,F7,F82 --show-source --statistics
# exit-zero treats all errors as warnings. The GitHub editor is 127 chars wide
flake8 . --count --exit-zero --max-complexity=10 --max-line-length=127 --statistics
- name: Test with pytest
if: (matrix.python-version != '3.7' || matrix.os == 'macos-latest') && matrix.python-version != '3.10'
run: |

View File

@@ -7,15 +7,6 @@ ci:
autoupdate_schedule: 'quarterly'
repos:
- repo: https://github.com/psf/black
rev: 23.1.0
hooks:
- id: black
args: ["--line-length=120"]
- repo: https://github.com/pycqa/flake8
rev: 6.0.0
hooks:
- id: flake8
- repo: https://github.com/pre-commit/pre-commit-hooks
rev: v4.4.0
hooks:
@@ -31,3 +22,12 @@ repos:
- id: trailing-whitespace
- id: end-of-file-fixer
- id: no-commit-to-branch
- repo: https://github.com/psf/black
rev: 23.3.0
hooks:
- id: black
- repo: https://github.com/charliermarsh/ruff-pre-commit
rev: v0.0.261
hooks:
- id: ruff
args: ["--fix"]

View File

@@ -3,8 +3,8 @@
[![Build](https://github.com/microsoft/FLAML/actions/workflows/python-package.yml/badge.svg)](https://github.com/microsoft/FLAML/actions/workflows/python-package.yml)
![Python Version](https://img.shields.io/badge/3.7%20%7C%203.8%20%7C%203.9%20%7C%203.10-blue)
[![Downloads](https://pepy.tech/badge/flaml)](https://pepy.tech/project/flaml)
[![Join the chat at https://gitter.im/FLAMLer/community](https://badges.gitter.im/FLAMLer/community.svg)](https://gitter.im/FLAMLer/community?utm_source=badge&utm_medium=badge&utm_campaign=pr-badge&utm_content=badge)
[![](https://img.shields.io/discord/1025786666260111483?logo=discord&style=flat)](https://discord.gg/Cppx2vSPVP)
<!-- [![Join the chat at https://gitter.im/FLAMLer/community](https://badges.gitter.im/FLAMLer/community.svg)](https://gitter.im/FLAMLer/community?utm_source=badge&utm_medium=badge&utm_campaign=pr-badge&utm_content=badge) -->
# A Fast Library for Automated Machine Learning & Tuning
@@ -16,18 +16,16 @@
:fire: v1.2.0 is released with support for ChatGPT and GPT-4.
:fire: A [lab forum](https://github.com/microsoft/FLAML/tree/tutorial-aaai23/tutorial) on FLAML at AAAI 2023.
:fire: A [hands-on tutorial](https://github.com/microsoft/FLAML/tree/tutorial/tutorial) on FLAML presented at KDD 2022
## What is FLAML
FLAML is a lightweight Python library that finds accurate machine
learning models automatically, efficiently and economically. It frees users from selecting
models and hyperparameters for each model. It can also be used to tune generic hyperparameters for foundation models, MLOps/LMOps workflows, pipelines, mathematical/statistical models, algorithms, computing experiments, software configurations and so on.
FLAML is a lightweight Python library for efficient automation of machine
learning, including selection of
models, hyperparameters, and other tunable choices of an application (e.g., inference hyperparameters for foundation models, configurations in MLOps/LMOps workflows, pipelines, mathematical/statistical models, algorithms, computing experiments, software configurations).
1. For common machine learning or AI tasks like classification, regression, and generation, it quickly finds quality models for user-provided data with low computational resources. It supports both classical machine learning models and deep neural networks, including foundation models such as the GPT series.
1. It is easy to customize or extend. Users can find their desired customizability from a smooth range: minimal customization (computational resource budget), medium customization (e.g., scikit-style learner, search space and metric), or full customization (arbitrary training and evaluation code).
1. It supports fast automatic tuning, capable of handling complex constraints/guidance/early stopping. FLAML is powered by a new, [cost-effective
* For foundation models like the GPT series, it automates the experimentation and optimization of their inference performance to maximize the effectiveness for downstream applications and minimize the inference cost.
* For common machine learning tasks like classification and regression, it quickly finds quality models for user-provided data with low computational resources.
* It is easy to customize or extend. Users can find their desired customizability from a smooth range: minimal customization (computational resource budget), medium customization (e.g., scikit-style learner, search space and metric), or full customization (arbitrary training/inference/evaluation code).
* It supports fast automatic tuning, capable of handling complex constraints/guidance/early stopping. FLAML is powered by a [cost-effective
hyperparameter optimization](https://microsoft.github.io/FLAML/docs/Use-Cases/Tune-User-Defined-Function/#hyperparameter-optimization-algorithm)
and model selection method invented by Microsoft Research, and many followup [research studies](https://microsoft.github.io/FLAML/docs/Research).
@@ -61,6 +59,25 @@ Use the following guides to get started with FLAML in .NET:
## Quickstart
* (New) You can optimize [generations](https://microsoft.github.io/FLAML/docs/Use-Cases/Auto-Generation) by ChatGPT or GPT-4 etc. with your own tuning data, success metrics and budgets.
```python
from flaml import oai
config, analysis = oai.Completion.tune(
data=tune_data,
metric="success",
mode="max",
eval_func=eval_func,
inference_budget=0.05,
optimization_budget=3,
num_samples=-1,
)
```
The automated experimentation and optimization can help you maximize the utility out of these expensive models.
A suite of utilities such as caching and templating are offered to accelerate the experimentation and application development.
* With three lines of code, you can start using this economical and fast
AutoML engine as a [scikit-learn style estimator](https://microsoft.github.io/FLAML/docs/Use-Cases/Task-Oriented-AutoML).
@@ -95,33 +112,15 @@ estimator = LGBMRegressor()
estimator.fit(X_train, y_train)
```
* (New) You can optimize [generations](https://microsoft.github.io/FLAML/docs/Use-Cases/Auto-Generation) by ChatGPT or GPT-4 etc. with your own tuning data, success metrics and budgets.
```python
from flaml import oai
config, analysis = oai.Completion.tune(
data=tune_data,
metric="success",
mode="max",
eval_func=eval_func,
inference_budget=0.05,
optimization_budget=3,
num_samples=-1,
)
```
## Documentation
You can find a detailed documentation about FLAML [here](https://microsoft.github.io/FLAML/) where you can find the API documentation, use cases and examples.
In addition, you can find:
- [Talks](https://www.youtube.com/channel/UCfU0zfFXHXdAd5x-WvFBk5A) and [tutorials](https://github.com/microsoft/FLAML/tree/tutorial/tutorial) about FLAML.
- Research around FLAML [here](https://microsoft.github.io/FLAML/docs/Research).
- FAQ [here](https://microsoft.github.io/FLAML/docs/FAQ).
- Discord [here](https://discord.gg/Cppx2vSPVP).
- Contributing guide [here](https://microsoft.github.io/FLAML/docs/Contribute).

View File

@@ -0,0 +1,2 @@
DEFAULT_MODEL = "gpt-4"
FAST_MODEL = "gpt-3.5-turbo"

View File

@@ -0,0 +1,50 @@
from collections import defaultdict
class Agent:
"""(Experimental) An abstract class for AI agent.
An agent can communicate with other agents and perform actions.
Different agents can differ in what actions they perform in the `receive` method.
"""
def __init__(self, name, system_message=""):
"""
Args:
name (str): name of the agent
system_message (str): system message to be sent to the agent
"""
# empty memory
self._memory = []
# a dictionary of conversations, default value is list
self._conversations = defaultdict(list)
self._name = name
self._system_message = system_message
@property
def name(self):
"""Get the name of the agent."""
return self._name
def _remember(self, memory):
"""Remember something."""
self._memory.append(memory)
def _send(self, message, recipient):
"""Send a message to another agent."""
self._conversations[recipient.name].append({"content": message, "role": "assistant"})
recipient.receive(message, self)
def _receive(self, message, sender):
"""Receive a message from another agent."""
print("\n****", self.name, "received message from", sender.name, "****\n")
print(message)
self._conversations[sender.name].append({"content": message, "role": "user"})
def receive(self, message, sender):
"""Receive a message from another agent.
This method is called by the sender.
It needs to be overriden by the subclass to perform followup actions.
"""
self._receive(message, sender)
# perform actions based on the message

View File

@@ -0,0 +1,35 @@
from .agent import Agent
from flaml.autogen.code_utils import DEFAULT_MODEL
from flaml import oai
class ChatAgent(Agent):
"""(Experimental) Chat."""
DEFAULT_SYSTEM_MESSAGE = """You are a chat agent.
"""
DEFAULT_CONFIG = {
"model": DEFAULT_MODEL,
}
def __init__(self, name, system_message=DEFAULT_SYSTEM_MESSAGE, work_dir=None, **config):
"""
Args:
name (str): agent name
system_message (str): system message to be sent to the agent
work_dir (str): working directory for the agent to execute code
config (dict): other configurations.
"""
super().__init__(name, system_message)
self._work_dir = work_dir
self._config = self.DEFAULT_CONFIG.copy()
self._config.update(config)
self._sender_dict = {}
def receive(self, message, sender):
super().receive(message, sender)
responses = oai.ChatCompletion.create(messages=self._conversations[sender.name], **self._config)
# cost = oai.ChatCompletion.cost(responses)
response = oai.ChatCompletion.extract_text(responses)[0]
self._send(response, sender)

View File

@@ -0,0 +1,41 @@
from .agent import Agent
from flaml.autogen.code_utils import DEFAULT_MODEL
from flaml import oai
class PythonAgent(Agent):
"""(Experimental) Suggest code blocks."""
DEFAULT_SYSTEM_MESSAGE = """You suggest python code (in a python coding block) for a user to execute for a given task. If you want the user to save the code in a file before executing it, put # filename: <filename> inside the code block as the first line. Finish the task smartly. Don't suggest shell command. Don't include multiple code blocks in one response. Use 'print' function for the output when relevant. Check the execution result returned by the user.
If the result indicates there is an error, fix the error and output the code again.
Reply "TERMINATE" in the end when the task is done.
"""
DEFAULT_CONFIG = {
"model": DEFAULT_MODEL,
}
def __init__(self, name, system_message=DEFAULT_SYSTEM_MESSAGE, **config):
"""
Args:
name (str): agent name
system_message (str): system message to be sent to the agent
config (dict): other configurations.
"""
super().__init__(name, system_message)
self._config = self.DEFAULT_CONFIG.copy()
self._config.update(config)
self._sender_dict = {}
def receive(self, message, sender):
if sender.name not in self._sender_dict:
self._sender_dict[sender.name] = sender
self._conversations[sender.name] = [{"content": self._system_message, "role": "system"}]
super().receive(message, sender)
responses = oai.ChatCompletion.create(messages=self._conversations[sender.name], **self._config)
response = oai.ChatCompletion.extract_text(responses)[0]
self._send(response, sender)
def reset(self):
self._sender_dict.clear()
self._conversations.clear()

View File

@@ -0,0 +1,122 @@
from .agent import Agent
from flaml.autogen.code_utils import extract_code, execute_code
from collections import defaultdict
class UserProxyAgent(Agent):
"""(Experimental) A proxy agent for the user, that can execute code and provide feedback to the other agents."""
MAX_CONSECUTIVE_AUTO_REPLY = 100 # maximum number of consecutive auto replies (subject to future change)
def __init__(
self,
name,
system_message="",
work_dir=None,
human_input_mode="ALWAYS",
max_consecutive_auto_reply=None,
is_termination_msg=None,
use_docker=True,
**config,
):
"""
Args:
name (str): name of the agent
system_message (str): system message to be sent to the agent
work_dir (str): working directory for the agent
human_input_mode (bool): whether to ask for human inputs every time a message is received.
Possible values are "ALWAYS", "TERMINATE", "NEVER".
(1) When "ALWAYS", the agent prompts for human input every time a message is received.
Under this mode, the conversation stops when the human input is "exit",
or when is_termination_msg is True and there is no human input.
(2) When "TERMINATE", the agent only prompts for human input only when a termination message is received or
the number of auto reply reaches the max_consecutive_auto_reply.
(3) When "NEVER", the agent will never prompt for human input. Under this mode, the conversation stops
when the number of auto reply reaches the max_consecutive_auto_reply or or when is_termination_msg is True.
max_consecutive_auto_reply (int): the maximum number of consecutive auto replies.
default: None (no limit provided, class attribute MAX_CONSECUTIVE_AUTO_REPLY will be used as the limit in this case).
The limit only plays a role when human_input_mode is not "ALWAYS".
is_termination_msg (function): a function that takes a message and returns a boolean value.
This function is used to determine if a received message is a termination message.
config (dict): other configurations.
"""
super().__init__(name, system_message)
self._work_dir = work_dir
self._human_input_mode = human_input_mode
self._is_termination_msg = (
is_termination_msg if is_termination_msg is not None else (lambda x: x == "TERMINATE")
)
self._config = config
self._max_consecutive_auto_reply = (
max_consecutive_auto_reply if max_consecutive_auto_reply is not None else self.MAX_CONSECUTIVE_AUTO_REPLY
)
self._consecutive_auto_reply_counter = defaultdict(int)
self._use_docker = use_docker
def _execute_code(self, code, lang):
"""Execute the code and return the result."""
if lang in ["bash", "shell"]:
if not code.startswith("python "):
return 1, f"please do not suggest bash or shell commands like {code}"
file_name = code[len("python ") :]
exitcode, logs = execute_code(filename=file_name, work_dir=self._work_dir, use_docker=self._use_docker)
logs = logs.decode("utf-8")
elif lang == "python":
if code.startswith("# filename: "):
filename = code[11 : code.find("\n")].strip()
else:
filename = None
exitcode, logs = execute_code(code, work_dir=self._work_dir, filename=filename, use_docker=self._use_docker)
logs = logs.decode("utf-8")
else:
# TODO: could this happen?
exitcode, logs = 1, f"unknown language {lang}"
# raise NotImplementedError
return exitcode, logs
def auto_reply(self, message, sender, default_reply=""):
"""Generate an auto reply."""
code, lang = extract_code(message)
if lang == "unknown":
# no code block is found, lang should be "unknown"
self._send(default_reply, sender)
else:
# try to execute the code
exitcode, logs = self._execute_code(code, lang)
exitcode2str = "execution succeeded" if exitcode == 0 else "execution failed"
self._send(f"exitcode: {exitcode} ({exitcode2str})\nCode output: {logs}", sender)
def receive(self, message, sender):
"""Receive a message from the sender agent.
Once a message is received, this function sends a reply to the sender or simply stop.
The reply can be generated automatically or entered manually by a human.
"""
super().receive(message, sender)
# default reply is empty (i.e., no reply, in this case we will try to generate auto reply)
reply = ""
if self._human_input_mode == "ALWAYS":
reply = input(
"Provide feedback to the sender. Press enter to skip and use auto-reply, or type 'exit' to end the conversation: "
)
elif self._consecutive_auto_reply_counter[
sender.name
] >= self._max_consecutive_auto_reply or self._is_termination_msg(message):
if self._human_input_mode == "TERMINATE":
reply = input(
"Please give feedback to the sender. (Press enter or type 'exit' to stop the conversation): "
)
reply = reply if reply else "exit"
else:
# this corresponds to the case when self._human_input_mode == "NEVER"
reply = "exit"
if reply == "exit" or (self._is_termination_msg(message) and not reply):
return
elif reply:
# reset the consecutive_auto_reply_counter
self._consecutive_auto_reply_counter[sender.name] = 0
self._send(reply, sender)
return
self._consecutive_auto_reply_counter[sender.name] += 1
self.auto_reply(message, sender, default_reply=reply)

View File

@@ -1,58 +1,285 @@
import signal
import subprocess
import sys
import os
import pathlib
from typing import List, Dict, Tuple, Optional, Union, Callable
from flaml import oai
import re
import time
from hashlib import md5
from flaml.autogen import oai, DEFAULT_MODEL, FAST_MODEL
# Regular expression for finding a code block
CODE_BLOCK_PATTERN = r"```(\w*)\n(.*?)\n```"
WORKING_DIR = os.path.join(os.path.dirname(os.path.realpath(__file__)), "extensions")
def extract_code(text: str, pattern: str = CODE_BLOCK_PATTERN) -> str:
# Use a regular expression to find the code block
match = re.search(pattern, text, flags=re.DOTALL)
# If a match is found, return the code
if match:
return match.group(2), match.group(1)
# If no code block is found, return the whole text
return text, "unknown"
def generate_code(pattern: str = CODE_BLOCK_PATTERN, **config) -> Tuple[str, float]:
"""Generate code.
Args:
pattern (Optional, str): The regular expression pattern for finding the code block.
The default pattern is for finding a code block in a markdown file.
config (Optional, dict): The configuration for the API call.
Returns:
str: The generated code.
float: The cost of the generation.
"""
response = oai.Completion.create(**config)
return extract_code(oai.Completion.extract_text(response)[0], pattern), response["cost"]
_IMPROVE_FUNCTION_CONFIG = {
"prompt": """Improve the function '{func_name}' to achieve the objective '{objective}'.
The current implementation of the function is as follows:
{file_string}""",
"model": DEFAULT_MODEL,
"request_timeout": 600,
}
def improve_function(file_name, func_name, objective, **config):
"""(work in progress) Improve the function to achieve the objective."""
params = {**_IMPROVE_FUNCTION_CONFIG, **config}
# read the entire file into a str
with open(file_name, "r") as f:
file_string = f.read()
response = oai.Completion.create(
{"func_name": func_name, "objective": objective, "file_string": file_string}, **params
)
return oai.Completion.extract_text(response)[0], response["cost"]
_IMPROVE_CODE_CONFIG = {
"prompt": """Analyze the code in the following files and return a list of suggestions for improvement{followup}, to achieve the objective of '{objective}'.
{code}
""",
"model": DEFAULT_MODEL,
"request_timeout": 900,
}
def improve_code(files, objective, suggest_only=True, **config):
"""Improve the code to achieve a given objective.
Args:
files (list): A list of file names containing the source code.
objective (str): The objective to achieve.
suggest_only (bool): Whether to return only the suggestions or the improved code.
config (Optional, dict): The configuration for the API call.
Returns:
str: The improved code if suggest_only=False; a list of suggestions if suggest_only=True (default).
float: The cost of the generation.
"""
code = ""
for file_name in files:
# read the entire file into a string
with open(file_name, "r") as f:
file_string = f.read()
code += f"""{file_name}:
{file_string}
"""
params = {**_IMPROVE_CODE_CONFIG, **config}
followup = "" if suggest_only else " followed by the improved code"
response = oai.Completion.create({"objective": objective, "code": code, "followup": followup}, **params)
return oai.Completion.extract_text(response)[0], response["cost"]
def timeout_handler(signum, frame):
raise TimeoutError("Timed out!")
def execute_code(code: str, max_exec_time: Optional[int] = 3):
signal.signal(signal.SIGALRM, timeout_handler)
code = code.strip()
with open("codetest.py", "w") as fout:
fout.write(code)
try:
signal.alarm(max_exec_time)
result = subprocess.run(
[sys.executable, "codetest.py"],
stdout=subprocess.DEVNULL,
stderr=subprocess.PIPE,
)
signal.alarm(0)
except TimeoutError:
return 0
return int(result.returncode == 0)
def execute_code(
code: Optional[str] = None,
timeout: Optional[int] = 600,
filename: Optional[str] = None,
work_dir: Optional[str] = None,
use_docker: Optional[bool] = True,
) -> Tuple[int, bytes]:
"""Execute code in a docker container.
This function is not tested on MacOS.
Args:
code (Optional, str): The code to execute.
If None, the code from the file specified by filename will be executed.
Either code or filename must be provided.
timeout (Optional, int): The maximum execution time in seconds.
filename (Optional, str): The file name to save the code or where the code is stored when `code` is None.
If None, a file with a randomly generated name will be created.
The randomly generated file will be deleted after execution.
The file name must be a relative path. Relative paths are relative to the working directory.
work_dir (Optional, str): The working directory for the code execution.
If None, a default working directory will be used.
The default working directory is the "extensions" directory under
"xxx/flaml/autogen", where "xxx" is the path to the flaml package.
use_docker (Optional, bool): Whether to use a docker container for code execution.
If True, the code will be executed in a docker container.
If False, the code will be executed in the current environment.
Default is True. If the code is executed in the current environment,
the code must be trusted.
Returns:
int: 0 if the code executes successfully.
bytes: The error message if the code fails to execute; the stdout otherwise.
"""
assert code is not None or filename is not None, "Either code or filename must be provided."
original_filename = filename
if filename is None:
code_hash = md5(code.encode()).hexdigest()
# create a file with a automatically generated name
filename = f"tmp_code_{code_hash}.py"
if work_dir is None:
work_dir = WORKING_DIR
filepath = os.path.join(work_dir, filename)
file_dir = os.path.dirname(filepath)
os.makedirs(file_dir, exist_ok=True)
if code is not None:
with open(filepath, "w") as fout:
fout.write(code)
# check if already running in a docker container
in_docker_container = os.path.exists("/.dockerenv")
if not use_docker or in_docker_container:
# already running in a docker container
signal.signal(signal.SIGALRM, timeout_handler)
try:
signal.alarm(timeout)
# run the code in a subprocess in the current docker container in the working directory
result = subprocess.run(
[sys.executable, filename],
cwd=work_dir,
capture_output=True,
)
signal.alarm(0)
except TimeoutError:
if original_filename is None:
os.remove(filepath)
return 1, "Timeout"
if original_filename is None:
os.remove(filepath)
return result.returncode, result.stderr if result.returncode else result.stdout
import docker
from requests.exceptions import ReadTimeout, ConnectionError
# create a docker client
client = docker.from_env()
image_list = ["python:3-alpine", "python:3", "python:3-windowsservercore"]
for image in image_list:
# check if the image exists
try:
client.images.get(image)
break
except docker.errors.ImageNotFound:
# pull the image
print("Pulling image", image)
try:
client.images.pull(image)
break
except docker.errors.DockerException:
print("Failed to pull image", image)
# get a randomized str based on current time to wrap the exit code
exit_code_str = f"exitcode{time.time()}"
abs_path = pathlib.Path(work_dir).absolute()
# if sys.platform == "win32":
# abs_path = str(abs_path).replace("\\", "/")
# abs_path = f"/{abs_path[0].lower()}{abs_path[2:]}"
# create a docker container
container = client.containers.run(
image,
command=[
"sh",
"-c",
f"python {filename}; exit_code=$?; echo -n {exit_code_str}; echo -n $exit_code; echo {exit_code_str}",
],
working_dir="/workspace",
detach=True,
# get absolute path to the working directory
volumes={abs_path: {"bind": "/workspace", "mode": "rw"}},
)
start_time = time.time()
while container.status != "exited" and time.time() - start_time < timeout:
# Reload the container object
container.reload()
if container.status != "exited":
container.stop()
container.remove()
if original_filename is None:
os.remove(filepath)
return 1, "Timeout"
# try:
# container.wait(timeout=timeout)
# except (ReadTimeout, ConnectionError):
# container.stop()
# container.remove()
# if original_filename is None:
# os.remove(filepath)
# return 1, "Timeout"
# get the container logs
logs = container.logs().decode("utf-8").rstrip()
# remove the container
container.remove()
# check if the code executed successfully
exit_code = container.attrs["State"]["ExitCode"]
if exit_code == 0:
# extract the exit code from the logs
pattern = re.compile(f"{exit_code_str}(\\d+){exit_code_str}")
match = pattern.search(logs)
exit_code = int(match.group(1))
# remove the exit code from the logs
logs = pattern.sub("", logs)
logs = bytes(logs, "utf-8")
if original_filename is None:
os.remove(filepath)
# return the exit code and logs
return exit_code, logs
def generate_assertions(definition: str, model: Optional[str] = "gpt-3.5-turbo") -> Tuple[str, float]:
_GENERATE_ASSERTIONS_CONFIG = {
"prompt": """Given the signature and docstring, write the exactly same number of assertion(s) for the provided example(s) in the docstring, without assertion messages.
func signature:
{definition}
assertions:""",
"model": FAST_MODEL,
"max_tokens": 256,
"stop": "\n\n",
}
def generate_assertions(definition: str, **config) -> Tuple[str, float]:
"""Generate assertions for a function.
Args:
definition (str): The function definition, including the signature and docstr.
model (str): The model used for generation.
config (Optional, dict): The configuration for the API call.
Returns:
str: The generated assertions.
float: The cost of the generation.
"""
prompt = """Given the signature and docstring, write the exactly same number of assertion(s) for the provided example(s) in the docstring, without assertion messages.
func signature:
{definition}
assertions:"""
params = {**_GENERATE_ASSERTIONS_CONFIG, **config}
response = oai.Completion.create(
{"definition": definition},
model=model,
prompt=prompt,
max_tokens=256,
stop="\n\n",
**params,
)
cost = oai.Completion.cost(model, response)
assertions = oai.Completion.extract_text(response)[0]
return assertions, cost
return assertions, response["cost"]
def _remove_check(response):
@@ -70,6 +297,8 @@ def eval_function_completions(
test: Optional[str] = None,
entry_point: Optional[str] = None,
assertions: Optional[Union[str, Callable[[str], Tuple[str, float]]]] = None,
timeout: Optional[float] = 3,
use_docker: Optional[bool] = True,
) -> Dict:
"""Select a response from a list of responses for the function completion task (using generated assertions), and/or evaluate if the task is successful using a gold test.
@@ -80,6 +309,7 @@ def eval_function_completions(
entry_point (Optional, str): The name of the function.
assertions (Optional, str or Callable): The assertion code which serves as a filter of the responses, or an assertion generator.
When provided, only the responses that pass the assertions will be considered for the actual test (if provided).
timeout (Optional, float): The timeout for executing the code.
Returns:
dict: The success metrics.
@@ -95,7 +325,7 @@ def eval_function_completions(
if response.startswith("def")
else f"{definition}{response}\n{test}\ncheck({entry_point})"
)
success = execute_code(code)
success = execute_code(code, timeout=timeout, use_docker=use_docker)[0] == 0
success_list.append(success)
return {
"expected_success": 1 - pow(1 - sum(success_list) / n, n),
@@ -112,7 +342,7 @@ def eval_function_completions(
code = (
f"{response}\n{assertions}" if response.startswith("def") else f"{definition}{response}\n{assertions}"
)
succeed_assertions = execute_code(code)
succeed_assertions = execute_code(code, timeout=timeout, use_docker=use_docker)[0] == 0
if succeed_assertions:
break
else:
@@ -132,7 +362,7 @@ def eval_function_completions(
if response.startswith("def")
else f"{definition}{response}\n{test}\ncheck({entry_point})"
)
success = execute_code(code_test)
success = execute_code(code_test, timeout=timeout, use_docker=use_docker)[0] == 0
return {
"index_selected": i,
"succeed_assertions": succeed_assertions,
@@ -142,9 +372,37 @@ def eval_function_completions(
}
_FUNC_COMPLETION_PROMPT = "# Python 3{definition}"
_FUNC_COMPLETION_STOP = ["\nclass", "\ndef", "\nif", "\nprint"]
_IMPLEMENT_CONFIGS = [
{"model": FAST_MODEL, "prompt": _FUNC_COMPLETION_PROMPT, "temperature": 0, "seed": 0},
{"model": FAST_MODEL, "prompt": _FUNC_COMPLETION_PROMPT, "stop": _FUNC_COMPLETION_STOP, "n": 7, "seed": 0},
{"model": DEFAULT_MODEL, "prompt": _FUNC_COMPLETION_PROMPT, "temperature": 0, "seed": 1},
{"model": DEFAULT_MODEL, "prompt": _FUNC_COMPLETION_PROMPT, "stop": _FUNC_COMPLETION_STOP, "n": 2, "seed": 2},
{"model": DEFAULT_MODEL, "prompt": _FUNC_COMPLETION_PROMPT, "stop": _FUNC_COMPLETION_STOP, "n": 1, "seed": 2},
]
class PassAssertionFilter:
def __init__(self, assertions):
self._assertions = assertions
self.cost = 0
self.metrics = self.responses = None
def pass_assertions(self, context, response, **_):
"""Check if the response passes the assertions."""
responses = oai.Completion.extract_text(response)
metrics = eval_function_completions(responses, context["definition"], assertions=self._assertions)
self._assertions = metrics["assertions"]
self.cost += metrics["gen_cost"]
self.metrics = metrics
self.responses = responses
return metrics["succeed_assertions"]
def implement(
definition: str,
configs: List[Dict],
configs: Optional[List[Dict]] = None,
assertions: Optional[Union[str, Callable[[str], Tuple[str, float]]]] = generate_assertions,
) -> Tuple[str, float]:
"""Implement a function from a definition.
@@ -160,14 +418,22 @@ def implement(
int: The index of the configuration which generates the implementation.
"""
cost = 0
configs = configs or _IMPLEMENT_CONFIGS
if len(configs) > 1 and callable(assertions):
assertions, cost = assertions(definition)
for i, config in enumerate(configs):
response = oai.Completion.create({"definition": definition}, **config)
cost += oai.Completion.cost(config["model"], response)
responses = oai.Completion.extract_text(response)
metrics = eval_function_completions(responses, definition, assertions=assertions)
assertions = metrics["assertions"]
cost += metrics["gen_cost"]
if metrics["succeed_assertions"] or i == len(configs) - 1:
return responses[metrics["index_selected"]], cost, i
assertion_filter = PassAssertionFilter(assertions)
response = oai.Completion.create(
{"definition": definition}, config_list=configs, filter_func=assertion_filter.pass_assertions
)
cost += assertion_filter.cost + response["cost"]
return assertion_filter.responses[assertion_filter.metrics["index_selected"]], cost, response["config_id"]
# for i, config in enumerate(configs):
# response = oai.Completion.create({"definition": definition}, **config)
# cost += oai.Completion.cost(response)
# responses = oai.Completion.extract_text(response)
# metrics = eval_function_completions(responses, definition, assertions=assertions)
# assertions = metrics["assertions"]
# cost += metrics["gen_cost"]
# if metrics["succeed_assertions"] or i == len(configs) - 1:
# return responses[metrics["index_selected"]], cost, i

View File

View File

@@ -1,4 +1,27 @@
from typing import Optional
from flaml.autogen import oai, DEFAULT_MODEL
_MATH_PROMPT = "{problem} Solve the problem carefully. Simplify your answer as much as possible. Put the final answer in \\boxed{{}}."
_MATH_CONFIG = {
"model": DEFAULT_MODEL,
"prompt": _MATH_PROMPT,
}
def solve_problem(problem: str, **config) -> str:
"""(work in progress) Solve the math problem.
Args:
problem (str): The problem statement.
config (Optional, dict): The configuration for the API call.
Returns:
str: The solution to the problem.
"""
params = {**_MATH_CONFIG, **config}
response = oai.Completion.create({"problem": problem}, **params)
results = eval_math_responses(oai.Completion.extract_text(response))
return results.get("voted_answer"), response["cost"]
def remove_boxed(string: str) -> Optional[str]:

View File

@@ -1,3 +1,4 @@
from flaml.autogen.oai.completion import Completion, ChatCompletion
from flaml.autogen.oai.openai_utils import get_config_list, config_list_gpt4_gpt35, config_list_openai_aoai
__all__ = ["Completion", "ChatCompletion"]
__all__ = ["Completion", "ChatCompletion", "get_config_list", "config_list_gpt4_gpt35", "config_list_openai_aoai"]

View File

@@ -2,10 +2,13 @@ from time import sleep
import logging
import numpy as np
import time
from typing import List, Optional, Dict
from typing import List, Optional, Dict, Callable, Any
import sys
import shutil
from flaml import tune, BlendSearch
from flaml.tune.space import is_constant
from flaml.automl.logger import logger_formatter
from .openai_utils import get_key
try:
import openai
@@ -16,12 +19,15 @@ try:
InvalidRequestError,
APIConnectionError,
Timeout,
AuthenticationError,
)
from openai import Completion as openai_Completion
import diskcache
ERROR = None
except ImportError:
ERROR = ImportError("please install flaml[openai] option to use the flaml.oai subpackage.")
openai_Completion = object
logger = logging.getLogger(__name__)
if not logger.handlers:
# Add the console handler.
@@ -30,23 +36,7 @@ if not logger.handlers:
logger.addHandler(_ch)
def get_key(config):
"""Get a unique identifier of a configuration.
Args:
config (dict or list): A configuration.
Returns:
tuple: A unique identifier which can be used as a key for a dict.
"""
if isinstance(config, dict):
return tuple(get_key(x) for x in sorted(config.items()))
if isinstance(config, list):
return tuple(get_key(x) for x in config)
return config
class Completion:
class Completion(openai_Completion):
"""A class for OpenAI completion API.
It also supports: ChatCompletion, Azure OpenAI API.
@@ -93,7 +83,7 @@ class Completion:
),
"temperature_or_top_p": tune.choice(
[
{"temperature": tune.uniform(0, 1)},
{"temperature": tune.uniform(0, 2)},
{"top_p": tune.uniform(0, 1)},
]
),
@@ -115,8 +105,10 @@ class Completion:
_total_cost = 0
optimization_budget = None
_history_dict = _count_create = None
@classmethod
def set_cache(cls, seed=41, cache_path=".cache"):
def set_cache(cls, seed: Optional[int] = 41, cache_path_root: Optional[str] = ".cache"):
"""Set cache path.
Args:
@@ -126,66 +118,134 @@ class Completion:
The complete cache path will be {cache_path}/{seed}.
"""
cls.seed = seed
cls.cache_path = f"{cache_path}/{seed}"
cls.cache_path = f"{cache_path_root}/{seed}"
@classmethod
def _get_response(cls, config: dict, eval_only=False, use_cache=True):
def clear_cache(cls, seed: Optional[int] = None, cache_path_root: Optional[str] = ".cache"):
"""Clear cache.
Args:
seed (int, Optional): The integer identifier for the pseudo seed.
If omitted, all caches under cache_path_root will be cleared.
cache_path (str, Optional): The root path for the cache.
The complete cache path will be {cache_path}/{seed}.
"""
if seed is None:
shutil.rmtree(cache_path_root, ignore_errors=True)
return
with diskcache.Cache(f"{cache_path_root}/{seed}") as cache:
cache.clear()
@classmethod
def _book_keeping(cls, config: Dict, response):
"""Book keeping for the created completions."""
if response != -1 and "cost" not in response:
response["cost"] = cls.cost(response)
if cls._history_dict is None:
return
if cls._history_compact:
value = {
"created_at": [],
"cost": [],
}
if "messages" in config:
messages = config["messages"]
if len(messages) > 1 and messages[-1]["role"] != "assistant":
existing_key = get_key(messages[:-1])
value = cls._history_dict.pop(existing_key, value)
key = get_key(messages + [choice["message"] for choice in response["choices"]])
else:
key = get_key([config["prompt"]] + [choice.get("text") for choice in response["choices"]])
value["created_at"].append(cls._count_create)
value["cost"].append(response["cost"])
cls._history_dict[key] = value
cls._count_create += 1
return
cls._history_dict[cls._count_create] = {
"request": config,
"response": response.to_dict_recursive(),
}
cls._count_create += 1
@classmethod
def _get_response(cls, config: Dict, raise_error=False, use_cache=True):
"""Get the response from the openai api call.
Try cache first. If not found, call the openai api. If the api call fails, retry after retry_time.
"""
config = config.copy()
openai.api_key_path = config.pop("api_key_path", openai.api_key_path)
key = get_key(config)
if use_cache:
response = cls._cache.get(key, None)
if response is not None and (response != -1 or not eval_only):
if response is not None and (response != -1 or not raise_error):
# print("using cached response")
cls._book_keeping(config, response)
return response
openai_completion = openai.ChatCompletion if config["model"] in cls.chat_models else openai.Completion
openai_completion = (
openai.ChatCompletion
if config["model"] in cls.chat_models or issubclass(cls, ChatCompletion)
else openai.Completion
)
start_time = time.time()
request_timeout = cls.request_timeout
while True:
try:
response = openai_completion.create(request_timeout=request_timeout, **config)
cls._cache.set(key, response)
return response
if "request_timeout" in config:
response = openai_completion.create(**config)
else:
response = openai_completion.create(request_timeout=request_timeout, **config)
except (
ServiceUnavailableError,
APIError,
APIConnectionError,
):
# transient error
logger.warning(f"retrying in {cls.retry_time} seconds...", exc_info=1)
logger.info(f"retrying in {cls.retry_time} seconds...", exc_info=1)
sleep(cls.retry_time)
except (RateLimitError, Timeout) as e:
except APIError as err:
error_code = err and err.json_body and err.json_body.get("error")
error_code = error_code and error_code.get("code")
if error_code == "content_filter":
raise
# transient error
logger.info(f"retrying in {cls.retry_time} seconds...", exc_info=1)
sleep(cls.retry_time)
except (RateLimitError, Timeout) as err:
time_left = cls.retry_timeout - (time.time() - start_time + cls.retry_time)
if (
time_left > 0
and isinstance(e, RateLimitError)
and isinstance(err, RateLimitError)
or time_left > request_timeout
and isinstance(e, Timeout)
and isinstance(err, Timeout)
):
logger.info(f"retrying in {cls.retry_time} seconds...", exc_info=1)
elif eval_only:
elif raise_error:
raise
else:
break
if isinstance(e, Timeout):
response = -1
if use_cache and isinstance(err, Timeout):
cls._cache.set(key, response)
logger.warning(
f"Failed to get response from openai api due to getting RateLimitError or Timeout for {cls.retry_timeout} seconds."
)
return response
if isinstance(err, Timeout):
if "request_timeout" in config:
raise
request_timeout <<= 1
request_timeout = min(request_timeout, time_left)
sleep(cls.retry_time)
except InvalidRequestError:
if "azure" == openai.api_type and "model" in config:
if "azure" == config.get("api_type", openai.api_type) and "model" in config:
# azure api uses "engine" instead of "model"
config = config.copy()
config["engine"] = config.pop("model").replace("gpt-3.5-turbo", "gpt-35-turbo")
else:
raise
logger.warning(
f"Failed to get response from openai api due to getting RateLimitError or Timeout for {cls.retry_timeout} seconds."
)
response = -1
cls._cache.set(key, response)
return response
else:
if use_cache:
cls._cache.set(key, response)
cls._book_keeping(config, response)
return response
@classmethod
def _get_max_valid_n(cls, key, max_tokens):
@@ -208,6 +268,7 @@ class Completion:
@classmethod
def _get_region_key(cls, config):
# get a key for the valid/invalid region corresponding to the given config
config = cls._pop_subspace(config, always_copy=False)
return (
config["model"],
config.get("prompt", config.get("messages")),
@@ -224,31 +285,28 @@ class Completion:
invalid_n[max_tokens] = min(num_completions, invalid_n.get(max_tokens, np.inf))
@classmethod
def _pop_subspace(cls, config):
def _pop_subspace(cls, config, always_copy=True):
if "subspace" in config:
config = config.copy()
config.update(config.pop("subspace"))
return config
return config.copy() if always_copy else config
@classmethod
def _get_prompt_messages_from_config(cls, model, config):
prompt, messages = None, None
if model in cls.chat_models:
# either "prompt" should be in config (for being compatible with non-chat models)
# or "messages" should be in config (for tuning chat models only)
prompt = config.get("prompt")
messages = config.get("messages")
# either prompt or messages should be in config, but not both
assert (prompt is None) != (
messages is None
), "Either prompt or messages should be in config for chat models."
if prompt is None:
messages = cls._messages[messages]
else:
prompt = cls._prompts[prompt]
def _get_params_for_create(cls, config: Dict) -> Dict:
"""Get the params for the openai api call from a config in the search space."""
params = cls._pop_subspace(config)
if cls._prompts:
params["prompt"] = cls._prompts[config["prompt"]]
else:
prompt = cls._prompts[config["prompt"]]
return prompt, messages
params["messages"] = cls._messages[config["messages"]]
if "stop" in params:
params["stop"] = cls._stops and cls._stops[params["stop"]]
temperature_or_top_p = params.pop("temperature_or_top_p", None)
if temperature_or_top_p:
params.update(temperature_or_top_p)
if cls._config_list and "config_list" not in params:
params["config_list"] = cls._config_list
return params
@classmethod
def _eval(cls, config: dict, prune=True, eval_only=False):
@@ -257,7 +315,8 @@ class Completion:
Args:
config (dict): Hyperparameter setting for the openai api call.
prune (bool, optional): Whether to enable pruning. Defaults to True.
eval_only (bool, optional): Whether to evaluate only (ignore the inference budget and no timeout).
eval_only (bool, optional): Whether to evaluate only
(ignore the inference budget and do not rasie error when a request fails).
Defaults to False.
Returns:
@@ -265,18 +324,18 @@ class Completion:
"""
cost = 0
data = cls.data
config = cls._pop_subspace(config)
model = config["model"]
params = cls._get_params_for_create(config)
model = params["model"]
data_length = len(data)
price = cls.price1K.get(model)
price_input, price_output = price if isinstance(price, tuple) else (price, price)
inference_budget = getattr(cls, "inference_budget", None)
prune_hp = getattr(cls, "_prune_hp", "n")
metric = cls._metric
config_n = config.get(prune_hp, 1) # default value in OpenAI is 1
max_tokens = config.get("max_tokens", np.inf if model in cls.chat_models else 16)
prompt, messages = cls._get_prompt_messages_from_config(model, config)
stop = cls._stops and cls._stops[config["stop"]]
config_n = params.get(prune_hp, 1) # default value in OpenAI is 1
max_tokens = params.get(
"max_tokens", np.inf if model in cls.chat_models or issubclass(cls, ChatCompletion) else 16
)
target_output_tokens = None
if not cls.avg_input_tokens:
input_tokens = [None] * data_length
@@ -307,12 +366,6 @@ class Completion:
else:
start_n = config_n
region_key = None
params = config.copy()
if "stop" in config:
params["stop"] = stop
temperature_or_top_p = params.pop("temperature_or_top_p", None)
if temperature_or_top_p:
params.update(temperature_or_top_p)
num_completions, previous_num_completions = start_n, 0
n_tokens_list, result, responses_list = [], {}, []
while True: # n <= config_n
@@ -325,9 +378,9 @@ class Completion:
for i in range(prev_data_limit, data_limit):
logger.debug(f"num_completions={num_completions}, data instance={i}")
data_i = data[i]
params = cls._construct_params(data_i, params, prompt, messages)
response = cls._get_response(params, eval_only)
if response == -1: # rate limit error, treat as invalid
# params = cls._construct_params(data_i, params, prompt, messages)
response = cls.create(data_i, raise_error=eval_only, **params)
if response == -1: # rate limit/timeout error, treat as invalid
cls._update_invalid_n(prune, region_key, max_tokens, num_completions)
result[metric] = 0
result["cost"] = cost
@@ -340,7 +393,7 @@ class Completion:
if not cls.avg_input_tokens and not input_tokens[i]:
# store the # input tokens
input_tokens[i] = n_input_tokens
query_cost = (price_input * n_input_tokens + price_output * n_output_tokens) / 1000
query_cost = response["cost"]
cls._total_cost += query_cost
cost += query_cost
if cls.optimization_budget and cls._total_cost >= cls.optimization_budget and not eval_only:
@@ -431,15 +484,15 @@ class Completion:
@classmethod
def tune(
cls,
data,
metric,
mode,
eval_func,
log_file_name=None,
inference_budget=None,
optimization_budget=None,
num_samples=1,
logging_level=logging.WARNING,
data: List[Dict],
metric: str,
mode: str,
eval_func: Callable,
log_file_name: Optional[str] = None,
inference_budget: Optional[float] = None,
optimization_budget: Optional[float] = None,
num_samples: Optional[int] = 1,
logging_level: Optional[int] = logging.WARNING,
**config,
):
"""Tune the parameters for the OpenAI API call.
@@ -539,6 +592,11 @@ class Completion:
if not (isinstance(cls._stops, list) and isinstance(cls._stops[0], list)):
cls._stops = [cls._stops]
space["stop"] = tune.choice(list(range(len(cls._stops))))
cls._config_list = space.get("config_list")
if cls._config_list is not None:
is_const = is_constant(cls._config_list)
if is_const:
space.pop("config_list")
cls._metric, cls._mode = metric, mode
cls._total_cost = 0 # total optimization cost
cls._eval_func = eval_func
@@ -605,31 +663,74 @@ class Completion:
verbose=3,
)
config = analysis.best_config
params = cls._pop_subspace(config)
if cls._prompts:
params["prompt"] = cls._prompts[config["prompt"]]
else:
params["messages"] = cls._messages[config["messages"]]
stop = cls._stops and cls._stops[config["stop"]]
params["stop"] = stop
temperature_or_top_p = params.pop("temperature_or_top_p", None)
if temperature_or_top_p:
params.update(temperature_or_top_p)
params = cls._get_params_for_create(config)
if cls._config_list is not None and is_const:
params.pop("config_list")
logger.setLevel(old_level)
return params, analysis
@classmethod
def create(cls, context: Optional[Dict] = None, use_cache: Optional[bool] = True, **config):
def create(
cls,
context: Optional[Dict] = None,
use_cache: Optional[bool] = True,
config_list: Optional[List[Dict]] = None,
filter_func: Optional[Callable[[Dict, Dict, Dict], bool]] = None,
raise_error: Optional[bool] = True,
**config,
):
"""Make a completion for a given context.
Args:
context (dict, Optional): The context to instantiate the prompt.
It needs to contain keys that are used by the prompt template.
E.g., `prompt="Complete the following sentence: {prefix}"`.
`context={"prefix": "Today I feel"}`.
The actual prompt sent to OpenAI will be:
context (Dict, Optional): The context to instantiate the prompt.
It needs to contain keys that are used by the prompt template or the filter function.
E.g., `prompt="Complete the following sentence: {prefix}, context={"prefix": "Today I feel"}`.
The actual prompt will be:
"Complete the following sentence: Today I feel".
More examples can be found at [templating](/docs/Use-Cases/Auto-Generation#templating).
use_cache (bool, Optional): Whether to use cached responses.
config_list (List, Optional): List of configurations for the completion to try.
The first one that does not raise an error will be used.
Only the differences from the default config need to be provided.
E.g.,
```python
response = oai.Completion.create(
config_list=[
{
"model": "gpt-4",
"api_key": os.environ.get("AZURE_OPENAI_API_KEY"),
"api_type": "azure",
"api_base": os.environ.get("AZURE_OPENAI_API_BASE"),
"api_version": "2023-03-15-preview",
},
{
"model": "gpt-3.5-turbo",
"api_key": os.environ.get("OPENAI_API_KEY"),
"api_type": "open_ai",
"api_base": "https://api.openai.com/v1",
},
{
"model": "llama-7B",
"api_base": "http://127.0.0.1:8080",
"api_type": "open_ai",
}
],
prompt="Hi",
)
```
filter_func (Callable, Optional): A function that takes in the context, the config and the response and returns a boolean to indicate whether the response is valid. E.g.,
```python
def yes_or_no_filter(context, config, response):
return context.get("yes_or_no_choice", False) is False or any(
text in ["Yes.", "No."] for text in oai.Completion.extract_text(response)
)
```
raise_error (bool, Optional): Whether to raise error when all configs fail.
When set to False, -1 will be returned when all configs fail.
**config: Configuration for the completion.
Besides the parameters for the openai API call, it can also contain a seed (int) for the cache.
This is useful when implementing "controlled randomness" for the completion.
@@ -640,15 +741,41 @@ class Completion:
"""
if ERROR:
raise ERROR
if config_list:
retry_timeout = cls.retry_timeout
last = len(config_list) - 1
cost = 0
for i, each_config in enumerate(config_list):
base_config = config.copy()
base_config.update(each_config)
try:
cls.retry_timeout = 0 if i < last and filter_func is None else retry_timeout
# retry_timeout = 0 to avoid retrying when no filter is given
response = cls.create(context, use_cache, **base_config)
pass_filter = filter_func is None or filter_func(
context=context, base_config=config, response=response
)
if pass_filter or i == last:
response["cost"] = cost + response["cost"]
response["config_id"] = i
response["pass_filter"] = pass_filter
return response
cost += response["cost"]
except (AuthenticationError, RateLimitError, Timeout):
logger.debug(f"failed with config {i}", exc_info=1)
if i == last:
raise
finally:
cls.retry_timeout = retry_timeout
params = cls._construct_params(context, config)
if not use_cache:
return cls._get_response(params, eval_only=True, use_cache=False)
return cls._get_response(params, raise_error=raise_error, use_cache=False)
seed = cls.seed
if "seed" in params:
cls.set_cache(params.pop("seed"))
with diskcache.Cache(cls.cache_path) as cls._cache:
cls.set_cache(seed)
return cls._get_response(params, eval_only=True)
return cls._get_response(params, raise_error=raise_error)
@classmethod
def _instantiate(cls, template: str, context: Optional[Dict] = None):
@@ -666,7 +793,7 @@ class Completion:
messages = config.get("messages") if messages is None else messages
# either "prompt" should be in config (for being compatible with non-chat models)
# or "messages" should be in config (for tuning chat models only)
if prompt is None and model in cls.chat_models:
if prompt is None and (model in cls.chat_models or issubclass(cls, ChatCompletion)):
if messages is None:
raise ValueError("Either prompt or messages should be in config for chat models.")
if prompt is None:
@@ -681,7 +808,7 @@ class Completion:
if data_instance
else messages
)
elif model in cls.chat_models:
elif model in cls.chat_models or issubclass(cls, ChatCompletion):
# convert prompt to messages
params["messages"] = [
{
@@ -698,18 +825,17 @@ class Completion:
def test(
cls,
data,
config,
eval_func=None,
use_cache=True,
agg_method="avg",
return_responses_and_per_instance_result=False,
logging_level=logging.WARNING,
**config,
):
"""Evaluate the responses created with the config for the OpenAI API call.
Args:
data (list): The list of test data points.
config (dict): Hyperparameter setting for the openai api call.
eval_func (Callable): The evaluation function for responses per data instance.
The function should take a list of responses and a data point as input,
and return a dict of metrics. You need to either provide a valid callable
@@ -755,6 +881,7 @@ class Completion:
return_responses_and_per_instance_result (bool): Whether to also return responses
and per instance results in addition to the aggregated results.
logging_level (optional): logging level. Defaults to logging.WARNING.
**config (dict): parametes passed to the openai api call `create()`.
Returns:
None when no valid eval_func is provided in either test or tune;
@@ -764,13 +891,12 @@ class Completion:
result_agg, responses_list, result_list = {}, [], []
metric_keys = None
cost = 0
model = config["model"]
old_level = logger.getEffectiveLevel()
logger.setLevel(logging_level)
for i, data_i in enumerate(data):
logger.info(f"evaluating data instance {i}")
response = cls.create(data_i, use_cache, **config)
cost += cls.cost(model, response)
cost += response["cost"]
# evaluate the quality of the responses
responses = cls.extract_text(response)
if eval_func is not None:
@@ -829,18 +955,19 @@ class Completion:
return result_agg
@classmethod
def cost(cls, model: str, response: dict):
def cost(cls, response: dict):
"""Compute the cost of an API call.
Args:
model (str): The model name.
response (dict): The response from OpenAI API.
Returns:
The cost in USD.
The cost in USD. 0 if the model is not supported.
"""
model = response["model"]
if model not in cls.price1K:
raise ValueError(f"Unknown model: {model}")
return 0
# raise ValueError(f"Unknown model: {model}")
usage = response["usage"]
n_input_tokens = usage["prompt_tokens"]
n_output_tokens = usage.get("completion_tokens", 0)
@@ -864,6 +991,68 @@ class Completion:
return [choice["text"] for choice in choices]
return [choice["message"].get("content", "") for choice in choices]
@classmethod
@property
def logged_history(cls) -> Dict:
"""Return the book keeping dictionary."""
return cls._history_dict
@classmethod
def start_logging(
cls, history_dict: Optional[Dict] = None, compact: Optional[bool] = True, reset_counter: Optional[bool] = True
):
"""Start book keeping.
Args:
history_dict (Dict): A dictionary for book keeping.
If no provided, a new one will be created.
compact (bool): Whether to keep the history dictionary compact.
Compact history contains one key per conversation, and the value is a dictionary
like:
```python
{
"create_at": [0, 1],
"cost": [0.1, 0.2],
}
```
where "created_at" is the index of API calls indicating the order of all the calls,
and "cost" is the cost of each call. This example shows that the conversation is based
on two API calls. The compact format is useful for condensing the history of a conversation.
If compact is False, the history dictionary will contain all the API calls: the key
is the index of the API call, and the value is a dictionary like:
```python
{
"request": request_dict,
"response": response_dict,
}
```
where request_dict is the request sent to OpenAI API, and response_dict is the response.
For a conversation containing two API calls, the non-compact history dictionary will be like:
```python
{
0: {
"request": request_dict_0,
"response": response_dict_0,
},
1: {
"request": request_dict_1,
"response": response_dict_1,
},
```
The first request's messages plus the response is equal to the second request's messages.
For a conversation with many turns, the non-compact history dictionary has a quadratic size
while the compact history dict has a linear size.
reset_counter (bool): whether to reset the counter of the number of API calls.
"""
cls._history_dict = {} if history_dict is None else history_dict
cls._history_compact = compact
cls._count_create = 0 if reset_counter or cls._count_create is None else cls._count_create
@classmethod
def stop_logging(cls):
"""End book keeping."""
cls._history_dict = cls._count_create = None
class ChatCompletion(Completion):
"""A class for OpenAI API ChatCompletion."""

View File

@@ -0,0 +1,142 @@
import os
import json
from typing import List, Optional, Dict
import logging
NON_CACHE_KEY = ["api_key", "api_base", "api_type", "api_version"]
def get_key(config):
"""Get a unique identifier of a configuration.
Args:
config (dict or list): A configuration.
Returns:
tuple: A unique identifier which can be used as a key for a dict.
"""
copied = False
for key in NON_CACHE_KEY:
if key in config:
config, copied = config.copy() if not copied else config, True
config.pop(key)
# if isinstance(config, dict):
# return tuple(get_key(x) for x in sorted(config.items()))
# if isinstance(config, list):
# return tuple(get_key(x) for x in config)
# return config
return json.dumps(config, sort_keys=True)
def get_config_list(
api_keys: List, api_bases: Optional[List] = None, api_type: Optional[str] = None, api_version: Optional[str] = None
) -> List[Dict]:
"""Get a list of configs for openai api calls.
Args:
api_keys (list): The api keys for openai api calls.
api_bases (list, optional): The api bases for openai api calls.
api_type (str, optional): The api type for openai api calls.
api_version (str, optional): The api version for openai api calls.
"""
config_list = []
for i, api_key in enumerate(api_keys):
if not api_key.strip():
continue
config = {"api_key": api_key}
if api_bases:
config["api_base"] = api_bases[i]
if api_type:
config["api_type"] = api_type
if api_version:
config["api_version"] = api_version
config_list.append(config)
return config_list
def config_list_openai_aoai(
key_file_path: Optional[str] = ".",
openai_api_key_file: Optional[str] = "key_openai.txt",
aoai_api_key_file: Optional[str] = "key_aoai.txt",
aoai_api_base_file: Optional[str] = "base_aoai.txt",
) -> List[Dict]:
"""Get a list of configs for openai + azure openai api calls.
Args:
key_file_path (str, optional): The path to the key files.
openai_api_key_file (str, optional): The file name of the openai api key.
aoai_api_key_file (str, optional): The file name of the azure openai api key.
aoai_api_base_file (str, optional): The file name of the azure openai api base.
Returns:
list: A list of configs for openai api calls.
"""
if "OPENAI_API_KEY" not in os.environ:
try:
os.environ["OPENAI_API_KEY"] = open(f"{key_file_path}/{openai_api_key_file}").read().strip()
except FileNotFoundError:
logging.info(
"To use OpenAI API, please set OPENAI_API_KEY in os.environ "
"or create key_openai.txt in the specified path, or specify the api_key in config_list."
)
if "AZURE_OPENAI_API_KEY" not in os.environ:
try:
os.environ["AZURE_OPENAI_API_KEY"] = open(f"{key_file_path}/{aoai_api_key_file}").read().strip()
except FileNotFoundError:
logging.info(
"To use Azure OpenAI API, please set AZURE_OPENAI_API_KEY in os.environ "
"or create key_aoai.txt in the specified path, or specify the api_key in config_list."
)
if "AZURE_OPENAI_API_BASE" not in os.environ:
try:
os.environ["AZURE_OPENAI_API_BASE"] = open(f"{key_file_path}/{aoai_api_base_file}").read().strip()
except FileNotFoundError:
logging.info(
"To use Azure OpenAI API, please set AZURE_OPENAI_API_BASE in os.environ "
"or create base_aoai.txt in the specified path, or specify the api_base in config_list."
)
aoai_config = get_config_list(
# Assuming Azure OpenAI api keys in os.environ["AZURE_OPENAI_API_KEY"], in separated lines
api_keys=os.environ.get("AZURE_OPENAI_API_KEY", "").split("\n"),
# Assuming Azure OpenAI api bases in os.environ["AZURE_OPENAI_API_BASE"], in separated lines
api_bases=os.environ.get("AZURE_OPENAI_API_BASE", "").split("\n"),
api_type="azure",
api_version="2023-03-15-preview", # change if necessary
)
openai_config = get_config_list(
# Assuming OpenAI API_KEY in os.environ["OPENAI_API_KEY"]
api_keys=os.environ.get("OPENAI_API_KEY", "").split("\n"),
# "api_type": "open_ai",
# "api_base": "https://api.openai.com/v1",
)
config_list = openai_config + aoai_config
return config_list
def config_list_gpt4_gpt35(
key_file_path: Optional[str] = ".",
openai_api_key_file: Optional[str] = "key_openai.txt",
aoai_api_key_file: Optional[str] = "key_aoai.txt",
aoai_api_base_file: Optional[str] = "base_aoai.txt",
) -> List[Dict]:
"""Get a list of configs for gpt-4 followed by gpt-3.5 api calls.
Args:
key_file_path (str, optional): The path to the key files.
openai_api_key_file (str, optional): The file name of the openai api key.
aoai_api_key_file (str, optional): The file name of the azure openai api key.
aoai_api_base_file (str, optional): The file name of the azure openai api base.
Returns:
list: A list of configs for openai api calls.
"""
config_list = config_list_openai_aoai(
key_file_path,
openai_api_key_file,
aoai_api_key_file,
aoai_api_base_file,
)
return [{**config, "model": "gpt-4"} for config in config_list] + [
{**config, "model": "gpt-3.5-turbo"} for config in config_list
]

View File

@@ -341,6 +341,9 @@ class AutoML(BaseEstimator):
}
}
```
mlflow_logging: boolean, default=True | Whether to log the training results to mlflow.
This requires mlflow to be installed and to have an active mlflow run.
FLAML will create nested runs.
"""
self._track_iter = 0
@@ -390,6 +393,7 @@ class AutoML(BaseEstimator):
settings["fit_kwargs_by_estimator"] = settings.get("fit_kwargs_by_estimator", {})
settings["custom_hp"] = settings.get("custom_hp", {})
settings["skip_transform"] = settings.get("skip_transform", False)
settings["mlflow_logging"] = settings.get("mlflow_logging", True)
self._estimator_type = "classifier" if settings["task"] in CLASSIFICATION else "regressor"
@@ -1213,6 +1217,7 @@ class AutoML(BaseEstimator):
custom_hp=None,
cv_score_agg_func=None,
skip_transform=None,
mlflow_logging=None,
fit_kwargs_by_estimator=None,
**fit_kwargs,
):
@@ -1474,6 +1479,11 @@ class AutoML(BaseEstimator):
```
skip_transform: boolean, default=False | Whether to pre-process data prior to modeling.
mlflow_logging: boolean, default=None | Whether to log the training results to mlflow.
Default value is None, which means the logging decision is made based on
AutoML.__init__'s mlflow_logging argument.
This requires mlflow to be installed and to have an active mlflow run.
FLAML will create nested runs.
fit_kwargs_by_estimator: dict, default=None | The user specified keywords arguments, grouped by estimator name.
For TransformersEstimator, available fit_kwargs can be found from
[TrainingArgumentsForAuto](nlp/huggingface/training_args).
@@ -1659,6 +1669,7 @@ class AutoML(BaseEstimator):
self._state.fit_kwargs = fit_kwargs
custom_hp = custom_hp or self._settings.get("custom_hp")
self._skip_transform = self._settings.get("skip_transform") if skip_transform is None else skip_transform
self._mlflow_logging = self._settings.get("mlflow_logging") if mlflow_logging is None else mlflow_logging
fit_kwargs_by_estimator = fit_kwargs_by_estimator or self._settings.get("fit_kwargs_by_estimator")
self._state.fit_kwargs_by_estimator = fit_kwargs_by_estimator.copy() # shallow copy of fit_kwargs_by_estimator
self._state.weight_val = sample_weight_val
@@ -2139,7 +2150,7 @@ class AutoML(BaseEstimator):
estimator,
search_state.sample_size,
)
if mlflow is not None and mlflow.active_run():
if self._mlflow_logging and mlflow is not None and mlflow.active_run():
with mlflow.start_run(nested=True):
mlflow.log_metric("iter_counter", self._track_iter)
if (search_state.metric_for_logging is not None) and (

View File

@@ -1135,9 +1135,8 @@ class TransformersEstimator(BaseEstimator):
predictions = new_trainer.predict(test_dataset).predictions
except ZeroDivisionError:
logger.warning("Zero division error appeared in HuggingFace Transformers.")
predictions = np.array([-0.05] * len(test_dataset))
else:
return predictions
predictions = None
return predictions
def score(self, X_val: DataFrame, y_val: Series, **kwargs):
import transformers
@@ -1169,14 +1168,13 @@ class TransformersEstimator(BaseEstimator):
kwargs = {} if self._task not in NLG_TASKS else {"metric_key_prefix": "predict"}
try:
predictions = new_trainer.predict(test_dataset, **kwargs)
predictions = new_trainer.predict(test_dataset, **kwargs).predictions
except ZeroDivisionError:
logger.warning("Zero division error appeared in HuggingFace Transformers.")
predictions = np.array([0] * len(test_dataset))
predictions = None
post_y_pred, _ = postprocess_prediction_and_true(
task=self._task,
y_pred=predictions.predictions,
y_pred=predictions,
tokenizer=self.tokenizer,
hf_args=self._training_args,
X=X,
@@ -2326,10 +2324,7 @@ class HoltWinters(ARIMA):
if self.params["trend"] == "mul" and (train_df.y == 0).sum() > 0:
self.params["trend"] = "add"
if not self.params["seasonal"] or not self.params["trend"] in [
"mul",
"add",
]:
if not self.params["seasonal"] or self.params["trend"] not in ["mul", "add"]:
self.params["damped_trend"] = False
model = HWExponentialSmoothing(

View File

@@ -311,6 +311,8 @@ def tokenize_swag(this_row, tokenizer, hf_args=None, return_column_name=False):
def postprocess_prediction_and_true(task, y_pred, tokenizer, hf_args, y_true=None, X=None):
# postprocess the matrix prediction y_pred and ground truth y_true into user readable format, e.g., for summarization, decode into text
if y_pred is None:
return np.array([0.0] * len(X)), y_true
if task == SEQCLASSIFICATION:
return np.argmax(y_pred, axis=1), y_true
elif task == SEQREGRESSION:

View File

@@ -11,7 +11,7 @@ try:
except (ImportError, AssertionError):
from . import sample
from .searcher.variant_generator import generate_variants
from typing import Dict, Optional, Any, Tuple, Generator
from typing import Dict, Optional, Any, Tuple, Generator, List, Union
import numpy as np
import logging
@@ -27,6 +27,29 @@ def generate_variants_compatible(
return generate_variants(unresolved_spec, constant_grid_search)
def is_constant(space: Union[Dict, List]) -> bool:
"""Whether the search space is all constant.
Returns:
A bool of whether the search space is all constant.
"""
if isinstance(space, dict):
for domain in space.values():
if isinstance(domain, (dict, list)):
if not is_constant(domain):
return False
continue
if isinstance(domain, sample.Domain):
return False
return True
elif isinstance(space, list):
for item in space:
if not is_constant(item):
return False
return True
return not isinstance(space, sample.Domain)
def define_by_run_func(trial, space: Dict, path: str = "") -> Optional[Dict[str, Any]]:
"""Define-by-run function to create the search space.

View File

@@ -647,14 +647,10 @@ def run(
time_start = time.time()
try:
FLAML_MAX_CONCURRENT = int(os.getenv("FLAML_MAX_CONCURRENT", 0))
num_executors = max(num_executors, FLAML_MAX_CONCURRENT, 1)
except ValueError:
FLAML_MAX_CONCURRENT = 0
max_spark_parallelism = (
min(spark.sparkContext.defaultParallelism, FLAML_MAX_CONCURRENT)
if FLAML_MAX_CONCURRENT > 0
else spark.sparkContext.defaultParallelism
)
num_executors = max(num_executors, FLAML_MAX_CONCURRENT, 1)
max_spark_parallelism = max(spark.sparkContext.defaultParallelism, FLAML_MAX_CONCURRENT)
if scheduler:
scheduler.set_search_properties(metric=metric, mode=mode)
if isinstance(search_alg, ConcurrencyLimiter):

View File

@@ -1 +1 @@
__version__ = "1.2.1"
__version__ = "1.2.4"

File diff suppressed because it is too large Load Diff

File diff suppressed because it is too large Load Diff

View File

@@ -21,9 +21,9 @@
"\n",
"## Requirements\n",
"\n",
"FLAML requires `Python>=3.7`. To run this notebook example, please install flaml with the [openai] option:\n",
"FLAML requires `Python>=3.7`. To run this notebook example, please install flaml with the [autogen,blendsearch] option:\n",
"```bash\n",
"pip install flaml[openai]==1.2.0\n",
"pip install flaml[autogen,blendsearch]==1.2.2\n",
"```"
]
},
@@ -40,7 +40,7 @@
},
"outputs": [],
"source": [
"# %pip install flaml[openai]==1.2.0 datasets"
"# %pip install flaml[autogen,blendsearch]==1.2.2 datasets"
]
},
{
@@ -297,7 +297,13 @@
"from functools import partial\n",
"from flaml.autogen.code_utils import eval_function_completions, generate_assertions\n",
"\n",
"eval_with_generated_assertions = partial(eval_function_completions, assertions=generate_assertions)"
"eval_with_generated_assertions = partial(\n",
" eval_function_completions,\n",
" assertions=generate_assertions,\n",
" use_docker=False,\n",
" # Please set use_docker=True if you have docker available to run the generated code.\n",
" # Using docker is safer than running the generated code directly.\n",
")\n"
]
},
{
@@ -751,7 +757,7 @@
}
],
"source": [
"# result = oai.Completion.test(test_data, config)\n",
"# result = oai.Completion.test(test_data, **config)\n",
"# print(\"performance on test data with the tuned config:\", result)"
]
},

File diff suppressed because one or more lines are too long

View File

@@ -19,9 +19,9 @@
"\n",
"## Requirements\n",
"\n",
"FLAML requires `Python>=3.7`. To run this notebook example, please install flaml with the [openai] option:\n",
"FLAML requires `Python>=3.7`. To run this notebook example, please install flaml with the [autogen] option:\n",
"```bash\n",
"pip install flaml[openai]==1.2.0\n",
"pip install flaml[autogen]==1.2.2\n",
"```"
]
},
@@ -38,7 +38,7 @@
},
"outputs": [],
"source": [
"# %pip install flaml[openai]==1.2.0 datasets"
"# %pip install flaml[autogen]==1.2.2 datasets"
]
},
{
@@ -381,7 +381,7 @@
"success = 0\n",
"for i, d in enumerate(data):\n",
" response, cost_i, j = implement(d[\"definition\"], configs)\n",
" metrics = eval_function_completions(responses=[response], **d)\n",
" metrics = eval_function_completions(responses=[response], use_docker=False, **d)\n",
" success += metrics[\"success\"]\n",
" cost += cost_i\n",
" print(f\"Example {i}, config {j}, success {success}\")\n",

View File

@@ -21,7 +21,7 @@
"\n",
"FLAML requires `Python>=3.7`. To run this notebook example, please install flaml with the [openai] option:\n",
"```bash\n",
"pip install flaml[openai]==1.2.0\n",
"pip install flaml[openai]==1.2.2\n",
"```"
]
},
@@ -38,7 +38,7 @@
},
"outputs": [],
"source": [
"# %pip install flaml[openai]==1.2.0 datasets"
"# %pip install flaml[openai]==1.2.2 datasets"
]
},
{
@@ -313,7 +313,7 @@
"source": [
"### Evaluate the success rate on the test data\n",
"\n",
"You can use flaml's `oai.ChatCompletion.test` to evaluate the performance of an entire dataset with the tuned config."
"You can use flaml's `oai.ChatCompletion.test` to evaluate the performance of an entire dataset with a config."
]
},
{
@@ -325,7 +325,7 @@
"import logging\n",
"\n",
"config_n1 = {\"model\": 'gpt-4', \"prompt\": prompt, \"max_tokens\": 600, \"n\": 1}\n",
"n1_result = oai.ChatCompletion.test(test_data[:50], config_n1, eval_math_responses)\n",
"n1_result = oai.ChatCompletion.test(test_data[:50], eval_math_responses, **config_n1)\n",
"print(n1_result)"
]
},
@@ -336,8 +336,8 @@
"outputs": [],
"source": [
"oai.ChatCompletion.request_timeout = 120\n",
"config_n10 = {\"model\": 'gpt-4', \"prompt\": prompts[0], \"max_tokens\": 600, \"n\": 10}\n",
"n10_result = oai.ChatCompletion.test(test_data[:50], config_n10, eval_math_responses, logging_level=logging.INFO)\n",
"config_n10 = {\"model\": 'gpt-4', \"prompt\": prompt, \"max_tokens\": 600, \"n\": 10}\n",
"n10_result = oai.ChatCompletion.test(test_data[:50], eval_math_responses, logging_level=logging.INFO, **config_n10)\n",
"print(n10_result)"
]
},
@@ -347,8 +347,8 @@
"metadata": {},
"outputs": [],
"source": [
"config_n30 = {\"model\": 'gpt-4', \"prompt\": prompts[0], \"max_tokens\": 600, \"n\": 30}\n",
"n30_result = oai.ChatCompletion.test(test_data[:50], config_n30, eval_math_responses, logging_level=logging.INFO)\n",
"config_n30 = {\"model\": 'gpt-4', \"prompt\": prompt, \"max_tokens\": 600, \"n\": 30}\n",
"n30_result = oai.ChatCompletion.test(test_data[:50], eval_math_responses, logging_level=logging.INFO, **config_n30)\n",
"print(n30_result)"
]
},

51
pyproject.toml Normal file
View File

@@ -0,0 +1,51 @@
[metadata]
license_file = "LICENSE"
description-file = "README.md"
[tool.pytest.ini_options]
addopts = '-m "not conda"'
markers = [
"conda: test related to conda forge distribution"
]
[tool.black]
# https://github.com/psf/black
line-length = 120
exclude = "(.eggs|.git|.hg|.mypy_cache|.venv|_build|buck-out|build|dist)"
[tool.ruff]
line-length = 120
# Enable Pyflakes `E` and `F` codes by default.
select = [
"E", "W", # see: https://pypi.org/project/pycodestyle
"F", # see: https://pypi.org/project/pyflakes
# "D", # see: https://pypi.org/project/pydocstyle
# "N", # see: https://pypi.org/project/pep8-naming
# "S", # see: https://pypi.org/project/flake8-bandit
]
ignore = [
"E501",
"F401",
"F403",
"C901",
]
# Exclude a variety of commonly ignored directories.
exclude = [
".eggs",
".git",
".mypy_cache",
".ruff_cache",
"__pypackages__",
"_build",
"build",
"dist",
"docs"
]
ignore-init-module-imports = true
unfixable = ["F401"]
[tool.ruff.mccabe]
# Unlike Flake8, default to a complexity level of 10.
max-complexity = 10

View File

@@ -1,4 +0,0 @@
[pytest]
addopts = -m "not conda"
markers =
conda: test related to conda forge distribution

View File

@@ -49,14 +49,13 @@ setuptools.setup(
"joblibspark>=0.5.0",
],
"test": [
"flake8>=3.8.4",
"thop",
"pytest>=6.1.1",
"coverage>=5.3",
"pre-commit",
"torch",
"torchvision",
"catboost>=0.26",
"catboost>=0.26,<1.2",
"rgf-python",
"optuna==2.8.0",
"openml==0.10.2",
@@ -77,6 +76,7 @@ setuptools.setup(
"nbformat",
"ipykernel",
"pytorch-lightning<1.9.1", # test_forecast_panel
"requests<2.29.0", # https://github.com/docker/docker-py/issues/3113
],
"catboost": ["catboost>=0.26"],
"blendsearch": ["optuna==2.8.0"],
@@ -120,7 +120,8 @@ setuptools.setup(
"pytorch-forecasting>=0.9.0",
],
"benchmark": ["catboost>=0.26", "psutil==5.8.0", "xgboost==1.3.3"],
"openai": ["openai==0.27.4", "diskcache", "optuna==2.8.0"],
"openai": ["openai==0.27.4", "diskcache"],
"autogen": ["openai==0.27.4", "diskcache", "docker"],
"synapse": ["joblibspark>=0.5.0", "optuna==2.8.0", "pyspark>=3.2.0"],
},
classifiers=[

View File

View File

@@ -0,0 +1,77 @@
"""Solve a non-symmetric TSP problem.
Triangular inequality is not required in this problem.
"""
import math
import pdb
import random
import sys
from itertools import combinations, permutations
def solve_tsp(dists: dict) -> float:
"""Solve the TSP problem
Args:
dists (dict): the distance matrix between each nodes. Each item in the
dict is a pair (node A, node B) to the distance from A to B.
Returns:
float: the optimal cost
"""
# Get the unique nodes from the distance matrix
nodes = set()
for pair in dists.keys():
nodes.add(pair[0])
nodes.add(pair[1])
# Generate all possible routes (permutations of nodes)
routes = permutations(nodes)
# Initialize the optimal cost as infinite
optimal_cost = float("inf")
optimal_route = None
# Iterate through all possible routes
for route in routes:
cost = 0
# Calculate the cost of the current route
for i in range(len(route)):
current_node = route[i]
next_node = route[(i + 1) % len(route)]
cost += dists[(current_node, next_node)]
# Update the optimal cost if the current cost is smaller
if cost < optimal_cost:
optimal_cost = cost
optimal_route = route
print("Cost:", optimal_cost, "with route", optimal_route)
return optimal_cost
def tsp_data(n: int, seed: int = 2022) -> dict:
"""Generate some sample data for the non-symmetric TSP problem.
Args:
n (int): number of nodes in the problem
seed (int): the random seed.
Returns:
dict: the pairwise distance matrix.
"""
# Initialize the random seed
random.seed(seed)
# Initialize the distance matrix
dist_matrix = {}
# Generate distances for each pair of nodes
for i in range(n):
for j in range(n):
if i != j:
# Generate a random distance between nodes i and j
distance = round(random.uniform(1, 100), 2)
dist_matrix[(i, j)] = distance
return dist_matrix

View File

@@ -0,0 +1,35 @@
from .tsp import tsp_data
def change_dist(dist: dict, i: int, j: int, new_cost: float) -> float:
"""Change the distance between two points.
Args:
dist (dict): distance matrix, where the key is a pair and value is
the cost (aka, distance).
i (int): the source node
j (int): the destination node
new_cost (float): the new cost for the distance
Returns:
float: the previous cost
"""
prev_cost = dist[i, j]
dist[i, j] = new_cost
return prev_cost
def compare_costs(prev_cost, new_cost) -> float:
"""Compare the previous cost and the new cost.
Args:
prev_cost (float): the previous cost
new_cost (float): the updated cost
Returns:
float: the ratio between these two costs
"""
return (new_cost - prev_cost) / prev_cost
dists = tsp_data(5, seed=1)

View File

@@ -0,0 +1,452 @@
import datasets
import sys
import numpy as np
import pytest
from functools import partial
import os
import json
from flaml import oai
from flaml.autogen.code_utils import (
eval_function_completions,
generate_assertions,
implement,
generate_code,
extract_code,
improve_function,
improve_code,
execute_code,
)
from flaml.autogen.math_utils import eval_math_responses, solve_problem
KEY_LOC = "test/autogen"
here = os.path.abspath(os.path.dirname(__file__))
def yes_or_no_filter(context, response, **_):
return context.get("yes_or_no_choice", False) is False or any(
text in ["Yes.", "No."] for text in oai.Completion.extract_text(response)
)
def valid_json_filter(response, **_):
for text in oai.Completion.extract_text(response):
try:
json.loads(text)
return True
except ValueError:
pass
return False
def test_filter():
try:
import openai
except ImportError as exc:
print(exc)
return
response = oai.Completion.create(
context={"yes_or_no_choice": True},
config_list=[{"model": "text-ada-001"}, {"model": "gpt-3.5-turbo"}, {"model": "text-davinci-003"}],
prompt="Is 37 a prime number? Please answer 'Yes.' or 'No.'",
filter_func=yes_or_no_filter,
)
assert oai.Completion.extract_text(response)[0] in ["Yes.", "No."]
response = oai.Completion.create(
context={"yes_or_no_choice": False},
config_list=[{"model": "text-ada-001"}, {"model": "gpt-3.5-turbo"}, {"model": "text-davinci-003"}],
prompt="Is 37 a prime number?",
filter_func=yes_or_no_filter,
)
assert response["model"] == "text-ada-001"
response = oai.Completion.create(
config_list=[{"model": "text-ada-001"}, {"model": "gpt-3.5-turbo"}, {"model": "text-davinci-003"}],
prompt="How to construct a json request to Bing API to search for 'latest AI news'? Return the JSON request.",
filter_func=valid_json_filter,
)
assert response["config_id"] == 2 or response["pass_filter"], "the response must pass filter unless all fail"
assert not response["pass_filter"] or json.loads(oai.Completion.extract_text(response)[0])
def test_chatcompletion():
params = oai.ChatCompletion._construct_params(
data_instance=None,
config={"model": "unknown"},
prompt="hi",
)
assert "messages" in params
params = oai.Completion._construct_params(
data_instance=None,
config={"model": "unknown"},
prompt="hi",
)
assert "messages" not in params
params = oai.Completion._construct_params(
data_instance=None,
config={"model": "gpt-4"},
prompt="hi",
)
assert "messages" in params
def test_multi_model():
try:
import openai
except ImportError as exc:
print(exc)
return
response = oai.Completion.create(
config_list=oai.config_list_gpt4_gpt35(KEY_LOC),
prompt="Hi",
)
print(response)
@pytest.mark.skipif(
sys.platform in ["darwin", "win32"],
reason="do not run on MacOS or windows",
)
def test_execute_code():
try:
import docker
except ImportError as exc:
print(exc)
return
exitcode, msg = execute_code("print('hello world')", filename="tmp/codetest.py")
assert exitcode == 0 and msg == b"hello world\n", msg
# read a file
print(execute_code("with open('tmp/codetest.py', 'r') as f: a=f.read()"))
# create a file
print(execute_code("with open('tmp/codetest.py', 'w') as f: f.write('b=1')", work_dir=f"{here}/my_tmp"))
# execute code in a file
print(execute_code(filename="tmp/codetest.py"))
# execute code for assertion error
exit_code, msg = execute_code("assert 1==2")
assert exit_code, msg
# execute code which takes a long time
exit_code, error = execute_code("import time; time.sleep(2)", timeout=1)
assert exit_code and error == "Timeout"
exit_code, error = execute_code("import time; time.sleep(2)", timeout=1, use_docker=False)
assert exit_code and error == "Timeout"
def test_improve():
try:
import openai
import diskcache
except ImportError as exc:
print(exc)
return
config_list = oai.config_list_openai_aoai(KEY_LOC)
improved, _ = improve_function(
"flaml/autogen/math_utils.py",
"solve_problem",
"Solve math problems accurately, by avoiding calculation errors and reduce reasoning errors.",
config_list=config_list,
)
with open(f"{here}/math_utils.py.improved", "w") as f:
f.write(improved)
suggestion, _ = improve_code(
["flaml/autogen/code_utils.py", "flaml/autogen/math_utils.py"],
"leverage generative AI smartly and cost-effectively",
config_list=config_list,
)
print(suggestion)
improvement, cost = improve_code(
["flaml/autogen/code_utils.py", "flaml/autogen/math_utils.py"],
"leverage generative AI smartly and cost-effectively",
suggest_only=False,
config_list=config_list,
)
print(cost)
with open(f"{here}/suggested_improvement.txt", "w") as f:
f.write(improvement)
def test_nocontext():
try:
import openai
import diskcache
except ImportError as exc:
print(exc)
return
response = oai.Completion.create(
model="text-ada-001", prompt="1+1=", max_tokens=1, use_cache=False, request_timeout=10
)
print(response)
code, _ = generate_code(
model="gpt-3.5-turbo",
messages=[
{
"role": "system",
"content": "You want to become a better assistant by learning new skills and improving your existing ones.",
},
{
"role": "user",
"content": "Write reusable code to use web scraping to get information from websites.",
},
],
)
print(code)
# test extract_code from markdown
code, _ = extract_code(
"""
Example:
```
print("hello extract code")
```
"""
)
print(code)
code, _ = extract_code(
"""
Example:
```python
def scrape(url):
import requests
from bs4 import BeautifulSoup
response = requests.get(url)
soup = BeautifulSoup(response.text, "html.parser")
title = soup.find("title").text
text = soup.find("div", {"id": "bodyContent"}).text
return title, text
```
Test:
```python
url = "https://en.wikipedia.org/wiki/Web_scraping"
title, text = scrape(url)
print(f"Title: {title}")
print(f"Text: {text}")
"""
)
print(code)
solution, cost = solve_problem("1+1=", config_list=oai.config_list_gpt4_gpt35(KEY_LOC))
print(solution, cost)
@pytest.mark.skipif(
sys.platform == "win32",
reason="do not run on windows",
)
def test_humaneval(num_samples=1):
eval_with_generated_assertions = partial(eval_function_completions, assertions=generate_assertions)
seed = 41
data = datasets.load_dataset("openai_humaneval")["test"].shuffle(seed=seed)
n_tune_data = 20
tune_data = [
{
"definition": data[x]["prompt"],
"test": data[x]["test"],
"entry_point": data[x]["entry_point"],
}
for x in range(n_tune_data)
]
test_data = [
{
"definition": data[x]["prompt"],
"test": data[x]["test"],
"entry_point": data[x]["entry_point"],
}
for x in range(n_tune_data, len(data))
]
oai.Completion.clear_cache(cache_path_root="{here}/cache")
oai.Completion.set_cache(seed)
try:
import openai
import diskcache
except ImportError as exc:
print(exc)
return
oai.Completion.clear_cache(400)
# a minimal tuning example
config, _ = oai.Completion.tune(
data=tune_data,
metric="success",
mode="max",
eval_func=eval_function_completions,
n=1,
prompt="{definition}",
)
responses = oai.Completion.create(context=test_data[0], **config)
# a minimal tuning example for tuning chat completion models using the Completion class
config, _ = oai.Completion.tune(
data=tune_data,
metric="succeed_assertions",
mode="max",
eval_func=eval_with_generated_assertions,
n=1,
model="gpt-3.5-turbo",
prompt="{definition}",
)
responses = oai.Completion.create(context=test_data[0], **config)
# a minimal tuning example for tuning chat completion models using the ChatCompletion class
config_list = oai.config_list_openai_aoai(KEY_LOC)
config, _ = oai.ChatCompletion.tune(
data=tune_data,
metric="expected_success",
mode="max",
eval_func=eval_function_completions,
n=1,
messages=[{"role": "user", "content": "{definition}"}],
config_list=config_list,
)
responses = oai.ChatCompletion.create(context=test_data[0], config_list=config_list, **config)
print(responses)
code, cost, selected = implement(tune_data[1], [{**config_list[-1], **config}])
print(code)
print(cost)
assert selected == 0
print(eval_function_completions([code], **tune_data[1]))
# a more comprehensive tuning example
config2, analysis = oai.Completion.tune(
data=tune_data,
metric="success",
mode="max",
eval_func=eval_with_generated_assertions,
log_file_name="logs/humaneval.log",
inference_budget=0.002,
optimization_budget=2,
num_samples=num_samples,
# logging_level=logging.INFO,
prompt=[
"{definition}",
"# Python 3{definition}",
"Complete the following Python function:{definition}",
],
stop=[["\nclass", "\ndef", "\nif", "\nprint"], None], # the stop sequences
)
print(config2)
print(analysis.best_result)
print(test_data[0])
responses = oai.Completion.create(context=test_data[0], **config2)
print(responses)
oai.Completion.data = test_data[:num_samples]
result = oai.Completion._eval(analysis.best_config, prune=False, eval_only=True)
print("result without pruning", result)
result = oai.Completion.test(test_data[:num_samples], **config2)
print(result)
code, cost, selected = implement(tune_data[1], [config2, config])
print(code)
print(cost)
print(selected)
print(eval_function_completions([code], **tune_data[1]))
def test_math(num_samples=-1):
try:
import openai
import diskcache
except ImportError as exc:
print(exc)
return
seed = 41
data = datasets.load_dataset("competition_math")
train_data = data["train"].shuffle(seed=seed)
test_data = data["test"].shuffle(seed=seed)
n_tune_data = 20
tune_data = [
{
"problem": train_data[x]["problem"],
"solution": train_data[x]["solution"],
}
for x in range(len(train_data))
if train_data[x]["level"] == "Level 1"
][:n_tune_data]
test_data = [
{
"problem": test_data[x]["problem"],
"solution": test_data[x]["solution"],
}
for x in range(len(test_data))
if test_data[x]["level"] == "Level 1"
]
print(
"max tokens in tuning data's canonical solutions",
max([len(x["solution"].split()) for x in tune_data]),
)
print(len(tune_data), len(test_data))
# prompt template
prompts = [
lambda data: "%s Solve the problem carefully. Simplify your answer as much as possible. Put the final answer in \\boxed{}."
% data["problem"]
]
oai.ChatCompletion.set_cache(seed)
vanilla_config = {
"model": "gpt-3.5-turbo",
"temperature": 1,
"max_tokens": 2048,
"n": 1,
"prompt": prompts[0],
"stop": "###",
}
test_data_sample = test_data[0:3]
result = oai.ChatCompletion.test(test_data_sample, eval_math_responses, **vanilla_config)
result = oai.ChatCompletion.test(
test_data_sample,
eval_math_responses,
agg_method="median",
**vanilla_config,
)
def my_median(results):
return np.median(results)
def my_average(results):
return np.mean(results)
result = oai.ChatCompletion.test(
test_data_sample,
eval_math_responses,
agg_method=my_median,
**vanilla_config,
)
result = oai.ChatCompletion.test(
test_data_sample,
eval_math_responses,
agg_method={
"expected_success": my_median,
"success": my_average,
"success_vote": my_average,
"votes": np.mean,
},
**vanilla_config,
)
print(result)
config, _ = oai.ChatCompletion.tune(
data=tune_data, # the data for tuning
metric="expected_success", # the metric to optimize
mode="max", # the optimization mode
eval_func=eval_math_responses, # the evaluation function to return the success metrics
# log_file_name="logs/math.log", # the log file name
inference_budget=0.002, # the inference budget (dollar)
optimization_budget=0.01, # the optimization budget (dollar)
num_samples=num_samples,
prompt=prompts, # the prompt templates to choose from
stop="###", # the stop sequence
)
print("tuned config", config)
result = oai.ChatCompletion.test(test_data_sample, config_list=oai.config_list_openai_aoai(KEY_LOC), **config)
print("result from tuned config:", result)
print("empty responses", eval_math_responses([], None))
if __name__ == "__main__":
import openai
config_list = oai.config_list_openai_aoai(KEY_LOC)
assert len(config_list) >= 3, config_list
openai.api_key = os.environ["OPENAI_API_KEY"]
# test_filter()
# test_chatcompletion()
# test_multi_model()
# test_execute_code()
# test_improve()
# test_nocontext()
test_humaneval(1)
# test_math(1)

View File

@@ -0,0 +1,64 @@
import sys
import os
import pytest
try:
import openai
skip = False
except ImportError:
skip = True
here = os.path.abspath(os.path.dirname(__file__))
def run_notebook(input_nb, output_nb="executed_openai_notebook.ipynb", save=False):
import nbformat
from nbconvert.preprocessors import ExecutePreprocessor
from nbconvert.preprocessors import CellExecutionError
try:
file_path = os.path.join(here, os.pardir, os.pardir, os.pardir, "notebook", input_nb)
with open(file_path) as nb_file:
nb = nbformat.read(nb_file, as_version=4)
preprocessor = ExecutePreprocessor(timeout=4800, kernel_name="python3")
preprocessor.preprocess(nb, {"metadata": {"path": here}})
output_file_name = "executed_openai_notebook_output.txt"
output_file = os.path.join(here, output_file_name)
with open(output_file, "a") as nb_output_file:
for cell in nb.cells:
if cell.cell_type == "code" and "outputs" in cell:
for output in cell.outputs:
if "text" in output:
nb_output_file.write(output["text"].strip() + "\n")
elif "data" in output and "text/plain" in output["data"]:
nb_output_file.write(output["data"]["text/plain"].strip() + "\n")
except CellExecutionError:
raise
finally:
if save:
with open(os.path.join(here, output_nb), "w", encoding="utf-8") as nb_executed_file:
nbformat.write(nb, nb_executed_file)
@pytest.mark.skipif(
skip or not sys.version.startswith("3.10"),
reason="do not run openai test if openai is not installed or py!=3.10",
)
def test_autogen_openai_completion(save=False):
run_notebook("autogen_openai_completion.ipynb", save=save)
@pytest.mark.skipif(
skip or not sys.version.startswith("3.11"),
reason="do not run openai test if openai is not installed or py!=3.11",
)
def test_autogen_chatgpt_gpt4(save=False):
run_notebook("autogen_chatgpt_gpt4.ipynb", save=save)
if __name__ == "__main__":
test_autogen_chatgpt_gpt4(save=True)
test_autogen_openai_completion(save=True)

View File

@@ -0,0 +1,88 @@
import os
from flaml.autogen.code_utils import extract_code
from flaml import oai
KEY_LOC = "test/autogen"
here = os.path.abspath(os.path.dirname(__file__))
def test_extract_code():
print(extract_code("```bash\npython temp.py\n```"))
def test_coding_agent(human_input_mode="NEVER", max_consecutive_auto_reply=10):
try:
import openai
except ImportError:
return
from flaml.autogen.agent.coding_agent import PythonAgent
from flaml.autogen.agent.user_proxy_agent import UserProxyAgent
config_list = oai.config_list_gpt4_gpt35(key_file_path=KEY_LOC)
conversations = {}
oai.ChatCompletion.start_logging(conversations)
agent = PythonAgent("coding_agent", request_timeout=600, seed=42, config_list=config_list)
user = UserProxyAgent(
"user",
human_input_mode=human_input_mode,
max_consecutive_auto_reply=max_consecutive_auto_reply,
is_termination_msg=lambda x: x.rstrip().endswith("TERMINATE"),
)
agent.receive(
"""Create and execute a script to plot a rocket without using matplotlib""",
user,
)
agent.reset()
agent.receive(
"""Create a temp.py file with the following content:
```
print('Hello world!')
```""",
user,
)
print(conversations)
oai.ChatCompletion.start_logging(compact=False)
agent.receive("""Execute temp.py""", user)
print(oai.ChatCompletion.logged_history)
oai.ChatCompletion.stop_logging()
def test_tsp(human_input_mode="NEVER", max_consecutive_auto_reply=10):
try:
import openai
except ImportError:
return
from flaml.autogen.agent.coding_agent import PythonAgent
from flaml.autogen.agent.user_proxy_agent import UserProxyAgent
config_list = oai.config_list_openai_aoai(key_file_path=KEY_LOC)
hard_questions = [
"What if we must go from node 1 to node 2?",
"Can we double all distances?",
"Can we add a new point to the graph? It's distance should be randomly between 0 - 5 to each of the existing points.",
]
oai.ChatCompletion.start_logging()
agent = PythonAgent("coding_agent", temperature=0, config_list=config_list)
user = UserProxyAgent(
"user",
work_dir=f"{here}",
human_input_mode=human_input_mode,
max_consecutive_auto_reply=max_consecutive_auto_reply,
)
with open(f"{here}/tsp_prompt.txt", "r") as f:
prompt = f.read()
# agent.receive(prompt.format(question=hard_questions[0]), user)
# agent.receive(prompt.format(question=hard_questions[1]), user)
agent.receive(prompt.format(question=hard_questions[2]), user)
print(oai.ChatCompletion.logged_history)
oai.ChatCompletion.stop_logging()
if __name__ == "__main__":
# test_extract_code()
test_coding_agent(human_input_mode="TERMINATE")
# when GPT-4, i.e., the DEFAULT_MODEL, is used, conversation in the following test
# should terminate in 2-3 rounds of interactions (because is_termination_msg should be true after 2-3 rounds)
# although the max_consecutive_auto_reply is set to 10.
test_tsp(human_input_mode="NEVER", max_consecutive_auto_reply=10)

View File

@@ -0,0 +1,32 @@
from flaml import oai
KEY_LOC = "test/autogen"
def test_human_agent():
try:
import openai
except ImportError:
return
from flaml.autogen.agent.chat_agent import ChatAgent
from flaml.autogen.agent.user_proxy_agent import UserProxyAgent
conversations = {}
oai.ChatCompletion.start_logging(conversations)
agent = ChatAgent("chat_agent", config_list=oai.config_list_gpt4_gpt35(key_file_path=KEY_LOC))
user = UserProxyAgent("human_user", human_input_mode="NEVER", max_consecutive_auto_reply=2)
agent.receive(
"""Write python code to solve the equation x^3=125. You must write code in the following format. You must always print the result.
Wait for me to return the result.
```python
# your code
print(your_result)
```
""",
user,
)
print(conversations)
if __name__ == "__main__":
test_human_agent()

115
test/autogen/tsp_prompt.txt Normal file
View File

@@ -0,0 +1,115 @@
Now, we have a system to solve TSP problems. Let's try to solve a problem.
Given a distance dictionary `dicts`, where the key is a pair of nodes and the
value is the distance between them. For example, `dists[(1, 2)]` is the distance
between node 1 and node 2. We want to find the optimal cost for the TSP problem.
The users might have some questions regarding the solution. So, you are
responsible to write code to answer the their questions. Note that you usually
would need to run `solve_tsp` and `compare_costs` to compare the costs before
and after the change.
Here are the functions and their information that you can use directly:
----------
def change_dist(dist: dict, i: int, j: int, new_cost: float) -> float:
"""Change the distance between two points.
Args:
dist (dict): distance matrix, where the key is a pair and value is
the cost (aka, distance).
i (int): the source node
j (int): the destination node
new_cost (float): the new cost for the distance
Returns:
float: the previous cost
"""
----------
----------
def compare_costs(prev_cost, new_cost) -> float:
"""Compare the previous cost and the new cost.
Args:
prev_cost (float): the previous cost
new_cost (float): the updated cost
Returns:
float: the ratio between these two costs
"""
----------
----------
def solve_tsp(dists: dict) -> float:
"""Solve the TSP problem
Args:
dists (dict): the distance matrix between each nodes. Each item in the
dict is a pair (node A, node B) to the distance from A to B.
Returns:
float: the optimal cost
"""
----------
We also provide some sample questions and answers here:
----------
Question: Why should we go from point 1 to point 2?
Code:
```
from extensions.tsp import solve_tsp
from extensions.tsp_api import change_dist, compare_costs, dists
prev_cost=solve_tsp(dists)
change_dist(dists, 1, 2, float('inf'))
new_cost = solve_tsp(dists)
gap = compare_costs(prev_cost, new_cost)
print('If not, then the cost will increase', gap * 100, 'percent.')
```
----------
Question: Can we double the distance between point 4 and 2?
Code:
```
from extensions.tsp import solve_tsp
from extensions.tsp_api import change_dist, compare_costs, dists
prev_cost=solve_tsp(dists)
change_dist(dists, 3, 4, dists[(3, 4)] * 2)
new_cost = solve_tsp(dists)
gap = compare_costs(prev_cost, new_cost)
print('If we double the distance between 4 and 2, then the cost will decrease', - gap * 100, 'percent.')
```
----------
Question: what would happen if we remove point 2?
Code:
```
from extensions.tsp import solve_tsp
from extensions.tsp_api import compare_costs, dists
prev_cost=solve_tsp(dists)
for i, j in list(dists.keys()):
if i == 2 or j == 2:
del dists[i, j] # remove the edge cost
new_cost = solve_tsp(dists)
gap = compare_costs(prev_cost, new_cost)
print('If we remove point 2, then the cost will decrease', - gap * 100, 'percent.')
```
----------
Question: What if the edge between point 2 to 3 is removed?
Code:
```
from extensions.tsp import solve_tsp
from extensions.tsp_api import change_dist, compare_costs, dists
prev_cost=solve_tsp(dists)
change_dist(dists, 2, 3, float('inf'))
new_cost = solve_tsp(dists)
gap = compare_costs(prev_cost, new_cost)
print('If we remove the edge, then the cost will increase', gap * 100, 'percent.')
```
Now, answer the questions by using Python code:
Question: {question}
Code:

View File

@@ -0,0 +1,64 @@
import pytest
from pandas import DataFrame
from sklearn.datasets import load_iris
import mlflow
import mlflow.entities
from flaml import AutoML
class TestMLFlowLoggingParam:
def test_should_start_new_run_by_default(self, automl_settings):
with mlflow.start_run():
parent = mlflow.last_active_run()
automl = AutoML()
X_train, y_train = load_iris(return_X_y=True)
automl.fit(X_train=X_train, y_train=y_train, **automl_settings)
children = self._get_child_runs(parent)
assert len(children) >= 1, "Expected at least 1 child run, got {}".format(len(children))
def test_should_not_start_new_run_when_mlflow_logging_set_to_false_in_init(self, automl_settings):
with mlflow.start_run():
parent = mlflow.last_active_run()
automl = AutoML(mlflow_logging=False)
X_train, y_train = load_iris(return_X_y=True)
automl.fit(X_train=X_train, y_train=y_train, **automl_settings)
children = self._get_child_runs(parent)
assert len(children) == 0, "Expected 0 child runs, got {}".format(len(children))
def test_should_not_start_new_run_when_mlflow_logging_set_to_false_in_fit(self, automl_settings):
with mlflow.start_run():
parent = mlflow.last_active_run()
automl = AutoML()
X_train, y_train = load_iris(return_X_y=True)
automl.fit(X_train=X_train, y_train=y_train, mlflow_logging=False, **automl_settings)
children = self._get_child_runs(parent)
assert len(children) == 0, "Expected 0 child runs, got {}".format(len(children))
def test_should_start_new_run_when_mlflow_logging_set_to_true_in_fit(self, automl_settings):
with mlflow.start_run():
parent = mlflow.last_active_run()
automl = AutoML(mlflow_logging=False)
X_train, y_train = load_iris(return_X_y=True)
automl.fit(X_train=X_train, y_train=y_train, mlflow_logging=True, **automl_settings)
children = self._get_child_runs(parent)
assert len(children) >= 1, "Expected at least 1 child run, got {}".format(len(children))
@staticmethod
def _get_child_runs(parent_run: mlflow.entities.Run) -> DataFrame:
experiment_id = parent_run.info.experiment_id
return mlflow.search_runs(
[experiment_id], filter_string="tags.mlflow.parentRunId = '{}'".format(parent_run.info.run_id)
)
@pytest.fixture(scope="class")
def automl_settings(self):
return {
"time_budget": 2, # in seconds
"metric": "accuracy",
"task": "classification",
"log_file_name": "iris.log",
}

View File

@@ -22,7 +22,7 @@ def test_summarization():
automl_settings["task"] = "summarization"
automl_settings["metric"] = "rouge1"
automl_settings["time_budget"] = 2 * automl_settings["time_budget"]
automl_settings["fit_kwargs_by_estimator"]["transformer"]["model_path"] = "patrickvonplaten/t5-tiny-random"
automl_settings["fit_kwargs_by_estimator"]["transformer"]["model_path"] = "google/flan-t5-small"
try:
automl.fit(X_train=X_train, y_train=y_train, X_val=X_val, y_val=y_val, **automl_settings)

View File

@@ -1,239 +0,0 @@
import datasets
import sys
import numpy as np
import pytest
from functools import partial
from flaml import oai
from flaml.autogen.code_utils import (
eval_function_completions,
generate_assertions,
implement,
)
from flaml.autogen.math_utils import eval_math_responses
def test_nocontext():
try:
import openai
import diskcache
except ImportError as exc:
print(exc)
return
response = oai.Completion.create(model="text-ada-001", prompt="1+1=", max_tokens=1)
print(response)
@pytest.mark.skipif(
sys.platform == "win32",
reason="do not run on windows",
)
def test_humaneval(num_samples=1):
eval_with_generated_assertions = partial(eval_function_completions, assertions=generate_assertions)
seed = 41
data = datasets.load_dataset("openai_humaneval")["test"].shuffle(seed=seed)
n_tune_data = 20
tune_data = [
{
"definition": data[x]["prompt"],
"test": data[x]["test"],
"entry_point": data[x]["entry_point"],
}
for x in range(n_tune_data)
]
test_data = [
{
"definition": data[x]["prompt"],
"test": data[x]["test"],
"entry_point": data[x]["entry_point"],
}
for x in range(n_tune_data, len(data))
]
oai.Completion.set_cache(seed)
try:
import openai
import diskcache
except ImportError as exc:
print(exc)
return
# a minimal tuning example
config, _ = oai.Completion.tune(
data=tune_data,
metric="success",
mode="max",
eval_func=eval_function_completions,
n=1,
prompt="{definition}",
)
responses = oai.Completion.create(context=test_data[0], **config)
# a minimal tuning example for tuning chat completion models using the Completion class
config, _ = oai.Completion.tune(
data=tune_data,
metric="succeed_assertions",
mode="max",
eval_func=eval_with_generated_assertions,
n=1,
model="gpt-3.5-turbo",
prompt="{definition}",
)
responses = oai.Completion.create(context=test_data[0], **config)
# a minimal tuning example for tuning chat completion models using the Completion class
config, _ = oai.ChatCompletion.tune(
data=tune_data,
metric="expected_success",
mode="max",
eval_func=eval_function_completions,
n=1,
messages=[{"role": "user", "content": "{definition}"}],
)
responses = oai.ChatCompletion.create(context=test_data[0], **config)
print(responses)
code, cost, _ = implement(tune_data[1], [config])
print(code)
print(cost)
print(eval_function_completions([code], **tune_data[1]))
# a more comprehensive tuning example
config2, analysis = oai.Completion.tune(
data=tune_data,
metric="success",
mode="max",
eval_func=eval_with_generated_assertions,
log_file_name="logs/humaneval.log",
inference_budget=0.002,
optimization_budget=2,
num_samples=num_samples,
prompt=[
"{definition}",
"# Python 3{definition}",
"Complete the following Python function:{definition}",
],
stop=[["\nclass", "\ndef", "\nif", "\nprint"], None], # the stop sequences
)
print(config2)
print(analysis.best_result)
print(test_data[0])
responses = oai.Completion.create(context=test_data[0], **config2)
print(responses)
oai.Completion.data = test_data[:num_samples]
result = oai.Completion._eval(analysis.best_config, prune=False, eval_only=True)
print("result without pruning", result)
result = oai.Completion.test(test_data[:num_samples], config=config2)
print(result)
code, cost, selected = implement(tune_data[1], [config2, config])
print(selected)
print(eval_function_completions([code], **tune_data[1]))
def test_math(num_samples=-1):
seed = 41
data = datasets.load_dataset("competition_math")
train_data = data["train"].shuffle(seed=seed)
test_data = data["test"].shuffle(seed=seed)
n_tune_data = 20
tune_data = [
{
"problem": train_data[x]["problem"],
"solution": train_data[x]["solution"],
}
for x in range(len(train_data))
if train_data[x]["level"] == "Level 1"
][:n_tune_data]
test_data = [
{
"problem": test_data[x]["problem"],
"solution": test_data[x]["solution"],
}
for x in range(len(test_data))
if test_data[x]["level"] == "Level 1"
]
print(
"max tokens in tuning data's canonical solutions",
max([len(x["solution"].split()) for x in tune_data]),
)
print(len(tune_data), len(test_data))
# prompt template
prompts = [
lambda data: "%s Solve the problem carefully. Simplify your answer as much as possible. Put the final answer in \\boxed{}."
% data["problem"]
]
try:
import openai
import diskcache
except ImportError as exc:
print(exc)
return
oai.ChatCompletion.set_cache(seed)
vanilla_config = {
"model": "gpt-3.5-turbo",
"temperature": 1,
"max_tokens": 2048,
"n": 1,
"prompt": prompts[0],
"stop": "###",
}
test_data_sample = test_data[0:3]
result = oai.ChatCompletion.test(test_data_sample, vanilla_config, eval_math_responses)
test_data_sample = test_data[3:6]
result = oai.ChatCompletion.test(
test_data_sample,
vanilla_config,
eval_math_responses,
use_cache=False,
agg_method="median",
)
def my_median(results):
return np.median(results)
def my_average(results):
return np.mean(results)
result = oai.ChatCompletion.test(
test_data_sample,
vanilla_config,
eval_math_responses,
use_cache=False,
agg_method=my_median,
)
result = oai.ChatCompletion.test(
test_data_sample,
vanilla_config,
eval_math_responses,
use_cache=False,
agg_method={
"expected_success": my_median,
"success": my_average,
"success_vote": my_average,
"votes": np.mean,
},
)
print(result)
config, _ = oai.ChatCompletion.tune(
data=tune_data, # the data for tuning
metric="expected_success", # the metric to optimize
mode="max", # the optimization mode
eval_func=eval_math_responses, # the evaluation function to return the success metrics
# log_file_name="logs/math.log", # the log file name
inference_budget=0.002, # the inference budget (dollar)
optimization_budget=0.01, # the optimization budget (dollar)
num_samples=num_samples,
prompt=prompts, # the prompt templates to choose from
stop="###", # the stop sequence
)
print("tuned config", config)
result = oai.ChatCompletion.test(test_data_sample, config)
print("result from tuned config:", result)
print("empty responses", eval_math_responses([], None))
if __name__ == "__main__":
import openai
openai.api_key_path = "test/openai/key.txt"
test_nocontext()
test_humaneval(1)
test_math(1)

View File

@@ -1,62 +0,0 @@
import nbformat
from nbconvert.preprocessors import ExecutePreprocessor
from nbconvert.preprocessors import CellExecutionError
import os
import pytest
try:
import openai
skip = False
except ImportError:
skip = True
here = os.path.abspath(os.path.dirname(__file__))
def run_notebook(input_nb, output_nb="executed_openai_notebook.ipynb", save=False):
try:
file_path = os.path.join(here, os.pardir, os.pardir, "notebook", input_nb)
with open(file_path) as f:
nb = nbformat.read(f, as_version=4)
ep = ExecutePreprocessor(timeout=3600, kernel_name="python3")
ep.preprocess(nb, {"metadata": {"path": here}})
output_file_name = "executed_openai_notebook_output.txt"
output_file = os.path.join(here, output_file_name)
with open(output_file, "a") as f:
for cell in nb.cells:
if cell.cell_type == "code" and "outputs" in cell:
for output in cell.outputs:
if "text" in output:
f.write(output["text"].strip() + "\n")
elif "data" in output and "text/plain" in output["data"]:
f.write(output["data"]["text/plain"].strip() + "\n")
except CellExecutionError:
raise
finally:
if save:
with open(os.path.join(here, output_nb), "w", encoding="utf-8") as f:
nbformat.write(nb, f)
@pytest.mark.skipif(
skip,
reason="do not run openai test if openai is not installed",
)
def test_autogen_openai(save=False):
run_notebook("autogen_openai.ipynb", save=save)
@pytest.mark.skipif(
skip,
reason="do not run openai test if openai is not installed",
)
def test_autogen_chatgpt(save=False):
run_notebook("autogen_chatgpt.ipynb", save=save)
if __name__ == "__main__":
test_autogen_chatgpt(save=True)
test_autogen_openai(save=True)

View File

@@ -2,6 +2,7 @@ import os
import sys
import warnings
import pytest
import mlflow
import sklearn.datasets as skds
from flaml import AutoML
from flaml.tune.spark.utils import check_spark
@@ -18,10 +19,15 @@ else:
spark = (
pyspark.sql.SparkSession.builder.appName("MyApp")
.master("local[1]")
.master("local[2]")
.config(
"spark.jars.packages",
"com.microsoft.azure:synapseml_2.12:0.10.2,org.apache.hadoop:hadoop-azure:3.3.5,com.microsoft.azure:azure-storage:8.6.6",
(
"com.microsoft.azure:synapseml_2.12:0.10.2,"
"org.apache.hadoop:hadoop-azure:3.3.5,"
"com.microsoft.azure:azure-storage:8.6.6,"
f"org.mlflow:mlflow-spark:{mlflow.__version__}"
),
)
.config("spark.jars.repositories", "https://mmlspark.azureedge.net/maven")
.config("spark.sql.debug.maxToStringFields", "100")
@@ -29,6 +35,10 @@ else:
.config("spark.executor.extraJavaOptions", "-Xss1m")
.getOrCreate()
)
spark.sparkContext._conf.set(
"spark.mlflow.pysparkml.autolog.logModelAllowlistFile",
"https://mmlspark.blob.core.windows.net/publicwasb/log_model_allowlist.txt",
)
# spark.sparkContext.setLogLevel("ERROR")
spark_available, _ = check_spark()
skip_spark = not spark_available

View File

@@ -187,14 +187,10 @@ def test_n_current_trials():
def get_n_current_trials(n_concurrent_trials=0, num_executors=num_executors):
try:
FLAML_MAX_CONCURRENT = int(os.getenv("FLAML_MAX_CONCURRENT", 0))
num_executors = max(num_executors, FLAML_MAX_CONCURRENT, 1)
except ValueError:
FLAML_MAX_CONCURRENT = 0
max_spark_parallelism = (
min(spark.sparkContext.defaultParallelism, FLAML_MAX_CONCURRENT)
if FLAML_MAX_CONCURRENT > 0
else spark.sparkContext.defaultParallelism
)
num_executors = max(num_executors, FLAML_MAX_CONCURRENT, 1)
max_spark_parallelism = max(spark.sparkContext.defaultParallelism, FLAML_MAX_CONCURRENT)
max_concurrent = max(1, max_spark_parallelism)
n_concurrent_trials = min(
n_concurrent_trials if n_concurrent_trials > 0 else num_executors,
@@ -204,20 +200,26 @@ def test_n_current_trials():
return n_concurrent_trials
os.environ["FLAML_MAX_CONCURRENT"] = "invlaid"
assert get_n_current_trials() == num_executors
os.environ["FLAML_MAX_CONCURRENT"] = "0"
assert get_n_current_trials() == max(num_executors, 1)
os.environ["FLAML_MAX_CONCURRENT"] = "4"
tmp_max = min(4, spark.sparkContext.defaultParallelism)
assert get_n_current_trials() == tmp_max
os.environ["FLAML_MAX_CONCURRENT"] = "9999999"
assert get_n_current_trials() == spark.sparkContext.defaultParallelism
os.environ["FLAML_MAX_CONCURRENT"] = "100"
tmp_max = min(100, spark.sparkContext.defaultParallelism)
tmp_max = spark.sparkContext.defaultParallelism
assert get_n_current_trials(1) == 1
assert get_n_current_trials(2) == min(2, tmp_max)
assert get_n_current_trials(50) == min(50, tmp_max)
assert get_n_current_trials(200) == min(200, tmp_max)
os.environ["FLAML_MAX_CONCURRENT"] = "0"
assert get_n_current_trials() == max(num_executors, 1)
os.environ["FLAML_MAX_CONCURRENT"] = "4"
tmp_max = max(4, spark.sparkContext.defaultParallelism)
assert get_n_current_trials() == min(4, tmp_max)
os.environ["FLAML_MAX_CONCURRENT"] = "9999999"
assert get_n_current_trials() == 9999999
os.environ["FLAML_MAX_CONCURRENT"] = "100"
tmp_max = max(100, spark.sparkContext.defaultParallelism)
assert get_n_current_trials(1) == 1
assert get_n_current_trials(2) == min(2, tmp_max)
assert get_n_current_trials(50) == min(50, tmp_max)
assert get_n_current_trials(200) == min(200, tmp_max)
del os.environ["FLAML_MAX_CONCURRENT"]
def test_iloc_pandas_on_spark():
@@ -410,7 +412,7 @@ if __name__ == "__main__":
# test_broadcast_code()
# test_get_broadcast_data()
# test_train_test_split_pyspark()
# test_n_current_trials()
test_n_current_trials()
# test_len_labels()
# test_iloc_pandas_on_spark()
test_spark_metric_loss_score()

Binary file not shown.

After

Width:  |  Height:  |  Size: 39 KiB

Binary file not shown.

After

Width:  |  Height:  |  Size: 40 KiB

Binary file not shown.

After

Width:  |  Height:  |  Size: 36 KiB

Binary file not shown.

After

Width:  |  Height:  |  Size: 34 KiB

View File

@@ -0,0 +1,74 @@
---
title: Does Model and Inference Parameter Matter in LLM Applications? - A Case Study for MATH
authors: sonichi
tags: [LLM, GPT, research]
---
![level 2 algebra](img/level2algebra.png)
**TL;DR:**
* **A case study using the MATH benchmark shows that model selection and inference parameters do matter in Large Language Model (LLM) applications.**
* **The tuned gpt-3.5-turbo model vastly outperformed untuned gpt-4 in accuracy for easier problems, while gpt-4 was a better choice for the most difficult problems.**
* **FLAML can help with model selection, parameter tuning, and cost-saving in LLM applications.**
Large language models (LLMs) are powerful tools that can generate natural language texts for various applications, such as chatbots, summarization, translation, and more. GPT-4 is currently the state of the art LLM in the world. Is model selection irrelevant? What about inference parameters?
In this blog post, we will explore how model and inference parameter matter in LLM applications, using a case study for [MATH](https://datasets-benchmarks-proceedings.neurips.cc/paper/2021/hash/be83ab3ecd0db773eb2dc1b0a17836a1-Abstract-round2.html), a benchmark for evaluating LLMs on advanced mathematical problem solving. MATH consists of 12K math competition problems from AMC-10, AMC-12 and AIME. Each problem is accompanied by a step-by-step solution.
We will use the new subpackage [`flaml.autogen`](docs/Use-Cases/Auto-Generation) to automatically find the best model and inference parameter for LLMs on a given task and dataset given an inference budget, using a novel low-cost search & pruning strategy. FLAML currently supports all the LLMs from OpenAI, such as GPT-3.5 and GPT-4.
We will use FLAML to perform model selection and inference parameter tuning. Then we compare the performance and inference cost on solving algebra problems with the untuned gpt-4. We will also analyze how different difficulty levels affect the results.
## Experiment Setup
We use FLAML to select between the following models with a target inference budget $0.02 per instance:
- gpt-3.5-turbo, a relatively cheap model that powers the popular ChatGPT app
- gpt-4, the state of the art LLM that costs more than 100 times of gpt-3.5-turbo
We adapt the models using 20 examples in the train set, using the problem statement as the input and generating the solution as the output. We use the following inference parameters:
- temperature: The parameter that controls the randomness of the output text. A higher temperature means more diversity but less coherence. We search for the optimal temperature in the range of [0, 1].
- top_p: The parameter that controls the probability mass of the output tokens. Only tokens with a cumulative probability less than or equal to top-p are considered. A lower top-p means more diversity but less coherence. We search for the optimal top-p in the range of [0, 1].
- max_tokens: The maximum number of tokens that can be generated for each output. We search for the optimal max length in the range of [50, 1000].
- n: The number of responses to generate. We search for the optimal n in the range of [1, 100].
- prompt: We use the template: "{problem} Solve the problem carefully. Simplify your answer as much as possible. Put the final answer in \\boxed{{}}." where {problem} will be replaced by the math problem instance.
In this experiment, when n > 1, we find the answer with highest votes among all the responses and then select it as the final answer to compare with the ground truth. For example, if n = 5 and 3 of the responses contain a final answer 301 while 2 of the responses contain a final answer 159, we choose 301 as the final answer. This can help with resolving potential errors due to randomness. We use the average accuracy and average inference cost as the metric to evaluate the performance over a dataset. The inference cost of a particular instance is measured by the price per 1K tokens and the number of tokens consumed.
## Experiment Results
The first figure in this blog post shows the average accuracy and average inference cost of each configuration on the level 2 Algebra test set.
Surprisingly, the tuned gpt-3.5-turbo model is selected as a better model and it vastly outperforms untuned gpt-4 in accuracy (92% vs. 70%) with equal or 2.5 times higher inference budget.
The same observation can be obtained on the level 3 Algebra test set.
![level 3 algebra](img/level3algebra.png)
However, the selected model changes on level 4 Algebra.
![level 4 algebra](img/level4algebra.png)
This time gpt-4 is selected as the best model. The tuned gpt-4 achieves much higher accuracy (56% vs. 44%) and lower cost than the untuned gpt-4.
On level 5 the result is similar.
![level 5 algebra](img/level5algebra.png)
We can see that FLAML has found different optimal model and inference parameters for each subset of a particular level, which shows that these parameters matter in cost-sensitive LLM applications and need to be carefully tuned or adapted.
An example notebook to run these experiments can be found at: https://github.com/microsoft/FLAML/blob/v1.2.1/notebook/autogen_chatgpt.ipynb
## Analysis and Discussion
While gpt-3.5-turbo demonstrates competitive accuracy with voted answers in relatively easy algebra problems under the same inference budget, gpt-4 is a better choice for the most difficult problems. In general, through parameter tuning and model selection, we can identify the opportunity to save the expensive model for more challenging tasks, and improve the overall effectiveness of a budget-constrained system.
There are many other alternative ways of solving math problems, which we have not covered in this blog post. When there are choices beyond the inference parameters, they can be generally tuned via [`flaml.tune`](docs/Use-Cases/Tune-User-Defined-Function).
The need for model selection, parameter tuning and cost saving is not specific to the math problems. The [Auto-GPT](https://github.com/Significant-Gravitas/Auto-GPT) project is an example where high cost can easily prevent a generic complex task to be accomplished as it needs many LLM inference calls.
## For Further Reading
* [Research paper about the tuning technique](https://arxiv.org/abs/2303.04673)
* [Documentation about `flaml.autogen`](/docs/Use-Cases/Auto-Generation)
*Do you have any experience to share about LLM applications? Do you like to see more support or research of LLM optimization or automation? Please join our [Discord](https://discord.gg/Cppx2vSPVP) server for discussion.*

View File

@@ -0,0 +1,43 @@
---
title: Surpassing 1 Million Downloads - A Retrospective and a Look into the Future
authors: qingyunwu
tags: [LLM, LLMOps, FLAMLv2]
---
**TL;DR:**
* **Celebrating FLAML's milestone: 1 million downloads**
* **Introducing Large Language Model (LLM) support in the upcoming FLAML v2**
This week, FLAML has reached a significant milestone: 1 million downloads. Originating as an intern research project within Microsoft Research, FLAML has grown into an open-source library used widely across the industry and supported by an active community.
As we celebrate this milestone, we want to recognize the passionate contributors and users who have played an essential role in molding FLAML into the flourishing project it is today. Our heartfelt gratitude goes out to each of you for your unwavering support, constructive feedback, and innovative contributions that have driven FLAML to new heights.
A big shoutout to our industrial collaborators from Azure Core, Azure Machine Learning, Azure Synapse Analytics, Microsoft 365, ML.NET, Vowpal Wabbit, Anyscale, Databricks, and Wise; and academic collaborators from MIT, Penn State University, Stevens Institute of Technology, Tel Aviv University, Texas A & M University, University of Manchester, University of Washington, and The Chinese University of Hong Kong etc.
We'd also like to take the opportunity to reflect on FLAML's past achievements and its future roadmap, with a particular focus on large language models (LLM) and LLMOps.
## FLAML's Journey: Past Achievements and Milestones
### Bring AutoML to One's Fingertips
FLAML offers an off-the-shelf AutoML solution that enables users to quickly discover high-quality models or configurations for common ML/AI tasks. By automatically selecting models and hyperparameters for training or inference, FLAML saves users time and effort. FLAML has significantly reduced development time for developers and data scientists alike, while also providing a convenient way to integrate new algorithms into the pipeline, enabling easy extensions and large-scale parallel tuning. These features make FLAML a valuable tool in R&D efforts for many enterprise users.
FLAML is capable of handling a variety of common ML tasks, such as [classification](https://microsoft.github.io/FLAML/docs/Examples/AutoML-Classification), [regression](https://microsoft.github.io/FLAML/docs/Examples/AutoML-Regression), [time series forecasting](https://microsoft.github.io/FLAML/docs/Examples/AutoML-Time%20series%20forecast), [NLP tasks](https://microsoft.github.io/FLAML/docs/Examples/AutoML-Rank), and [generative tasks](https://microsoft.github.io/FLAML/docs/Use-Cases/Auto-Generation), providing a comprehensive solution for various applications.
### Speed and Efficiency: The FLAML Advantage
What sets FLAML apart from other AutoML libraries is its exceptional efficiency, thanks to the economical and efficient hyperparameter optimization and model selection methods developed in our [research](https://microsoft.github.io/FLAML/docs/Research). FLAML is also capable of handling large search spaces with heterogeneous evaluation costs, complex constraints, guidance, and early stopping. The [zero-shot AutoML](https://microsoft.github.io/FLAML/docs/Use-Cases/Zero-Shot-AutoML) option further reduces the cost of AutoML, making FLAML an even more attractive solution for a wide range of applications with low resources.
### Easy Customization and Extensibility
FLAML is designed for easy extensibility and customization, allowing users to add custom learners, metrics, search space, etc. For example, the support of hierarchical search spaces allows one to first choose an ML learner and then sampling from the hyperparameter space specific to that learner. The level of customization ranges from minimal (providing only training data and task type as input) to full (tuning a user-defined function). This flexibility and support for easy customization have led to FLAML's adoption in various domains, including security, finance, marketing, engineering, supply chain, insurance, and healthcare, delivering highly accurate results.
## Embracing Large Language Models in FLAML v2
As large language models continue to reshape the AI ecosystem, FLAML is poised to adapt and grow alongside these advancements. Recognizing the importance of large language models, we have recently incorporated an autogen package into FLAML, and are committed to focusing our collective efforts on addressing the unique challenges that arise in LLMOps (Large Language Model Operations).
In its current iteration, FLAML offers support for model selection and inference parameter tuning for large language models. We are actively working on the development of new features, such as LLM selection, inference hyperparameter tuning for LLM, and agent-based LLM operations, to further expand FLAML's capabilities.
We are eagerly preparing for the launch of FLAML v2, where we will place special emphasis on incorporating and enhancing features specifically tailored for large language models (LLMs), further expanding FLAML's capabilities.
We invite contributions from anyone interested in this topic and look forward to collaborating with the community as we shape the future of FLAML and LLMOps together.
## For Further Reading
* [Documentation about `flaml.autogen`](/docs/Use-Cases/Auto-Generation)
* [Code Example: Tune chatGPT for Math Problem Solving with FLAML](https://github.com/microsoft/FLAML/blob/main/notebook/autogen_chatgpt_gpt4.ipynb)
*Do you have any experience to share about LLM applications? Do you like to see more support or research of LLMOps? Please join our [Discord](https://discord.gg/Cppx2vSPVP) server for discussion.*

11
website/blog/authors.yml Normal file
View File

@@ -0,0 +1,11 @@
sonichi:
name: Chi Wang
title: Principal Researcher at Microsoft Research
url: https://www.linkedin.com/in/chi-wang-49b15b16/
image_url: https://github.com/sonichi.png
qingyunwu:
name: Qingyun Wu
title: Assistant Professor at the Pennsylvania State University
url: https://qingyun-wu.github.io/
image_url: https://github.com/qingyun-wu.png

View File

@@ -5,9 +5,9 @@ In this example, we will tune several hyperparameters for the OpenAI's completio
### Prerequisites
Install the [openai] option. The OpenAI integration is in preview.
Install the [autogen,blendsearch] option.
```bash
pip install "flaml[openai]==1.2.0"
pip install "flaml[autogen,blendsearch]==1.2.2 datasets"
```
Setup your OpenAI key:
@@ -64,7 +64,9 @@ Before starting tuning, you need to define the metric for the optimization. For
from functools import partial
from flaml.autogen.code_utils import eval_function_completions, generate_assertions
eval_with_generated_assertions = partial(eval_function_completions, assertions=generate_assertions)
eval_with_generated_assertions = partial(
eval_function_completions, assertions=generate_assertions,
)
```
This function will first generate assertion statements for each problem. Then, it uses the assertions to select the generated responses.
@@ -126,10 +128,10 @@ print(eval_with_generated_assertions(oai.Completion.extract_text(response), **tu
You can use flaml's `oai.Completion.test` to evaluate the performance of an entire dataset with the tuned config.
```python
result = oai.Completion.test(test_data, config)
result = oai.Completion.test(test_data, **config)
print("performance on test data with the tuned config:", result)
```
The result will vary with the inference budget and optimization budget.
[Link to notebook](https://github.com/microsoft/FLAML/blob/main/notebook/autogen_openai.ipynb) | [Open in colab](https://colab.research.google.com/github/microsoft/FLAML/blob/main/notebook/autogen_openai.ipynb)
[Link to notebook](https://github.com/microsoft/FLAML/blob/main/notebook/autogen_openai_completion.ipynb) | [Open in colab](https://colab.research.google.com/github/microsoft/FLAML/blob/main/notebook/autogen_openai_completion.ipynb)

View File

@@ -2,14 +2,16 @@
<!-- ### Welcome to FLAML, a Fast Library for Automated Machine Learning & Tuning! -->
FLAML is a lightweight Python library that finds accurate machine
learning models automatically, efficiently and economically. It frees users from selecting models and hyperparameters for each model.
FLAML is a lightweight Python library for efficient automation of machine
learning, including selection of
models, hyperparameters, and other tunable choices of an application.
### Main Features
1. For common machine learning or AI tasks like classification, regression, and generation, it quickly finds quality models for user-provided data with low computational resources. It supports both classical machine learning models and deep neural networks, including foundation models such as the GPT series.
2. It is easy to customize or extend. Users can find their desired customizability from a smooth range: minimal customization (computational resource budget), medium customization (e.g., scikit-style learner, search space and metric), or full customization (arbitrary training and evaluation code). Users can customize only when and what they need to, and leave the rest to the library.
3. It supports fast and economical automatic tuning, capable of handling large search space with heterogeneous evaluation cost and complex constraints/guidance/early stopping. FLAML is powered by a new, [cost-effective
* For foundation models like the GPT series, it automates the experimentation and optimization of their inference performance to maximize the effectiveness for downstream applications and minimize the inference cost.
* For common machine learning tasks like classification and regression, it quickly finds quality models for user-provided data with low computational resources.
* It is easy to customize or extend. Users can find their desired customizability from a smooth range: minimal customization (computational resource budget), medium customization (e.g., scikit-style learner, search space and metric), or full customization (arbitrary training/inference/evaluation code). Users can customize only when and what they need to, and leave the rest to the library.
* It supports fast and economical automatic tuning, capable of handling large search space with heterogeneous evaluation cost and complex constraints/guidance/early stopping. FLAML is powered by a [cost-effective
hyperparameter optimization](Use-Cases/Tune-User-Defined-Function#hyperparameter-optimization-algorithm)
and model selection method invented by Microsoft Research, and many followup [research studies](Research).
@@ -19,6 +21,27 @@ Install FLAML from pip: `pip install flaml`. Find more options in [Installation]
There are several ways of using flaml:
#### (New) [Auto Generation](Use-Cases/Auto-Generation)
For example, you can optimize generations by ChatGPT or GPT-4 etc. with your own tuning data, success metrics and budgets.
```python
from flaml import oai
config, analysis = oai.Completion.tune(
data=tune_data,
metric="success",
mode="max",
eval_func=eval_func,
inference_budget=0.05,
optimization_budget=3,
num_samples=-1,
)
```
The automated experimentation and optimization can help you maximize the utility out of these expensive models.
A suite of utilities such as caching and templating are offered to accelerate the experimentation and application development.
#### [Task-oriented AutoML](Use-Cases/task-oriented-automl)
For example, with three lines of code, you can start using this economical and fast AutoML engine as a scikit-learn style estimator.
@@ -86,33 +109,12 @@ from flaml.default import LGBMClassifier
Then, you can use it just like you use the original `LGMBClassifier`. Your other code can remain unchanged. When you call the `fit()` function from `flaml.default.LGBMClassifier`, it will automatically instantiate a good data-dependent hyperparameter configuration for your dataset, which is expected to work better than the default configuration.
#### (New) [Auto Generation](Use-Cases/Auto-Generation)
You can optimize generations by ChatGPT or GPT-4 etc. with your own tuning data, success metrics and budgets.
```python
from flaml import oai
config, analysis = oai.Completion.tune(
data=tune_data,
metric="success",
mode="max",
eval_func=eval_func,
inference_budget=0.05,
optimization_budget=3,
num_samples=-1,
)
```
The optimization can help you maximize the utility out of these expensive models.
### Where to Go Next?
* Understand the use cases for [Task-oriented AutoML](Use-Cases/task-oriented-automl), [Tune user-defined function](Use-Cases/Tune-User-Defined-Function) and [Zero-shot AutoML](Use-Cases/Zero-Shot-AutoML).
* Find code examples under "Examples": from [AutoML - Classification](Examples/AutoML-Classification) to [Tune - PyTorch](Examples/Tune-PyTorch).
* Find [talks](https://www.youtube.com/channel/UCfU0zfFXHXdAd5x-WvFBk5A) and [tutorials](https://github.com/microsoft/FLAML/tree/tutorial/tutorial) about FLAML.
* Understand the use cases for [Auto Generation](Use-Cases/Auto-Generation), [Task-oriented AutoML](Use-Cases/Task-Oriented-Automl), [Tune user-defined function](Use-Cases/Tune-User-Defined-Function) and [Zero-shot AutoML](Use-Cases/Zero-Shot-AutoML).
* Find code examples under "Examples": from [AutoGen - OpenAI](Examples/AutoGen-OpenAI) to [Tune - PyTorch](Examples/Tune-PyTorch).
* Learn about [research](Research) around FLAML.
* Refer to [SDK](reference/automl/automl) and [FAQ](FAQ).
* Chat on [Discord](https://discord.gg/Cppx2vSPVP).
If you like our project, please give it a [star](https://github.com/microsoft/FLAML/stargazers) on GitHub. If you are interested in contributing, please read [Contributor's Guide](Contribute).

View File

@@ -1,6 +1,6 @@
# Auto Generation
`flaml.autogen` is a subpackage for automating generation tasks. It uses [`flaml.tune`](../reference/tune/tune) to find good hyperparameter configurations under budget constraints.
`flaml.autogen` is a package for automating generation tasks (in preview). It uses [`flaml.tune`](../reference/tune/tune) to find good hyperparameter configurations under budget constraints.
Such optimization has several benefits:
* Maximize the utility out of using expensive foundation models.
* Reduce the inference cost by using cheaper models or configurations which achieve equal or better performance.
@@ -26,6 +26,9 @@ There are also complex interactions among subsets of the hyperparameters. For ex
the temperature and top_p are not recommended to be altered from their default values together because they both control the randomness of the generated text, and changing both at the same time can result in conflicting effects; n and best_of are rarely tuned together because if the application can process multiple outputs, filtering on the server side causes unnecessary information loss; both n and max_tokens will affect the total number of tokens generated, which in turn will affect the cost of the request.
These interactions and trade-offs make it difficult to manually determine the optimal hyperparameter settings for a given text generation task.
*Do the choices matter? Check this [blog post](/blog/2023/04/21/LLM-tuning-math) for a case study.*
## Tune Hyperparameters
The tuning can be performed with the following information:
@@ -46,8 +49,9 @@ The evaluation function should take a list of responses, and other keyword argum
```python
def eval_math_responses(responses: List[str], solution: str, **args) -> Dict:
# select a response from the list of responses
answer = voted_answer(responses)
# check whether the answer is correct
return {"success": True or False}
return {"success": is_equivalent(answer, solution)}
```
[`flaml.autogen.code_utils`](../reference/autogen/code_utils) and [`flaml.autogen.math_utils`](../reference/autogen/math_utils) offer some example evaluation functions for code generation and math problem solving.
@@ -98,16 +102,23 @@ config, analysis = oai.Completion.tune(
`num_samples` is the number of configurations to sample. -1 means unlimited (until optimization budget is exhausted).
The returned `config` contains the optimized configuration and `analysis` contains an [ExperimentAnalysis](../reference/tune/analysis#experimentanalysis-objects) object for all the tried configurations and results.
## Perform inference with the tuned config
The tuend config can be used to perform inference.
One can use [`flaml.oai.Completion.create`](../reference/autogen/oai/completion#create) to performance inference.
*Refer to this [page](../Examples/AutoGen-OpenAI) for a full example.*
## Perform Inference
One can use [`flaml.oai.Completion.create`](../reference/autogen/oai/completion#create) to perform inference.
There are a number of benefits of using `flaml.oai.Completion.create` to perform inference.
A template is either a format str, or a function which produces a str from several input fields.
### API unification
`flaml.oai.Completion.create` is compatible with both `openai.Completion.create` and `openai.ChatCompletion.create`, and both OpenAI API and Azure OpenAI API. So models such as "text-davinci-003", "gpt-3.5-turbo" and "gpt-4" can share a common API. When only tuning the chat-based models, `flaml.oai.ChatCompletion` can be used.
`flaml.oai.Completion.create` is compatible with both `openai.Completion.create` and `openai.ChatCompletion.create`, and both OpenAI API and Azure OpenAI API. So models such as "text-davinci-003", "gpt-3.5-turbo" and "gpt-4" can share a common API.
When chat models are used and `prompt` is given as the input to `flaml.oai.Completion.create`, the prompt will be automatically converted into `messages` to fit the chat completion API requirement. One advantage is that one can experiment with both chat and non-chat models for the same prompt in a unified API.
For local LLMs, one can spin up an endpoint using a package like [simple_ai_server](https://github.com/lhenault/simpleAI), and then use the same API to send a request.
When only working with the chat-based models, `flaml.oai.ChatCompletion` can be used. It also does automatic conversion from prompt to messages, if prompt is provided instead of messages.
### Caching
@@ -115,23 +126,269 @@ API call results are cached locally and reused when the same request is issued.
### Error handling
#### Runtime error
It is easy to hit error when calling OpenAI APIs, due to connection, rate limit, or timeout. Some of the errors are transient. `flaml.oai.Completion.create` deals with the transient errors and retries automatically. Initial request timeout, retry timeout and retry time interval can be configured via `flaml.oai.request_timeout`, `flaml.oai.retry_timeout` and `flaml.oai.retry_time`.
Moreover, one can pass a list of configurations of different models/endpoints to mitigate the rate limits. For example,
```python
response = oai.Completion.create(
config_list=[
{
"model": "gpt-4",
"api_key": os.environ.get("AZURE_OPENAI_API_KEY"),
"api_type": "azure",
"api_base": os.environ.get("AZURE_OPENAI_API_BASE"),
"api_version": "2023-03-15-preview",
},
{
"model": "gpt-3.5-turbo",
"api_key": os.environ.get("OPENAI_API_KEY"),
"api_type": "open_ai",
"api_base": "https://api.openai.com/v1",
"api_version": None,
},
{
"model": "llama-7B",
"api_base": "http://127.0.0.1:8080",
"api_type": "open_ai",
"api_version": None,
}
],
prompt="Hi",
)
```
It will try querying Azure OpenAI gpt-4, OpenAI gpt-3.5-turbo, and a locally hosted llama-7B one by one, ignoring AuthenticationError, RateLimitError and Timeout,
until a valid result is returned. This can speed up the development process where the rate limit is a bottleneck. An error will be raised if the last choice fails. So make sure the last choice in the list has the best availability.
#### Logic error
Another type of error is that the returned response does not satisfy a requirement. For example, if the response is required to be a valid json string, one would like to filter the responses that are not. This can be achieved by providing a list of configurations and a filter function. For example,
```python
def valid_json_filter(context, config, response):
for text in oai.Completion.extract_text(response):
try:
json.loads(text)
return True
except ValueError:
pass
return False
response = oai.Completion.create(
config_list=[{"model": "text-ada-001"}, {"model": "gpt-3.5-turbo"}, {"model": "text-davinci-003"}],
prompt="How to construct a json request to Bing API to search for 'latest AI news'? Return the JSON request.",
filter_func=valid_json_filter,
)
```
The example above will try to use text-ada-001, gpt-3.5-turbo, and text-davinci-003 iteratively, until a valid json string is returned or the last config is used. One can also repeat the same model in the list for multiple times to try one model multiple times for increasing the robustness of the final response.
### Templating
If the provided prompt or message is a template, it will be automatically materialized with a given context. For example,
```python
response = oai.Completion.create(problme=problem, prompt="{problem} Solve the problem carefully.", **config)
response = oai.Completion.create(
context={"problem": "How many positive integers, not exceeding 100, are multiples of 2 or 3 but not 4?"},
prompt="{problem} Solve the problem carefully.",
**config
)
```
## Other utilities
`flaml.oai.Completion` also offers some additional utilities, such as:
A template is either a format str, like the example above, or a function which produces a str from several input fields, like the example below.
```python
def content(turn, **context):
return "\n".join(
[
context[f"user_message_{turn}"],
context[f"external_info_{turn}"]
]
)
messages = [
{
"role": "system",
"content": "You are a teaching assistant of math.",
},
{
"role": "user",
"content": partial(content, turn=0),
},
]
context = {
"user_message_0": "Could you explain the solution to Problem 1?",
"external_info_0": "Problem 1: ...",
}
response = oai.ChatCompletion.create(context, messages=messages, **config)
messages.append(
{
"role": "assistant",
"content": oai.ChatCompletion.extract_text(response)[0]
}
)
messages.append(
{
"role": "user",
"content": partial(content, turn=1),
},
)
context.append(
{
"user_message_1": "Why can't we apply Theorem 1 to Equation (2)?",
"external_info_1": "Theorem 1: ...",
}
)
response = oai.ChatCompletion.create(context, messages=messages, **config)
```
### Logging (Experimental)
When debugging or diagnosing an LLM-based system, it is often convenient to log the API calls and analyze them. `flaml.oai.Completion` and `flaml.oai.ChatCompletion` offer an easy way to collect the API call histories. For example, to log the chat histories, simply run:
```python
flaml.oai.ChatCompletion.start_logging()
```
The API calls made after this will be automatically logged. They can be retrieved at any time by:
```python
flaml.oai.ChatCompletion.logged_history
```
To stop logging, use
```python
flaml.oai.ChatCompletion.stop_logging()
```
If one would like to append the history to an existing dict, pass the dict like:
```python
flaml.oai.ChatCompletion.start_logging(history_dict=existing_history_dict)
```
By default, the counter of API calls will be reset at `start_logging()`. If no reset is desired, set `reset_counter=False`.
There are two types of logging formats: compact logging and individual API call logging. The default format is compact.
Set `compact=False` in `start_logging()` to switch.
* Example of a history dict with compact logging.
```python
{
"""
[
{
'role': 'system',
'content': system_message,
},
{
'role': 'user',
'content': user_message_1,
},
{
'role': 'assistant',
'content': assistant_message_1,
},
{
'role': 'user',
'content': user_message_2,
},
{
'role': 'assistant',
'content': assistant_message_2,
},
]""": {
"created_at": [0, 1],
"cost": [0.1, 0.2],
}
}
```
* Example of a history dict with individual API call logging.
```python
{
0: {
"request": {
"messages": [
{
"role": "system",
"content": system_message,
},
{
"role": "user",
"content": user_message_1,
}
],
... # other parameters in the request
},
"response": {
"choices": [
"messages": {
"role": "assistant",
"content": assistant_message_1,
},
],
... # other fields in the response
}
},
1: {
"request": {
"messages": [
{
"role": "system",
"content": system_message,
},
{
"role": "user",
"content": user_message_1,
},
{
"role": "assistant",
"content": assistant_message_1,
},
{
"role": "user",
"content": user_message_2,
},
],
... # other parameters in the request
},
"response": {
"choices": [
"messages": {
"role": "assistant",
"content": assistant_message_2,
},
],
... # other fields in the response
}
},
}
```
It can be seen that the individual API call history contain redundant information of the conversation. For a long conversation the degree of redundancy is high.
The compact history is more efficient and the individual API call history contains more details.
## Other Utilities
### Completion
[`flaml.oai.Completion`](../reference/autogen/oai/completion) also offers some additional utilities, such as:
- a [`cost`](../reference/autogen/oai/completion#cost) function to calculate the cost of an API call.
- a [`test`](../reference/autogen/oai/completion#test) function to conveniently evaluate the configuration over test data.
- a [`extract_text`](../reference/autogen/oai/completion#extract_text) function to extract the text from a completion or chat response.
- a [`set_cache`](../reference/autogen/oai/completion#extract_text) function to set the seed and cache path. The caching is introduced in the section above, with the benefit of cost saving, reproducibility, and controlled randomness.
Interested in trying it yourself? Please check the following notebook examples:
* [Optimize for Code Gen](https://github.com/microsoft/FLAML/blob/main/notebook/autogen_openai.ipynb)
* [Optimize for Math](https://github.com/microsoft/FLAML/blob/main/notebook/autogen_chatgpt.ipynb)
### Code
[`flaml.autogen.code_utils`](../reference/autogen/code_utils) offers code-related utilities, such as:
- a [`improve_code`](../reference/autogen/code_utils#improve_code) function to improve code for a given objective.
- a [`generate_assertions`](../reference/autogen/code_utils#generate_assertions) function to generate assertion statements from function signature and docstr.
- a [`implement`](../reference/autogen/code_utils#implement) function to implement a function from a definition.
- a [`eval_function_completions`](../reference/autogen/code_utils#eval_function_completions) function to evaluate the success of a function completion task, or select a response from a list of responses using generated assertions.
### Math
[`flaml.autogen.math_utils`](../reference/autogen/math_utils) offers utilities for math problems, such as:
- a [eval_math_responses](../reference/autogen/math_utils#eval_math_responses) function to select a response using voting, and check if the final answer is correct if the canonical solution is provided.
*Interested in trying it yourself? Please check the following notebook examples:*
* [Optimize for Code Gen](https://github.com/microsoft/FLAML/blob/main/notebook/autogen_openai_completion.ipynb)
* [Optimize for Math](https://github.com/microsoft/FLAML/blob/main/notebook/autogen_chatgpt_gpt4.ipynb)

View File

@@ -477,6 +477,18 @@ with mlflow.start_run():
automl.fit(X_train=X_train, y_train=y_train, **settings)
```
To disable mlflow logging pre-configured in FLAML, set `mlflow_logging=False`:
```python
automl = AutoML(mlflow_logging=False)
```
or
```python
automl.fit(X_train=X_train, y_train=y_train, mlflow_logging=False, **settings)
```
Setting `mlflow_logging=False` in the constructor will disable mlflow logging for all the `fit()` calls.
Setting `mlflow_logging=False` in `fit()` will disable mlflow logging for that `fit()` call only.
### Extra fit arguments
Extra fit arguments that are needed by the estimators can be passed to `AutoML.fit()`. For example, if there is a weight associated with each training example, they can be passed via `sample_weight`. For another example, `period` can be passed for time series forecaster. For any extra keywork argument passed to `AutoML.fit()` which has not been explicitly listed in the function signature, it will be passed to the underlying estimators' `fit()` as is. For another example, you can set the number of gpus used by each trial with the `gpu_per_trial` argument, which is only used by TransformersEstimator and XGBoostSklearnEstimator.
@@ -503,7 +515,7 @@ automl_settings = {
automl.fit(X_train=X_train, y_train=y_train, **automl_settings)
```
## Retrieve and analyze the outcomes of AutoML.fit()
## Retrieve the Outcomes
### Get best model

View File

@@ -32,6 +32,7 @@ module.exports = {
position: 'left',
label: 'SDK',
},
{to: 'blog', label: 'Blog', position: 'left'},
{
type: 'doc',
docId: 'FAQ',
@@ -57,23 +58,23 @@ module.exports = {
// },
// ],
// },
// {
// title: 'Community',
// items: [
{
title: 'Community',
items: [
// // {
// // label: 'Stack Overflow',
// // href: 'https://stackoverflow.com/questions/tagged/pymarlin',
// // },
// // {
// // label: 'Discord',
// // href: 'https://discordapp.com/invite/docusaurus',
// // },
{
label: 'Discord',
href: 'https://discord.gg/Cppx2vSPVP',
},
// // {
// // label: 'Twitter',
// // href: 'https://twitter.com/docusaurus',
// // },
// ],
// },
],
},
// {
// title: 'More',
// items: [

View File

@@ -8,9 +8,9 @@ const FeatureList = [
Svg: require('../../static/img/auto.svg').default,
description: (
<>
FLAML finds accurate ML models with low computational resources
for common ML tasks.
It frees users from selecting learners and hyperparameters.
FLAML finds accurate models or configurations with low computational resources
for common ML/AI tasks.
It frees users from selecting models and hyperparameters for training or inference.
{/* It is fast and economical. */}
</>
),