Files
FLAML/notebook/integrate_azureml.ipynb
Chi Wang 868e7dd1ca support xgboost 2.0 (#1219)
* support xgboost 2.0

* try classes_

* test version

* quote

* use_label_encoder

* Fix xgboost test error

* remove deprecated files

* remove deprecated files

* remove deprecated import

* replace deprecated import in integrate_spark.ipynb

* replace deprecated import in automl_lightgbm.ipynb

* formatted integrate_spark.ipynb

* replace deprecated import

* try fix driver python path

* Update python-package.yml

* replace deprecated reference

* move spark python env var to other section

* Update setup.py, install xgb<2 for MacOS

* Fix typo

* assert

* Try assert xgboost version

* Fail fast

* Keep all test/spark to try fail fast

* No need to skip spark test in Mac or Win

* Remove assert xgb version

* Remove fail fast

* Found root cause, fix test_sparse_matrix_xgboost

* Revert "No need to skip spark test in Mac or Win"

This reverts commit a09034817f.

* remove assertion

---------

Co-authored-by: Li Jiang <bnujli@gmail.com>
Co-authored-by: levscaut <57213911+levscaut@users.noreply.github.com>
Co-authored-by: levscaut <lwd2010530@qq.com>
Co-authored-by: Li Jiang <lijiang1@microsoft.com>
2023-09-22 06:55:00 +00:00

232 lines
5.9 KiB
Plaintext

{
"cells": [
{
"attachments": {},
"cell_type": "markdown",
"metadata": {
"slideshow": {
"slide_type": "slide"
}
},
"source": [
"Copyright (c) Microsoft Corporation. All rights reserved. \n",
"\n",
"Licensed under the MIT License.\n",
"\n",
"# Run FLAML in AzureML\n",
"\n",
"\n",
"## 1. Introduction\n",
"\n",
"FLAML is a Python library (https://github.com/microsoft/FLAML) designed to automatically produce accurate machine learning models \n",
"with low computational cost. It is fast and economical. The simple and lightweight design makes it easy \n",
"to use and extend, such as adding new learners. FLAML can \n",
"- serve as an economical AutoML engine,\n",
"- be used as a fast hyperparameter tuning tool, or \n",
"- be embedded in self-tuning software that requires low latency & resource in repetitive\n",
" tuning tasks.\n",
"\n",
"In this notebook, we use one real data example (binary classification) to showcase how to use FLAML library together with AzureML.\n",
"\n",
"FLAML requires `Python>=3.8`. To run this notebook example, please install flaml with the [automl,azureml] option:\n",
"```bash\n",
"pip install flaml[automl,azureml]\n",
"```"
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {},
"outputs": [],
"source": [
"%pip install flaml[automl,azureml]"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"### Enable mlflow in AzureML workspace"
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {},
"outputs": [],
"source": [
"import mlflow\n",
"from azureml.core import Workspace\n",
"\n",
"ws = Workspace.from_config()\n",
"mlflow.set_tracking_uri(ws.get_mlflow_tracking_uri())"
]
},
{
"cell_type": "markdown",
"metadata": {
"slideshow": {
"slide_type": "slide"
}
},
"source": [
"## 2. Classification Example\n",
"### Load data and preprocess\n",
"\n",
"Download [Airlines dataset](https://www.openml.org/d/1169) from OpenML. The task is to predict whether a given flight will be delayed, given the information of the scheduled departure."
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {
"slideshow": {
"slide_type": "subslide"
},
"tags": []
},
"outputs": [],
"source": [
"from flaml.automl.data import load_openml_dataset\n",
"X_train, X_test, y_train, y_test = load_openml_dataset(dataset_id=1169, data_dir='./')"
]
},
{
"cell_type": "markdown",
"metadata": {
"slideshow": {
"slide_type": "slide"
}
},
"source": [
"### Run FLAML\n",
"In the FLAML automl run configuration, users can specify the task type, time budget, error metric, learner list, whether to subsample, resampling strategy type, and so on. All these arguments have default values which will be used if users do not provide them. For example, the default ML learners of FLAML are `['lgbm', 'xgboost', 'catboost', 'rf', 'extra_tree', 'lrl1']`. "
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {
"slideshow": {
"slide_type": "slide"
}
},
"outputs": [],
"source": [
"''' import AutoML class from flaml package '''\n",
"from flaml import AutoML\n",
"automl = AutoML()"
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {
"slideshow": {
"slide_type": "slide"
}
},
"outputs": [],
"source": [
"settings = {\n",
" \"time_budget\": 60, # total running time in seconds\n",
" \"metric\": 'accuracy', \n",
" # check the documentation for options of metrics (https://microsoft.github.io/FLAML/docs/Use-Cases/Task-Oriented-AutoML#optimization-metric)\n",
" \"estimator_list\": ['lgbm', 'rf', 'xgboost'], # list of ML learners\n",
" \"task\": 'classification', # task type \n",
" \"sample\": False, # whether to subsample training data\n",
" \"log_file_name\": 'airlines_experiment.log', # flaml log file\n",
"}\n"
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {
"slideshow": {
"slide_type": "slide"
},
"tags": []
},
"outputs": [],
"source": [
"experiment = mlflow.set_experiment(\"flaml\")\n",
"with mlflow.start_run() as run:\n",
" automl.fit(X_train=X_train, y_train=y_train, **settings)\n",
" # log the model\n",
" mlflow.sklearn.log_model(automl, \"automl\")\n"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"### Load the model"
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {},
"outputs": [],
"source": [
"automl = mlflow.sklearn.load_model(f\"{run.info.artifact_uri}/automl\")\n",
"print(automl.predict_proba(X_test))\n",
"print(automl.predict(X_test))"
]
},
{
"cell_type": "markdown",
"metadata": {
"slideshow": {
"slide_type": "slide"
}
},
"source": [
"### Retrieve logs"
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {
"slideshow": {
"slide_type": "subslide"
},
"tags": []
},
"outputs": [],
"source": [
"mlflow.search_runs(experiment_ids=[experiment.experiment_id], filter_string=\"params.learner = 'xgboost'\")"
]
}
],
"metadata": {
"kernelspec": {
"display_name": "Python 3.8.13 ('syml-py38')",
"language": "python",
"name": "python3"
},
"language_info": {
"codemirror_mode": {
"name": "ipython",
"version": 3
},
"file_extension": ".py",
"mimetype": "text/x-python",
"name": "python",
"nbconvert_exporter": "python",
"pygments_lexer": "ipython3",
"version": "3.8.13"
},
"vscode": {
"interpreter": {
"hash": "e3d9487e2ef008ade0db1bc293d3206d35cb2b6081faff9f66b40b257b7398f7"
}
}
},
"nbformat": 4,
"nbformat_minor": 2
}