Clarify period parameter and automatic label lagging in time series forecasting (#1495)

* Initial plan

* Add comprehensive documentation for period parameter and automatic label lagging

Co-authored-by: thinkall <3197038+thinkall@users.noreply.github.com>

* Address code review feedback on docstring clarity

Co-authored-by: thinkall <3197038+thinkall@users.noreply.github.com>

* Clarify period vs prediction output length per @thinkall's feedback

Co-authored-by: thinkall <3197038+thinkall@users.noreply.github.com>

* Refine terminology per code review feedback

Co-authored-by: thinkall <3197038+thinkall@users.noreply.github.com>

* Run pre-commit formatting fixes

Co-authored-by: thinkall <3197038+thinkall@users.noreply.github.com>

---------

Co-authored-by: copilot-swe-agent[bot] <198982749+Copilot@users.noreply.github.com>
Co-authored-by: thinkall <3197038+thinkall@users.noreply.github.com>
Co-authored-by: Li Jiang <bnujli@gmail.com>
This commit is contained in:
Copilot
2026-01-21 14:19:23 +08:00
committed by GitHub
parent 9233a52736
commit 7ec1414e9b
3 changed files with 59 additions and 7 deletions

View File

@@ -8,6 +8,25 @@ Install the [automl,ts_forecast] option.
pip install "flaml[automl,ts_forecast]"
```
### Understanding the `period` Parameter
The `period` parameter (also called **horizon** in the code) specifies the **forecast horizon** - the number of future time steps the model is trained to predict. For example:
- `period=12` means you want to forecast 12 time steps ahead (e.g., 12 months, 12 days)
- `period=7` means you want to forecast 7 time steps ahead
**Important Note on Prediction**: During the prediction stage, the output length equals the length of `X_test`. This means you can generate predictions for any number of time steps by providing the corresponding timestamps in `X_test`, regardless of the `period` value used during training.
#### Automatic Feature Engineering
**Important**: You do NOT need to manually lag the target variable before training. FLAML handles this automatically:
- **For sklearn-based models** (lgbm, rf, xgboost, extra_tree, catboost): FLAML automatically creates lagged features of both the target variable and any exogenous variables. This transforms the time series forecasting problem into a supervised learning regression problem.
- **For time series native models** (prophet, arima, sarimax, holt-winters): These models have built-in time series forecasting capabilities and handle temporal dependencies natively.
The automatic lagging is implemented internally when you call `automl.fit()` with `task="ts_forecast"` or `task="ts_forecast_classification"`, so you can focus on providing clean input data without worrying about feature engineering.
### Simple NumPy Example
```python