learn

ies_pi_predict.learn

learned_predict

learned_predict(
    df: DataFrame,
    output_column: str,
    predict_start: datetime,
    predict_end: datetime,
    algorhythms: list[Algorhythm] = default_algorhythms,
    random_state: int | None = None,
) -> tuple[pd.Series, Algorhythm, float, float]

learned_predict uses machine learning to predict unknown values.

Parameters:

Name	Type	Description	Default
`df`	`DataFrame`	A dataframe with a datetime index and columns representing input variables and an output variables of a system.	required
`output_column`	`str`	The name of the output column	required
`predict_start`	`datetime`	The date for which to start predicting values.	required
`predict_end`	`datetime`	The date for which to stop predicting values.	required
`algorhythms`	`list[Algorhythm]`	The list of algorhythm to use.	`default_algorhythms`
`random_state`	`int \| None`	The random state, used for testing and debugging. Defaults to None.	`None`

It firsts enriches the provided DataFrame with columns calculated from the datetime index. They will provide to the model more info regarding the date and time of day.

It then instantiates models with the provided algorhytms and train them with the provided data (input columns and output columns).

The algorhythms that best fits the data (calculated with a mean squared error function) is then used to predict data for the portion of time between the provided dates.

Returns:

Type	Description
`tuple[Series, Algorhythm, float, float]`	tuple[pd.Series, Algorhythm, float, float]: A tuple containing: - a series of predicted values (with a datetime index) - the algorhythm that was used - the score (coefficient of determination) it obtained during training - the RMSE (Root mean square error) it obtained during training

Source code in src/ies_pi_predict/learn.py

def learned_predict(df: pd.DataFrame, output_column: str, 
            predict_start: datetime, predict_end: datetime, 
            algorhythms: list[Algorhythm]=default_algorhythms,
            random_state: int | None = None) \
                -> tuple[pd.Series, Algorhythm, float, float]:
    """
    learned_predict uses machine learning to predict unknown values.

    Args:
        df (pd.DataFrame): A dataframe with a datetime index and columns representing
                           input variables and an output variables of a system.
        output_column (str): The name of the output column
        predict_start (datetime): The date for which to start predicting values.
        predict_end (datetime): The date for which to stop predicting values.
        algorhythms (list[Algorhythm], optional): The list of algorhythm to use.
        random_state (int | None, optional): The random state, used for testing
                                             and debugging. Defaults to None.


    It firsts enriches the provided DataFrame with columns calculated from the 
    datetime index. They will provide to the model more info regarding the date 
    and time of day.

    It then instantiates models with the provided algorhytms and train them with
    the provided data (input columns and output columns).

    The algorhythms that best fits the data (calculated with a mean squared 
    error function) is then used to predict data for the portion of time between
    the provided dates.

    Returns:
        tuple[pd.Series, Algorhythm, float, float]: A tuple containing:
            - a series of predicted values (with a datetime index)
            - the algorhythm that was used
            - the score (coefficient of determination) it obtained during training
            - the RMSE (Root mean square error) it obtained during training
    """
    df = df.copy()
    _enrich_df(df)
    X, y = _get_train_data(df, output_column)

    xTrain, xTest, yTrain, ytest = train_test_split(
    X, y, test_size=0.2, random_state=random_state)

    scores = {}
    rmses = {}
    models = _create_models(algorhythms, 
                           random_state=random_state)
    for algo, model in models.items():
        model.fit(xTrain, yTrain)
        prediction = model.predict(xTest)
        scores[algo] = r2_score(ytest, prediction)
        rmses[algo] = np.sqrt(mean_squared_error(ytest, prediction))
    best_algo = max(scores, key=scores.get)
    best_model = models[best_algo]

    df = df.truncate(predict_start, predict_end)
    predict_input, _ = _get_predict_data(
        df, output_column)
    prediction = np.round(best_model.predict(predict_input), 2)
    output_column = f"{output_column}__predicted"
    df[output_column] = prediction

    return df[output_column], best_algo, scores[best_algo], rmses[best_algo]