--- title: "Tuning Fit and Compile Arguments" output: rmarkdown::html_vignette vignette: > %\VignetteIndexEntry{Tuning Fit and Compile Arguments} %\VignetteEngine{knitr::rmarkdown} %\VignetteEncoding{UTF-8} --- ```{r, include = FALSE} knitr::opts_chunk$set( collapse = TRUE, comment = "#>", eval = reticulate::py_module_available("keras") ) # Suppress verbose Keras output for the vignette options(keras.fit_verbose = 0) set.seed(123) ``` ## Introduction While `kerasnip` makes it easy to tune the architecture of a Keras model (e.g., the number of layers or the number of units in a layer), it is often just as important to tune the parameters that control the training process itself. `kerasnip` exposes these parameters through special `fit_*` and `compile_*` arguments in the model specification. This vignette provides a comprehensive example of how to tune these arguments within a `tidymodels` workflow. We will tune: * **`fit_epochs`**: The number of training epochs. * **`fit_batch_size`**: The number of samples per gradient update. * **`compile_optimizer`**: The optimization algorithm (e.g., "adam", "sgd"). * **`compile_loss`**: The loss function used for training. * **`learn_rate`**: The learning rate for the optimizer. ## Setup First, we load the necessary packages. ```{r load-packages} library(kerasnip) library(tidymodels) library(keras3) ``` ## Data Preparation We will use the classic `iris` dataset for this example. It's a simple, small dataset, which is ideal for demonstrating the tuning process without long training times. ```{r data-prep} # Split data into training and testing sets set.seed(123) iris_split <- initial_split(iris, prop = 0.8, strata = Species) iris_train <- training(iris_split) iris_test <- testing(iris_split) # Create cross-validation folds for tuning iris_folds <- vfold_cv(iris_train, v = 3, strata = Species) ``` ## Define a `kerasnip` Model We'll create a very simple sequential model with a single dense layer. This keeps the focus on tuning the `fit_*` and `compile_*` arguments rather than the model architecture. ```{r define-kerasnip-model} # Define layer blocks input_block <- function(model, input_shape) { keras_model_sequential(input_shape = input_shape) } dense_block <- function(model, units = 10) { model |> layer_dense(units = units, activation = "relu") } output_block <- function(model, num_classes) { model |> layer_dense(units = num_classes, activation = "softmax") } # Create the kerasnip model specification function create_keras_sequential_spec( model_name = "iris_mlp", layer_blocks = list( input = input_block, dense = dense_block, output = output_block ), mode = "classification" ) ``` ## Define the Tunable Specification Now, we create an instance of our `iris_mlp` model. We set the arguments we want to optimize to `tune()`. ```{r define-tune-spec} # Define the tunable model specification tune_spec <- iris_mlp( dense_units = 16, # Keep architecture fixed for this example fit_epochs = tune(), fit_batch_size = tune(), compile_optimizer = tune(), compile_loss = tune(), learn_rate = tune() ) |> set_engine("keras") print(tune_spec) ``` ## Create Workflow and Tuning Grid Next, we create a `workflow` and define the search space for our hyperparameters using `dials`. `kerasnip` provides special `dials` parameter functions for `optimizer` and `loss`. ```{r create-workflow-grid} # Create a simple recipe iris_recipe <- recipe(Species ~ ., data = iris_train) |> step_normalize(all_numeric_predictors()) # Create the workflow tune_wf <- workflow() |> add_recipe(iris_recipe) |> add_model(tune_spec) # Define the tuning grid params <- extract_parameter_set_dials(tune_wf) |> update( fit_epochs = epochs(c(10, 30)), fit_batch_size = batch_size(c(16, 64), trans = NULL), compile_optimizer = optimizer_function(values = c("adam", "sgd", "rmsprop")), compile_loss = loss_function_keras(values = c("categorical_crossentropy", "kl_divergence")), learn_rate = learn_rate(c(0.001, 0.01), trans = NULL) ) set.seed(456) tuning_grid <- grid_regular(params, levels = 2) tuning_grid ``` ## Tune the Model With the workflow and grid defined, we can now run the hyperparameter tuning using `tune_grid()`. ```{r tune-model, cache=TRUE} tune_res <- tune_grid( tune_wf, resamples = iris_folds, grid = tuning_grid, metrics = metric_set(accuracy, roc_auc), control = control_grid(save_pred = FALSE, save_workflow = TRUE, verbose = FALSE) ) ``` ## Inspect the Results Let's examine the results to see how the different combinations of fitting and compilation parameters performed. ```{r inspect-results} # Show the best performing models based on accuracy show_best(tune_res, metric = "accuracy") # Plot the results autoplot(tune_res) + theme_minimal() # Select the best hyperparameters best_params <- select_best(tune_res, metric = "accuracy") print(best_params) ``` The results show that `tune` has successfully explored different optimizers, loss functions, learning rates, epochs, and batch sizes, identifying the combination that yields the best accuracy. ## Finalize and Fit Finally, we finalize our workflow with the best-performing hyperparameters and fit the model one last time on the full training dataset. ```{r finalize-fit} # Finalize the workflow final_wf <- finalize_workflow(tune_wf, best_params) # Fit the final model final_fit <- fit(final_wf, data = iris_train) print(final_fit) ``` We can now use this `final_fit` object to make predictions on the test set. ```{r predict} # Make predictions predictions <- predict(final_fit, new_data = iris_test) # Evaluate performance bind_cols(predictions, iris_test) |> accuracy(truth = Species, estimate = .pred_class) ``` ## Conclusion This vignette demonstrated how to tune the crucial `fit_*` and `compile_*` arguments of a Keras model within the `tidymodels` framework using `kerasnip`. By exposing these as tunable parameters, `kerasnip` gives you full control over the training process, allowing you to optimize not just the model's architecture, but also how it learns.