---
title: "Using Customized Models"
output: rmarkdown::html_vignette
author: Fangzhou Xie
date: "`r format(Sys.time(), '%B %d, %Y')`"
vignette: >
  %\VignetteIndexEntry{Using Customized Models}
  %\VignetteEngine{knitr::rmarkdown}
  %\VignetteEncoding{UTF-8}
---

```{r, include = FALSE}
knitr::opts_chunk$set(
  collapse = TRUE,
  comment = "#>"
)
```

```{r setup}
library(rethnicity)
```

# Design of the Package

I built this package to help applied researchers for research on ethnic equality/inequality.
More specifically, this package provides a race-prediction method based on names.
I designed the package in such way that the method is empowered by deep learning models, without the need
to install the deep learning libraries, the installations of which are usually a daunting task.
Hence, the methods provided in this package are not designed to be updated/fine-tuned/trained on custom datasets.
This is the trade-off one has to be willing to make for the ease of use.

That said, from version `0.2.0` onward, I provide two additional lower-level functions: `predict_fullname` and
`predict_lastname`, which would allow users to provided their customized models. 
(There is only one function prior to `v0.2.0`: `predict_ethnicity`. This function is still the RECOMMENDED one to use for most people.)


# Usage on Customized Models

Since the package disables training by design, you need to train your own model in Keras and then convert the trained model 
to `.json` format by the [frugally-deep](https://github.com/Dobiasd/frugally-deep) project.

## Train the model in Keras

If you are reading this vignette, most likely you know what you are doing and you must have heard `Keras`.
Otherwise, you will have to stick to the default method `predict_ethnicity`.

You can refer to the following links to see how I trained the models and create your own version:
[fullname model](https://github.com/fangzhou-xie/rethnicity/blob/main/data-raw/rethnicity_singlechar_distill_fullname_aligned.ipynb),
[lastname model](https://github.com/fangzhou-xie/rethnicity/blob/main/data-raw/rethnicity_singlechar_distill_lastname.ipynb).

Before training the model, you need to process your dataset and you will need to use `keras.utils.to_categorical()` to
transform the outcome variable into integers and you need to know the mapping between them.
For example, `0, 1, 2, 3` refer to `asian, black, hispanic, white` respectively.
You will need this and we will call it `labels = c("asian", "black", "hispanic", "white")`.

Just remember to save the model without the optimizers (more on the [`frugally-deep` website](https://github.com/Dobiasd/frugally-deep)):

```
model.save('keras_model.h5', include_optimizer=False)
```

## Convert the Model to `.json`

Then, use the [`convert_model.py` script](https://github.com/Dobiasd/frugally-deep/tree/master/keras_export) to convert your model into `.json` format.
This is what I did as well.
You will encounter an error in the conversion process, if you include the optimizers in the saved model.

```
python convert_model.py keras_model.h5 keras_model.json
```

## Predict with Your Own Model

Now you have the model trained and converted and you need the file path of this model file.
I am loading the default models without training new ones.

```{r}
# remember the list of labels we mentioned?
labels <- c("asian", "black", "hispanic", "white")

# change to your own model file path
model_path <- system.file("models", "fullname_aligned_distill.json", package = "rethnicity", mustWork = TRUE)

# run the prediction
predict_fullname(firstnames = "Alan", lastnames = "Turing", labels = labels, model_path = model_path)
```


In fact, if you tweak the code to predict gender from names, this will also work.