In this vignette a brief overview of classification metrics in {SLmetrics} will be
provided. The classification interface is broadly divided into two
methods: foo.cmatrix() and foo.factor(). The
former calculates the classification from a confusion matrix, while the
latter calculates the same metric from two vectors: a vector of
actual values and a vector of predicted
values. Both are vectors of [factor] values.
Throughout this vignette, the following data will be used:
# 1) seed
set.seed(1903)
# 2) actual values
actual <- factor(
    x = sample(c("A", "B", "C"), size = 10, replace = TRUE)
)
# 3) predicted values
predicted <- factor(
    x = sample(c("A", "B", "C"), size = 10, replace = TRUE)
)
# 4) sample weights
weights <- runif(
    n = length(actual)
)Assume that the predicted values come from a trained
machine learning model. This vignette introduces a subset of the metrics
available in {SLmetrics}; see the online documentation for
more details and other metrics.
The accuracy of the model can be evaluated using the
accuracy()-function as follows:
Many classification metrics have different names yet compute the same
underlying value. For example, recall is also known as the
true positive rate or sensitivity. These
metrics can be calculated as follows:
# 1) calculate recall
recall(
    actual    = actual,
    predicted = predicted
)
#>         A         B         C 
#> 0.3333333 0.2500000 0.3333333
# 2) calculate sensitivity
sensitivity(
    actual    = actual,
    predicted = predicted
)
#>         A         B         C 
#> 0.3333333 0.2500000 0.3333333
# 1) calculate true positive rate
tpr(
    actual    = actual,
    predicted = predicted
)
#>         A         B         C 
#> 0.3333333 0.2500000 0.3333333By default, all classification functions calculates the class-wise
performance metrics where possible. The performance metrics can also be
aggregated in micro and macro averages by
using the estimator-parameter:
# 1) macro average
recall(
    actual    = actual,
    predicted = predicted,
    estimator = 2 # macro average: 2
)
#> [1] 0.3055556
# 2) micro average
recall(
    actual    = actual,
    predicted = predicted,
    estimator = 1 # micro average: 1
)
#> [1] 0.3Calculating multiple performance metrics using separate calls to
foo.factor() can be inefficient because each function
reconstructs the underlying confusion matrix. A more efficient approach
is to construct the confusion matrix once and then pass it to your
chosen metric function. To do this, you can use the
cmatrix() function:
# 1) confusion matrix
confusion_matrix <- cmatrix(
    actual    = actual,
    predicted = predicted
)
# 2) summarise confusion matrix
summary(
    confusion_matrix
)
#> Confusion Matrix (3 x 3) 
#> ================================================================================
#>   A B C
#> A 1 0 2
#> B 1 1 2
#> C 1 1 1
#> ================================================================================
#> Overall Statistics (micro average)
#>  - Accuracy:          0.30
#>  - Balanced Accuracy: 0.31
#>  - Sensitivity:       0.30
#>  - Specificity:       0.65
#>  - Precision:         0.30Now you can pass the confusion matrix directly into the metric functions:
The weighted classification metrics can be calculated by using the
weighted.foo-method which have a similar interface as the
unweighted versions above. Below is an example showing how to compute a
weighted version of recall:
# 1) calculate recall
weighted.recall(
    actual    = actual,
    predicted = predicted,
    w         = weights
)
#>         A         B         C 
#> 0.3359073 0.3027334 0.4245202
# 2) calculate sensitivity
weighted.sensitivity(
    actual    = actual,
    predicted = predicted,
    w         = weights
)
#>         A         B         C 
#> 0.3359073 0.3027334 0.4245202
# 1) calculate true positive rate
weighted.tpr(
    actual    = actual,
    predicted = predicted,
    w         = weights
)
#>         A         B         C 
#> 0.3359073 0.3027334 0.4245202A small disclaimer applies to weighted metrics: it is
not possible to pass a weighted confusion matrix
directly into a weighted.foo() method. Consider the
following example:
# 1) calculate weighted confusion matrix
weighted_confusion_matrix <- weighted.cmatrix(
    actual = actual,
    predicted = predicted,
    w = weights
)
# 2) calculate weighted accuracy
try(
    weighted.accuracy(weighted_confusion_matrix)
)
#> Error in UseMethod(generic = "weighted.accuracy") : 
#>   no applicable method for 'weighted.accuracy' applied to an object of class "cmatrix"This approach throws an error. Instead, pass the weighted confusion
matrix into the unweighted function that uses a confusion matrix
interface (i.e., foo.cmatrix()). For example:
This returns the same weighted accuracy as if it were
calculated directly: