| batch_compute | Compute batches |
| big_tokenize_transform | String tokenization and transformation for big data sets |
| bytes_converter | bytes converter of a text file ( KB, MB or GB ) |
| cluster_frequency | Frequencies of an existing cluster object |
| cosine_distance | cosine distance of two character strings (each string consists of more than one words) |
| COS_TEXT | Cosine similarity for text documents |
| Count_Rows | Number of rows of a file |
| dense_2sparse | convert a dense matrix to a sparse matrix |
| dice_distance | dice similarity of words using n-grams |
| dims_of_word_vecs | dimensions of a word vectors file |
| Doc2Vec | Conversion of text documents to word-vector-representation features ( Doc2Vec ) |
| JACCARD_DICE | Jaccard or Dice similarity for text documents |
| levenshtein_distance | levenshtein distance of two words |
| load_sparse_binary | load a sparse matrix in binary format |
| matrix_sparsity | sparsity percentage of a sparse matrix |
| read_characters | read a specific number of characters from a text file |
| read_rows | read a specific number of rows from a text file |
| save_sparse_binary | save a sparse matrix in binary format |
| select_predictors | Exclude highly correlated predictors |
| sparse_Means | RowMens and colMeans for a sparse matrix |
| sparse_Sums | RowSums and colSums for a sparse matrix |
| sparse_term_matrix | Term matrices and statistics ( document-term-matrix, term-document-matrix) |
| TEXT_DOC_DISSIM | Dissimilarity calculation of text documents |
| text_file_parser | text file parser |
| text_intersect | intersection of words or letters in tokenized text |
| tokenize_transform_text | String tokenization and transformation ( character string or path to a file ) |
| tokenize_transform_vec_docs | String tokenization and transformation ( vector of documents ) |
| token_stats | token statistics |
| utf_locale | utf-locale for the available languages |
| vocabulary_parser | returns the vocabulary counts for small or medium ( xml and not only ) files |