4.5 C
New York
Sunday, April 13, 2025

Predict fraud with self -chirers and keras


Normal description

On this publication, we’ll practice a Self -Doder to detect bank card fraud. We may also exhibit tips on how to practice the keras fashions within the cloud utilizing Cloud.

The premise of our mannequin would be the Kaggy Bank card fraud detection knowledge set, which was collected throughout a Worldline analysis collaboration and the Computerized Studying Group of ULB (Bruxelles Free College) on Large Knowledge mining and fraud detection.

The info set incorporates bank card transactions by European card holders made for a interval of two days in September 2013. There are 492 fraud of 284,807 transactions. The info set is extremely unbalanced, the constructive class (fraud) represents solely 0.172% of all transactions.

Studying the info

After downloading the info from KaggyYou may learn it in R with read_csv():

library(readr)
df <- read_csv("data-raw/creditcard.csv", col_types = record(Time = col_number()))

The enter variables include solely numerical values ​​which are the results of a PCA transformation. To protect confidentiality, no extra data was supplied concerning the authentic traits. The traits V1, …, V28 have been obtained with PCA. Nonetheless, there are 2 options (Time and Quantity) that they didn’t remodel.
Time They’re the seconds between every transaction and the primary transaction within the knowledge set. Quantity It’s the quantity of the transaction and could possibly be used for value -sensitive studying. He Class The variable takes worth 1 in case of fraud and 0 in any other case.

Self -chirers

Since solely 0.172% of the observations are fraud, we’ve got a extremely unbalanced classification drawback. With the sort of drawback, conventional classification approaches normally don’t work very properly as a result of we solely have a really small pattern of the category extra not often.

A Self -chire It’s a neuronal community that’s used to study a illustration (coding) for a knowledge set, usually with the aim of decreasing dimensionality. For this drawback, we’ll practice a self -defoder to encode non -fraudulent observations of our coaching set. Since fraud are alleged to have a unique distribution than regular transactions, we hope that our self -coexist has larger reconstruction errors in fraud than in regular transactions. Because of this we will use the reconstruction error as an quantity that signifies whether or not a transaction is fraudulent or not.

If you wish to get hold of extra details about self -employers, a very good start line is that this Larochelle video on YouTube and Chapter 14 from Deep studying Goodfellow et al ebook.

Show

To ensure that a self -chire to work properly, we’ve got a strong preliminary assumption: that the distribution of variables for regular transactions is completely different from the distribution for fraudulent. Let’s do some plots to confirm this. The variables turned a (0,1) interval to hint.

We are able to see that the distributions of variables for fraudulent transactions are very completely different from regular, apart from the Time variable, which appears to have precisely the identical distribution.

Preprocessing

Earlier than the modeling steps we have to make some preprocess. We’ll divide the practice set and take a look at units after which we’ll MIN-MAX Normalize Our knowledge (that is accomplished as a result of neural networks work significantly better with small enter values). We may also eradicate the Time variable because it has precisely the identical distribution for regular and fraudulent transactions.

Based mostly on it Time Variable We’ll use the primary 200,000 observations for coaching and the remaining for the exams. It is a good apply as a result of utilizing the mannequin we need to predict future fraud primarily based on transactions that occurred earlier than.

Now let’s work within the standardization of tickets. We create 2 capabilities to assist us. The primary obtains descriptive statistics on the info set used for the dimensions. Then we’ve got a operate to carry out the MIN-Max scale. It is very important remember that we apply the identical normalization constants for coaching and take a look at units.

library(purrr)

#' Will get descriptive statistics for each variable within the dataset.
get_desc <- operate(x) {
  map(x, ~record(
    min = min(.x),
    max = max(.x),
    imply = imply(.x),
    sd = sd(.x)
  ))
} 

#' Given a dataset and normalization constants it would create a min-max normalized
#' model of the dataset.
normalization_minmax <- operate(x, desc) {
  map2_dfc(x, desc, ~(.x - .y$min)/(.y$max - .y$min))
}

Now we imagine standardized variations of our knowledge units. We additionally remodel our knowledge frames into matrices, since that is the format anticipated by keras.

Now we’ll outline our mannequin in Keras, a symmetrical self -chire with 4 dense layers.

library(keras)
mannequin <- keras_model_sequential()
mannequin %>%
  layer_dense(items = 15, activation = "tanh", input_shape = ncol(x_train)) %>%
  layer_dense(items = 10, activation = "tanh") %>%
  layer_dense(items = 15, activation = "tanh") %>%
  layer_dense(items = ncol(x_train))

abstract(mannequin)
___________________________________________________________________________________
Layer (sort)                         Output Form                     Param #      
===================================================================================
dense_1 (Dense)                      (None, 15)                       450          
___________________________________________________________________________________
dense_2 (Dense)                      (None, 10)                       160          
___________________________________________________________________________________
dense_3 (Dense)                      (None, 15)                       165          
___________________________________________________________________________________
dense_4 (Dense)                      (None, 29)                       464          
===================================================================================
Whole params: 1,239
Trainable params: 1,239
Non-trainable params: 0
___________________________________________________________________________________

Then we’ll compile our mannequin, utilizing the common lack of error to the sq. and the ADAM optimizer for coaching.

mannequin %>% compile(
  loss = "mean_squared_error", 
  optimizer = "adam"
)

Coaching the mannequin

Now we will practice our mannequin utilizing the match() operate. Coaching the mannequin in all fairness quick (~ 14s per time on my laptop computer). We’ll solely feed on our mannequin the observations of regular transactions (non -fraudulent).

We’ll use callback_model_checkpoint() To avoid wasting our mannequin after every period. When the argument passes save_best_only = TRUE We’ll preserve the time solely with the smallest loss worth within the take a look at set. We may also use callback_early_stopping() To cease coaching if the lack of validation stops reducing for five occasions.

checkpoint <- callback_model_checkpoint(
  filepath = "mannequin.hdf5", 
  save_best_only = TRUE, 
  interval = 1,
  verbose = 1
)

early_stopping <- callback_early_stopping(endurance = 5)

mannequin %>% match(
  x = x_train(y_train == 0,), 
  y = x_train(y_train == 0,), 
  epochs = 100, 
  batch_size = 32,
  validation_data = record(x_test(y_test == 0,), x_test(y_test == 0,)), 
  callbacks = record(checkpoint, early_stopping)
)
Prepare on 199615 samples, validate on 84700 samples
Epoch 1/100
199615/199615 (==============================) - 17s 83us/step - loss: 0.0036 - val_loss: 6.8522e-04d from inf to 0.00069, saving mannequin to mannequin.hdf5
Epoch 2/100
199615/199615 (==============================) - 17s 86us/step - loss: 4.7817e-04 - val_loss: 4.7266e-04d from 0.00069 to 0.00047, saving mannequin to mannequin.hdf5
Epoch 3/100
199615/199615 (==============================) - 19s 94us/step - loss: 3.7753e-04 - val_loss: 4.2430e-04d from 0.00047 to 0.00042, saving mannequin to mannequin.hdf5
Epoch 4/100
199615/199615 (==============================) - 19s 94us/step - loss: 3.3937e-04 - val_loss: 4.0299e-04d from 0.00042 to 0.00040, saving mannequin to mannequin.hdf5
Epoch 5/100
199615/199615 (==============================) - 19s 94us/step - loss: 3.2259e-04 - val_loss: 4.0852e-04 enhance
Epoch 6/100
199615/199615 (==============================) - 18s 91us/step - loss: 3.1668e-04 - val_loss: 4.0746e-04 enhance
...

After coaching, we will get hold of the ultimate loss for the established take a look at utilizing the consider() Fucntion.

loss <- consider(mannequin, x = x_test(y_test == 0,), y = x_test(y_test == 0,))
loss
        loss 
0.0003534254 

Tuning with cloudml

We might get hold of higher outcomes by adjusting the hyperparameters of our mannequin. We are able to tune in, for instance, the normalization operate, the educational fee, the activation capabilities and the scale of the hidden layers. Cloudml makes use of Bayesian optimization to tune within the hyperparameters of the fashions as described in This weblog submit.

We are able to use the Cloudml bundle To regulate our mannequin, however we first want to organize our challenge making a Coaching flag For every hyperparameter and a tuning.yml Archive that may inform Cloudml what parameters we need to tune in and the way.

The entire script used for cloudml coaching could be present in https://github.com/dfalbel/fraud-utoencoder-example. An important modifications to the code have been so as to add the coaching flags:

FLAGS <- flags(
  flag_string("normalization", "minmax", "One among minmax, zscore"),
  flag_string("activation", "relu", "One among relu, selu, tanh, sigmoid"),
  flag_numeric("learning_rate", 0.001, "Optimizer Studying Price"),
  flag_integer("hidden_size", 15, "The hidden layer dimension")
)

Then we use the FLAGS variable inside the script to spice up the hyperparameters of the mannequin, for instance:

mannequin %>% compile(
  optimizer = optimizer_adam(lr = FLAGS$learning_rate), 
  loss = 'mean_squared_error',
)

We additionally create a tuning.yml Archive that describes how hyperparameters ought to range throughout coaching, in addition to what metric we wished to optimize (on this case it was the lack of validation: val_loss).

Tuning.yml

trainingInput:
  scaleTier: CUSTOM
  masterType: standard_gpu
  hyperparameters:
    aim: MINIMIZE
    hyperparameterMetricTag: val_loss
    maxTrials: 10
    maxParallelTrials: 5
    params:
      - parameterName: normalization
        sort: CATEGORICAL
        categoricalValues: (zscore, minmax)
      - parameterName: activation
        sort: CATEGORICAL
        categoricalValues: (relu, selu, tanh, sigmoid)
      - parameterName: learning_rate
        sort: DOUBLE
        minValue: 0.000001
        maxValue: 0.1
        scaleType: UNIT_LOG_SCALE
      - parameterName: hidden_size
        sort: INTEGER
        minValue: 5
        maxValue: 50
        scaleType: UNIT_LINEAR_SCALE

We describe the kind of machine we need to use (on this case a standard_gpu occasion), the metric that we need to reduce whereas tune in, and the utmost variety of exams (that’s, the variety of mixtures of hyperparameters that we need to attempt). Then we specify how we need to range each hyperparameter throughout the adjustment.

You will get extra details about the Tuning.yml file Within the tensorflow for the R Documentation R and Google’s official documentation in Cloudml.

Now we’re able to ship the work to Google Cloudml. We are able to do that operating:

library(cloudml)
cloudml_train("practice.R", config = "tuning.yml")

The cloudml bundle is answerable for loading the info set and putting in any dependence on the R bundle required to run the cloudml script. If you’re utilizing V1.1 or larger, additionally, you will let you monitor your work in a background terminal. You may as well monitor your work utilizing the Google Cloud console.

After the work ends, we will accumulate the outcomes of the work with:

This can copy the work recordsdata with the perfect val_loss Efficiency in Cloudml to your native system and open a report that summarizes coaching execution.

Since we use a name return to avoid wasting mannequin management factors throughout coaching, the mannequin file was additionally copied from Google Cloudml. The recordsdata created throughout coaching are copied to the “executed” subdirectory of the work board from which cloudml_train() known as. You may decide this listing for the newest execution with:

(1) runs/cloudml_2018_01_23_221244595-03

You may as well record all earlier executions and their validation losses with:

ls_runs(order = metric_val_loss, reducing = FALSE)
                    run_dir metric_loss metric_val_loss
1 runs/2017-12-09T21-01-11Z      0.2577          0.1482
2 runs/2017-12-09T21-00-11Z      0.2655          0.1505
3 runs/2017-12-09T19-59-44Z      0.2597          0.1402
4 runs/2017-12-09T19-56-48Z      0.2610          0.1459

Use View(ls_runs()) to view all columns

In our case, Cloudml’s work was stored for runs/cloudml_2018_01_23_221244595-03/So the saved mannequin file is accessible in runs/cloudml_2018_01_23_221244595-03/mannequin.hdf5. Now we will use our tuned mannequin to make predictions.

Making predictions

Now that we practice and tune in our mannequin, we’re able to generate predictions with our self -default. We have an interest within the MSE for every statement and we hope that fraudulent transactions observations have larger MSE.

First, we load our mannequin.

mannequin <- load_model_hdf5("runs/cloudml_2018_01_23_221244595-03/mannequin.hdf5", 
                         compile = FALSE)

Now let’s calculate the MSE for coaching observations and take a look at set.

pred_train <- predict(mannequin, x_train)
mse_train <- apply((x_train - pred_train)^2, 1, sum)

pred_test <- predict(mannequin, x_test)
mse_test <- apply((x_test - pred_test)^2, 1, sum)

An excellent measure of mannequin efficiency in extremely unbalanced knowledge units is the world underneath the ROC (AUC) curve. AUC has a very good interpretation for this drawback, it’s the chance {that a} fraudulent transaction has a better than a standard MSE. We are able to calculate this utilizing the Metrics Package deal, which implements all kinds of efficiency metrics of the frequent computerized studying mannequin.

(1) 0.9546814
(1) 0.9403554

To make use of the mannequin in apply to make predictions, we have to discover a threshold (Okay ) For the MSE, then sure (MSE> Okay ) We think about that the transaction is a fraud (in any other case, we think about it regular). To outline this worth, it’s helpful to have a look at precision and reminiscence whereas the edge varies (Okay ).

possible_k <- seq(0, 0.5, size.out = 100)
precision <- sapply(possible_k, operate(ok) {
  predicted_class <- as.numeric(mse_test > ok)
  sum(predicted_class == 1 & y_test == 1)/sum(predicted_class)
})

qplot(possible_k, precision, geom = "line") 
  + labs(x = "Threshold", y = "Precision")

recall <- sapply(possible_k, operate(ok) {
  predicted_class <- as.numeric(mse_test > ok)
  sum(predicted_class == 1 & y_test == 1)/sum(y_test)
})
qplot(possible_k, recall, geom = "line") 
  + labs(x = "Threshold", y = "Recall")

An excellent start line can be to decide on the edge with most precision, however we might additionally rely our determination on how a lot cash we might lose fraudulent transactions.

Suppose that every guide fraud verification prices US $ 1, but when we don’t confirm a transaction and it’s a fraud, we’ll lose this quantity of the transaction. Let’s discover for every worth threshold how a lot cash we might lose.

cost_per_verification <- 1

lost_money <- sapply(possible_k, operate(ok) {
  predicted_class <- as.numeric(mse_test > ok)
  sum(cost_per_verification * predicted_class + (predicted_class == 0) * y_test * df_test$Quantity) 
})

qplot(possible_k, lost_money, geom = "line") + labs(x = "Threshold", y = "Misplaced Cash")

We are able to discover the perfect threshold on this case with:

(1) 0.005050505

If we have to manually confirm all fraud, it might value us ~ $ 13,000. Utilizing our mannequin we will cut back this to ~ $ 2,500.

Related Articles

Latest Articles