Summaries/Courses/GANs_Specialization/1_Build_Basic_GANs/Week_2:Deep_Convolutional_GANs/2_gan.md
2023-05-19 23:38:37 +02:00

7.2 KiB
Raw Permalink Blame History

Batch Normalization

covariate shift

Happens particulair when data has been collected over long period of time. Source: Covariate Shift Covariate shift means that the distributions of some variables are dependent on another.

dataset shift (or drifting)

Basically, in the real world, dataset shift mainly occurs because of the change of environments (non-stationary environment), where the environment can be referred as location, time, etc.

Types of Dataset Shift

  1. Shift in the independent variables (Covariate Shift)
  2. Shift in the target variable (Prior probability shift)
  3. Shift in the relationship between the independent and the target variable (Concept Shift)

Covariate shift refers to the change in the distribution of the input variables present in the training and the test data.

The basic idea to identify shift If there exists a shift in the dataset, then on mixing the train and test file, you should still be able to classify an instance of the mixed dataset as train or test with reasonable accuracy. Not possible for all variable

Steps to identify drift

The basic steps that we will follow are:

  1. Preprocessing: This step involves imputing all missing values and label encoding of all categorical variables.
  2. Creating a random sample of your training and test data separately and adding a new feature origin which has value train or test depending on whether the observation comes from the training dataset or the test dataset.
  3. Now combine these random samples into a single dataset. Note that the shape of both the samples of training and test dataset should be nearly equal, otherwise it can be a case of an unbalanced dataset.
  4. Now create a model taking one feature at a time while having origin as the target variable on a part of the dataset (say ~75%).
  5. Now predict on the rest part(~25%) of the dataset and calculate the value of AUC-ROC.
  6. Now if the value of AUC-ROC for a particular feature is greater than 0.80, we classify that feature as drifting.

Treatment

There are different techniques by which we can treat these features in order to improve our model. Let us discuss some of them.

  1. Dropping of drifting features (might result in some loss of information) Features having a drift value greater than 0.8 and are not important in our model, we drop them.
  2. Importance weight using Density Ratio Estimation (very time-consuming task)

Normalization (solution for covariate shift)

Normalization and effects normalized, meaning, the distribution of the new input variables x1 prime and x2 prime will be much more similar with say means equal to 0 and a standard deviation equal to 1. Then the cost function will also look smoother and more balanced across these two dimensions. And as a result training would actually be much easier and potentially much faster.

  • Training data uses batch stats
  • Test data uses training stats

So normalization of input variables, smoothing that cost function out in reducing the covariate shift.

However, covariate shift shouldn't be a problem if you just make sure that the distribution of your data set is similar to the task your modeling. So, the test set is similar to your training site in terms of how it's distributed.

Internal Covariate Shift covariate shift in the internal hidden layers Internal covariate shift Batch normalization is also a solution for internal covariate shift

Batch Normalization (Procedure)

Batch Nornalization Training

Beta = shift factor Gamma = scale factor Both learned during training

After completely normalize things to z-hat, then rescale them based on these learned values, Gamma and Beta. This is the primary difference between normalization of inputs and batch normalization.

What's key here is that batch normalization gives you control over what that distribution will look like moving forward in the neural network, and this final value after the shifting and scaling will be called y. This y is what then goes into this activation function.

During testing, what you use is the running mean and standard deviation that was computed over the entire training set, and these values are now fixed after training, they don't move.

Frameworks like TensorFlow and PyTorch keep track of these statistics for you. All you have to do is create a layer called batch norm, and then when your model is put into the test mode, the running statistics will be computed over the whole data set for you or saved for you

Summary:

  • Batch norm introduces learnable shift and scale factors (you don't force the target distribution to have a zero mean and a standard deviation of one)
  • the batch mean and standard deviation during training and the running statistics (that was computed over the entire training set) for testing. The running values are fixed after training.
  • Frameworks take care of the whole process (training and testing)

Convolutions

Convolution allows you to detect key features in different areas of an image. Convolutions

Convolutions? A convolution is just a series of sums of those element-wise products across your entire image.

Padding and Stride

Stride

Padding Padding is a frame put around images, in order to give the same importance to the pixels at the edges of the images, as the ones in the center.

Pooling and Upsampling

  • Pooling is used to lower the dimension of the input images by taking the mean or finding the maximum value of different areas.

It'll be much less expensive to do computations on this pooled layer than it is on this original image. Pooling is really just trying to distill that information. Max Pooling What max pooling is doing here is that it's getting the most salient information from this image, which are these really high values here. Pooling doesn't have any learnable parameters, what is different from convolutions

up-sampling has a goal of outputting one that has higher resolution. It actually requires inferring values for the additional pixels.

Upsampling

Nearest neighbors up-sampling Alt text and then Alt text up-sampling layers don't have any learnable parameters

Transposed Convolutions (upsampling technique with learnable filter)

The center pixel in the output is influenced by all the values in the input, while the corners are influenced by just one value. This center pixel causes a common issue that arises when using transposed convolutions. The output has a checkerboard problem. Using upsampling followed by convolution is becoming a more popular technique now to avoid this checkerboard problem.

Reading: Deconvolution and Checkerboard Artifacts