Principal Components


Signal

A signal is any function that holds some information. Usually, a signal is a function of time, thus, there are continuous-time signals and discrete-time signals.

Real and Complex Signals

A real signal may take its value in the set of the real numbers, while a complex signal may take its value in the set of the complex numbers.

The Complex Exponential

One of the most important signal is the complex exponential shown in the figure below. The complex exponentials are called eigen-functions for the linear and time invariant (LTI) systems because when a complex exponential is applied to the input of an LTI system, the output is a scaled version of the input.

ComplexExponential

Vectors and Signals

Signals are similar to vectors. For instance, two signals can be added as two vectors can. A vector in a 3-dimensional space can be represented as a linear combination of the unit vectors as shown below. Remember that a set of vectors is called orthonormal if the vectors are mutually orthogonal and all have unity norm.

Vector

Linear Transformation

A linear transformation in an N-dimensional space is a transformation T that holds the relation shown below for any vectors u and v, and any scalar a and b. Eigen-vectors of a linear transformation T are vectors which do not change direction when applying the transformation, that is, the vector are just scaled as shown below.

LinearTransformationSpace

Fourier Series

If x(t) is a periodic signal with period T0 and if the Dirichlet conditions are met, then x(t) can be expressed as a sum of complex exponentials as shown below. Then, the Fourier Transform is a rotation from a space where the unit vectors (or Dirac delta functions) are the basis functions to a space where the basis functions are the sine and cosine functions.

FourierSeries

Covariance Matrix

The covariance matrix is popular known as dispersion matrix. The covariance matrix generalizes the notion of variance to multiple dimensions. The elements of the main diagonal are the variances of each column. To compute the covariance, first the mean of each column is computed; second the formula shown below is used.

Covariance

Problem 1
Create a New Project called Principal with Main file only option to compute manually the covariance matrix of the data set shown the input variables are: x, and y. Add the Covariance.lab file and edit the file as shown.

input_xls

input_csv

Principal\Covariance.lab
//________________________ Load the data
Matrix trainSetIn;
trainSetIn.Load();
int rows = trainSetIn.GetRowCount();
int cols = trainSetIn.GetColCount();
int i;
int j;
int k;
//____________________ Compute the mean of each column
Vector mean;
mean.Create(cols);
double tmp;
for(j = 0; j < cols; j++)
{
     ...
}
mean.Save();
//________________________ Compute the covariance
Matrix covariance;
covariance.Create(cols, cols);
...
for(i = 0; i < rows; i++)
{
     for(j = 0; j < cols; j++)
     {
          ...
     }     
}




CovarianceRun

Problem 2
Using the data from the input.csv file create the Makeplot.lab program to be able to plot x versus y.

OriginalDataPlot

Problem 3
(a) Write a program called CheckCov.lab to verify the answer of the previous problem using the CovarianceMatrix function. (b) By looking the values on the main diagonal of the covariance matrix what variable, x or y has more information?

Principal\CheckCov.lab
Matrix input;
input.Load();
Matrix y = CovarianceMatrix(input);

CheckCovariance

Linear Transformation

A linear transformation may be applied to a variable to change its mean and variance. Some information may be lost during a linear transformation. A linear transformation is typically used to scale data.

Problem 4
The figure shows two variables Weight and Height. (a) What is bigger, the variance of the Weight, or the variance of the Height? (b) It is desired to reduce the number of variables to only one, what variable should be chosen, the Weight or the Height? (c) If the lineal transformation to the new variables x, y is applied as shown, what variable should be chosen, x or y?

LinearTransformation

Principal Components

The method of principal components is used when there are many input variables, and it is desired to the reduced the number of input variables. The method of principal components consists of keeping the variables with large variance and removing those variables with small variance. One of the main problems of using the method of the principal components is that some information is lost during the process. It is possible that the lost information may be important for the problem at hand.

Steps of the method of principal components:
  1. Compute the mean of each column
  2. Build the covariance matrix
  3. Compute the eigenvectors and eigenvalues
  4. Select only those eigenvectors that are associated with the biggest eigenvalues
  5. Transform the data

Problem 5
Write the program PrincipalComponents.lab to extract the principal of the data set in Problem 1. The EigenSystem functions returns the eigen-values and the respective eigen-vectors. The EigenSystem function returns the eigen-values sorted.

Principal\PrincipalComponents.lab
Matrix input;
input.Load();

Vector mean = input.ColsMean();
mean.Save();

//__________________ Compute Covariance
Matrix covariance = CovarianceMatrix(input);

//__________________ EigenSystem computes the eigenvector and eigenvalues
//__________________ The eigen values are returned in the last row of the matrix
Matrix eigen_vector = EigenSystem(covariance);
int rows = eigen_vector.GetRowCount();
Vector eigen_value = eigen_vector.GetRowVec(rows-1);
eigen_vector.DeleteRow(rows-1);
eigen_vector.Save();

eigen_vector

eigen_value

Tip
The set of eigen-vectors defines a transformation matrix. If all eigen-vectors are use, the transformed data will have the same number of variables (columns) as the original data set. In the previous problem, it is possible to choose those eigen-vectors associated with the biggest eigen-values. To compute the transform data, z, we must applied the transposed of the transformation matrix as shown in the next problem.

Problem 6
Write the program Transform.lab to transform using the transformation matrix of the previous problem.

Principal\Transform.lab
//______________ Load the eigen vectors
Matrix eigen_vector;
eigen_vector.Load();
//_____________ Load the original data set
Matrix input;
input.Load();
//_____________ Compute the mean
Vector mean = input.ColsMean();
//_____________ Subtracte the Mean
int rows = input.GetRowCount();
int cols = input.GetColCount();
int i;
int j;
for(i = 0; i < rows; i++)
{
     for(j = 0; j < cols; j++)
     {
          input[i][j] = input[i][j] - mean[j];
     }
}
Matrix transformed = input*eigen_vector;
Vector new_x = transformed.GetColVec(0);
Vector new_y = transformed.GetColVec(1);

XyChart chartTransformed;
chartTransformed.AddGraph(new_x, new_y, "transformed", 1, 3, 0, 255, 0);
chartTransformed.SetColorMode(2);
chartTransformed.AutoScaleX(true);
chartTransformed.AutoScaleY(true);
chartTransformed.SaveAndShow();
chartTransformed.SavePDF(0.0);

transformed

TransformedPlot

Problem 7
Write the program Reduce.lab to reduce the number of variables from two to one.

Principal\Reduce.lab
//______________ Load the eigen vectors
Matrix eigen_vector;
eigen_vector.Load();
//______________ Compute the transformation matrix
Matrix transformationMatrix = eigen_vector;
transformationMatrix.DeleteCol(1);
//_____________ Load the original data set
Matrix input;
input.Load();
//_____________ Compute the mean
Vector mean = input.ColsMean();
//_____________ Subtracte the Mean
int rows = input.GetRowCount();
int cols = input.GetColCount();
int i;
int j;
for(i = 0; i < rows; i++)
{
     for(j = 0; j < cols; j++)
     {
          input[i][j] = input[i][j] - mean[j];
     }
}
Matrix reduced = input*transformationMatrix;

Reduced

Problem 8
Suppose that a student has a database for classification with 1000 training cases and 500 validation cases. In order to reduce the problem complexity, he applied the technique of principal components. First, he extracted and used 25 characteristics to feed an ANN; he got 300 errors when using the training set, and 300 errors when using the validation set. Second, he incorporated 4 more characteristics to train the ANN; he got 310 errors when using the training set, and 329 errors when using the validation set. Provide a possible explanation of the experiment.

Tip
In some problems, important information may be hidden in the ratio among the several variables. Note that this information may be lost when using the method of principal components.

© Copyright 2000-2019 Wintempla selo. All Rights Reserved. Sep 05 2019. Home