# Principal Components

## Signal

A signal is any function that holds some information. Usually, a signal is a function of time, thus, there are continuous-time signals and discrete-time signals.

## Real and Complex Signals

A real signal may take its value in the set of the real numbers, while a complex signal may take its value in the set of the complex numbers.

## The Complex Exponential

One of the most important signal is the complex exponential shown in the figure below. The complex exponentials are called eigen-functions for the linear and time invariant (LTI) systems because when a complex exponential is applied to the input of an LTI system, the output is a scaled version of the input. ## Vectors and Signals

Signals are similar to vectors. For instance, two signals can be added as two vectors can. A vector in a 3-dimensional space can be represented as a linear combination of the unit vectors as shown below. Remember that a set of vectors is called orthonormal if the vectors are mutually orthogonal and all have unity norm. ## Linear Transformation

A linear transformation in an N-dimensional space is a transformation T that holds the relation shown below for any vectors u and v, and any scalar a and b. Eigen-vectors of a linear transformation T are vectors which do not change direction when applying the transformation, that is, the vector are just scaled as shown below. ## Fourier Series

If x(t) is a periodic signal with period T0 and if the Dirichlet conditions are met, then x(t) can be expressed as a sum of complex exponentials as shown below. Then, the Fourier Transform is a rotation from a space where the unit vectors (or Dirac delta functions) are the basis functions to a space where the basis functions are the sine and cosine functions. ## Covariance Matrix

The covariance matrix is popular known as dispersion matrix. The covariance matrix generalizes the notion of variance to multiple dimensions. The elements of the main diagonal are the variances of each column. To compute the covariance, first the mean of each column is computed; second the formula shown below is used. Problem 1 Create a New Project called Principal with Main file only option to compute manually the covariance matrix of the data set shown the input variables are: x, and y. Add the Covariance.lab file and edit the file as shown.  Principal\Covariance.lab //________________________ Load the data Matrix trainSetIn; trainSetIn.Load(); int rows = trainSetIn.GetRowCount(); int cols = trainSetIn.GetColCount(); int i; int j; int k; //____________________ Compute the mean of each column Vector mean; mean.Create(cols); double tmp; for(j = 0; j < cols; j++) {      ... } mean.Save(); //________________________ Compute the covariance Matrix covariance; covariance.Create(cols, cols); ... for(i = 0; i < rows; i++) {      for(j = 0; j < cols; j++)      {           ...      }      } Problem 2 Using the data from the input.csv file create the Makeplot.lab program to be able to plot x versus y. Problem 3 (a) Write a program called CheckCov.lab to verify the answer of the previous problem using the CovarianceMatrix function. (b) By looking the values on the main diagonal of the covariance matrix what variable, x or y has more information?

 Principal\CheckCov.lab Matrix input; input.Load(); Matrix y = CovarianceMatrix(input); ## Linear Transformation

A linear transformation may be applied to a variable to change its mean and variance. Some information may be lost during a linear transformation. A linear transformation is typically used to scale data.

 Problem 4 The figure shows two variables Weight and Height. (a) What is bigger, the variance of the Weight, or the variance of the Height? (b) It is desired to reduce the number of variables to only one, what variable should be chosen, the Weight or the Height? (c) If the lineal transformation to the new variables x, y is applied as shown, what variable should be chosen, x or y? ## Principal Components

The method of principal components is used when there are many input variables, and it is desired to the reduced the number of input variables. The method of principal components consists of keeping the variables with large variance and removing those variables with small variance. One of the main problems of using the method of the principal components is that some information is lost during the process. It is possible that the lost information may be important for the problem at hand.

Steps of the method of principal components:
1. Compute the mean of each column
2. Build the covariance matrix
3. Compute the eigenvectors and eigenvalues
4. Select only those eigenvectors that are associated with the biggest eigenvalues
5. Transform the data

 Problem 5 Write the program PrincipalComponents.lab to extract the principal of the data set in Problem 1. The EigenSystem functions returns the eigen-values and the respective eigen-vectors. The EigenSystem function returns the eigen-values sorted.

 Principal\PrincipalComponents.lab Matrix input; input.Load(); Vector mean = input.ColsMean(); mean.Save(); //__________________ Compute Covariance Matrix covariance = CovarianceMatrix(input); //__________________ EigenSystem computes the eigenvector and eigenvalues //__________________ The eigen values are returned in the last row of the matrix Matrix eigen_vector = EigenSystem(covariance); int rows = eigen_vector.GetRowCount(); Vector eigen_value = eigen_vector.GetRowVec(rows-1); eigen_vector.DeleteRow(rows-1); eigen_vector.Save();  Tip The set of eigen-vectors defines a transformation matrix. If all eigen-vectors are use, the transformed data will have the same number of variables (columns) as the original data set. In the previous problem, it is possible to choose those eigen-vectors associated with the biggest eigen-values. To compute the transform data, z, we must applied the transposed of the transformation matrix as shown in the next problem.

 Problem 6 Write the program Transform.lab to transform using the transformation matrix of the previous problem.

 Principal\Transform.lab //______________ Load the eigen vectors Matrix eigen_vector; eigen_vector.Load(); //_____________ Load the original data set Matrix input; input.Load(); //_____________ Compute the mean Vector mean = input.ColsMean(); //_____________ Subtracte the Mean int rows = input.GetRowCount(); int cols = input.GetColCount(); int i; int j; for(i = 0; i < rows; i++) {      for(j = 0; j < cols; j++)      {           input[i][j] = input[i][j] - mean[j];      } } Matrix transformed = input*eigen_vector; Vector new_x = transformed.GetColVec(0); Vector new_y = transformed.GetColVec(1); XyChart chartTransformed; chartTransformed.AddGraph(new_x, new_y, "transformed", 1, 3, 0, 255, 0); chartTransformed.SetColorMode(2); chartTransformed.AutoScaleX(true); chartTransformed.AutoScaleY(true); chartTransformed.SaveAndShow(); chartTransformed.SavePDF(0.0);  Problem 7 Write the program Reduce.lab to reduce the number of variables from two to one.

 Principal\Reduce.lab //______________ Load the eigen vectors Matrix eigen_vector; eigen_vector.Load(); //______________ Compute the transformation matrix Matrix transformationMatrix = eigen_vector; transformationMatrix.DeleteCol(1); //_____________ Load the original data set Matrix input; input.Load(); //_____________ Compute the mean Vector mean = input.ColsMean(); //_____________ Subtracte the Mean int rows = input.GetRowCount(); int cols = input.GetColCount(); int i; int j; for(i = 0; i < rows; i++) {      for(j = 0; j < cols; j++)      {           input[i][j] = input[i][j] - mean[j];      } } Matrix reduced = input*transformationMatrix; Problem 8 Suppose that a student has a database for classification with 1000 training cases and 500 validation cases. In order to reduce the problem complexity, he applied the technique of principal components. First, he extracted and used 25 characteristics to feed an ANN; he got 300 errors when using the training set, and 300 errors when using the validation set. Second, he incorporated 4 more characteristics to train the ANN; he got 310 errors when using the training set, and 329 errors when using the validation set. Provide a possible explanation of the experiment.

 Tip In some problems, important information may be hidden in the ratio among the several variables. Note that this information may be lost when using the method of principal components.