with the constraint .
where denotes the largest singular value of the matrix , and is given. This problem arises when we seek to solve a least-squares problem in which the matrix is subject to an additive perturbation, and we would like to find a solution that minimizes the worst-case residual error.
G. -Ellipses. Consider points in . For a given positive number , we define the -ellipse with radius as the set of points such that the sum of the distances from to the points is equal to .
How do -ellipses look like when or ?
Express the problem of computing the geometric median, which is the point that minimizes the sum of the distances to the points , , as an SOCP in standard form.
Using CVX, write a code with input that plots the corresponding -ellipse.
H. Sparse Classification for word imaging. The image of a given query word in a given corpus of text news can be defined as a short list of other words with which this query is strongly associated. To be easily understandable, the list should be extremely short with respect to the number of terms present in the corpus.
One way to obtain a word image is to use norm penalization in a classification algorithm, where indicator of the query word's appearance in each headline is used as that headline's label/response (), and the indicators for all other words are used as predictors/features (). The few features (words) that are then predictive of the appearance of the query term correspond to that query's image in the corpus.
As explained here, we consider the following optimization problem, also called norm Support Vector Machine (L1-SVM):
where . By imposing a sparsity constraint on the weight vector, we can single out the few words that are most able to predict the presence or absence of a query word in any document. These selected words are then considered the list of words comprising the query word's image.
We look at the Word Imaging problem in a small-scale setting. Our original data is the headlines of New York Times between Jan 1 2006 and Dec 31 2006: there are 84612 headlines and 160,624 distinct words in total. To make the problem accessible to you, we preprocessed the text data and down-sampled (with special care) both the number of headlines and that of distinct words to 997 and 1045 respectively. We try to obtain the image of the query word ``Microsoft''. The label , the predictor X and the dictionary of terms dict are defined in sparseSVM.mat. Note that indicates whether ``Microsoft'' shows up in the th headline, indicates whether th headline contains th word, and is the th word.
Show how to formulate the above as an SOCP.
5. Follow the procedure from part 2 to solve this problem using the same lambdas. Discuss your results.
6. Compare the top 20 features extracted by the L1-SVM and the L2-SVM for each lambda. How do they compare across the lambdas? Which formulation makes more sense to you? Discuss.