in which the data matrix is noisy. Our specific noise model assumes that each row has the form , where the noise vector has zero mean and covariance matrix , with a measure of the size of the noise. Therefore, now the matrix is a function of the uncertain vector , which we denote by . We will write to denote the matrix with rows , . We replace the original problem with
where denotes the expected value with respect to the random variable . Show that this problem can be written as
where is some regularization parameter, which you will determine. That is, regularized least-squares can be interpreted as a way to take into account uncertainties in the matrix , in the expected value sense. Hint: compute the expected value of , for a specific row index .
B. The figure shows the number of transistors in 13 microprocessors as a function the year of their introduction. The plot suggests that we can obtain a good fit with a function of the form , where is the year, is the number of transistors at year , and and are model parameters. This model results in a straight line if we plot on a logarithmic scale versus on a linear scale, as is done in the figure.
Moore's law describes a long-term trend in the history of computing hardware, and states that the number of transistors that can be placed inexpensively on an integrated circuit has doubled approximately every two years. In this problem, we investigate the validity of the claim via least-squares.
Using the problem data, show how to estimate the parameters using least-squares, that is, via a problem of the form Make sure to define precisely the data and how the variable relates to the original problem parameters . (Use the notations for the number of processors, and for the corresponding years. You can assume that no component of is zero at optimum.)
Is the solution to problem above unique? Justify carefully your answer, and give the expression for the unique solution in terms of .
The solution to the problem yields , . Is this estimate consistent with the so-called Moore's law, which states that the number of transistors per integrated circuit roughly doubles every two years?
where , , are parameters.
D. Least norm estimation on traffic flow networks.
You want to estimate the traffic (in San Francisco for example, but we’ll start with a smaller example). You know the road network as well as the historical average of flows on each road segment.
Example of traffic estimation problem. The intersections are labeled a to h. The road segments are labeled 1 to 22. The arrows indicate the direction of traffic.
|Table of flows: historical averages q (center column), and some measured flows (right column).|
The goal of the estimation is to estimate the traffic flow on each of the road segment. The flow estimates should satisfy the conservation of vehicles exactly at each intersection. Among the solutions that satisfy this constraint, we are searching for the estimate that is the closest to the historical average, q, in the l2-norm sense. The vector q has size I and the i-th element represent the average for the road segment i. Pose the optimization problem.
Explain how to solve this problem mathematically. Detail your answer (do not only give a formula but explain where it comes from).
Formulate the problem for the small example of the figure above and solve it using the historical average given in the Table of Flows above. What is the flow that you estimate on road segments 1, 3, 6, 15 and 22?
Now, assume that besides the historical averages, you are also given some flow measurements on some of the road segments of the network. You assume that these flow measurements are correct and want your estimate of the flow to match these measurements perfectly (besides matching the conservation of vehicles of course). The right column of Table 1 lists the road segments for which we have such flow measurements. Do you estimate a different flow on some of the links? Give the difference in flow you estimate for road segments 1,3, 6, 15 and 22. Also check that you estimate gives you the measured flow on the road segments for which you have measured the flow.