Foundation of LASSO Machine Learning/LASSO機械学習の基礎

When performing a multiple regression analysis, let us consider not only the minimization of the error function defined in linear regression such as the least squares method, but also linear regression with the error function of the least squares method and a regularization term (penalty) added. The error function \(\displaystyle L\) is as follows:
\(\displaystyle L = \sum_{i=1}^{N} \left( y_{i} – \hat{y}_{i} \right)^{2} \).
Here, the \(\displaystyle i\)th measured sample data is \(\displaystyle y_{i}\), the \(\displaystyle i\)th predicted sample data is \(\displaystyle \hat{y}_{i}\), and the total number of sample data is \(\displaystyle N \).

The loss function \(\displaystyle E\) of the LASSO regression is as follows:
\(\displaystyle E = L + \lambda\sum_{j=1}^{K}\left|\beta_{j}\right|\).
The loss function contains the error function of the least squares method and the regularization term. Note that \(\displaystyle \lambda\) is a parameter which determines the size of regularization set by the data analyst, and that \(\displaystyle K\) is the number of explanatory variables, and that \(\displaystyle \beta_{j}\) is the \(\displaystyle j\)th partial regression coefficients. In this way, we write as follows: the loss function contains the error function of the least squares method and the regularization term.

By minimizing the loss function, we construct a method with the following property.
・LASSO regression: removing unnecessary explanatory variables by setting the weights (partial regression coefficients) of many explanatory variables to zero.
The LASSO regression is used to reduce unnecessary explanatory variables.

重回帰分析を行うとき，最小二乗法のような線形回帰による誤差関数の最小化だけではなく，正則化項（ペナルティー）を付与した線形回帰を考えよう．誤差関数\(\displaystyle L\)は
\(\displaystyle L = \sum_{i=1}^{N} \left( y_{i} – \hat{y}_{i} \right)^{2} \)
と表される．ここで，\(\displaystyle i\)番目のサンプルの実測値を\(\displaystyle y_{i}\)とし，予測値を\(\displaystyle \hat{y}_{i}\)と定義する．また全サンプルデータ数を\(\displaystyle N \)とする．

誤差関数\(\displaystyle L\)に正則化項を付与した判定式である損失関数\(\displaystyle E\)は
\(\displaystyle E = L + \lambda\sum_{j=1}^{K}\left|\beta_{j}\right|\)
と書き表される．ここで，\(\displaystyle \lambda\)はデータ解析者が設定する正則化の大きさを決定づけるパラメータであり，\(\displaystyle K\)は説明変数の数であり，\(\displaystyle \beta_{j}\)は\(\displaystyle j\)番目の偏回帰係数である．このように線形回帰の最小二乗法の誤差関数に偏回帰係数の絶対値の和を追加した損失関数を定義した．

この損失関数を最小化することで，
・LASSO回帰：多くの説明変数の重み（偏回帰係数）をゼロにして不要な説明変数を削除する
という手法構築を行うことができる．LASSO回帰は不要な説明変数の削減に使用されている．