Coursera ML(4)-Logistic Regression
in Coursera ML with 0 comment

Coursera ML(4)-Logistic Regression

in Coursera ML with 0 comment
本节笔记对应第三周Coursera课程 binary classification problem

Classification is not actually a linear function.

Classification and Representation

Hypothesis Representation

$$\begin{align*}& h_\theta (x) = g ( \theta^T x ) \newline \newline& z = \theta^T x \newline& g(z) = \dfrac{1}{1 + e^{-z}}\end{align*}$$
Sigmoid Function 可以使输出值范围在$(0,1)$之间。$g(z)$对应的图为:

Decision Boundary

$$\begin{align*}& h_\theta(x) \geq 0.5 \rightarrow y = 1 \newline& h_\theta(x) < 0.5 \rightarrow y = 0 \newline\end{align*}$$

Logistic Regression Model

Cost function for one variable hypothesis

$$J(\theta) = \dfrac{1}{m} \sum_{i=1}^m \mathrm{Cost}(h_\theta(x^{(i)}),y^{(i)})$$

$$Cost(h_\theta (x), y) =\begin{cases}-log(h_\theta (x)), (y = 1) \\-log(1 - h_\theta (x)), (y = 0) \\\end{cases}$$

$$\begin{align*}& \mathrm{Cost}(h_\theta(x),y) = 0 \text{ if } h_\theta(x) = y \newline & \mathrm{Cost}(h_\theta(x),y) \rightarrow \infty \text{ if } y = 0 \; \mathrm{and} \; h_\theta(x) \rightarrow 1 \newline & \mathrm{Cost}(h_\theta(x),y) \rightarrow \infty \text{ if } y = 1 \; \mathrm{and} \; h_\theta(x) \rightarrow 0 \newline \end{align*}$$

Simplified Cost Function and Gradient Descent

$$\mathrm{Cost}(h_\theta(x),y) = - y \; \log(h_\theta(x)) - (1 - y) \log(1 - h_\theta(x))$$

$$J(\theta) = - \frac{1}{m} \displaystyle \sum_{i=1}^m [y^{(i)}\log (h_\theta (x^{(i)})) + (1 - y^{(i)})\log (1 - h_\theta(x^{(i)}))]$$

Gradient Descent

$$\begin{align*}& Repeat \; \lbrace \newline & \; \theta_j := \theta_j - \alpha \dfrac{\partial}{\partial \theta_j}J(\theta) \newline & \rbrace\end{align*}$$

$$\frac{\partial}{\partial \theta_j} J(\theta) = \frac{1}{m}\sum\limits_{i=1}^{m}[(h_\theta (x^{(i)}) - y^{(i)})x_j^{(i)}]$$

$$\begin{align*} & Repeat \; \lbrace \newline & \; \theta_j := \theta_j - \frac{\alpha}{m} \sum_{i=1}^m (h_\theta(x^{(i)}) - y^{(i)}) x_j^{(i)} \newline & \rbrace \end{align*}$$

Multiclass Classification: One-vs-all

Solving the Problem of Overfitting

The Problem of Overfitting

mark

address the issue of overfitting

Cost Function

$$min_\theta\ \dfrac{1}{2m}\ \sum_{i=1}^m (h_\theta(x^{(i)}) - y^{(i)})^2 + \lambda\ \sum_{j=1}^n \theta_j^2$$

The λ, or lambda, is the regularization parameter. It determines how much the costs of our theta parameters are inflated.

Regularized Linear Regression

$$\begin{align*} & \text{Repeat}\ \lbrace \newline & \ \ \ \ \theta_0 := \theta_0 - \alpha\ \frac{1}{m}\ \sum_{i=1}^m (h_\theta(x^{(i)}) - y^{(i)})x_0^{(i)} \newline & \ \ \ \ \theta_j := \theta_j - \alpha\ \left[ \left( \frac{1}{m}\ \sum_{i=1}^m (h_\theta(x^{(i)}) - y^{(i)})x_j^{(i)} \right) + \frac{\lambda}{m}\theta_j \right] &\ \ \ \ \ \ \ \ \ \ j \in \lbrace 1,2...n\rbrace\newline & \rbrace \end{align*}$$

$$\begin{align*}& \theta = \left( X^TX + \lambda \cdot L \right)^{-1} X^Ty \newline& \text{where}\ \ L = \begin{bmatrix} 0 & & & & \newline & 1 & & & \newline & & 1 & & \newline & & & \ddots & \newline & & & & 1 \newline\end{bmatrix}\end{align*}$$

Summary

我在这里整理一下上述两个方法,补全课程上的相关推导。

Logistic Regression Model

$h_\theta(x)$是假设函数

$$h_\theta (x) = g ( \theta^T x ) = \dfrac{1}{1 + e^{- \theta^T x}} $$
注意假设函数和真实数据之间的区别

Cost Function

$$J(\theta) = - \frac{1}{m} \sum_{i=1}^m \large[ y^{(i)}\ \log (h_\theta (x^{(i)})) + (1 - y^{(i)})\ \log (1 - h_\theta(x^{(i)}))\large]$$
回头看看上边的那个$h_\theta (x)$ ,cost function定义了训练集给出的结果 和 当前计算结果之间的差距。当然,该差距越小越好,那么需要求导一下。

Gradient Descent

$$\theta_j := \theta_j - \alpha \dfrac{\partial}{\partial \theta_j}J(\theta)$$

$$\frac{\partial}{\partial \theta_j} J(\theta) = \frac{1}{m}\sum\limits_{i=1}^{m}[(h_\theta (x^{(i)}) - y^{(i)})x_j^{(i)}]$$

$$\theta_j := \theta_j - \frac{\alpha}{m} \sum_{i=1}^m (h_\theta(x^{(i)}) - y^{(i)}) x_j^{(i)} $$

这里推导一下$\frac{\partial}{\partial \theta_j} J(\theta)$:

$$\begin{align*} &h_\theta'(x) = ( \frac1{1+e^{- \theta x}})'\newline &\ \ \ \ \ \ \ \ = \frac{e^{- \theta x}x}{1+e^{- \theta x}}\newline &\ \ \ \ \ \ \ \ = \frac{1+e^{- \theta x}-1}{(1+e^{- \theta x})^2}x\newline &\ \ \ \ \ \ \ \ = \large[\frac{1}{1+e^{- \theta x}}-\frac{1}{(1+e^{- \theta x})^2}\large]x\newline &\ \ \ \ \ \ \ \ = h_\theta(x)(1-h_\theta(x))x \end{align*}$$

$$\begin{align*} &\frac{\partial}{\partial \theta_j} J(\theta) = \frac{\partial}{\partial \theta_j} \frac{1}{m} \sum_{i=1}^m \large[ -y^{(i)}\ \log (h_\theta (x^{(i)})) - (1 - y^{(i)})\ \log (1 - h_\theta(x^{(i)}))\large] \newline &\ \ \ \ \ \ \ \ \ \ \ \ \ \ = \frac{1}{m} \sum_{i=1}^m \large[ -y^{(i)}\ \frac1{h_\theta(x^{(i)})}h_\theta'(x^{(i)}) - (1 - y^{(i)}) \frac{-1}{1-h_\theta(x^{(i)})}h_\theta'(x^{(i)})\large] \newline &\ \ \ \ \ \ \ \ \ \ \ \ \ \ = \frac{1}{m} \sum_{i=1}^m \large[ -y^{(i)}\ \frac1{h_\theta(x^{(i)})}h_\theta(x^{(i)})(1-h_\theta(x^{(i)}))x^{(i)} \newline &\ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ - (1 - y^{(i)}) \frac{-1}{1-h_\theta(x^{(i)})}h_\theta(x^{(i)})(1-h_\theta(x^{(i)}))x^{(i)}\large] \newline &\ \ \ \ \ \ \ \ \ \ \ \ \ \ = \frac{1}{m} \sum_{i=1}^m \large[ -y^{(i)}(1-h_\theta(x^{(i)}) x^{(i)})+(1- y)h_\theta(x^{(i)}) x^{(i)})\large] \newline &\ \ \ \ \ \ \ \ \ \ \ \ \ \ = \frac{1}{m} \sum_{i=1}^m \large[ -x^{(i)}y^{(i)}+x^{(i)}y^{(i)}h_\theta(x^{(i)}) \newline &\ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ +x^{(i)}h_\theta(x^{(i)}) - x^{(i)}y^{(i)}h_\theta(x^{(i)}) \large] \newline &\ \ \ \ \ \ \ \ \ \ \ \ \ \ = \frac{1}{m}\sum\limits_{i=1}^{m}[(h_\theta (x^{(i)}) - y^{(i)})x_j^{(i)}] \end{align*}$$

即:
$$\begin{align*} &\frac{\partial}{\partial \theta_j} J(\theta) = \frac{1}{m}\sum\limits_{i=1}^{m}[(h_\theta (x^{(i)}) - y^{(i)})x_j^{(i)}] \end{align*}$$

Solving the Problem of Overfitting

其他地方都一样,稍作修改

Cost Function

$$J(\theta) = - \frac{1}{m} \sum_{i=1}^m \large[ y^{(i)}\ \log (h_\theta (x^{(i)})) + (1 - y^{(i)})\ \log (1 - h_\theta(x^{(i)}))\large] + \frac{\lambda}{2m}\sum_{j=1}^n \theta_j^2$$

Gradient Descent

$$\begin{align*} & \text{Repeat}\ \lbrace \newline & \ \ \ \ \theta_0 := \theta_0 - \alpha\ \frac{1}{m}\ \sum_{i=1}^m (h_\theta(x^{(i)}) - y^{(i)})x_0^{(i)} \newline & \ \ \ \ \theta_j := \theta_j - \alpha\ \left[ \left( \frac{1}{m}\ \sum_{i=1}^m (h_\theta(x^{(i)}) - y^{(i)})x_j^{(i)} \right) + \frac{\lambda}{m}\theta_j \right] &\ \ \ \ \ \ \ \ \ \ j \in \lbrace 1,2...n\rbrace\newline & \rbrace \end{align*}$$


以上

Responses

From now on, bravely dream and run toward that dream.
陕ICP备17001447号