Coursera ML(4)-Logistic Regression
in Coursera ML with 0 comment

Coursera ML(4)-Logistic Regression

in Coursera ML with 0 comment

本节笔记对应第三周Coursera课程 binary classification problem


Classification is not actually a linear function.

Classification and Representation

Hypothesis Representation

Sigmoid Function 可以使输出值范围在$(0,1)$之间。$g(z)$对应的图为:

Decision Boundary

Logistic Regression Model

Cost function for one variable hypothesis

$$Cost(h_\theta (x), y) =\begin{cases}-log(h_\theta (x)), (y = 1) \\-log(1 - h_\theta (x)), (y = 0) \\\end{cases}$$

Simplified Cost Function and Gradient Descent

Gradient Descent

Multiclass Classification: One-vs-all

Solving the Problem of Overfitting

The Problem of Overfitting

mark

address the issue of overfitting

Cost Function

The λ, or lambda, is the regularization parameter. It determines how much the costs of our theta parameters are inflated.

Regularized Linear Regression

Summary

我在这里整理一下上述两个方法,补全课程上的相关推导。

Logistic Regression Model

$h_\theta(x)$是假设函数

$$h_\theta (x) = g ( \theta^T x ) = \dfrac{1}{1 + e^{- \theta^T x}} $$
注意假设函数和真实数据之间的区别

Cost Function

$$J(\theta) = - \frac{1}{m} \sum_{i=1}^m \large[ y^{(i)}\ \log (h_\theta (x^{(i)})) + (1 - y^{(i)})\ \log (1 - h_\theta(x^{(i)}))\large]$$
回头看看上边的那个$h_\theta (x)$ ,cost function定义了训练集给出的结果 和 当前计算结果之间的差距。当然,该差距越小越好,那么需要求导一下。

Gradient Descent

这里推导一下$\frac{\partial}{\partial \theta_j} J(\theta)$:

$$\begin{align*} &\frac{\partial}{\partial \theta_j} J(\theta) = \frac{\partial}{\partial \theta_j} \frac{1}{m} \sum_{i=1}^m \large[ -y^{(i)}\ \log (h_\theta (x^{(i)})) - (1 - y^{(i)})\ \log (1 - h_\theta(x^{(i)}))\large] \newline &\ \ \ \ \ \ \ \ \ \ \ \ \ \ = \frac{1}{m} \sum_{i=1}^m \large[ -y^{(i)}\ \frac1{h_\theta(x^{(i)})}h_\theta'(x^{(i)}) - (1 - y^{(i)}) \frac{-1}{1-h_\theta(x^{(i)})}h_\theta'(x^{(i)})\large] \newline &\ \ \ \ \ \ \ \ \ \ \ \ \ \ = \frac{1}{m} \sum_{i=1}^m \large[ -y^{(i)}\ \frac1{h_\theta(x^{(i)})}h_\theta(x^{(i)})(1-h_\theta(x^{(i)}))x^{(i)} \newline &\ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ - (1 - y^{(i)}) \frac{-1}{1-h_\theta(x^{(i)})}h_\theta(x^{(i)})(1-h_\theta(x^{(i)}))x^{(i)}\large] \newline &\ \ \ \ \ \ \ \ \ \ \ \ \ \ = \frac{1}{m} \sum_{i=1}^m \large[ -y^{(i)}(1-h_\theta(x^{(i)}) x^{(i)})+(1- y)h_\theta(x^{(i)}) x^{(i)})\large] \newline &\ \ \ \ \ \ \ \ \ \ \ \ \ \ = \frac{1}{m} \sum_{i=1}^m \large[ -x^{(i)}y^{(i)}+x^{(i)}y^{(i)}h_\theta(x^{(i)}) \newline &\ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ +x^{(i)}h_\theta(x^{(i)}) - x^{(i)}y^{(i)}h_\theta(x^{(i)}) \large] \newline &\ \ \ \ \ \ \ \ \ \ \ \ \ \ = \frac{1}{m}\sum\limits_{i=1}^{m}[(h_\theta (x^{(i)}) - y^{(i)})x_j^{(i)}] \end{align*}$$

即:
$$\begin{align*} &\frac{\partial}{\partial \theta_j} J(\theta) = \frac{1}{m}\sum\limits_{i=1}^{m}[(h_\theta (x^{(i)}) - y^{(i)})x_j^{(i)}] \end{align*}$$

Solving the Problem of Overfitting

其他地方都一样,稍作修改

Cost Function

$$J(\theta) = - \frac{1}{m} \sum_{i=1}^m \large[ y^{(i)}\ \log (h_\theta (x^{(i)})) + (1 - y^{(i)})\ \log (1 - h_\theta(x^{(i)}))\large] + \frac{\lambda}{2m}\sum_{j=1}^n \theta_j^2$$

Gradient Descent

$$\begin{align*} & \text{Repeat}\ \lbrace \newline & \ \ \ \ \theta_0 := \theta_0 - \alpha\ \frac{1}{m}\ \sum_{i=1}^m (h_\theta(x^{(i)}) - y^{(i)})x_0^{(i)} \newline & \ \ \ \ \theta_j := \theta_j - \alpha\ \left[ \left( \frac{1}{m}\ \sum_{i=1}^m (h_\theta(x^{(i)}) - y^{(i)})x_j^{(i)} \right) + \frac{\lambda}{m}\theta_j \right] &\ \ \ \ \ \ \ \ \ \ j \in \lbrace 1,2...n\rbrace\newline & \rbrace \end{align*}$$


以上

Responses

From now on, bravely dream and run toward that dream.
陕ICP备17001447号·苏公网安备 32059002001895号