Deep Learning Notes
Neural Network Fundamentals
-
Artificial Neuron:
Basic computational unit that performs:
$$y = f(\sum_{i} w_i x_i + b)$$
where $f$ is the activation function, $w_i$ are weights, $x_i$ are inputs, and $b$ is bias.
-
Common Activation Functions:
- ReLU: $f(x) = \max(0, x)$
- Sigmoid: $\sigma(x) = \frac{1}{1 + e^{-x}}$
- Tanh: $\tanh(x) = \frac{e^x - e^{-x}}{e^x + e^{-x}}$
Optimization
-
Gradient Descent:
Weight update rule:
$$w = w - \eta \nabla L$$
where $\eta$ is learning rate and $\nabla L$ is the gradient of the loss function.
-
Common Optimizers:
- SGD with Momentum
- Adam: Adaptive Moment Estimation
- RMSprop: Root Mean Square Propagation
Loss Functions
-
Cross-Entropy Loss:
$$L = -\sum_i y_i \log(\hat{y}_i)$$
Used for classification tasks.
-
Mean Squared Error:
$$\text{MSE} = \frac{1}{n}\sum_i(y_i - \hat{y}_i)^2$$
Common for regression tasks.
Architectures
-
Convolutional Neural Networks (CNNs):
- Convolution layers for feature extraction
- Pooling layers for dimensionality reduction
- Fully connected layers for classification
-
Recurrent Neural Networks (RNNs):
- LSTM: Long Short-Term Memory
- GRU: Gated Recurrent Unit
Advanced Concepts
-
Regularization Techniques:
- Dropout
- L1/L2 Regularization
- Batch Normalization
-
Transfer Learning:
- Fine-tuning pre-trained models
- Feature extraction