Deep Learning Notes

Neural Network Fundamentals

Artificial Neuron:
Basic computational unit that performs:

$$y = f(\sum_{i} w_i x_i + b)$$

where $f$ is the activation function, $w_i$ are weights, $x_i$ are inputs, and $b$ is bias.

Common Activation Functions:
- ReLU: $f(x) = \max(0, x)$
- Sigmoid: $\sigma(x) = \frac{1}{1 + e^{-x}}$
- Tanh: $\tanh(x) = \frac{e^x - e^{-x}}{e^x + e^{-x}}$

Optimization

Gradient Descent:
Weight update rule:

$$w = w - \eta \nabla L$$

where $\eta$ is learning rate and $\nabla L$ is the gradient of the loss function.

Common Optimizers:
- SGD with Momentum
- Adam: Adaptive Moment Estimation
- RMSprop: Root Mean Square Propagation

Loss Functions

Cross-Entropy Loss:
$$L = -\sum_i y_i \log(\hat{y}_i)$$

Used for classification tasks.

Mean Squared Error:
$$\text{MSE} = \frac{1}{n}\sum_i(y_i - \hat{y}_i)^2$$

Common for regression tasks.

Architectures

Convolutional Neural Networks (CNNs):
- Convolution layers for feature extraction
- Pooling layers for dimensionality reduction
- Fully connected layers for classification

Recurrent Neural Networks (RNNs):
- LSTM: Long Short-Term Memory
- GRU: Gated Recurrent Unit

Advanced Concepts

Regularization Techniques:
- Dropout
- L1/L2 Regularization
- Batch Normalization

Transfer Learning:
- Fine-tuning pre-trained models
- Feature extraction