Deep Learning Notes

Neural Network Fundamentals

  • Artificial Neuron:

    Basic computational unit that performs:

    $$y = f(\sum_{i} w_i x_i + b)$$

    where $f$ is the activation function, $w_i$ are weights, $x_i$ are inputs, and $b$ is bias.


  • Common Activation Functions:
    • ReLU: $f(x) = \max(0, x)$
    • Sigmoid: $\sigma(x) = \frac{1}{1 + e^{-x}}$
    • Tanh: $\tanh(x) = \frac{e^x - e^{-x}}{e^x + e^{-x}}$

Optimization

  • Gradient Descent:

    Weight update rule:

    $$w = w - \eta \nabla L$$

    where $\eta$ is learning rate and $\nabla L$ is the gradient of the loss function.


  • Common Optimizers:
    • SGD with Momentum
    • Adam: Adaptive Moment Estimation
    • RMSprop: Root Mean Square Propagation

Loss Functions

  • Cross-Entropy Loss:

    $$L = -\sum_i y_i \log(\hat{y}_i)$$

    Used for classification tasks.


  • Mean Squared Error:

    $$\text{MSE} = \frac{1}{n}\sum_i(y_i - \hat{y}_i)^2$$

    Common for regression tasks.


Architectures

  • Convolutional Neural Networks (CNNs):
    • Convolution layers for feature extraction
    • Pooling layers for dimensionality reduction
    • Fully connected layers for classification

  • Recurrent Neural Networks (RNNs):
    • LSTM: Long Short-Term Memory
    • GRU: Gated Recurrent Unit

Advanced Concepts

  • Regularization Techniques:
    • Dropout
    • L1/L2 Regularization
    • Batch Normalization

  • Transfer Learning:
    • Fine-tuning pre-trained models
    • Feature extraction