Zakaria Patel

Derivative of the Cross-Entropy Loss

A quick derivation of the CE loss with a Softmax activation.

Quantization

Compressing neural network weights for efficient inference.

Decoder-Only Transformers

A look into the architecture that powers most LLMs.

More articles »

Zakaria Patel