Disclaimer: This article was generated by artificial intelligence.
Beyond Distributed Compilation: A New Paradigm for Heterogeneous Computing
Figure 1: The complexity of modern distributed systems
The Challenge
Consider the case of Large Language Models (LLMs). Today, we h…
Read more...
What tokenization achieves
Tokenization enables models to have some form of temporal compression (read outputting several characters as 1 timestep) while also increasing the entropy of those item. By definition we use compression algorithm to create the tokens, which means the …
Read more...
TL;DR In this article I explain how you can deploy a model directly to the browser from pytorch by using Onnjx. This work was done a year ago in about two weeks time.
Check out the full demo.
Deploying a cool deep learning demo at zero cost.
Ok, so when we are showcasi…
Read more...
TL;DR Recently moved to the Netherlands, in order to avoid Googling translate everything, I did the next best thing to learning the language: I created a clone of translate.google.com
Find a correct training loop
My first instinct was to check Hugging Face as this repo contai…
Read more...
Solar energy Stefan boltzmann's law $ \text{Surface energy} = \sigma T^4$
For the sun, $T = \text{5,778 }K$
$\sigma = 5.67 \times 10 ^{-8} W.m^{-2}.K^{-4}$
from sympy.physics.units import K, W, m, giga
sigma = 5.67 * 10**(-8) * W *m**(-2) * K**(-4)
T = 5778 * K
s…
Read more...
What's the problem ? ML models usually are not really capable of predicting how well the data you
feed them is close to what was in the dataset. It really matters in production
models as they might make really stupid mistakes just because they are off
the training set.
…
Read more...
Sometimes if you want to contain dependencies you might want to use docker
to containerize your projects. You can also use it for GPU
In order to run docker images with GPU enabled, you are going to need:
Install docker
sudo apt-get install \
apt-transport-https \
ca-ce…
Read more...
TL;DR. By training two models in the same dataset order with same architecture, same loss, but different initialization,
I was able to obtain a consistent out-of-distribution detector by measuring the kl-divergence between model outputs.
This out-of-distribution measure us…
Read more...
In the first segment
we looked into how we could make a BPE
based encoding, not only based on frequency in the dataset, but directly on the
model probability measure of the next token. In that article I mention that
dynamic BPE are costly because they stop being a one time operat…
Read more...
In the first segment
we looked into how we could make a BPE
based encoding, not only based on frequency in the dataset, but directly on the
model probability measure of the next token. In that article I mention that
dynamic BPE are costly because they stop being a one time operat…
Read more...