Beyond Distributed Compilation: A New Paradigm for Heterogeneous Computing

Disclaimer: This article was generated by artificial intelligence. Beyond Distributed Compilation: A New Paradigm for Heterogeneous Computing Figure 1: The complexity of modern distributed systems The Challenge Consider the case of Large Language Models (LLMs). Today, we h…
Read more...

All we need is tokenizerless

What tokenization achieves Tokenization enables models to have some form of temporal compression (read outputting several characters as 1 timestep) while also increasing the entropy of those item. By definition we use compression algorithm to create the tokens, which means the …
Read more...

Deploying a snapshat filter to the browser

TL;DR In this article I explain how you can deploy a model directly to the browser from pytorch by using Onnjx. This work was done a year ago in about two weeks time. Check out the full demo. Deploying a cool deep learning demo at zero cost. Ok, so when we are showcasi…
Read more...

Creating a translation app

TL;DR Recently moved to the Netherlands, in order to avoid Googling translate everything, I did the next best thing to learning the language: I created a clone of translate.google.com Find a correct training loop My first instinct was to check Hugging Face as this repo contai…
Read more...

Super simple estimation of available solar energy

Solar energy Stefan boltzmann's law $ \text{Surface energy} = \sigma T^4$ For the sun, $T = \text{5,778 }K$ $\sigma = 5.67 \times 10 ^{-8} W.m^{-2}.K^{-4}$ from sympy.physics.units import K, W, m, giga sigma = 5.67 * 10**(-8) * W *m**(-2) * K**(-4) T = 5778 * K s…
Read more...

Can we train neural networks without gradient descent ?

What's the problem ? ML models usually are not really capable of predicting how well the data you feed them is close to what was in the dataset. It really matters in production models as they might make really stupid mistakes just because they are off the training set. …
Read more...

Running a docker with GPU enabled (for pytorch and tensorflow)

Sometimes if you want to contain dependencies you might want to use docker to containerize your projects. You can also use it for GPU In order to run docker images with GPU enabled, you are going to need: Install docker sudo apt-get install \ apt-transport-https \ ca-ce…
Read more...

Self KL-divergence for detecting out of distribution data and unsupervised text classification

TL;DR. By training two models in the same dataset order with same architecture, same loss, but different initialization, I was able to obtain a consistent out-of-distribution detector by measuring the kl-divergence between model outputs. This out-of-distribution measure us…
Read more...

Model based encodings (3)

In the first segment we looked into how we could make a BPE based encoding, not only based on frequency in the dataset, but directly on the model probability measure of the next token. In that article I mention that dynamic BPE are costly because they stop being a one time operat…
Read more...

Model based encodings (2)

In the first segment we looked into how we could make a BPE based encoding, not only based on frequency in the dataset, but directly on the model probability measure of the next token. In that article I mention that dynamic BPE are costly because they stop being a one time operat…
Read more...