Narsil

Beyond Distributed Compilation: A New Paradigm for Heterogeneous Computing

2025-06-04 - (7 min read)

Disclaimer: This article was generated by artificial intelligence. Beyond Distributed Compilation: A New Paradigm for Heterogeneous Computing Figure 1: The complexity of modern distributed systems The Challenge Consider the case of Large Language Models (LLMs). Today, we h…

All we need is tokenizerless

2024-01-14 by nicolas - (3 min read)

What tokenization achieves Tokenization enables models to have some form of temporal compression (read outputting several characters as 1 timestep) while also increasing the entropy of those item. By definition we use compression algorithm to create the tokens, which means the …

Deploying a snapshat filter to the browser

2020-07-29 by nicolas - (8 min read)

TL;DR In this article I explain how you can deploy a model directly to the browser from pytorch by using Onnjx. This work was done a year ago in about two weeks time. Check out the full demo. Deploying a cool deep learning demo at zero cost. Ok, so when we are showcasi…

Creating a translation app

2020-07-27 by nicolas - (10 min read)

TL;DR Recently moved to the Netherlands, in order to avoid Googling translate everything, I did the next best thing to learning the language: I created a clone of translate.google.com Find a correct training loop My first instinct was to check Hugging Face as this repo contai…

Super simple estimation of available solar energy

2020-03-19 - (15 min read)

Solar energy Stefan boltzmann's law $ \text{Surface energy} = \sigma T^4$ For the sun, $T = \text{5,778 }K$ $\sigma = 5.67 \times 10 ^{-8} W.m^{-2}.K^{-4}$ from sympy.physics.units import K, W, m, giga sigma = 5.67 * 10**(-8) * W *m**(-2) * K**(-4) T = 5778 * K s…

Can we train neural networks without gradient descent ?

2020-03-10 - (43 min read)

What's the problem ? ML models usually are not really capable of predicting how well the data you feed them is close to what was in the dataset. It really matters in production models as they might make really stupid mistakes just because they are off the training set. …

Running a docker with GPU enabled (for pytorch and tensorflow)

2020-03-04 by nicolas - (2 min read)

Sometimes if you want to contain dependencies you might want to use docker to containerize your projects. You can also use it for GPU In order to run docker images with GPU enabled, you are going to need: Install docker sudo apt-get install \ apt-transport-https \ ca-ce…

Self KL-divergence for detecting out of distribution data and unsupervised text classification

2020-02-26 - (154 min read)

TL;DR. By training two models in the same dataset order with same architecture, same loss, but different initialization, I was able to obtain a consistent out-of-distribution detector by measuring the kl-divergence between model outputs. This out-of-distribution measure us…

Model based encodings (3)

2019-08-06 by nicolas - (12 min read)

In the first segment we looked into how we could make a BPE based encoding, not only based on frequency in the dataset, but directly on the model probability measure of the next token. In that article I mention that dynamic BPE are costly because they stop being a one time operat…

Model based encodings (2)

2019-06-06 by nicolas - (12 min read)

In the first segment we looked into how we could make a BPE based encoding, not only based on frequency in the dataset, but directly on the model probability measure of the next token. In that article I mention that dynamic BPE are costly because they stop being a one time operat…