Reflections on Rising Nationalism

Toxic sentiment in society doesn’t arise from a vacuum. History shows that it tends to follow a perceived threat to a way of live, a reaction to change. You cannot remove the toxicity without addressing the underlying issues. But what are these?

Getting All the Books

This is a short post explaining how to obtain over 50,000 text books for your natural language processing projects. The source of these books is the excellent Project Gutenberg. Project Gutenberg offers the ability to use sync the collection of books. To obtain the collection you can set up a private mirror as explained here. … Continue reading Getting All the Books →

Taming the Docker Blob

Or understanding how to best use Docker. Docker is a great way to build services with modular and changeable components without borking your server / computer. I like to think of Docker containers as a system version of Python's virtual environment - you can build a stack of services and applications through a Docker file, … Continue reading Taming the Docker Blob →

Sampling vs Prediction

Some things have recently been bugging me when applying deep learning models to natural language generation. This post contains my random thoughts on two of these: sampling and prediction. By writing this post, I hope to try to tease these apart in my head to help improve my natural language models. Sampling Sampling is the … Continue reading Sampling vs Prediction →

QuickPost – Discovering SSH Devices on Home Network

As the home router uses DHCP, the IP addresses of the devices on the network often change. To find devices on the network that have ssh open on port 22 you can use: where [Base IP Address] is often 192.168.1.0 for most routers and 10.0.1.0 for Apple routers.

Understanding Convolution in Tensorflow

This is a quick post intended to help those trying to understand convolution as applied in Tensorflow. There are many good blog posts on the Internet explaining convolution as applied in convolutional neural networks (CNNs), e.g. see this one by Denny Britz. However, understanding the theory in one thing, knowing how to implement it is … Continue reading Understanding Convolution in Tensorflow →

Practical Problems with Natural Language Processing

Recently I've been playing around with the last 15 years of patent publications as a 'big data' source. This includes over 4 million individual documents. Here I thought I'd highlight some problems I faced. I found that a lot of academic papers tend to ignore or otherwise bypass this stuff.

Artificial Morality (or How Do We Teach Robots to Love)

One Saturday morning I came upon the website 80000 Hours. The idea of the site is to direct our activity to maximise impact. They have a list of world problems here. One of the most pressing is explained as the artificial intelligence "control problem" : how do we control forces that can out think us? This … Continue reading Artificial Morality (or How Do We Teach Robots to Love) →

Fixing Errors on Apache-Served Flask Apps

This is just a quick post to remind me of the steps to resolve errors on an Apache-served Flask app. I'm using Anaconda as I'm on Puppy Linux (old PC) and some compilations give me errors. Stuff in square brackets is for you to fill in. Log into remote server (I use ssh keys): ssh -p … Continue reading Fixing Errors on Apache-Served Flask Apps →

Using Alembic to Migrate SQLAlchemy Databases

There are several advantages of using SQLAlchemy as a wrapper for an SQL database. These include stability with large numbers of data records, class/object-oriented approach, plug-and-play underlying databases. However, one under-documented disadvantage is poor change management. If you add a field or table you generally need to regenerate the entire database. This is a pain if … Continue reading Using Alembic to Migrate SQLAlchemy Databases →