I've loved spaCy for a long time but I've only just got my head around how you can structure a text processing pipeline to take full advantage of its power.
This post looks at what recent advances in natural language processing can teach us about the brain and cognition.
This is a short post explaining how to obtain over 50,000 text books for your natural language processing projects. The source of these books is the excellent Project Gutenberg. Project Gutenberg offers the ability to use sync the collection of books. To obtain the collection you can set up a private mirror as explained here. … Continue reading Getting All the Books
Recently I've been playing around with the last 15 years of patent publications as a 'big data' source. This includes over 4 million individual documents. Here I thought I'd highlight some problems I faced. I found that a lot of academic papers tend to ignore or otherwise bypass this stuff.