James M. Clawson

Word Vector Utilities

The wordVectors package from Ben Schmidt and templates from the Women Writers Project have been very useful as I begin doing work with word vector embeddings. To simplify some things and to help with exploring results, I wrote some utility functions, which this post explains how to use.

Selecting a Better Corpus

Building on the previous blog post, this one gets more novels from more countries over more years. Compared to the last corpus, this one is huge, at 13,334 titles. (Yes, sometimes bigger is better.)

Selecting a Literary Corpus from Wikipedia

Choosing a sufficiently sized collection of literary texts can be difficult, especially if one wants to avoid selection bias. Luckily, Wikipedia can help.

Why Blog, Why Today?

This is the perfect time not to start a blog, yet here I am. It feels necessary to start with a preface, with something to explain why I’m doing this, why I’m doing it now, and what I hope will result. In the spirit of the web, here’s a listicle of the Top Six Reasons I’ve Begun a Blog. (You won’t believe number three!)

