More reflections on the apparent “structuralism” in the Google dataset
In my last post, I argued that groups of related terms that express basic sensory oppositions (wet/dry, hot/cold, red/green/blue/yellow) have a tendency to correlate strongly with each other in the...
View ArticleOn different uses of structuralism; or, histories of diction don’t have to...
I’ve written several posts now on the way related terms (especially simple physical adjectives) tend to parallel each other in the Google dataset. The names of primary colors rise and fall together. So...
View ArticleSeveral varieties of noise, and the theme to Love Story.
I’ve asserted several times that flaws in optical character recognition (OCR) are not a crippling problem for the English part of the Google dataset, after 1820. Readers may wonder where I get that...
View ArticleHow to make the Google dataset work for humanists.
I started blogging about the Google dataset because it revealed stylistic trends so intriguing that I couldn’t wait to write them up. But these reflections are also ending up in a blog because they...
View ArticleIdentifying topics with a specific kind of historical timeliness.
Benjamin Schmidt has been posting some fascinating reflections on different ways of analyzing texts digitally and characterizing the affinities between them. I’m tempted to briefly comment on a...
View ArticleThe Google dataset as an episode in the history of science.
In a few years, some enterprising historian of science is going to write a history of the “culturomics” controversy, and it’s going to be fun to read. In some ways, the episode is a classic model of...
View ArticleTrends, topics, and trending topics.
I’ve developed a text-mining strategy that identifies what I call “trending topics” — with apologies to Twitter, where the term is used a little differently. These are diachronic patterns that I find...
View ArticleWords that appear in the same 18c volumes also track each other over time,...
I wrote a long post last Friday arguing that topic-modeling an 18c collection is a reliable way of discovering eighteenth- and nineteenth-century trends, even in a different collection. But when I woke...
View ArticleExploring the relationship between topics and trends.
I’ve been talking about correlation since I started this blog. Actually, that was the reason why I did start it: I think literary scholars can get a huge amount of heuristic leverage out of the fact...
View ArticleHow not to do things with words.
In recent weeks, journals published two papers purporting to draw broad cultural inferences from Google’s ngram corpus. The first of these papers, in PLoS One, argued that “language in American books...
View Article
More Pages to Explore .....