Skip to main content

Code #

Most of my coding projects and replication repositories are available on my GitLab and several text analysis R packages are available on the Cultural Cartography GitLab project. I’ve moved away from using GitHub, but several projects are still hosted there.

text2map #

In collaboration with Marshall A. Taylor, we built an R package oriented around various kinds of text matrices. It cannabalizes the functions in our CMDist package (which is now deprecated).

You can find out much more about the package here: culturalcartography.gitlab.io/text2map/

To install the text2map R package:

install.packages("text2map")
library(text2map)

We also developed several related modules which are hosted on GitLab:

Concept Mover’s Distance #

In collaboration with Marshall A. Taylor, we develop a method for measuring a text’s engagement with a focal concept using word embeddings. For example, which of Shakespeare's plays engages most with the concept of death?

Concept Mover’s Distance, determines a document's engagement with a concept as the minimum distance the words in the document need to travel to arrive at the position, determined by word embeddings, of an ideal "pseudo document" consisting of only words denoting a specified concept.

We have recently expanded functionality. The package now allows for averaging words creating a "semantic centroid" and creating ”semantic directions” created by juxtaposing pairs of terms (e.g., women - men or rich - poor). The function then calculates the distance of moving a document to these vectors. We have also demonstrated how CMD can be combined with correlational class analysis to group documents based on schematic similarities ("Concept Class Analysis").

For an overview of the package features and a quick start guide see this Github Page: https://CMDist Quick Start Guide

To jump right in, run the following code to install and load the text2map R package and use the CMDist function:

install.packages("text2map")
library(text2map)</code>

The method is detailed in our papers in the Journal of Computational Social Science: Concept Mover’s Distance: Measuring Concept Engagement in Texts via Word Embeddings," and "Integrating Semantic Directions with Concept Mover's Distance," and a Sociological Science paper "Concept Class Analysis: A Method for Identifying Cultural Schemas in Texts."

You can find the code to reproduce the original paper on Github: github.com/dustinstoltz/concept_movers_distance_jcss, the follow-up note at github.com/Marshall-Soc/cmd_geometry, and "Concept Class Analysis" at github.com/Marshall-Soc/CoCA.

Brickplot #

With Michael Lee Wood, and based on his Socius paper visualizing religious service attendance over time, we created an R package that easily plots distributions in discrete, ordered variables, by stacking barplots — like laying bricks.

You can find the code on Gitlab: https://gitlab.com/woodstoltz/brickplot

Run the following code to use the brickplot R package:

devtools::install_gitlab("woodstoltz/brickplot")
library(brickplot)

Textual Spanning #

Also in collaboration with Marshall A. Taylor, we develop a measure which increases when a document is similar to documents which are not also similar to each other (and vice versa). This measure is particularly well-suited for the unique properties of text networks built from document similarity matrices when considered as dense weighted graphs. The paper “Textual Spanning: Finding Discursive Holes in Text Networks” can be found in Socius.

Run the following code to use the textSpan R package:

devtools::install_github("dustinstoltz/textSpan")
library(textSpan)