The R-factor

The R-factor generates reports for scientific papers, using machine learning to classify citations as confirming, refuting, or mentioning. This project was developed through an LLC I co-founded, Verum Analytics.



statcheck.io is a web implementation of the popular statcheck program.



Snowbank is designed to make parallel processing in R (using the snow/snowfall libraries) easier and more accessible, particularly for those who don't have access to a large cluster (for example, if you just want to use the spare CPU cycles from a few workstations in your lab). Snowbank is a (relatively) compact virtual machine that can be loaded onto commodity hardware and used as a processing node. I've found this technique invaluable when performing computationally-intensive analyses, such as bootstrapping, or other "embarrassingly parallel" tasks.

The project is in its early stages, so a bit of end-user configuration (e.g., setting up network interfaces and names) is necessary. BASH scripts to simplify the process are forthcoming. However, until then, the linked VM should provide basic functionality: just make it accessible to your primary node and use the standard snowfall commands to include it in your cluster.

Download More information


psyLex is a set of text processing functions that extract psychological meaning from large corpra, using LIWC-style dictionaries. Implementations are currently underway for both Python and R. (Actually, they're pretty much done - it's just a matter of shooing out all the flying monkeys.)

Source code


spooner is actually more of a service than software. The problem at hand is the restrictive licensing agreements associated with scholarly publishing. Researchers often want to make their published articles available over the Internet, but doing so often violates publishers' agreements or copyright laws.

The idea behind spooner is to get around these restrictions. Authors will commonly post descriptions of their journal articles, along with an "e-mail me for this paper" link. The problem is that (a) it's hardly immediate gratification, and (b) the author has to spend his/her time fielding e-mailed requests for articles.

Using spooner, you simply provide your website visitors with a link that will allow them to enter their e-mail address and have a particular article e-mailed to them automatically.



I'm extremely paranoid. So when I had to design a scalable on-site data entry program for a study of headaches, I wanted some level of redundancy. In the past, this kind of thing could be obtained by simply logging terminal behavior. Not any more. Windowing systems make that irrelevant.

I wrote Specto to ameliorate my paranoia. It is a Windows program that sits in the background, takes a screen shot at a specified interval, and saves it to your hard drive. Useful in a variety of applications (I've found it useful while running analyses, for those "wait, how did I create that variable?" moments).



There are a lot of options for researchers who wish to collect data over the Internet. Websites have quickly become the predominant way social scientists administer questionnaires, even in a lab setting (where a paper-and-pencil format have previously been the order of the day). However, daily diary studies present unique challenges not met by existing software packages. Participants have to be tracked across time, reminded to participate, and reports are often needed to assign credit based on the number of times a participant completed their diary entry.

Ephemerix is a suite of programs designed to manage participants in diary studies and remind them to complete their entries at a specified interval.