R for psychometrics

April 22, 2019

If you are dealing with data science, which psychometrics most definitely is, you’ve probably come across R. It is an environment that allows you to implement packages for many different types of analysis, which are built by a massive community of data scientists around the world.

R has become one of the two main languages for data science and machine learning (the other being Python) and its popularity has grown drastically. R for psychometrics is also becoming common.

I was extremely anti-R for several years, but have recently started using it for several important reasons. However, for some even more important reasons, I don’t use it for all of my work. I’d recommend you do the same.

What is R?

R is a programming language-like environment for statistical analysis. Its Wikipedia page defines it as a “programming language and free software environment for statistical computing and graphics”. But I use the term “programming-language-like environment”.

This is because it is more like command scripting from DOS than an actual compiled language like Java or Pascal. R has an extremely steep learning curve compared to software that provides a decent UI; it claims that RStudio is a UI, but it really is just a more advanced window to see the same command code!

R can be maddening for several reasons. For example, it will not recognize a missing value in data when running a simple correlation and is unable to give you a decent error message explaining this. This was my first date with R, and turned me off for years. A similar thing occurred to me the first time I used PARSCALE in 2009 and couldn’t get it to work for days. Eventually,, I discovered it was because the original codebase was DOS, which limits you to 8-character-file names, which is nowhere in the documentation! They literally expected all users to be knowledgeable on 1980s DOS rules in 2009.

BUT… R is free, and everybody likes free. Even though free never means there is no cost.

What are packages?

R comes with some analysis out of the box, but the vast majority is available in packages. For example, if you want to do factor analysis or item response theory, you install one of several packages that do those. These packages are written by contributors and uploaded to an R server somewhere.

There is no code review or anything else to check the packages, so it is entirely a caveat emptor situation. This isn’t malicious, they’re just taking the scientific approach that assumes other researchers will replicate, disprove, or alternative work.

For important, commonly used packages (I am a huge fan of caret), this is most definitely the case. For rarely used packages and pet projects, it is the opposite.

Why do I use R for psychometrics or elsewhere?

As mentioned earlier, I use R when there are well-known packages that are accepted in the community. The caret package is a great example. Just Google “r caret” and you can get a glimpse of the many resources, blog posts, papers, and other uses of the package. It isn’t an analysis package usually, it just makes it easier to call existing, proven packages. Another favorite is the text2vec package, and of course, there is the ubiquitous tidyverse.

I love to use R in cases of more general data science problems because this means a community several orders of magnitude above psychometricians, which definitely contributes to the higher quality. The caret package is for regression and classification, which are used in just about every field.

The text2vec package is for natural language processing, used in fields as diverse as marketing, political science, and education. One of my favorite projects I’ve heard come across was the analysis of the Jane Austen corpus. Fascinating.

When would I use R packages that I might consider less stellar?

I don’t mind using R when it is a low-stakes situation such as exploratory data analysis for a client. I would also consider it an acceptable alternative to commercial software when the analysis is something I do very rarely. No way am I going to pay $10,000 or whatever for something I do 2 hours per year.

Finally, I would consider it for niche analyses where no other option exists except to write my own code, and it does not make financial sense to do so. However, in these cases I still try to perform due diligence.

Why I don’t Use R

Often it comes down to a single word: quality.

For niche packages, the code might be 100% from a grad student who was a complete newbie on a topic for their thesis, with no background in software development or ancillary topics like writing a user manual. Moreover, no one has ever validated a single line of the code. Because of this, I am very wary when using R packages.

If you use one like this, I highly recommend you do some QA or background research on it first! I would love it if R had a community rating system like exists for WordPress plugins. With those, you can see that one plugin might be used on 1,000,000 sites with a 4.5/5.0 rating, while another is used on 43 sites with a 2.7/5.0 rating.

This quality thing is of course a continuum. There is a massive gap between the grad student project and something like caret. In between, you might have an R package that is a hobby of a professor who devotes some time to it and has extremely deep knowledge of the subject matter—but it remains a part-time endeavor by someone with no experience in commercial software. For example of this situation, please see this comparison of IRT results with R vs professional tools.

The issue on User Manuals is of particular concern to me as someone that provides commercial software and knows what it is like to support users. I have seen user manuals in the R world that literally do not tell the users how to use the package. They might provide a long-winded description of some psychometrics, obviously copied from a dissertation, as a “manual”, when at best it only belongs as an appendix. No info on formatting of input files, no provision of example input, no examples of usage, and no description of interpreting the output.

Even in the cases of an extremely popular package that has high-quality code, the documentation is virtually unreadable. Check out the official landing page for tidyverse. How welcoming is that? I’ve found that the official documentation is almost guaranteed to be worthless – instead, head over to popular blogs or YouTube channels on your favorite topic.

The output is also famously below average regarding quality.

R stores its output as objects, a sort of mini-database behind the scenes. If you want to make graphs or dump results to something like a CSV file, you have to write more code just for such basics. And if you want a nice report in Word or PDF, get ready to write a ton of code, or spend a week doing copy-and-paste. I noticed that there was a workshop a few weeks ago at NCME (April 2019) that was specifically on how to get useful output reports from R, since this is a known issue.

Is R turning the corner?

I’ve got another post coming about R and how it has really turned the corner because of 3 things: Shiny, RStudio, and availability of quality packages. More on that in the future, but for now:

Shiny allows you to make applications out of R code so that the power of R can be available to end-users without them having to write & run code themselves. Until Shiny, R was limited to people who wanted to write & run code.
RStudio makes it easier to develop R code, by overlaying an integrated development environment (IDE) on top of R. If you have ever used and IDE, you know how important this is. You’ve got to be incredibly clueless to not use an IDE for development. Yet the first release of RStudio did not happen until 2011. This shows how rooted R was in academia.
As you might surmise from my rant above, it is the quality packages (and third-party documentation!) that are really opening the floodgates.

Another hope for R is jumping on the bandwagon of the API economy. It might become the lingua franca of the data analytics world from an integration perspective. This might be the true future of R for psychometrics.

But there are still plenty of issues. One of my pet peeves is the lack of quality error trapping. For example, if you do simple errors, the system will crash with completely worthless error messages. I found this to happen if I run an analysis, open my output file, and run it again when forgetting to close the output file. As previously mentioned, there is also the issue with a single missing data point in a correlation.

Nevertheless, R is still not really consumer facing. That is, actual users will always be limited to people that have strong coding skills AND deep content knowledge on a certain area of data science or psychometrics. Just like there will always be a home for more user-friendly statistical software like SPSS, there will always be a home for true psychometric software like Xcalibre.

Nathan Thompson

Nathan Thompson earned his PhD in Psychometrics from the University of Minnesota, with a focus on computerized adaptive testing. His undergraduate degree was from Luther College with a triple major of Mathematics, Psychology, and Latin. He is primarily interested in the use of AI and software automation to augment and replace the work done by psychometricians, which has provided extensive experience in software design and programming. Dr. Thompson has published over 100 journal articles and conference presentations, but his favorite remains https://scholarworks.umass.edu/pare/vol16/iss1/1/ .

Ready to talk to an assessment expert?

Get in touch, and we'll meet to discuss how we can improve your exam development, delivery, and psychometrics!

Request a Consultation