Match Calculator

A few people have contacted me recently to let me know that the old link for MatchCalculator isnít working any more. Apologies for this, and for not addressing this more rapidly. The program is still there, but at some point I think my universityís firewall decided to stop executables from being transmitted across its domain (itís only for your protection, you know!). Anyway, for those game enough to take the risk of computing match values, Iíve now added this link to a zipped version of the same executable. If youíre still having problems downloading that, let me know.

This program has been around since late 2000, when I changed the match computation in my SOLAR model from the rather complicated method described in my PhD thesis (which was based on Nigrinís, 1993, confidence values) to the somewhat simpler (not to mention theoretically superior) method used in the current model, which is based on the harmony of signal-weight differences (some details can be found in Davis & Bowers, 2004, 2006). Although the calculation is fairly straightforward, itís still not exactly back-of-the-envelope stuff (especially where letter transpositions are involved), and so I wrote this program so that anyone could determine the spatial coding match between two letter strings. It seemed like a good idea to add some of the older position-specific coding schemes for comparison purposes. Subsequently, other coding schemes were introduced, and I adapted the program to include these, for my own use. Of course, no one should expect my program to be the repository for the latest version of any other modellerís coding scheme, so the other schemes should be used with appropriate care. Furthermore, please note that this simple program is NOT a program for running simulations of models (see below)!

Iíve never provided any documentation with MatchCalc, because I thought it was fairly self-explanatory. However, some common questions have arisen over the years, so the following is my attempt at an FAQ.

What is a match value? A match value is a value between 0 and 1 that represents the match between two letter strings, where 0 indicates no match and 1 indicates a perfect match.

Which string do I enter in which box? Thatís up to you, and your purpose in using the program, but hereís my interpretation. One letter string represents the input stimulus (the letter string that the reader is fixating on, if you will). The other letter string represents the word coded by a particular word detector (if youíre not a localist, you might not like such a term, but youíll probably be able to substitute something more palatable). I call this word the comparison word. That is, the word detector is ďlookingĒ at the input and performing the calculations to see how well this input stimulus matches the comparison word.

Why does it matter which string is which? (i.e., why are the match computations asymmetric?) Consider the simplest scheme, the absolute position coding scheme. If the word detector codes the word CAP, and the input is the word CAPE, then itís happy Ė it has all the input that it requires (the three letters C, A, and P in the right positions) to compute a perfect match of 1. Now suppose that the input is CAP, and the comparison word is CAPE. This time the word detector isnít quite as happy, as itís missing the letter E. Instead, it has a match of 3 out of 4 letters, or .75.

How do I submit a list of items? You can do this by using the Process File button. The input file should be a text file with two letter strings per line, and a tab separating them.

What do the SOLAR (spatial coding) parameters mean? In the basic spatial coding scheme, thereís only a single parameter, called sigma. This parameter represents the degree of letter position uncertainty (a typical value is 1.25). Over the years Iíve occasionally tinkered with additional parameters implementing different ways to have the match be sensitive to serial position (in particular, the special status enjoyed by the initial letter). In the current version, the ILW parameter has this purpose (ILW stands for initial letter weight). Note that it doesnít have any effect unless the End Letter Units box is ticked. Iíd still prefer to have a model that didnít have any position-specificity in its letter representations. So, it would be wise to think of this aspect of the match calculation as continuing to be a work in progress.
While I get this, and other things, just right, the SOLAR paper continues to be under revision (I can provide a copy of the paper submitted to Psych Review a few years ago, for anyone whoís really interested). Iím confident that both the paper and the simulation program will be available before too much longer. In the meantime the appendices of Davis & Bowers (2004, 2006, in JEP:HPP) provide a description of how the match calculation works.

What are the other coding schemes? The first four schemes are various different types of absolute position-specific coding schemes. It should be fairly self-evident how these work. Theyíre included chiefly for historical interest. It would probably be a nice idea to include calculations for the overlap model (Gomez, Perea, & Ratcliff, submitted), and maybe one day Iíll get around to adding this.
The final three schemes are varieties of open-bigram coding scheme (e.g., Grainger & van Heuven, 2004; Schoonbaert & Grainger, 2004; Grainger & Whitney, 2004; Whitney, 2001, 2004). SERIOL2001 is a faithful rendition of the computations originally described by Whitney (2001). However, this model has now been superseded by a revised version; Carol Whitneyís (2004) thesis is probably the best source for information on this revised model. The other open-bigram coding scheme isnít really a current scheme, but is based on the constrained version of open-bigram coding originally proposed by Jonathan Grainger, Walter van Heuven and colleagues. The latter scheme has been developed further by Walter recently, and at some point I will try to include a current version of this scheme in MatchCalc; in the meantime you should probably contact Walter directly if youíd like information pertaining to this scheme.

What about that method described by Van Orden (1987)? You can find an implementation of that method here.

Whatís the difference between match values and simulations of models? I must confess to having some reservations about the way this program has been getting used of late. As a tool for understanding the various different orthographic coding schemes, and how they compute matches, I think it serves a useful purpose. However, Iím always at pains to point out that a match value is not, in itself, a prediction. In models like SOLAR, the match value computed by a word detector serves as its bottom-up input. But predictions about things like lexical decision latencies will depend on a range of other factors, notably competitive interactions amongst orthographically/phonologically similar candidates. For example, ABLE and AZLE are both neighbours of the word AXLE, and each of the coding schemes implemented in this program computes equivalent matches for (AXLE, ABLE) and (AXLE, AZLE). But this most certainly doesnít mean that SOLAR (or any other model) predicts that these pairs should always function equivalently. For example, SOLAR predicts that the prime ďableĒ inhibits AXLE, whereas the prime ďazleĒ facilitates ďAXLEĒ, and this is what we find in readers also, as Steve Lupker & I have shown (Davis & Lupker, 2006, JEP:HPP).

In particular, when using this program, please remember that you are simply examining the bottom-up calculations associated with particular coding schemes. You are not simulating the SOLAR/SERIOL/OB coding models!

How do I compute phonological matches? The program includes an option to compute phonological matches. This isnít something I ever do, and I have even greater reservations about these matches than about the misapplication of the orthographic match scores. Frankly, Iím really not sure what the point of this option is, as none of the calculations implemented in the program are based on theoretical models of phonological processing. Still, Iíve been asked to make this option available, so here it is, in the hope that some people might find it useful. If you want to use it, you need to use DISC codes (you can look at the output of my N-Watch program for help with that).

How much does this program cost? Iíve always made the software utilities that Iíve written available to other researchers free of charge, in the hope that they may be useful. I have no plans to change this policy. However, at the time Iím writing this (late 2007), I am an expectant father living on a humble British academicís salary in one of the worldís most expensive cities. So, if youíre using this program for funded research, and you have funds for things like software, I wouldnít necessarily object to a small financial contribution; indeed, Iíd be most grateful.