Education, to me, is the neverending opportunities we have for a cycle of instruction and assessment.    This can be extremely small scale (watching a YouTube video on how to change a bike tire, then doing it) to large scale (teaching a 5th grad math curriculum and then assessing it nationwide).  Psychometrics is the Science of Assessment – using scientific principles to make the assessment side of that equation more efficient, accurate, and defensible.  How can psychometrics, especially its intersection with technology, improve your assessment?  Here are 10 important avenues to improve assessment with psychometrics.

10 ways to improve assessment with psychometrics

  • Job analysis: If you are doing assessment of anything job-related, from pre-employment screening tests of basic skills to a nationwide licensure exam for a high-profile profession, a job analysis is the essential first step.  It uses a range of scientifically vetted and quantitatively leveraged approaches to help you define the scope of the exam.
  • Standard-setting studies: If a test has a cutscore, you need a defensible method to set that cutscore.  Simply selecting a round number like 70% is asking for a disaster.  There are a number of approaches from the scientific literature that will improve this process, including the Angoff method and Contrasting Groups method.
  • Technology-Enhanced Items (TEIs): These item types leverage the power of computers to change assessment my moving the medium from multiple-choice recall questions to questions that evaluate more deeper thinking.  Substantial research exists on these, but don’t forget to establish a valid scoring algorithm!
  • Workflow management: Items are the basic building blocks of the assessment.  If they are not high quality, everything else is a moot point.  There needs to be formal processes in place to develop and review test questions.
  • Linking: Linking and equating refer to the process of statistically determining comparable scores on different forms of an exam, including tracking a scale across years and completely different set of items.  If you have multiple test forms or track performance across time, you need this.  And IRT provides far superior methodologies.
  • Automated test assembly: The assembly of test forms – selecting items to match blueprints – can be incredibly laborious.  That’s why we have algorithms to do it for you.  Check out TestAssembler.
  • Distractor analysis: If you are using items with selected responses (including multiple choice, multiple response, and Likert), a distractor/option analysis is essential to determine if those basic building blocks are indeed up to snuff.  Our reporting platform in FastTest, as well as software like Iteman and Xcalibre, is designed for this purpose.
  • Item response theory (IRT): This is the modern paradigm for developing large-scale assessments.  Most important exams in the world over the past 40 years have used it, across all areas of assessment: licensure, certification, K12 education, postsecondary education, language, medicine, psychology, pre-employment… the trend is clear.  For good reason.
  • Automated essay scoring: This technology is just becoming more widely available, thanks to a public contest hosted by Kaggle.  If your organization scores large volumes of essays, you should probably consider this.
  • Computerized adaptive testing (CAT):  Tests should be smart.  CAT makes them so.  Why waste vast amounts of examinee time on items that don’t contribute?  There are many other advantages too.

Want to improve the quality of your assessments?

Sign up for our newsletter and hear about our free tools, product updates, and blog posts first! Don’t worry, we would never sell your email address, and we promise not to spam you with too many emails.

Newsletter Sign Up
First Name*
Last Name*
Email*
Company*
Market Sector*
Lead Source

I recently read a disturbing article in the New York Times regarding the “Opt-Out of Educational Assessment” movement in the State of New York.  I find this downright appalling.  There are many sorts of standardized assessments, and the ones we are talking about here exist to improve the quality of education.  They check to see whether students are learning the curriculum, thereby providing some measure of accountability, as well as an index of improvement, to the teachers, schools, districts, and States.  They are used to help students, not hinder them.  By saying that you do not want your students involved in this, you are saying that you don’t want any accountability, and that you really do not care about education at an aggregate level.  You are happy just to send your kids to school and hope that the teacher teaches them something over the course of the year.

It’s also interesting that the article mentions this being in response, in part, to the fact that test scores “plummeted” after the introduction of Common Core.  The author here is definitely biased and/or uninformed.    Why were scores lower?  Because we no longer allowed States to doctor their data.  Under No Child Left Behind, States were required to assess students but could set the cutscores wherever they wanted.  So the backwoods States obviously set the bar pretty low, and reported that large numbers of their students were proficient.  They were also free to adjust their curriculum, i.e., teach 4th grade math in 5th grade so that, wow… the 5th graders seem to do well on math!  The move to Common Core was in part to prevent these two schemes, and of course such states saw much lower numbers of 5th grade math students score as proficient when being tested on actual 5th grade math, with a standard set that was similar to other states.  The Opt-Out of Educational Assessment movement is fighting this.

What is happening here, then is that people are blaming the test purely out of reactionary tendency.  An analogy might be that you are on a weight loss program and have for years intentionally miscalibrated your bathroom scale to read 10 pounds lower.  You get a fitness coach that calls you on it and forces you to correctly calibrate it, and then to actually weigh yourself once a week to see if you are losing weight.  So you blame 1) the scale, and 2) your fitness coach?  That is probably not going to help improve your fitness.  Or the education system.  But go ahead and opt out of weighing yourself.

The State of Connecticut has enacted a new policy to require all 11th grade students to take the SAT (New York Times, 6 Aug 2015).  As a lifelong advocate of quality assessment, this concerns me.  I consider Instruction and Assessment to be the two primary components of Education; instruction without assessment is like a weight loss program without a scale or mirror.  But the assessment has to be done right.  How do we know what is right?  To evaluate this, we need to couch the discussion in the most important concept in assessment: validity.

Validity

Validity refers to whether the score interpretations being made form a test are those for which the test was intended and designed, and are supported by evidence.  One of my graduate school mentors used to say, “The right tool for the right job.”  You could pound in a nail with a screwdriver, but that is not what it was designed for.  The same goes for assessment.  An actual need has to be identified that requires assessment to be done, and the assessment should be designed for that need.  So let’s look at the situation in Connecticut.

What does Connecticut Need?

Connecticut needs a measure of student success, so that they can evaluate whether a student is learning the curriculum.  A curriculum is designated by the State, and is currently moving towards the Common Core, which is a big improvement over the old days where each State did their own… leading States with poor education systems to simply dumb down their curriculum and lower their standards to make it appear that their students were smart.  A test of what a student has learned is achievement.

For what was the SAT designed?

The Scholastic Aptitude Test (SAT) is, as the name suggests, a test of aptitude, not achievement.  The story behind the SAT: A century ago, colleges and universities had their own entrance/placement exams.  They realized this was stupid and banded together to form the (creatively named) College Board, to make a common exam, as well as other things.  The test was designed to best predict success in college – nothing about how much the student learned in high school.  Here’s an article that discusses that purpose.  The approach they used was to assess cognitive ability via a few areas, currently Critical Reading, Math, and Writing (scores here).  This isn’t a bad idea, given the purpose of the test.  But the SAT definitely does not assess, for example, how much of the Science curriculum the student has learned in high school

Is there too much testing?

Connecticut obviously felt pressure from parents and teachers that there was too much testing for 11th graders; this article in a local paper shows the sentiment.  I don’t want to dispute that.  What concerns me is that Connecticut seemed to have a choice of eliminating their achievement tests or the SAT.  They need an achievement test to track student learning.  Instead, they dropped the achievement test built for that purpose and kept the SAT, which is designed for a completely different purpose.  What irks me even more is that people who should know better, such as the Superintendent in that article, are apparently not even aware of the basic concepts of assessment.  Instead, he is happy to have to go through the work of trying to set new cutscores on a test that is being misapplied.

So how did this happen?

I think the biggest reason for this, and many other misuses of assessment, is that the vast majority of people, including educators and politicians, are psychometrically illiterate.  Educators are incredibly hardworking, but their focus is instruction (and rightly so).  They have never had exposure to basic concepts like validity and reliability.  As a scientific field as well as a business industry, we need to better educate end users and stakeholders.  This is not limited to K12 Education, by the way.

As for Connecticut, I’d suggest one thing to start: follow the money.  That local article says that the SAT is free for all CT students.  We all know there is no such thing as a free lunch. Someone made a very good sale!

Want to improve the quality of your assessments?

Sign up for our newsletter and hear about our free tools, product updates, and blog posts first! Don’t worry, we would never sell your email address, and we promise not to spam you with too many emails.

Newsletter Sign Up
First Name*
Last Name*
Email*
Company*
Market Sector*
Lead Source

Last night, I had the honor to sit on a panel discussing Current Themes in Educational Assessment at an Educelerate event.  Educelerate “is a networking forum for people who are passionate about promoting innovation and entrepreneurship in education – particularly through the use of technology.”  It is a national network, and the Twin Cities has an active chapter due to the substantial presence of the Education industry here.  See the local MeetUp page for more information or to join up.  There is also a national Twitter feed to follow.

I’d like to thank Sean Finn for organizing the event and serving as moderator.  I’d also like to thank the other three panelists in addition to everyone that attended.

  • Jennifer Dugan – Director of Assessment at the State of Minnesota
  • Greg Wright – CEO at Naiku
  • Steve Mesmer – COO at Write the World

After an overview of assessment at the State level by Ms. Dugan, each panelist was asked to provide a brief response to three questions regarding assessment.  Here are mine:

  1. In your opinion, how do you perceive the role of technology in educational assessment?

I think this depends on the purpose of the assessment.  In assessment of learning, from 3rd grade benchmark exams to medical licensure tests, the purpose of the test is to obtain an accurate estimate of student mastery.  The greater the stakes, the more accuracy is needed.  Technology should serve this goal.

In assessment for learning, the goal is more to engage the student and be integral to the learning process.  Using complex psychometrics to gain more accurate scores is less important.  Technology should explore ways to engage the student and enhance learning, such as simulations.

However, we must not lose sight of these purposes, and adding technology merely for the sake of appearing innovative is actually counterproductive.  I’ve already seen this happen twice with a PARCC project.  They have “two-part” items that are supposed to delve deeper into student understanding, but because the approach is purely pedagogical and not psychometric, the data they produce is unusable.  PARCC also takes the standard multiple response item (choose two out of five checkboxes) and makes it into a drag and drop item; no difference whatsoever in data or psychometrics, just sleeker looking technology.

 

  1. What opportunities do you see for new technologies that can help improve educational assessment in the 21st century?

There are a few ways this can happen.

My favorite is adaptive testing, whereby we leverage the computing power of technology to make tests more accurate.  The same is also true for more sophisticated psychometric modeling.

Another great idea is automated essay scoring, which is not safe as the ONLY scoring method, but improves accuracy when used appropriately.  Given the massive back-end cost of scoring essay items, any alleviation on that front will allow for more use of constructed-response formats.

New item types that allow us to glean more information in a shorter amount of time will improve the efficiency and accuracy of assessment.  But as I mentioned previously, development of new item types should always be done with the correct purpose in mind.

Big Data will likely improve the use of assessment data, but can also come into play in terms of the development and delivery of tests.

I’d also like to see Virtual Reality break into the assessment arena.  Our company works with Crane Operator exams.  Who WOULDN’T want to take an exam like that via virtual reality?

 

  1. Adaptive testing is a common term in the educational assessment world, especially given the focus of Smarter Balanced. What is the future of adaptive testing in your opinion, and how will that impact educational assessment?

The primary purpose of adaptive testing is to improve the efficiency of the assessment process.  That is, research has generally shown that it can produce scores just as accurate as a linear test, but with half as many items.  Moreover, the improvement in precision is typically more pronounced for students that are very high or low, because the typical exam does not provide many items for them; most items are of middle difficulty.  Alternatively if we want to keep the same precision, we can cut time in half; this is extremely relevant for quick diagnostic tests as opposed to longer, high-stakes tests.

While the time savings are notable at an individual level, consider the overall time savings across hundreds of thousands of students – certainly relevant in an environment of “less testing!”

CAT also has secondary advantages, such as increasing student engagement because students are only presented with items of appropriate difficulty.

One major opportunity is for CAT to start using more sophisticated models, such as multidimensional item response theory, cognitive diagnostic models, and models that utilize item response time.  This will improve its performance even further.

The future involves more widespread use of CAT as the cost of providing it continues to come down.  While it will never be something that can be done at the classroom or school level since it requires a PhD, more companies will be able to provide it, and at a lower price point, which means it ends up being used more widely.

Want to improve the quality of your assessments?

Sign up for our newsletter and hear about our free tools, product updates, and blog posts first! Don’t worry, we would never sell your email address, and we promise not to spam you with too many emails.

Newsletter Sign Up
First Name*
Last Name*
Email*
Company*
Market Sector*
Lead Source

The Partnership for Assessment of Readiness for College and Careers (PARCC) is a consortium of US States working together to develop educational assessments aligned with the Common Core State Standards.  This is a daunting task, and PARCC is doing an admirable job, especially with their focus on utilizing technology.  However, one of the new item types has a serious psychometric fault that deserves a caveat with regards to scoring.

The item type is an “Evidence-Based Selected-­Response” (EBSR) item format, commonly called a Part A/B item or Two-Part item.  The goal of this format is to delve deeper into student understanding, and award credit for deeper knowledge while minimizing the impact of guessing.  This is obviously an appropriate goal for assessment.  To do so, the item is presented as two parts to the student, where the first part asks a simple question and the second part asks for supporting evidence to their answer in Part A.  Students must answer Part A correctly to receive credit on Part B.  As described on the PARCC website:

In order to receive full credit for this item, students must choose two supporting facts that support the adjective chosen for Part A. Unlike tests in the past, students may not guess on Part A and receive credit; they will only receive credit for the details they’ve chosen to support Part A.

While this makes sense in theory, it leads to problem in data analysis, especially if using Item Response Theory (IRT).  Obviously, this violates the fundamental assumption of IRT, local independence (items are not dependent on each other).  So when working with a client of mine, we decided to combine it into one multi-point question, which matches the theoretical approach PARCC EBSR items are taking.  The goal was to calibrate the item with Muraki’s generalized partial credit model (GPCM), which is typically used to analyze polytomous items in K12 assessment (learn more here).   The GPCM tries to order students based on the points they earn: 0 point students tend to have the lowest ability, 1 point students of moderate ability, and 2 point students are of the highest ability.  The polytomous category response functions (CRFs) then try to approximate those, and the model estimates thresholds, the points that are the line between a 0-point student and a 1-point student and 1 vs. 2.  This typically occurs to where the adjacent CRFs cross.

PARCC EBSR Graphs

The first thing we noticed was that some point levels had very small sample sizes.  Suppose that Part A is 1 point and Part B is 1 point (select two evidence pieces but must get both).  Most students will get 0 points or 2 points.  Not many will receive 1: guess Part A but select no correct evidence or only select one evidence point.  This leads to calibration issues with the GPCM.

However, even when there was sufficient N at each level, we found that the GPCM had terrible fit statistics, meaning that the item was not performing according to the model described above.  So I ran Iteman, our classical analysis software, to obtain quantile plots that approximate the polytomous IRFs without imposing the GPCM modeling.  I found that in the 0-2 point items tend to have the issue where not many students get 1 point, and moreover the line for them is relatively flat.  The GPCM assumes that it is relatively bell-shaped.  So the GPCM is looking for where the drop-offs are in the bell shape, crossing with adjacent CRFs – the thresholds – and they aren’t there.  The GPCM would blow up, usually not even estimating thresholds in correct ordering.

So I tried to think of this from a test development perspective.  How do students get 1 point on these PARCC EBSR items?  The only way to do so is to get Part A right but not Part B.  Given that Part B is the reason for Part A, this means this group is students who answer Part A correctly but don’t know the reason, which means they are guessing.  It is then no surprise that the data for 1-point students is in a flat line – it’s just like the c parameter in the 3PL.  So the GPCM will have an extremely tough time estimating threshold parameters.

From a psychometric perspective, point levels are supposed to represent different levels of ability.  A 1-point student should be higher ability than a 0-point student on this item, and a 2-point student of higher ability than a 1-point student.  This seems obvious and intuitive.  But this item, by definition, violates that first statement.  The only way to get 1 point is to guess the first part – and therefore not know the answer and are no different than the 0-point examinees whatsoever.  So of course the 1-point results look funky here.

The items were calibrated as two separate dichotomous items rather than one polytomous item, and the statistics turned out much better.  This still violates the IRT assumption but at least produces usable IRT parameters that can score students.  Nevertheless, I think the scoring of these items needs to be revisited so that the algorithm produces data which is able to be calibrated in IRT.  The entire goal of test items is to provide data points used to measure students; if the item is not providing usable data, then it is not worth using, no matter how good it seems in theory!

Want to improve the quality of your assessments?

Sign up for our newsletter and hear about our free tools, product updates, and blog posts first! Don’t worry, we would never sell your email address, and we promise not to spam you with too many emails.

Newsletter Sign Up
First Name*
Last Name*
Email*
Company*
Market Sector*
Lead Source

 Are states prepared?

Computerized Adaptive Testing (CAT) has been in the news for quite some time as a technology being adopted at the state, and district level to improve the academic assessment performance for K-12 students. Legislation such as NCLB, and a push for a common-core standard have put pressure on many states to find solutions to improve performance and accountability in the classroom. For many state agencies the solution is CAT testing. But are states prepared for this innovation?

In mid-November Governor Malloy of Connecticut appropriated $14 million dollars for the use of CAT testing in Connecticut K-12 schools. This funding will be used to administer CAT end of year tests in grades 3 through 8 and 11. The CAT tests will be aligned to Smarter Balanced Assessments standards used by several other state education agencies across the United States.

Although Connecticut’s commitment to CAT testing will help improve the education performance of Connecticut’s students, many are still concerned about the state’s shortfall in the amount of money being appropriated with computer and bandwidth upgrades. These upgrades are paramount for a successful adoption of CAT testing.  According to WTNH Hartford, the state was only able to distribute $10 million out of the $24 million appropriated for technology upgrades in schools. This poses a central issue, will CAT be implemented successfully?

CAT testing is the innovative technology our country needs to successfully improve our K-12 education. Connecticut’s plan for CAT testing is a great example of what many states will have to deal with very shortly if funding is appropriated for CAT testing.  For many states, significant state-wide upgrades on technology are inevitable for CAT testing to be successfully implemented.

It has already been apparent in many states currently using CAT testing that they are seeing improved performance. What is still not apparent is how CAT testing will be implemented successfully in states where significant technology upgrades are still needed.

Link to WTNH article:

http://www.wtnh.com/news/politics/malloy-14m-in-grant-money-for-schools

What is psychometrics?

Psychometrics is the science of assessment, that is, the testing of psychoeducational variables.  It is often confused with psychological assessment, but it is actually far wider. Psychometrics studies the assessment process itself (what makes a good test?) regardless of what the test is about.  As such, it also covers many other areas of testing, from K-12 math exams to a certification to be an Accountant to assessment of basic job skills to university admissions, and much more.

Psychometrics an essential aspect of assessment, but to most people it remains a black box. However, a basic understanding is important for anyone working in the testing industry, especially those developing or selling tests.

Psychometrics is centered around the concept of validity, which is the documentation that the interpretations you are making from test scores are actually supported.  There is a ton of work that goes into making high-quality exams.

This serves an extremely important purpose in society.  We use tests every day to make decisions about humans, from hiring a person to helping a 5th grader learn math to providing career guidance.  By using principles of engineering and science to improve these assessments, we are making those decisions more accurate, which can have far-reaching effects.

test development cycle

How can psychometrics help your organization?

Why is psychometrics so important, and how will it benefit your organization? There are two primary ways to implement better psychometrics in your organization: process improvement (typically implemented by psychometricians), and specially-designed software.

This article will outline some of the ways that your tests can be improved, but first, let me outline some of the things that psychometrics can do for you.

 

Define what should be covered by the test

Before writing any items, you need to define very specifically what will be on the test.  Psychometricians typically run a job analysis study to form a quantitative, scientific basis for the test blueprints.  A job analysis is necessary for a certification program to get accredited.

 

Improve development of test content

There is a corpus of scientific literature on how to develop test items that accurately measure whatever you are trying to measure.  This is not just limited to multiple-choice items, although that approach remains popular.  Psychometricians leverage their knowledge of best practices to guide the item authoring and review process in a way that the result is highly defensible test content.  Professional item banking software provides the most efficient way to develop high-quality content and publish multiple test forms, as well as store important historical information like item statistics.

 

Set defensible cutscores

Test scores are often used to classify candidates into groups, such as pass/fail (Certification/Licensure), hire/non-hire (Pre-Employment), and below-basic/basic/proficient/advanced (Education).  Psychometricians lead studies to determine the cutscores, using methodologies such as Angoff, Beuk, Contrasting-Groups, and Borderline.

 

Statistically analyze results to improve the quality of items and scores

Psychometricians are essential for this step, as the statistical analyses can be quite complex.  Smaller testing organizations typically utilize classical test theory, which is based on simple mathematics like proportions and correlations.  Large, high-profile organizations typically use item response theory, which is based on a type of nonlinear regression analysis.  Psychometricians evaluate overall reliability of the test, difficulty and discrimination of each item, distractor analysis, possible bias, multidimensionality, linking multiple test forms/years, and much more.  Software such as Iteman and Xcalibre is also available for organizations with enough expertise to run statistical analyses internally.

 

Establish and document validity

Validity is the evidence provided to support score interpretations.  For example, we might interpret scores on a test to reflect knowledge of English, and we need to provide documentation and research supporting this.  There are several ways to provide this evidence.  A straightforward approach is to establish content-related evidence, which includes the test definition, blueprints, and item authoring/review.  In some situations, criterion-related evidence is important, which directly correlates test scores to another variable of interest.  Delivering tests in a secure manner is also essential for validity.

 

Is there a lot of Math in Psychometrics?

Absolutely.  A large portion of the work involves the statistical analysis of exam data, as mentioned above.  Classical test theory uses basic math like proportions, averages, and correlations.  An example of this is below, where we are analyzing a test question to determine if it is good.  Here, we see that the majority of the examinees get the question correct (65%) and that it has a strongly positive point-biserial, which is good, given the low sample size in this case.

Iteman45-quantile-plot

Item response theory analyzes many of the same things, but with far more complex mathematics by fitting nonlinear models.  However, doing so provides a number of advantages.  It is much easier to equate across forms or years, build adaptive tests, and construct forms.

Xcalibre item response theory

Here’s an article that compares CTT to IRT, if you are interested in learning more.

 

Where is Psychometrics Used?

Certification

In certification testing, psychometricians develop the test via a documented chain of evidence following a sequence of research outlined by accreditation bodies, typically: job analysis, test blueprints, item writing and review, cutscore study, and statistical analysis.  Web-based item banking software like FastTest is typically useful because the exam committee often consists of experts located across the country or even throughout the world; they can then easily log in from anywhere and collaborate.

 

Pre-Employment

In pre-employment testing, validity evidence relies primarily on establishing appropriate content (a test on PHP programming for a PHP programming job) and the correlation of test scores with an important criterion like job performance ratings (shows that the test predicts good job performance).  Adaptive tests are becoming much more common in pre-employment testing because they provide several benefits, the most important of which is cutting test time by 50% – a big deal for large corporations that test a million applicants each year.  Adaptive testing is based on item response theory, and requires a specialized psychometrician as well as specially designed software like FastTest.

 

K-12 Education

Most assessments in education fall into one of two categories: lower-stakes formative assessment in classrooms, and higher-stakes summative assessments like year-end exams.  Psychometrics is essential for establishing the reliability and validity of higher-stakes exams, and on equating the scores across different years.  They are also important for formative assessments, which are moving towards adaptive formats because of the 50% reduction in test time, meaning that student spend less time testing and more time learning.

 

Universities

Universities typically do not give much thought to psychometrics even though a significant amount of testing occurs in higher education, especially with the move to online learning and MOOCs.  Given that many of the exams are high stakes (consider a certificate exam after completing a year-long graduate program!), psychometricians should be used in the establishment of legally defensible cutscores and in statistical analysis to ensure reliable tests, and professionally designed assessment systems used for developing and delivering tests, especially with enhanced security.

The No Child Left Behind (NCLB) Act is an important piece of US legislation that governs assessment in the K-12 education system.  It is currently up for re-authorization, and language is being considered that will specifically mention computerized adaptive testing (CAT).

“Adaptive testing is proven to be a more effective tool for assessing student performance and competence than standard paper-based testing that only shows whether a student is on grade level.”

-Rep. Tom Petri, R-Wisconsin

Adaptive testing and NCLB work well together, as the advantages of adaptive testing translate well into the classroom as well as to accountability systems.

CAT continues to grow more widespread, especially in relation to the SMARTER Balanced Consortium.  However, most online CAT delivery platforms remain too expensive for many school districts.  FastTest offers an affordable alternative that will deliver CAT assessments which help prepare students for this more sophisticated and precise form of educational assessment.

Read the full article from EdWeek here: http://blogs.edweek.org/edweek/DigitalEducation/2013/07/house_nclb_rewrite_contains_ad.html.

The quote below is from a newsletter released by the SMARTER Balanced Consortium.  Obviously, purchasing student assessments directly from the Consortium will remain expensive, though less expensive than what many states currently pay private vendors.

FastTest , on the other hand, remains exceptionally affordable.  You can utilize our adaptive testing platform to create formative assessments – with the same CAT algorithms, hence a perfect preparation for SMARTER summative assessments – at a fraction of the cost, especially if your district or state has their own item bank.  Contact us to learn how we can help.

Fiction: The costs of these tests are unknown.

Fact: Smarter Balanced has released cost estimates for its assessments that include expenses for ongoing research and development of the assessment system as well as test administration and scoring. The end-of-year summative assessment alone is estimated to cost $22.50 per student. The full suite of summative, interim, and formative assessments is estimated to cost $27.30 per student. These costs are less than the amount that two-thirds of the Consortium’s member states currently pay. These costs are estimates because a sizable portion of the cost is for test administration and scoring services that will not be provided by Smarter Balanced; states will either provide these services directly or procure them from vendors in the private sector.[/dropshadowbox]

The efforts of the SMARTER Balanced Assessment Consortium are leading to more mentions of computerized adaptive testing in the news.  I recently came across the following article that covers a paper by Mark Reckase, one of the most respected researchers in the field.

http://blogs.edweek.org/edweek/curriculum/2011/05/computer-adaptive_testing_pose.html

Want to improve the quality of your assessments?

Sign up for our newsletter and hear about our free tools, product updates, and blog posts first! Don’t worry, we would never sell your email address, and we promise not to spam you with too many emails.