Digital literacy assessments are a critical aspect of modern educational and workforce development initiatives, given today’s fast-paced and technology-driven world, where digital literacy is essential in one’s education, occupation, and even in daily life. Defined broadly as the ability to navigate, evaluate, and create information in digital formats, digital literacy is no longer a “nice-to-have”, but a “must-have” skill set. Measuring this complex construct requires strong validity documentation, and psychometrics provides the theoretical and practical tools to do so effectively. This blog delves into the intersection of digital literacy assessments and psychometrics, exploring the frameworks, challenges, and innovations shaping the field.

An important assessment in the field is the Programme for the International Assessment of Adult Competencies (PIAAC) which evaluates digital literacy across countries.  If you are interested in research on this topic, that site provides extensive documentation on how they were developed, as well as results data.

 

Understanding Digital Literacy

Digital literacy refers to the ability to use digital tools and technologies effectively to access, analyze, and create information. It encompasses a broad range of skills, from basic functions like using devices and navigating the Internet, to more advanced skills such as Cybersecurity awareness, digital communication, and content creation. According to frameworks like the European Commission’s DIGCOMP and UNESCO’s Global Framework, digital literacy includes:

  1. Information Literacy: The ability to locate, evaluate, and use information effectively.
  2. Communication and collaboration: The ability to interact, communicate and collaborate with others through the use of digital technologies.
  3. Media Literacy: Understanding and critically analyzing media content and formats.
  4. Technical Literacy: Proficiency in using devices, software, and platforms.
  5. Digital Safety: Awareness of cybersecurity and ethical considerations in the digital space.
  6. Problem solving: The ability to identify needs and problems and resolve them in different digital environments.

These subdomains highlight the multidimensional nature of digital literacy, making it a challenging construct to measure. However, with clear frameworks and psychometric methodologies, we can create assessments that not only evaluate these skills but also guide their development.

 

Digital Literacy Statistics

Eurostat found that “54% in the EU aged 16 to 74 had at least basic overall digital skills in 2021” and the U.S. Department of Education estimated that “16 percent of adults (31.8 million Americans) lack sufficient comfort or competence with technology to use a computer” in 2012 (page 3).

To elaborate on multinational digital literacy statistics, the National Center for Education Statistics compared the average scores of adults ages 16-65 in 26 jurisdictions including the United States and identified “a mixed picture, with U.S. adults scoring higher than the International Average in Literacy, but lower in both Numeracy and Digital Problem Solving.”

As new technological innovations emerge, new skills must be acquired. For example, one must possess skills beyond just knowing how to type on a keyboard. One must also understand how to evaluate information found online, how to communicate securely, and how to create digital content. Digital literacy is a multifaceted competency that affects one’s personal, as well as professional, growth.

What is the Importance of Digital Literacy Assessments?

Digital Literacy Assessments are the process of evaluating an individual’s proficiency in using technologies and tools. There are several reasons for the importance of this type of assessment:

  1. Ability to Measure Skills Levels: A digital literacy assessment helps determine where an individual stands in terms of their digital skills. It allows educators, employers, and policymakers to determine whether individuals are adequately prepared for the digital demands of today.
  2. Targeted Training: After analyzing the results of an assessment, tailored training programs can be developed to improve specific areas of an individual’s digital literacy. For example, an employee struggling with Cybersecurity can receive focused training to improve their competence and understanding of this area.
  3. Empowering Learnings and Workers: Understanding one’s digital literacy level can allow individuals to take control of their learning and development, leading to improved confidence in using technology. This can reduce the digital divide that hinders groups, such as the poor, in their efforts to access education and employment
  4. Enhancing Education and Professional Outcomes: Digital literacy directly impacts academic success and workplace productivity. For example, a student that is well-versed in using a word processor will find writing essays and workplace reports to be an easier task than a student with introductory knowledge of the same. The ability to assess and improve skills such as these ensures that individuals are better equipped to excel in both their academic and professional lives.

 

Types of Digital Literacy Assessments

Digital literacy assessments can take multiple forms, ranging from self-assessments to formal evaluations. Below are a few types of digital literacy assessments that are commonly used:

digital literacy assessment & psychometrics

    1. Self-Assessment Questionnaires: These surveys often ask individuals to rate their own digital skills across various areas such as Internet navigation, software use, and online communication. While these are not as accurate as other methods, self-assessments can give estimates of an individual’s strengths and weaknesses pertaining to their digital skills.
    2. Standardized Tests: Some organizations and educational institutions use standardized tests, which evaluate digital literacy in a controlled setting. These assessments often measure proficiency in tasks such as document creation, online research, and/or responsible use of social media.
    3. Performance-Based Assessments: These simulate real-world tasks to measure practical skills. For example:
      • Using a search engine to find credible information.
      • Identifying and responding to phishing emails.
      • Creating digital content like a blog post or infographic.

      Performance-based assessments are often considered the gold standard because they reflect authentic digital tasks. However, they can be resource-intensive to develop and score.

    4. Knowledge Tests: Traditional knowledge-based tests evaluate understanding of digital concepts, such as:
        • What is a secure password?
        • How do algorithms affect social media feeds?

      Though straightforward to implement, these tests may not fully capture applied skills.

    5. Project-Based Assessments: These involve more extensive tasks in which individuals create digital content or solve real-world problems. These can include designing a website, developing a mobile app, or creating a digital marketing plan. These provide a hands-on way to assess how well individuals can apply their digital knowledge.
    6. Behavioral Data Analysis: This innovative approach uses data from digital interactions (e.g., how users navigate websites or apps) to infer literacy levels. It offers rich insights but raises ethical concerns about privacy.

 

Psychometrics in Digital Literacy

Psychometrics, the science of measurement, provides tools to ensure digital literacy assessments are valid, reliable, and fair. Here’s how psychometric principles are applied:

1. Reliability: Reliability ensures consistent results across different administrations. For example:

High reliability is critical for confidence in assessment results.

2. Validity: Validity ensures the test measures what it claims to measure. Psychometricians focus on:

  • Content Validity: Does the test cover all aspects of digital literacy?
  • Construct Validity: Does the test align with theoretical models?
  • Criterion Validity: Do test scores correlate with real-world performance?

A test measuring digital literacy should reflect not just theoretical understanding but also practical application.

3. Item Response Theory (IRT): IRT models how individual test items relate to the overall ability being measured. It allows for:

  • Adaptive testing, where questions adjust based on the test-taker’s responses.
  • More precise scoring by accounting for item difficulty and discrimination.

4. Addressing bias: Bias in assessments can arise from socioeconomic, cultural, or technical differences. Psychometricians use techniques like Differential Item Function (DIF) analysis to identify and mitigate bias, ensuring fairness.

How to Implement Digital Literacy Assessments

Follow these suggested steps to implement an effective digital literacy assessment:

      1. Define the Scope: Identify which digital literacy skills are most relevant for the context – for example, whether these skills will be used in an educational institution, in a corporate setting, or for general purposes.
      2. Choose the Right Tool: Select the appropriate assessment method based on the needs of the individual being assessed. Consider using a combination of tests, performance tasks, and self-assessments.
      3. Analyze Results: Review the results of the assessment to identify strengths and weaknesses to guide future training and support needs.
      4. Provide Feedback: Offer personalized feedback to individuals, highlighting areas of improvement and offering resources for further learning.
      5. Regular Re-assessment: With the continuous evolution of digital technology, it is crucial to continually assess digital literacy to ensure that individuals receive new skills and the ability to use new tools.

 

Innovations in Digital Literacy Assessment

1. Gamified Assessments

Gamification makes assessments engaging and interactive. For example:

        • A cybersecurity game in which users identify phishing attempts or secure accounts.
        • A digital collaboration in which users solve problems in a virtual workspace.

2. Adaptive Testing

Adaptive tests use algorithms to tailor questions based on a test-taker’s responses. This approach:

        • Reduces test length without sacrificing reliability.
        • Provides a more personalized assessment experience.

3. Data-Driven Insights

AI and machine learning analyze patterns in test responses and digital interactions. For example:

        • Tracking how users evaluate online information to identify gaps in critical thinking.
        • Analyzing social media behavior for insights into media literacy.

4. Cross-Cultural and Global Tools

Global frameworks require assessments that work across diverse cultural contexts. Localization involves:

        • Translating assessments into multiple languages.
        • Adapting scenarios to reflect local digital practices.

Conclusion

In today’s increasingly technology-driven world, digital literacy is a vital skill for everyone to have. Digital literacy assessments are invaluable tools for understanding how skillfully individuals can navigate the digital landscape and where improvements can be made. By accurately assessing digital skills and providing targeting training, we can ensure that people of all ages and backgrounds are prepared for their futures. As new technologies are frequently created, individuals’ digital literacy skills must be frequently updated.

Learning Management System Avatar

In today’s digital-first world, educational institutions and organizations are leveraging technology to deliver training and instruction in more dynamic and efficient ways. A core component of this shift is the Learning Management System (LMS). According to this website, the global LMS market reflects the following growing adoption: valued at $16.1 billion in 2022, it reached $24.05 billion in 2024 and is projected to expand further to $61.8 billion by 2032. This growth corresponds to a robust compound annual growth rate (CAGR) of 14.8% over the forecast period (2024–2032). But what exactly is an LMS, and why is it so critical to modern education and training? Let’s explore this transformative technology and its key features.

Understanding the Basics: What is a Learning Management System?

LMS is a software application or platform used to plan, implement, and assess a specific learning process. It provides educators, administrators, and learners with a single location for communication, course material, and assessment tools. LMS platforms are commonly used in schools, universities, corporate training programs, and online learning environments. LMS have faced a massive growth in usage due to the emphasis on remote learning during the COVID-19 pandemic. 

The core function of an LMS is to make educational content accessible to users anytime, anywhere, and often at their own pace. This flexibility is crucial in accommodating the diverse needs of learners and organizations.

Key Features of a Learning Management System

Learning Management Systems are designed to simplify the process of delivering training and educational content. Here are some of the primary features that make LMS platforms so valuable:

LMS - Connect

  1. Course Management: Create, organize, and manage courses with ease. This feature often includes the ability to upload different types of content, such as videos, presentations, PDFs, and quizzes.
  2. Assessment and Tracking: LMS allows for automated assessments and grading. It can track progress, monitor engagement, and provide insights through data analytics.
  3. User Management: Manage user roles and permissions to control access to different parts of the platform. Instructors, administrators, and learners each have unique permissions and access.
  4. Communication Tools: Many LMS platforms include integrated messaging, discussion forums, and video conferencing, fostering communication between learners and educators.
  5. Learning Analytics: LMS often incorporates dashboards to track student progress and performance. LMS can report key items like: completion rates and success likelihood. Administrators, educators and learners can use these metrics to better understand gaps in knowledge.

Examples of Popular Learning Management System Platforms

LMS - Modules

There are hundreds of LMS platforms available on the market, catering to various educational and corporate needs. The options range from open-source platforms like Moodle and Chamilo, which offer extensive customization but require technical expertise, to commercial solutions such as Blackboard and Canvas, known for their robust feature sets and support services. Pricing can vary significantly based on factors like the number of users, features, and deployment options.

Some platforms, like Google Classroom, are free for qualifying institutions. There are three paid plans. First, the Google Workspace for Education Standard plan costs $3 per student, per year and adds on a security center, advanced device and app management features, Gmail and Classroom logs for export into BigQuery, and audit logs. Then there’s the Teaching and Learning Upgrade plan that costs $4 per license, per month and includes additional features like advanced Google Meet features, unlimited originality reports and the ability to check for peer matches across a private repository. Finally, the Google Workspace for Education Plus plan costs $5 per student, per year and includes all of the features of the other plans, plus live streams with up to 100,000 in-domain viewers, syncing rosters from SISs to Google Classroom, personalized cloud search and prioritized support (Better Buys, 2023).

It’s essential to evaluate your needs and budget before choosing an LMS, as costs can quickly escalate with additional modules and support services.

Below are some widely used options:

  • Moodle: An open-source platform favored by educational institutions due to its flexibility and community support. Moodle is highly customizable and can be tailored to meet specific learning needs.

LMS - Moodle

  • Canvas: A popular choice for both K-12 and higher education, Canvas offers a clean interface and extensive integrations with third-party tools, making it ideal for tech-savvy institutions.

LMS - Canvas

  • Blackboard: Widely adopted by universities and colleges, Blackboard focuses on providing comprehensive features for large-scale educational organizations.

LMS - Blackboard

  • Google Classroom: A simple and intuitive tool, Google Classroom is popular in K-12 settings. It integrates seamlessly with other Google products, making it a convenient option for schools already using Google Workspace.

LMS - Google Classroom

When implementing an LMS, there are several additional expenses to consider beyond the platform’s base pricing. These include:

  1. Implementation and Setup Costs: Depending on the complexity of the LMS and your organization’s specific requirements, there may be initial setup costs. This could involve customizing the platform, integrating it with existing systems, and migrating existing content and user data.
  2. Training and Support: It’s crucial to allocate a budget for training administrators, instructors, and learners to use the LMS effectively. Some platforms offer onboarding and support as part of their package, while others charge separately for these services.
  3. Content Creation and Licensing: Developing new courses, multimedia content, or interactive assessments can be time-consuming and expensive. Additionally, if you’re using third-party content or e-learning modules, you may need to pay licensing fees.
  4. Maintenance and Upgrades: Keeping the LMS up-to-date with software patches, security updates, and new feature releases often incurs ongoing costs. Organizations that opt for self-hosted solutions will also need to consider server maintenance and IT support costs.
  5. Integration with Other Tools: If you plan to integrate the LMS with other systems like HR software, CRM platforms, or data analytics tools, there may be costs associated with custom integrations or purchasing additional licenses for these tools.
  6. Compliance and Security: Ensuring that your LMS complies with regulations (e.g., GDPR, ADA) may involve additional expenses for compliance assessments, legal consultations, and security enhancements.
  7. Scalability: If your organization grows, you might need to expand your LMS capacity, which could mean upgrading your plan, adding new features, or expanding server capacity—all of which can increase costs.

By considering these additional expenses, organizations can develop a more accurate budget and avoid unexpected costs during the LMS implementation process.

Why Your Organization Needs a Learning Management System

Whether you’re running a university, a corporate training program, or a small online course, an LMS can streamline your educational process. With the ability to host and organize content, track learner progress, and provide insights through analytics, an LMS offers much more than just a place to upload learning materials. It can be a strategic tool to enhance the learning experience, increase engagement, and ensure that your educational objectives are met.

Advantages of Using a Learning Management System

Learning Management Systems have become a cornerstone for modern education and corporate training environments. Here are six key benefits that define the value and effectiveness of an LMS.

  1. Interoperability: Seamless Integration Across Systems

One of the most significant advantages of an LMS is its ability to integrate seamlessly with other systems through standardized data formats and protocols. LMS platforms adhere to standards such as SCORM (Sharable Content Object Reference Model), xAPI (Experience API), and LTI (Learning Tools Interoperability), which enable the exchange of content and data between different applications. This level of interoperability simplifies the process of sharing resources and tracking learner progress across multiple platforms, ensuring a cohesive learning experience.

  1. Accessibility: Inclusive Learning for All Students

Accessibility is a critical factor in modern education, and LMS platforms are designed to support students with diverse needs, including those with disabilities. Most LMS platforms adhere to accessibility standards like the Web Content Accessibility Guidelines (WCAG), providing features such as screen reader support, keyboard navigation, and closed captioning for videos. Consistent layouts and interfaces make it easier for all users to navigate the platform and access content. By fostering an inclusive environment, an LMS can help organizations comply with legal requirements such as the Americans with Disabilities Act (ADA) and ensure that learning opportunities are available to everyone, regardless of physical or cognitive limitations.

  1. Reusability: Maximizing the Value of Educational Content

Reusability is a key strength of LMS platforms, enabling organizations to develop educational content once and reuse it across different courses, training programs, or departments. This feature significantly reduces the time and costs associated with creating new content for each learning module. Content created within an LMS can be structured into reusable learning objects that can be easily updated, repurposed, and shared. This flexibility is especially valuable for large organizations and educational institutions looking to standardize training materials and curricula while keeping them up-to-date with minimal effort.

  1. Durability: A Sustainable Solution for Long-Term Growth

As technology continues to transform education and training, the LMS market is poised for significant growth. Reports suggest that the global LMS market is expected to achieve a compound annual growth rate (CAGR) of 17.1% by 2028 (Reports, Valuates, 2022). This growth is driven by the increasing demand for flexible learning solutions, remote training, and the incorporation of new technologies like artificial intelligence and virtual reality into the learning process. By choosing a durable and scalable LMS, organizations can ensure that their investment remains relevant and adaptable to future educational trends and technologies.

  1. Maintainability: Ensuring a Continuously Evolving Platform

LMS platforms are designed with maintainability in mind, allowing developers to make updates, add new features, and fix bugs without disrupting the user experience. This is crucial in a rapidly changing educational landscape where learner needs and technological standards are constantly evolving. With cloud-based LMS platforms, maintenance is often handled automatically by the provider, ensuring that the system is always up-to-date with the latest security patches and performance optimizations. This continuous improvement cycle enables organizations to keep their learning environments modern, secure, and aligned with user expectations.

  1. Adaptability: Evolving with the Needs of Learners

Since their inception in the 1990s, LMS platforms have evolved significantly to keep up with changing societal needs and educational practices. Modern LMS platforms are highly adaptable, supporting a wide range of learning methodologies, such as blended learning, flipped classrooms, and competency-based learning. They also offer extensive customization options, allowing organizations to tailor the platform’s look and feel to match their branding and pedagogical approaches. As educational trends and technologies continue to evolve, LMS platforms are equipped to integrate emerging tools and approaches, such as gamification, microlearning, and artificial intelligence-driven personalized learning paths, making them a future-proof solution for delivering high-quality education and training.

By understanding these key advantages, organizations and institutions can leverage LMS platforms to create impactful learning experiences that not only meet current needs but are also prepared for the future of education and training.

Weaknesses of Using a Learning Management System

While Learning Management Systems offer many benefits, there are some limitations to be aware of, especially in specific contexts where advanced features are needed. Here are three key weaknesses to consider:

  1. Limited Functionality for Assessments
    Many LMS platforms lack sophisticated assessment tools. While most systems support basic quizzes and exams, they may not include advanced features like item banking, Item Response Theory (IRT), or adaptive testing capabilities. This limits their use for institutions or organizations looking to implement more complex testing methodologies, such as those used in standardized assessments or psychometric evaluations. In such cases, additional software or integrations with specialized assessment platforms may be required.
  2. Ineffective Student Management
    An LMS is not designed to function as a full-fledged Student Management System (SMS). It typically lacks the robust database management features necessary for handling complex student records, attendance tracking, and detailed progress reporting. This limitation means that many organizations must integrate the LMS with a separate SMS or a Customer Relationship Management (CRM) system to gain comprehensive student management capabilities. Without these integrations, tracking student progress and managing enrollment data can become cumbersome.
  3. Lack of e-Commerce Functionality
    Not all LMS platforms include built-in e-Commerce capabilities, making it difficult to monetize courses directly within the system. For organizations looking to sell courses, certifications, or training materials, the lack of e-Commerce features can be a significant drawback. While some platforms offer plugins or third-party integrations to support payment processing and course sales, these solutions can add complexity and additional costs to the system. If selling courses or certifications is a priority, it’s crucial to choose an LMS with robust e-Commerce support or consider integrating it with an external e-Commerce platform.
  4. Steep Learning Curve for Administrators and Instructors
    LMS platforms can be complex to navigate, especially for administrators and instructors who may not have a technical background. Setting up courses, managing user roles, configuring permissions, and integrating third-party tools often require specialized training and expertise. This learning curve can lead to inefficiencies, particularly in organizations without dedicated IT or instructional design support. Training costs and time investment can add up, reducing the overall efficiency of the platform.
  5. High Implementation and Maintenance Costs
    Implementing an LMS can be expensive, especially when accounting for customization, setup, training, and content creation. Self-hosted solutions may require ongoing IT support, server maintenance, and regular updates, all of which add to the cost. Even cloud-based solutions can have hidden fees for additional features, support, or upgrades. For organizations with limited budgets, these expenses can quickly become a barrier to effective implementation and long-term use.
  6. User Engagement and Retention Challenges
    While LMS platforms offer tools for tracking engagement and participation, they can sometimes struggle to keep learners motivated, especially in self-paced or online-only environments. If the courses are not designed with engaging content or interactive features, learners may lose interest and drop out. This issue is compounded when the LMS interface is not user-friendly, leading to poor user experience and decreased retention rates.
  7. Lack of Support for Personalized Learning Paths
    While some LMS platforms offer rudimentary support for personalized learning, most struggle to deliver truly customized learning paths that adapt to individual learner needs. This limitation can hinder the ability to address diverse learning styles, knowledge levels, or specific skill gaps. As a result, organizations may need to supplement their LMS with other tools or platforms that provide adaptive learning technologies, which adds complexity to the learning ecosystem.
  8. Data Privacy and Compliance Concerns
    Depending on the region and type of data being stored, LMS platforms may not always comply with data privacy regulations such as GDPR, CCPA, or FERPA. Organizations must carefully evaluate the platform’s data security features and ensure compliance with relevant standards. Failure to meet these requirements can result in significant legal and financial repercussions.

Final Thoughts

Understanding what a Learning Management System is and how it can benefit your organization is crucial in today’s education and training landscape. With platforms like Moodle, Canvas, and Blackboard, it’s easier than ever to create engaging and effective learning experiences. Ready to explore your options? Check out some of these LMS comparisons to find the best platform for your needs.

An LMS isn’t just a tool—it’s a bridge to more effective and scalable learning solutions.

References

Reports, Valuates. (2022). “Learning Management System (LMS) Market to Grow USD 40360 Million by 2028 at a CAGR of 17.1% | Valuates Reports”. www.prnewswire.com (Press release). https://www.prnewswire.com/news-releases/learning-management-system-lms-market-to-grow-usd-40360-million-by-2028-at-a-cagr-of-17-1–valuates-reports-301588142.html

Better buys. (2023). How Much Does an LMS Cost? 2024 Pricing Guide. https://www.betterbuys.com/lms/lms-pricing-guide/

Equation editor item type

Technology-enhanced items are assessment items (questions) that utilize technology to improve the interaction of a test question in digital assessment, over and above what is possible with paper.  Tech-enhanced items can improve examinee engagement (important with K12 assessment), assess complex concepts with higher fidelity, improve precision/reliability, and enhance face validity/sellability. 

To some extent, the last word is the key one; tech-enhanced items simply look sexier and therefore make an assessment platform easier to sell, even if they don’t actually improve assessment.  I’d argue that there are also technology-enabled items, which are distinct, as discussed below.

What is the goal of technology enhanced items?

The goal is to improve assessment, by increasing things like reliability/precision, validity, and fidelity. However, there are a number of TEIs that is actually designed more for sales purposes than psychometric purposes. So, how to know if TEIs improve assessment?  That, of course, is an empirical question that is best answered with an experiment.  But let me suggest one metric address this question: how far does the item go beyond just reformulating a traditional item format to use current user-interface technology?  I would define the reformulating of traditional format to be a fake TEI while going beyond would define a true TEI.

An alternative nomenclature might be to call the reformulations technology-enhanced items and the true tech usage to be technology-enabled items (Almond et al, 2010; Bryant, 2017), as they would not be possible without technology.

A great example of this is the relationship between a traditional multiple response item and certain types of drag and drop items.  There are a number of different ways that drag and drop items can be created, but for now, let’s use the example of a format that asks the examinee to drag text statements into a box. 

An example of this is K12 assessment items from PARCC that ask the student to read a passage, then ask questions about it.

drag drop sequence

The item is scored with integers from 0 to K where K is the number of correct statements; the integers are often then used to implement the generalized partial credit model for final scoring.  This would be true regardless of whether the item was presented as multiple response vs. drag and drop. The multiple response item, of course, could just as easily be delivered via paper and pencil. Converting it to drag and drop enhances the item with technology, but the interaction of the student with the item, psychometrically, remains the same.

Some True TEIs, or Technology Enabled Items

Of course, the past decade or so has witnessed stronger innovation in item formats. Gamified assessments change how the interaction of person and item is approached, though this is arguably not as relevant for high stakes assessment due to concerns of validity. There are also simulation items. For example, a test for a construction crane operator might provide an interface with crane controls and ask the examinee to complete a tasks. Even at the K-12 level there can be such items, such as the simulation of a science experiment where the student is given various test tubes or other instruments on the screen.

Both of these approaches are extremely powerful but have a major disadvantage: cost. They are typically custom-designed. In the case of the crane operator exam or even the science experiment, you would need to hire software developers to create this simulation. There are now some simulation-development ecosystems that make this process more efficient, but the items still involve custom authoring and custom scoring algorithms.

To address this shortcoming, there is a new generation of self-authored item types that are true TEIs. By “self-authored” I mean that a science teacher would be able to create these items themselves, just like they would a multiple choice item. The amount of technology leveraged is somewhere between a multiple choice item and a custom-designed simulation, providing a compromise of reduced cost but still increasing the engagement for the examinee. A major advantage of this approach is that the items do not need custom scoring algorithms, and instead are typically scored via point integers, which enables the use of polytomous item response theory.

Are we at least moving forward?  Not always!

There is always pushback against technology, and in this topic the counterexample is the gridded item type.  It actually goes in reverse of innovation, because it doesn’t take a traditional format and reformulate it for current UI. It actually ignores the capabilities of current UI (actually, UI for the past 20+ years) and is therefore a step backward. With that item type, students are presented a bubble sheet from a 1960s style paper exam, on a computer screen, and asked to fill in the bubbles by clicking on them rather than using a pencil on paper.

Another example is the EBSR item type from the artist formerly known as PARCC. It was a new item type that intended to assess deeper understanding, but it did not use any tech-enhancement or -enablement, instead asking two traditional questions in a linked manner. As any psychometrician can tell you, this approach ignored basic assumptions of psychometrics, so you can guess the quality of measurement that it put out.

How can I implement TEIs?

It takes very little software development expertise to develop a platform that supports multiple choice items. An item like the graphing one above, though, takes substantial investment. So there are relatively few platforms that can support these, especially with best practices like workflow item review or item response theory. 

parcc ebsr items

The Partnership for Assessment of Readiness for College and Careers (PARCC) is a consortium of US States working together to develop educational assessments aligned with the Common Core State Standards.  This is a daunting task, and PARCC is doing an admirable job, especially with their focus on utilizing technology.  However, one of the new item types has a serious psychometric fault that deserves a caveat with regards to scoring and validation.

What is an Evidence-Based Selected-­Response (EBSR) question?

The item type is an “Evidence-Based Selected-­Response” (PARCC EBSR) item format, commonly called a Part A/B item or Two-Part item.  The goal of this format is to delve deeper into student understanding, and award credit for deeper knowledge while minimizing the impact of guessing.  This is obviously an appropriate goal for assessment.  To do so, the item is presented as two parts to the student, where the first part asks a simple question and the second part asks for supporting evidence to their answer in Part A.  Students must answer Part A correctly to receive credit on Part B.  As described on the PARCC website:

In order to receive full credit for this item, students must choose two supporting facts that support the adjective chosen for Part A. Unlike tests in the past, students may not guess on Part A and receive credit; they will only receive credit for the details they’ve chosen to support Part A.

How EBSR items are scored

While this makes sense in theory, it leads to problem in data analysis, especially if using Item Response Theory (IRT). Obviously, this violates the fundamental assumption of IRT: local independence (items are not dependent on each other).  So when working with a client of mine, we decided to combine it into one multi-point question, which matches the theoretical approach PARCC EBSR items are taking.  The goal was to calibrate the item with Muraki’s Generalized Partial Credit Model (GPCM), which is the standard approach used to analyze polytomous items in K12 assessment (learn more here).  The GPCM tries to order students based on the points they earn: 0 point students tend to have the lowest ability, 1 point students of moderate ability, and 2 point students are of the highest ability.  Should be obvious, right?  Nope.

The first thing we noticed was that some point levels had very small sample sizes.  Suppose that Part A is 1 point and Part B is 1 point (select two evidence pieces but must get both).  Most students will get 0 points or 2 points.  Not many will receive 1 point.  We thought about it, and realized that the only way to earn 1 point is to guess Part A but select no correct evidence or only select one evidence point.  This leads to issues with the GPCM.

Using the Generalized Partial Credit Model

Even when there was sufficient N at each level, we found that the GPCM had terrible fit statistics, meaning that the item was not performing according to the model described above.  So I ran  Iteman, our classical analysis software, to obtain quantile plots that approximate the polytomous IRFs without imposing the GPCM modeling.  I found that in the 0-2 point items tend to have the issue where not many students get 1 point, and moreover the line for them is relatively flat.  The GPCM assumes that it is relatively bell-shaped.  So the GPCM is looking for where the drop-offs are in the bell shape, crossing with adjacent CRFs – the thresholds – and they aren’t there.  The GPCM would blow up, usually not even estimating thresholds in correct ordering.

PARCC EBSR Graphs

So I tried to think of this from a test development perspective.  How do students get 1 point on these PARCC EBSR items?  The only way to do so is to get Part A right but not Part B.  Given that Part B is the reason for Part A, this means this group is students who answer Part A correctly but don’t know the reason, which means they are guessing.  It is then no surprise that the data for 1-point students is in a flat line – it’s just like the c parameter in the 3PL.  So the GPCM will have an extremely tough time estimating threshold parameters.

Why EBSR items don’t work

From a psychometric perspective, point levels are supposed to represent different levels of ability.  A 1-point student should be higher ability than a 0-point student on this item, and a 2-point student of higher ability than a 1-point student.  This seems obvious and intuitive.  But this item, by definition, violates the idea that a 1-point student should have higher ability than a 0-point student.  The only way to get 1 point is to guess the first part – and therefore not know the answer and are no different than the 0-point examinees whatsoever.  So of course the 1-point results look funky here.

The items were calibrated as two separate dichotomous items rather than one polytomous item, and the statistics turned out much better.  This still violates the IRT assumption but at least produces usable IRT parameters that can score students.  Nevertheless, I think the scoring of these items needs to be revisited so that the algorithm produces data which is able to be calibrated in IRT.

The entire goal of test items is to provide data points used to measure students; if the Evidence-Based Selected-­Response item type is not providing usable data, then it is not worth using, no matter how good it seems in theory!

test-scaling

Scaling is a psychometric term regarding the establishment of a score metric for a test, and it often has two meanings. First, it involves defining the method to operationally scoring the test, establishing an underlying scale on which people are being measured.  A common example is the T-score, which transforms raw scores into a standardized scale with a mean of 50 and a standard deviation of 10, making it easier to compare results across different populations or test forms.  It also refers to score conversions used for reporting scores, especially conversions that are designed to carry specific information.  The latter is typically called scaled scoring.

Examples of Scaling

You have all been exposed to this type of scaling, though you might not have realized it at the time. Most high-stakes tests like the ACT, SAT, GRE, and MCAT are reported on scales that are selected to convey certain information, with the actual numbers selected more or less arbitrarily. The SAT and GRE have historically had a nominal mean of 500 and a standard deviation of 100, while the ACT has a nominal mean of 18 and standard deviation of 6. These are actually the same scale, because they are nothing more than a converted z-score (standard or zed score), simply because no examinee wants to receive a score report that says you got a score of -1. The numbers above were arbitrarily selected, and then the score range bounds were selected based on the fact that 99% of the population is within plus or minus three standard deviations. Hence, the SAT and GRE range from 200 to 800 and the ACT ranges from 0 to 36. This leads to the urban legend of receiving 200 points for writing your name correctly on the SAT; again, it feels better for the examinee. A score of 300 might seem like a big number and 100 points above the minimum, but it just means that someone is in the 3rd percentile.

Now, notice that I said “nominal.” I said that because the tests do not actually have those means observed in samples, because the samples have substantial range restriction. Because these tests are only taken by students serious about proceeding to the next level of education, the actual sample is of higher ability than the population. The lower third or so of high school students usually do not bother with the SAT or ACT. So many states will have an observed average ACT of 21 and standard deviation of 4. This is an important issue to consider in developing any test. Consider just how restricted the population of medical school students is; it is a very select group.

How can I select a score scale?

score-scale

For various reasons, actual observed scores from tests are often not reported, and only converted scores are reported.  If there are multiple forms which are being equated, scaling will hide the fact that the forms differ in difficulty, and in many cases, differ in cutscore.  Scaled scores can facilitate feedback.  They can also help the organization avoid explanations of IRT scoring, which can be a headache to some.

When deciding on the conversion calculations, there are several important questions to consider.

First, do we want to be able to make fine distinctions among examinees? If so, the range should be sufficiently wide. My personal view is that the scale should be at least as wide as the number of items; otherwise you are voluntarily giving up information. This in turn means you are giving up variance, which makes it more difficult to correlate your scaled scores with other variables, like the MCAT is correlated with success in medical school. This, of course, means that you are hampering future research – unless that research is able to revert back to actual observed scores to make sure all information possible is used. For example, supposed a test with 100 items is reported on a 5-point grade scale of A-B-C-D-F. That scale is quite restricted, and therefore difficult to correlate with other variables in research. But you have the option of reporting the grades to students and still using the original scores (0 to 100) for your research.

Along the same lines, we can swing completely in the other direction. For many tests, the purpose of the test is not to make fine distinctions, but only to broadly categorize examinees. The most common example of this is a mastery test, where the examinee is being assessed on their mastery of a certain subject, and the only possible scores are pass and fail. Licensure and certification examinations are an example. An extension of this is the “proficiency categories” used in K-12 testing, where students are classified into four groups: Below Basic, Basic, Proficient, and Advanced. This is used in the National Assessment of Educational Progress. Again, we see the care taken for reporting of low scores; instead of receiving a classification like “nonmastery” or “fail,” the failures are given the more palatable “Below Basic.”

Another issue to consider, which is very important in some settings but irrelevant in others, is vertical scaling. This refers to the chaining of scales across various tests that are at quite different levels. In education, this might involve linking the scales of exams in 8th grade, 10th grade, and 12th grade (graduation), so that student progress can be accurately tracked over time. Obviously, this is of great use in educational research, such as the medical school process. But for a test to award a certification in a medical specialty, it is not relevant because it is really a one-time deal.

Lastly, there are three calculation options: pure linear (ScaledScore = RawScore * Slope + Intercept), standardized conversion (Old Mean/SD to New Mean/SD), and nonlinear approaches like Equipercentile.

Perhaps the most important issue is whether the scores from the test will be criterion-referenced or norm-referenced. Often, this choice will be made for you because it distinctly represents the purpose of your tests. However, it is quite important and usually misunderstood, so I will discuss this in detail.

Criterion-Referenced vs. Norm-Referenced

data-analysis-norms

This is a distinction between the ways test scores are used or interpreted. A criterion-referenced score interpretation means that the score is interpreted with regards to defined content, blueprint, or curriculum (the criterion), and ignores how other examinees perform (Bond, 1996). A classroom assessment is the most common example; students are scored on the percent of items correct, which is taken to imply the percent of the content they have mastered. Conversely, a norm-referenced score interpretation is one where the score provides information about the examinee’s standing in the population, but no absolute (or ostensibly absolute) information regarding their mastery of content. This is often the case with non-educational measurements like personality or psychopathology. There is no defined content which we can use as a basis for some sort of absolute interpretation. Instead, scores are often either z-scores or some linear function of z-scores.  IQ is historically scaled with a mean of 100 and standard deviation of 15.

It is important to note that this dichotomy is not a characteristic of the test, but of the test score interpretations. This fact is more apparent when you consider that a single test or test score can have several interpretations, some of which are criterion-referenced and some of which are norm-referenced. We will discuss this deeper when we reach the topic of validity, but consider the following example. A high school graduation exam is designed to be a comprehensive summative assessment of a secondary education. It is therefore specifically designed to cover the curriculum used in schools, and scores are interpreted within that criterion-referenced context. Yet scores from this test could also be used for making acceptance decisions at universities, where scores are only interpreted with respect to their percentile (e.g., accept the top 40%). The scores might even do a fairly decent job at this norm-referenced application. However, this is not what they are designed for, and such score interpretations should be made with caution.

Another important note is the definition of “criterion.” Because most tests with criterion-referenced scores are educational and involve a cutscore, a common misunderstanding is that the cutscore is the criterion. It is still the underlying content or curriculum that is the criterion, because we can have this type of score interpretation without a cutscore. Regardless of whether there is a cutscore for pass/fail, a score on a classroom assessment is still interpreted with regards to mastery of the content.  To further add to the confusion, Industrial/Organizational psychology refers to outcome variables as the criterion; for a pre-employment test, the criterion is typically Job Performance at a later time.

This dichotomy also leads to some interesting thoughts about the nature of your construct. If you have a criterion-referenced score, you are assuming that the construct is concrete enough that anybody can make interpretations regarding it, such as mastering a certain percentage of content. This is why non-concrete constructs like personality tend to be only norm-referenced. There is no agreed-upon blueprint of personality.

Multidimensional Scaling

camera lenses for multidimensional item response theory

An advanced topic worth mentioning is multidimensional scaling (see Davison, 1998). The purpose of multidimensional scaling is similar to factor analysis (a later discussion!) in that it is designed to evaluate the underlying structure of constructs and how they are represented in items. This is therefore useful if you are working with constructs that are brand new, so that little is known about them, and you think they might be multidimensional. This is a pretty small percentage of the tests out there in the world; I encountered the topic in my first year of graduate school – only because I was in a Psychological Scaling course – and have not encountered it since.

Summary of test scaling

Scaling is the process of defining the scale that on which your measurements will take place. It raises fundamental questions about the nature of the construct. Fortunately, in many cases we are dealing with a simple construct that has a well-defined content, like an anatomy course for first-year medical students. Because it is so well-defined, we often take criterion-referenced score interpretations at face value. But as constructs become more complex, like job performance of a first-year resident, it becomes harder to define the scale, and we start to deal more in relatives than absolutes. At the other end of the spectrum are completely ephemeral constructs where researchers still can’t agree on the nature of the construct and we are pretty much limited to z-scores. Intelligence is a good example of this.

Some sources attempt to delineate the scaling of people and items or stimuli as separate things, but this is really impossible as they are so confounded. Especially since people define item statistics (the percent of people that get an item correct) and items define people scores (the percent of items a person gets correct). It is for this reason that item response theory, the most advanced paradigm in measurement theory, was designed to place items and people on the same scale. It is also for this reason that item writing should consider how they are going to be scored and therefore lead to person scores. But because we start writing items long before the test is administered, and the nature of the construct is caught up in the scale, the issues presented here need to be addressed at the very beginning of the test development cycle.

Test preparation for a high-stakes exam can be a daunting task, as obtaining degrees, certifications, and other significant achievements can accelerate your career and present new opportunities to learn in your chosen field of study. It’s a pivotal step towards advancing your career and demonstrating your expertise in a specific field. However, achieving success in your exam requires more than just diligent study: one must also incorporate strategies that can address their unique needs to study effectively. In this article, we will explore ten essential strategies to aid in preparing for your exam.

Also, remember that this test serves an important purpose.  If it is certification/licensure, it is designed to protect the public, ensuring that only qualified professionals work in the field.  If it is a pre-employment test, it is designed to predict job performance and fit in the organization – both to make the organization more efficient, and to help ensure you have the opportunity to be successful.  Tests are not designed to be a personal hurdle to you!

Tips for Test Preparation

  1. Apply Active Learning Strategies: Research indicates that students who employ active learning techniques, such as self-quizzing and summarization, tend to perform better on exams. A study found that 13.5% of students recognized the effectiveness of active strategies and aimed to incorporate them more into their study routines.
  2. Study Early, Study Regularly: To avoid the effects of procrastination, such as stress on the evening before the exam begins, begin your studies as soon as possible. By beginning your studies early, one can thoroughly review the material that will likely be on the exam, as well as identify topics that one may need assistance in reviewing. Also, creating a schedule will ensure thorough review of the material by making “chunks” to ensure the task of studying does not become overwhelming.  Research has shown that strategic test preparation will be more effective. Create a schedule to help manage the process.
  3. student exam help test preparationCreate Organized Notes: Gather your materials, including notes prepared in class, textbooks, and any other helpful materials. If possible, organize your notes by topic, chapter, date, or any other method that will make the material easier for you to understand. Make use of underlining or highlighting relevant sections to facilitate faster review times in the future and to better retain the information you are reviewing.
  4. Understand the Exam Format: Begin by familiarizing yourself with the exam’s format and structure. Understand the types of questions you’ll encounter, such as multiple-choice, essays, or practical demonstrations. Review the blueprints. Knowing what to expect will help you tailor your study plan accordingly and manage your time during the exam.
  5. Focus on Weak Areas: Identify your weaknesses early in the study process and prioritize them. Spend extra time reviewing challenging concepts and seeking clarification from instructors or peers if needed. Don’t neglect these areas in favor of topics you find easier, as they are likely to appear on the exam.
  6. Utilize Memory Aids: Memory aids can be incredibly helpful in associating relevant information with questions. Examples of memory aids include acronyms, such as the oft- promoted “PEMDAS” by teachers in learning the order-of-operations in mathematics, and story-writing to create context for, and/or association with, requisite material, such as the amount(s) and type(s) of force experienced on a particular amusement-park ride by attendees and how the associated formula should be applied.
  7. Familiarize yourself with the rules: Review the candidate handbook and other materials. Exams may differ in structure across varying subjects, incorporating stipulations such as essay-writing, image recognition, or passages to be read with corresponding questions. Familiarizing oneself with the grading weights and general structure of an exam can aid in allocating time to particular sections to study and/or practice for.
  8. Take care of yourself: It is essential that your physical and mental wellbeing is cared for while you are studying. Adequate sleep is essential both for acquiring new information and for reviewing previously-acquired information. Also, it is necessary to eat and drink regularly, and opportunities to do so should be included in your schedule. Exercise may also be beneficial in providing opportunities for relevant material to be reviewed, or as a break from your studies.
  9. Take practice tests: Practice is essential for exam success.  Many high-stakes exams offer a practice test, either through the test sponsor or through a third party, like these MCAT tests with Kaplan. Along with providing a further opportunity to review the content of the exam, practice tests can aid in identifying topics that one may need to further review, as well as increasing confidence for the exam-taker in their ability to pass the exam.  Practice exams simulate the test environment and help identify areas where you need improvement. Focus on both content knowledge and test-taking strategies to build confidence and reduce anxiety on exam day.
  10. Simulate Exam Conditions: To complete your test preparation, try to get as close as possible to the real thing.  Test yourself under a time-limit in a room similar to the exam room. Subtle factors such as room temperature or sound insulation may aid or distract one’s studies, so it may be helpful to simulate taking the exam in a similar environment.  Practice implementing your test-taking strategies, such as process of elimination for multiple-choice questions or managing your time effectively.
  11. Maintain a Positive Attitude: Remaining optimistic can improve one’s perceptions of their efforts, and their performance in the exam. Use mistakes as opportunities to improve oneself, rather than incidents to be recounted in the future. Maintaining an optimistic perspective can also help mitigate stress in the moments before one begins their exam.

Final Thoughts on Test Preparation

In conclusion, by incorporating these strategies into your test preparation routine, you’ll be better equipped to tackle your exam with confidence and competence. Remember that success is not just about how much you know but also how well you prepare. Stay focused, stay disciplined, and trust in your ability to succeed. Good luck!

automated-essay-scoring-machine-learning

Automated essay scoring (AES) is an important application of machine learning and artificial intelligence to the field of psychometrics and assessment.  In fact, it’s been around far longer than “machine learning” and “artificial intelligence” have been buzzwords in the general public!  The field of psychometrics has been doing such groundbreaking work for decades.

So how does AES work, and how can you apply it?

 

 

What is automated essay scoring?

The first and most critical thing to know is that there is not an algorithm that “reads” the student essays.  Instead, you need to train an algorithm.  That is, if you are a teacher and don’t want to grade your essays, you can’t just throw them in an essay scoring system.  You have to actually grade the essays (or at least a large sample of them) and then use that data to fit a machine learning algorithm.  Data scientists use the term train the model, which sounds complicated, but if you have ever done simple linear regression, you have experience with training models.

 

There are three steps for automated essay scoring:

  1. Establish your data set. Begin by gathering a substantial collection of student essays, ensuring a diverse range of topics and writing styles. Each essay should be meticulously graded by human experts to create a reliable and accurate benchmark. This data set forms the foundation of your automated scoring system, providing the necessary examples for the machine learning model to learn from.
  2. Determine the features. Identify the key features that will serve as predictor variables in your model. These features might include grammar, syntax, vocabulary usage, coherence, structure, and argument strength. Carefully selecting these attributes is crucial as they directly impact the model’s ability to assess essays accurately. The goal is to choose features that are indicative of overall writing quality and are relevant to the scoring criteria.
  3. Train the machine learning model. Use the established data set and selected features to train your machine learning model. This involves feeding the graded essays into the model, allowing it to learn the relationship between the features and the assigned grades. Through iterative training and validation processes, the model adjusts its algorithms to improve accuracy. Continuous refinement and testing ensure that the model can reliably score new, unseen essays with a high degree of precision.

 

Here’s an extremely oversimplified example:

  • You have a set of 100 student essays, which you have scored on a scale of 0 to 5 points.
  • The essay is on Napoleon Bonaparte, and you want students to know certain facts, so you want to give them “credit” in the model if they use words like: Corsica, Consul, Josephine, Emperor, Waterloo, Austerlitz, St. Helena.  You might also add other Features such as Word Count, number of grammar errors, number of spelling errors, etc.
  • You create a map of which students used each of these words, as 0/1 indicator variables.  You can then fit a multiple regression with 7 predictor variables (did they use each of the 7 words) and the 5 point scale as your criterion variable.  You can then use this model to predict each student’s score from just their essay text.

 

Obviously, this example is too simple to be of use, but the same general idea is done with massive, complex studies.  The establishment of the core features (predictive variables) can be much more complex, and models are going to be much more complex than multiple regression (neural networks, random forests, support vector machines).

Here’s an example of the very start of a data matrix for features, from an actual student essay.  Imagine that you also have data on the final scores, 0 to 5 points.  You can see how this is then a regression situation.

Examinee Word Count i_have best_jump move and_that the_kids well
1 307 0 1 2 0 0 1
2 164 0 0 1 0 0 0
3 348 1 0 1 0 0 0
4 371 0 1 1 0 0 0
5 446 0 0 0 0 0 2
6 364 1 0 0 0 1 1

 

How do you score the essay?

If they are on paper, then automated essay scoring won’t work unless you have an extremely good software for character recognition that converts it to a digital database of text.  Most likely, you have delivered the exam as an online assessment and already have the database.  If so, your platform should include functionality to manage the scoring process, including multiple custom rubrics.  An example of our  FastTest platform  is provided below.

 

FastTest_essay-marking

Some rubrics you might use:

  • Grammar
  • Spelling
  • Content
  • Style
  • Supporting arguments
  • Organization
  • Vocabulary / word choice

 

How do you pick the Features?

This is one of the key research problems.  In some cases, it might be something similar to the Napoleon example.  Suppose you had a complex item on Accounting, where examinees review reports and spreadsheets and need to summarize a few key points.  You might pull out a few key terms as features (mortgage amortization) or numbers (2.375%) and consider them to be Features.  I saw a presentation at Innovations In Testing 2022 that did exactly this.  Think of them as where you are giving the students “points” for using those keywords, though because you are using complex machine learning models, it is not simply giving them a single unit point.  It’s contributing towards a regression-like model with a positive slope.

In other cases, you might not know.  Maybe it is an item on an English test being delivered to English language learners, and you ask them to write about what country they want to visit someday.  You have no idea what they will write about.  But what you can do is tell the algorithm to find the words or terms that are used most often, and try to predict the scores with that.  Maybe words like “jetlag” or “edification” show up in students that tend to get high scores, while words like “clubbing” or “someday” tend to be used by students with lower scores.  The AI might also pick up on spelling errors.  I worked as an essay scorer in grad school, and I can’t tell you how many times I saw kids use “ludacris” (name of an American rap artist) instead of “ludicrous” when trying to describe an argument.  They had literally never seen the word used or spelled correctly.  Maybe the AI model finds to give that a negative weight.   That’s the next section!

 

How do you train a model?

bart model train

Well, if you are familiar with data science, you know there are TONS of models, and many of them have a bunch of parameterization options.  This is where more research is required.  What model works the best on your particular essay, and doesn’t take 5 days to run on your data set?  That’s for you to figure out.  There is a trade-off between simplicity and accuracy.  Complex models might be accurate but take days to run.  A simpler model might take 2 hours but with a 5% drop in accuracy.  It’s up to you to evaluate.

If you have experience with Python and R, you know that there are many packages which provide this analysis out of the box – it is a matter of selecting a model that works.

 

How effective is automated essay scoring?

Well, as psychometricians love to say, “it depends.”  You need to do the model fitting research for each prompt and rubric.  It will work better for some than others.  The general consensus in research is that AES algorithms work as well as a second human, and therefore serve very well in that role.  But you shouldn’t use them as the only score; of course, that’s impossible in many cases.

Here’s a graph from some research we did on our algorithm, showing the correlation of human to AES.  The three lines are for the proportion of sample used in the training set; we saw decent results from only 10% in this case!  Some of the models correlated above 0.80 with humans, even though this is a small data set.   We found that the Cubist model took a fraction of the time needed by complex models like Neural Net or Random Forest; in this case it might be sufficiently powerful.

 

Automated essay scoring results

 

How can I implement automated essay scoring without writing code from scratch?

There are several products on the market.  Some are standalone, some are integrated with a human-based essay scoring platform.  ASC’s platform for automated essay scoring is SmartMarq; click here to learn more.  It is currently in a standalone approach like you see below, making it extremely easy to use.  It is also in the process of being integrated into our online assessment platform, alongside human scoring, to provide an efficient and easy way of obtaining a second or third rater for QA purposes.

Want to learn more?  Contact us to request a demonstration.

 

SmartMarq automated essay scoring

Artificial intelligence (AI) is poised to address some challenges that education deals with today, through innovation of teaching and learning processes. By applying AI in education technologies, educators can determine student needs more precisely, keep students more engaged, improve learning, and adapt teaching accordingly to boost learning outcomes. A process of utilizing AI in education started off from looking for a substitute for one-on-one tutoring in the 1970s and has been witnessing multiple improvements since then. This article will look at some of the latest AI developments used in education, their potential impact, and drawbacks they possess.

Application of AI

AI robot - AI in Education

Recently, a helping hand of AI technologies has permeated into all aspects of educational process. The research that has been going since 2009 shows that AI has been extensively employed in managing, instructing, and learning sectors. In management, AI tools are used to review and grade student assignments, sometimes they operate even more accurately than educators do. There are some AI-based interactive tools that teachers apply to build and share student knowledge. Learning can be enhanced through customization and personalization of content enabled by new technological systems that leverage machine learning (ML) and adaptability.

Below you may find a list of major educational areas where AI technologies are actively involved and that are worthy of being further developed.

Personalized learning This educational approach tailors learning trajectory to individual student needs and interests. AI algorithms analyze student information (e.g. learning style and performance) to create customized learning paths. Based on student weaknesses and strengths, AI recommends exercises and learning materials.  AI technologies are increasingly pivotal in online learning apps, personalizing education and making it more accessible to a diverse learner base.
Adaptive learning This approach does the same as personalized learning but in real-time stimulating learners to be engaged and motivated. ALEKS is a good example of an adaptive learning program.
Learning courses These are AI-powered online platforms that are designed for eLearning and course management, and enable learners to browse for specific courses and study with their own speed. These platforms offer learning activities in an increasing order of their difficulty aiming at ultimate educational goals. For instance, advanced Learning Management Systems (LMS) and Massive Open Online Courses (MOOCs).
Learning assistants/Teaching robots AI-based assistants can supply support and resources to learners upon request. They can respond to questions, provide personalized feedback, and guide students through learning content. Such virtual assistants might be especially helpful for learners who cannot access offline support.
Adaptive testing This mode of delivering tests means that each examinee will get to respond to specific questions that correspond to their level of expertise based on their previous responses. It is possible due to AI algorithms enabled by ML and psychometric methods, i.e. item response theory (IRT). You can get more information about adaptive testing from Nathan Thompson’s blog post.
Remote proctoring It is a type of software that allows examiners to coordinate an assessment process remotely whilst keeping confidentiality and preventing examinees from cheating. In addition, there can be a virtual proctor who can assist examinees in resolving any issues arisen during the process. The functionality of proctoring software can differ substantially depending on the stakes of exams and preferences of stakeholders. You can read more on this topic from the ASC’s blog here.
Test assembly Automated test assembly (ATA) is a widely used valid and efficient method of test construction based on either classical test theory (CTT) or item response theory (IRT). ATA lets you assemble test forms that are equivalent in terms of content distribution and psychometric statistics in seconds. ASC has designed TestAssembler to minimize a laborious and time-consuming process of form building.
Automated grading Grading student assignments is one of the biggest challenges that educators face. AI-powered grading systems automate this routine work reducing bias and inconsistencies in assessment results and increasing validity. ASC has developed an AI essay scoring system—SmartMarq. If you are interested in automated essay scoring, you should definitely read this post.
Item generation There are often cases when teachers are asked to write a bunch of items for assessment purposes, as if they are not busy with lesson planning and other drudgery. Automated item generation is very helpful in terms of time saving and producing quality items.
Search engine The time of libraries has sunk into oblivion, so now we mostly deal with huge search engines that have been constructed to carry out web searches. AI-powered search engines help us find an abundance of information; search results heavily depend on how we formulate our queries, choose keywords, and navigate between different sites. One of the biggest search engines so far is Google.
Chatbot Last but not least… Chatbots are software applications that employ AI and natural language processing (NLP) to make humanized conversations with people. AI-powered chatbots can provide learners with additional personalized support and resources. ChatGPT can truly be considered as the brightest example of a chatbot today.

 

Highlights of AI and challenges to address

ai chatbot - AI in Education

Today AI-powered functions revolutionize education, just to name a few: speech recognition, NLP, and emotion detection. AI technologies enable identifying patterns, building algorithms, presenting knowledge, sensing, making and following plans, maintaining true-to-life interactions with people, managing complex learning activities, magnifying human abilities in learning contexts, and supporting learners in accordance with their individual interests and needs. AI allows students to use handwriting, gestures or speech as input while studying or taking a test.

Along with numerous opportunities, AI-evolution brings some risks and challenges that should be profoundly investigated and addressed. While approaching utilization of AI in education, it is important to keep caution and consideration to make sure that it is done in a responsible and ethical way, and not to get caught up in the mainstream since some AI tools consult billions of data available to everyone on the web. Another challenge associated with AI is a variability in its performance: some functions are performed on a superior level (such as identifying patterns in data) but some of them are quite primitive (such as inability to support an in-depth conversation). Even though AI is very powerful, human beings still play a crucial role in verifying AI’s output to avoid plagiarism and falsification of information.

 

Conclusion

AI is already massively applied in education around the world. With the right guidance and frameworks in place, AI-powered technologies can help build more efficient and equitable learning experiences. Today we have an opportunity to witness how AI- and ML-based approaches contribute to development of individualized, personalized, and adaptive learning.

ASC’s CEO, Dr Thompson, presented several topics on AI at the 2023 ATP Conference in Dallas, TX. If you are interested in utilizing AI-powered services provided by ASC, please do not hesitate to contact us!

 

References

Miao, F., Holmes, W., Huang, R., & Zhang, H. (2021). AI and education: A guidance for policymakers. UNESCO.

Niemi, H., Pea, R. D., & Lu, Y. (Eds.). (2022). AI in learning: Designing the future. Springer. https://doi.org/10.1007/978-3-031-09687-7

gamification in learning and assessment

Gamification in assessment and psychometrics presents new opportunities for ways to improve the quality of exams. While the majority of adults perceive games with caution because of their detrimental effect on youngsters’ minds causing addiction, they can be extremely beneficial for learning and assessment if employed thoughtfully. Gamification does not only provide learners with multiple opportunities to learn in context, but also is instrumental in developing digital literacy skills that are highly necessary in modern times.

 

What is Gamification?

Gamification means that elements of games, such as point-scoring, team collaboration, competition, and prizes, are incorporated into processes that would not otherwise have them. For example, a software for managing a Sales team might incorporate points for the number of phone calls and emails, splitting the team into two “teams” to compete against each other on those points, and winning a prize at the end of the month. Such ideas can also be incorporated into learning and assessment. A student might get points for each module they complete correctly, and a badge for each test they pass to show mastery of a skill, which are then displayed on their profile in the learning system.

 

Gamification equals motivation?student exam help

It is a fact that learning is much more effective when learners are motivated. What can motivate learners, you might ask? Engagement comes first—that is the core of learning. Engaged learners grasp knowledge because they are interested in the learning process, the material itself, and they are curious about discovering more. In contrast, unengaged learners wait when a lesson ends.

A traditional educational process usually involves several lessons where students learn one unit, and at the end of this unit, they take a cumulative test that gauges their level of acquisition. This model usually provides minimum of context for learning throughout the unit, so learners are supposed just to learn and memorize things unless they are given a chance to succeed or fail on the test.

Gamification can change this approach. When lessons and tests are gamified, learners obtain an opportunity to learn in context and use their associations and imagination—they become participants of the process, not just executors of instructions. Incorporating AI technology can enhance engagement with personalized learning and real-time feedback.

 

Gamification: challenges and ways to overcome them

While gamified learning and assessment are very efficacious, they might be challenging for educators in terms of development and implementation. Below you may check some challenges and how they can be tackled.

Challenge

Solution

More work Interactive lessons containing gamified elements demand more time and effort from educators, which is why overwhelmed with other obligations many of them give up and keep up with traditional style of teaching. However, if the whole team sets up the planning and preparations prior to starting a new unit, then there will be less work and less stress, respectively.
Preparation Gamified learning and assessment can be difficult for educators lacking creativity or not having any experience. Senior managers, like heads of departments, should take a leading position here: organize some courses and support their staff.
Distraction When developing gamified learning or assessment, it is important not to get distracted with fancy stuff and keep focused on the targeted learning objectives.
Individual needs Gamified learning and assessment cannot be unified, so educators will have to customize their materials to meet learner needs.

 

Gamified assessment

Psychometric tests have been evolving over time to provide more benefits to educators and learners, employers and candidates, and other stakeholders. Gamification is the next stage in the evolutionary process after having gained positive feedback from scientists and practitioners.

Gamified assessment is applied by human resources departments in the hiring process like psychometric tests evaluating candidate’s knowledge and skills. However, game-based assessment is quicker and more engaging than aptitude tests due to its user-friendly and interactive format. The latter features are also true for computerized adaptive testing (CAT), and I believe that these two can be complemented by each other to double the benefits provided.

There are several ways to incorporate gamification into assessment. Here are some ideas, but this is by no means exhaustive.

Aspect

Example

High fidelity items and/or assignments Instead of multiple choice items to ask about a task (e.g., operating a construction crane), create a simulation that is similar to a game.
Badging Candidates win badges for passing exams, which can be displayed places like their LinkedIn profile or email signature.
Points Obviously, most tests have “points” as part of the exam score, but it can be used in other ways, such as how many modules/quizzes you pass per month.
Teams Subdivide a class or other group into teams, and have them compete on other aspects.

Analyzing my personal experience, I remember how I used kahoot.it tool on my Math classes to interact with students and make them more engaged in the formative assessment activities. Students were highly motivated to take such tests because they were rewarding—it felt like competition and sometimes they got sweets. It was fun!

 

Summary

Obviously, gamified learning and assessment require more time and effort from creators than traditional non-gamified ones, but they are worthy. Both educators and learners are likely to benefit from this experience in different ways. If you are ready to apply gamified assessment by employing CAT technologies, our experts are ready to help. Contact us!

 

lock keyboard test security plan

A test security plan (TSP) is a document that lays out how an assessment organization address security of its intellectual property, to protect the validity of the exam scores.  If a test is compromised, the scores become meaningless, so security is obviously important.  The test security plan helps an organization anticipate test security issues, establish deterrent and detection methods, and plan responses.  It can also include validity threats not security-related, such as how to deal with examinees that have low motivation.  Note that it is not limited to delivery; it can often include topics like how to manage item writers.

Since the first tests were developed 2000 years ago for entry into the civil service of Imperial China, test security has been a concern.  The reason is quite straightforward: most threats to test security are also validity threats. The decisions we make with test scores could therefore be invalid, or at least suboptimal.  It is therefore imperative that organizations that use or develop tests should develop a TSP.

Why do we need a test security plan?

There are several reasons to develop a test security plan.  First, it drives greater security and therefore validity.  The TSP will enhance the legal defensibility of the testing program.  It helps to safeguard the content, which is typically an expensive investment for any organization that develops tests themselves.  If incidents do happen, they can be dealt with more swiftly and effectively.  It helps to manage all the security-related efforts.

The development of such a complex document requires a strong framework.  We advocate a framework with three phases: planning, implementation, and response.  In addition, the TSP should be revised periodically.

Phase 1: Planning

The first step in this phase is to list all potential threats to each assessment program at your organization.  This could include harvesting of test content, preknowledge of test content from past harvesters, copying other examinees, proxy testers, proctor help, and outside help.  Next, these should be rated on axes that are important to the organization; a simple approach would be to rate on potential impact to score validity, cost to the organization, and likelihood of occurrence.  This risk assessment exercise will help the remainder of the framework.

Next, the organization should develop the test security plan.  The first piece is to identify deterrents and procedures to reduce the possibility of issues.  This includes delivery procedures (such as a lockdown browser or proctoring), proctor training manuals, a strong candidate agreement, anonymous reporting pathways, confirmation testing, and candidate identification requirements.  The second piece is to explicitly plan for psychometric forensics. 

This can range from complex collusion indices based on item response theory to simple flags, such as a candidate responding to a certain multiple choice option more than 50% of the time or obtaining a score in the top 10% but in the lowest 10% of time.  The third piece is to establish planned responses.  What will you do if a proctor reports that two candidates were copying each other?  What if someone obtains a high score in an unreasonably short time? 

What if someone obviously did not try to pass the exam, but still sat there for the allotted time?  If a candidate were to lose a job opportunity due to your response, it helps you defensibility to show that the process was established ahead of time with the input of important stakeholders.

Phase 2: Implementation

The second phase is to implement the relevant aspects of the Test Security Plan, such as training all proctors in accordance with the manual and login procedures, setting IP address limits, or ensuring that a new secure testing platform with lockdown is rolled out to all testing locations.  There are generally two approaches.  Proactive approaches attempt to reduce the likelihood of issues in the first place, and reactive methods happen after the test is given.  The reactive methods can be observational, quantitative, or content-focused.  Observational methods include proctor reports or an anonymous tip line.  Quantitative methods include psychometric forensics, for which you will need software like SIFT.  Content-focused methods include automated web crawling.

Both approaches require continuous attention.  You might need to train new proctors several times per year, or update your lockdown browser.  If you use a virtual proctoring service based on record-and-review, flagged candidates must be periodically reviewed.  The reactive methods are similar: incoming anonymous tips or proctor reports must be dealt with at any given time.  The least continuous aspect is some of the psychometric forensics, which depend on a large-scale data analysis; for example, you might gather data from tens of thousands of examinees in a testing window and can only do a complete analysis at that point, which could take several weeks.

Phase 3: Response

The third phase, of course, to put your planned responses into motion if issues are detected.  Some of these could be relatively innocuous; if a proctor is reported as not following procedures, they might need some remedial training, and it’s certainly possible that no security breach occurred.  The more dramatic responses include actions taken against the candidate.  The most lenient is to provide a warning or simply ask them to retake the test.  The most extreme methods include a full invalidation of the score with future sanctions, such as a five-year ban on taking the test again, which could prevent someone from entering a profession for which they spent 8 years and hundreds of thousands of dollars in educative preparation.

What does a test security plan mean for me?

It is clear that test security threats are also validity threats, and that the extensive (and expensive!) measures warrant a strategic and proactive approach in many situations.  A framework like the one advocated here will help organizations identify and prioritize threats so that the measures are appropriate for a given program.  Note that the results can be quite different if an organization has multiple programs, from a practice test to an entry level screening test to a promotional test to a professional certification or licensure.

Another important difference between test sponsors/publishers and test consumers.  In the case of an organization that purchases off-the-shelf pre-employment tests, the validity of score interpretations is of more direct concern, while the theft of content might not be an immediate concern.  Conversely, the publisher of such tests has invested heavily in the content and could be massively impacted by theft, while the copying of two examinees in the hiring organization is not of immediate concern.

In summary, there are more security threats, deterrents, procedures, and psychometric forensic methods than can be discussed in one blog post, so the focus here rather on the framework itself.  For starters, start thinking strategically about test security and how it impacts their assessment programs by using the multi-axis rating approach, then begin to develop a Test Security Plan.  The end goal is to improve the health and validity of your assessments.


Want to implement some of the security aspects discussed here, like online delivery lockdown browser, IP address limits, and proctor passwords?

Sign up for a free account in FastTest!