A Better Understanding of Learner Perception of Peer Assessment Through Game Theory and Novel Constructs

Using a cross sectional survey design, learner perceptions of their peer assessment experiences at institutions of higher education (IHEs) are studied. Guided by game theory, this study examines if either the IHE’s prestige, the competitiveness, or its extent of grade inflation has a statistical effect on these peer assessment perceptions. A Likert scale was used to measure learner perceptions of their peer assessment experiences and the constructs. An exploratory factor analysis was performed on the three constructs to confirm their validity. The study found a statistically significant correlation between institutional prestige and peer assessment perceptions.


Introduction
Common investigations into peer assessment in the educational realm typically look at students evaluating one another by offering feedback on content such as writing style and/or thinking, as portrayed by their peer, through an assignment. The conditions and environment under which students evaluate each other, often satisfy many of the conditions that support game theory. For example, Klein (2019) observed that institutions of higher education (IHE) are often competitive, especially so during economic downturns. In fact, having survived the initial competition to be accepted to an IHE, students must remain competitive to complete their course of study, else they may become members of the cohort that drop-out. In other words, depending on the extent of the competitiveness, students may be locked into a zero-sum-game to cross the graduation finish line. When students are involved in peer assessment, they have opportunities to provide feedback designed to sabotage their partner. Indeed, feedback may not be designed as a "poisoned-pill", it may be the result of an incompetent "player". Therefore, the constructs of game theory may become a helpful guide in investigating peer assessment for its pedagogical utility. Game theory is a framework that enables researchers to analyze situations where an actor's (or peer's) optimum action and reaction, depend upon the activities of other actors (or peers) (Pastine, Pastine, & Humberstone, 2017). Since its evolution in the 20 th century, game theory has been used to analyze many spheres of human activity (Klein, 2018;Zong, Schunn, and Wang, 2021). However, game theory has been conspicuously absent in education literature. This investigation utilizes game theory to assess a common educational practice, peer assessment.
According to Murariu (2019), the following assumptions and conditions should be present within game theory to understand its usefulness as a tool for modeling and understanding interactions between individuals in tasks such as peer assessment: 1. The player's decisions are based on incomplete information. For example, players lack information about their opponents' optimum strategy.
2. The players do not make decisions entirely in an information vacuum. This means players can make reasonable assumptions about other players based on their prior moves and on overall understanding of strategy.
3. Based on reasonable assumptions about other players, players make rational moves. This means players make decisions designed for their optimum success.
In the case of peer assessment, optimum success means the highest possible grade on the player's assignment. If a learner views him or herself in competition with the other students in the class, then an approach to success could be providing unhelpful or damaging feedback to his or her partner in a peer assessment activity. For example, in classes where the number of each letter grade is predetermined, students involved in peer assessment are incentivized to give sabotaging feedback. In other words, under the lens of game theory, grades become a high-stakes outcome of peer assessment, and feedback from one's peers becomes a potentially prized resource that moves the players closer to, or further from, that end goal.
Indeed, in their study using game theory to examine peer assessment, Zong, Schunn, and Wang (2021) viewed peer feedback as an "intangible knowledge resource" (p. 2) that learners use to improve their assignment's grade and ultimate outcome. The peer being assessed can improve his or her assignment (and earn a better grade) if he or she accepts quality feedback from the peer assessor. On the other hand, if the peer being assessed accepts subpar feedback, then he or she might alter the assignment for the worse, resulting in a lower grade. In fact, Zong, Schunn, and Wang (2021) observed that when learners are both receiving and providing feedback, a learner's optimum outcome is to receive high quality feedback but provide feedback that is low (or lower) quality.
While game theory can be used as a model to examine peer assessment, it cannot be used as a model to examine perceived utility of instructor feedback over peer assessment. When students who are enrolled in an accredited university receive feedback from their instructor, it is reasonable for these students to assume that using instructor feedback will improve their grade. It is this difference in trusting the instructor, as opposed to only estimating the worth of peer assessment, that creates the opportunity for game theory to be a theoretical framework for investigating perceptions of the utility of peer assessment as a pedagogical activity.
There is limited research into the use of game theory as a tool for generating a better understanding of students' perceptions of peer assessment. Klein (2018) proposed that peers might consider indirect information about their peer assessor to better estimate the value of the feedback they receive. For example, a peer might perceive that their peer assessor in a competitive class might not be motivated to provide valuable feedback, as there may only be a certain number of A-grades assigned. This proposed perception inspired the current researchers to develop and explore constructs that measure learners' perception of peer assessment based on the ideas of competitiveness, grade inflation, and institutional prestige. Both grade inflation and institutional prestige may affect feedback quality. When students believe that they are in a grade inflated environment, they may not be motivated to provide high, or even average, quality feedback to their partner. In other words, students' perception of grade inflation can materially and negatively affect the feedback they provide. On the other hand, students' perception of the prestige of their school may impact their belief in the quality of the feedback they receive. Students who perceive that they are in a high-prestige school may tend to trust the competence, if not the motives, of their partner more so than a partner from a less prestigious school. These three indirect constructs were than analyzed statistically to determine any correlation with learner perception of peer assessment. The method of utilizing these derived constructs-prestige, competitiveness, and grade inflation to measure impact of perceived information on perception of peer assessment experiences, provided some evidence to support the notion of value and utility of peer assessment as a pedagogical approach.

Literature Review
The peer assessment experience involves one peer providing feedback and not a grade, to another peer (Topping, 2009). This feedback is most effective when learners both engage each other and help negotiate meanings (Phillips, 2016;Topping, 1998). Peer assessment may be best understood as a replacement for or an addition to formative assessment supplied by the instructor. Peer assessment would not generally be used as a high-stake grading assessment or a summative tool.
In a review of peer assessment literature Evans (2015) found significant disagreement on various aspects of peer assessment and how it should be operationalized in educational institutions. Peer assessment may be a stand-alone instructional activity or may be part of a larger peer engagement program. It may be required, voluntary, or may or may not be an integrated component of the instructional path. Literature agrees that in general, instructors initiate peer assessment and peers experience it. This pedagogical approach is different than traditional mentoring programs, as according to Topping (1998), a postgraduate student helping an undergraduate is not an example of peer assessment in the instructional activity sense. Eun, Knotek, and Heining-Boyton (2008), build on Vygotsky's concept of the zone of proximal development (ZPD). They support that a student may reach a zone of learning at a level beyond his or her existing abilities, only if he or she receives help from a more capable individual. Traditionally, the more-capable individual is an instructor in a classroom setting. If the more-capable individual helping the student is the instructor, then the competence and pedagogical expertise needed to scaffold, or iteratively build that help should be part of the instructor-learner dynamic. However, when the helper is not a qualified instructor, but a peer, then the assumption of capability is no longer a given. The implicit trust that learners initially are willing to give to the instructor, which can impel movement to that zone a level above their capabilities alone, can no longer be assumed.

The Zone of Proximal Development
The need for a more-knowledgeable helper is arguably intuitive. Topping (1989) offered that peer assessment should be planned by the instructor in such a way that the peer assessor is more capable than the peer being assessed. d'Arripe-Longueville, Gernigon, Huet, Cadopi, and Winnykamen (2002) studied student swimmers whose swimming instruction included peer assessment. It was examined if differences in competence levels between the peer and peer assessor pairs correlated with the effectiveness of the feedback. The differences between competence levels did correlate with feedback effectiveness: when novice swimmers were paired with better swimmers, the novice swimmers achieved better results than the cases where swimmers of equal skills were paired.
Topping (1998) postulated that the optimum ability differential occurs when the differential is large enough for the tutor to "provide a model of reliable competency" (Topping, 1989, p. 489) yet not so large that the tutor is "under-stimulated" (Topping, 1989, p 489). This means that the ZPD differential can be used as a first approximation to quantify peer assessment effectiveness.

Pedagogical Familiarity
It is not enough for the peer assessor to know more of the content than their peer being assessed. The peer assessor must also know how to effectively convey that content to their peer whose motivations may be unknown to the assessor. Zuckerman (2007) argued that the size of the ability-differential between peer assessor and peer assessed is necessary but not sufficient for effective peer assessment. Through a course of instruction, the peer assessor will provide feedback using a choice of pedagogy, which may "support some kinds of initiative and constrain others" (Zuckerman, 2007, p. 43). This means that if pairs are created only taking into consideration the ability differential, and ignoring instructional expertise, the peer assessment could be sub-optimal.
Shabani, Khatib, and Ebadi (2010) undertook a detailed analysis of the theoretical framework of the ZPD. The results of their examination showed that the theory allows for three zones -which may be visualized as three concentric circles that are individually based, by dynamic. A learner starts out needing little to no help in his or her innermost circle. This innermost circle is called the zone of actual development (ZAD). Slightly outside the circumference of this ZAD is the area that he or she cannot achieve without help of the more-capable individual, followed by development that the learner cannot reach even with assistance.
The initial zones and the speed in which learners move through them are individually based, supporting need for a variety of instructional methods to help learners navigate their individual zones. Accordingly, the pedagogical abilities of the peer assessor, and the instructor's ability to effectively and strategically design peer assessment pairs, may be an essential part of peer assessment as an effective pedagogical approach.

Game Theory
Game theory is not to be confused with a popular trend in education called gamification. Game theory is a theoretical lens to approach situations between "two or more people in which there is a prize to be gained or a punishment to be avoided" (Pitt, 2000, p. 234). The players involved attempt to either maximize the prize, minimize the punishment, or manage some combination of both -regardless of how those prizes or punishments are defined in the given situation. Any such optimized outcome for one player often accrues at the expense of the other players involved.
Game theory makes implicit assumptions about human behavior. For example, the theory assumes that players act rationally (Askari, Gordji, & Park, 2019). This means that they will weigh and evaluate all factors before acting in the direction of self-interest. Although game theory assumes that players are in competition, the theory does not necessarily assume that the players are hostile to each other (Fudenberg & Levine, 2016). An effective game theory analysis can occur even if the individuals under consideration do not know each other. Fudenberg and Levine (2016) looked at rush hour traffic through a game theory lens. The millions of commuters were the players, the routes chosen by those commuters corresponded to the player's strategies, and the time the route took was the result. Each commuter goal was to commute in the shortest time possible, but by one commuter finding a shorter time, this extended the time of some of the other commuters. Thus, the commuters were in competition against each other.
Competition among players may not result in the best outcome for all players, yet if all players pursue a strategy of cooperation, then it is possible that the outcome could be mutually beneficial. Gu (2015) treated institutions of higher education (IHE) as players in a game whose objective was to maximize tuition. When the IHEs colluded on price, an optimum price result (for the IHEs) was achieved. Games that highlight the tension between cooperation and competition and shed light on which leads to an optimum result, are known as prisoner's dilemma (PD) (Pastine, Pastine, & Humberstone, 2017), which supports that cooperation can lead to a better outcome than competition, but it requires each player to be aware of the other's willingness to cooperate.
When involved in peer assessment, peers have a similar lack of awareness. As far as the peer being assessed knows, his or her peer assessor may or may not be in a competitive position. A competitive assessor may intentionally provide poor feedback in the hopes that the individual being assessed will use it and accordingly obtain a lower grade, such may be the goal in a class where the number of "A" and "B" grades are artificially restricted.
Another situation is that the assessor might provide poor feedback out of a lack of sufficient knowledge of the subject matter, which again may be unknown to the peer being assessed. In either case, the peer being assessed wins when he or she receives feedback that leads to an improved assignment. The peer being assessed loses when he or she accepts poor feedback and uses that poor guidance to change his or her assignment. In either case, the peer being assessed lacks the necessary information about the assessor to accurately judge the worth of the feedback. Moreover, students may be negatively predisposed to accept feedback from unknown and untested peers (Patton, 2012). All that the peers being assessed can do is indirectly judge the worth of the feedback. We suggest that this judgment can come from the assessed peer's estimation of his or her school's prestige, competitiveness, or the extent to which grades or inflated.

Methods
A cross-sectional, online survey was used to examine the extent that learner's perception of their peer assessor and their peer assessment experience was related to their perceptions of their undergraduate college's prestige, competitiveness, and grade inflation practices.

Study Participants
Once the study was approved by the author's Institutional Review Board, the researcher posted the request for volunteers on social media. The researcher posted on his page for fellow students who are Facebook friends and in various social media groups on the Facebook platform. Potential participants read and agreed to the informed consent, and then given access to the survey instrument. These initial participants were encouraged to use social media to recruit other participants. This snowball effect increased the total number of participants that provided data for at least one of the constructs of interest to 107.
The mean age of the 107 participants was 49.9 years old (SD=16.2). The range was 19 to 87 years, and the median was 51.0. There were 45 (42%) self-identified "males" and 62 (58%) "females" in the analysis data set. The 107 participants were from all regions of the United States: 42% from the South, 20% from Northeast, 4% Midwest, 2% Mid Atlantic, 2% Pacific, 1% West; but 32% were Unknown. The length of time since college was distributed for the 107 participants as 13% were currently enrolled, 17% attended less than 5-years ago, 11% attended between 5 and 10 years ago, 56% attended more than 10 years ago; and 7% were unknown. The colleges represented in the survey are can be found in Appendix A.

Survey Questionnaire
There were 32 Likert items that measured the four constructs of interest in this study ( Table 1). The construct of peer assessment (PA) was measured with 11 items that elicited participants' perceptions and attitudes about their peer assessment experiences at their institutions of higher education (IHE). Seven Likert items gauged perceptions of the construct grade inflation (GI) at their IHE. Eleven items probed the construct competitiveness (CO) of students at their IHE, and five items elicited perceptions and attitudes about the construct institutional prestige (IP) of their IHE.

Attitudes and Perceptions of Peer Assessment (PA), Grade Inflation (GI), Competitiveness (CO) And Institutional Prestige (IP)
Item Statement Peer Assessment PA1 The feedback I gave my peers on their assignment(s) was useful. PA2 The feedback I gave my peers on their assignment(s) was too negative or critical.
(Agreement was reverse coded) PA3 The feedback I gave a peer on his or her assignment probably was similar to the feedback that other peers gave on the same assignment. PA4 If I had to give feedback several months from now on the same assignment for which I gave feedback in this class, I would probably give similar feedback. PA5 The feedback my peers gave me on my assignment for this class was useful. PA6 The feedback peers gave me on my assignment was too negative or critical. (Agreement was reverse coded) The feedback I got from one peer was similar to the feedback I got from other peers on the same assignment. PA8 If my peers gave me feedback several months after this class on the same assignment they examined for this class, they would probably give me similar feedback. PA9 Peers gave me a fair grade or fair feedback on my assignment.

GI1
I have received higher grades than I deserve. GI2 My classmates have received higher grades than they deserve. GI3 Receiving higher grades than I deserve occurs in most of my classes.

GI4
My classmates received higher grades than they deserve in most of their classes. GI5 A few months did not or will not make any difference in the fact that I received or will receive higher grades than I deserve. GI6 A few months did not or will not make any difference in the fact that my classmates received or will continue to receive higher grades than they deserve. GI7 Other students do not work as hard as me, yet they receive similar grades. Competitiveness CO1 I work harder when I know that I am competing against others. CO2 I was more willing to help my high school academic peers than my current academic peers. CO3 My high school academic peers were more willing to help me than are my current academic peers. CO4 I tend to give more help to fellow students enrolled in one of my classes if that class is not worth many credits. CO5 I tend to give more help to fellow students who are not in my program than those who are in my program.

CO6
I tend to give more help to fellow students if we are not enrolled in the same class. CO7 My fellow students who are enrolled in one of my classes tend to give me more help if that class is not worth many credits. CO8 My fellow students tend to give me more help if they are not in my program.

CO9
My fellow students tend to give me more help if they are enrolled in a different class than mine. CO10 My willingness to give or not give help to my fellow students on their assignments does not or did not change from the beginning of the semester to the end of the semester. CO11 My fellow students' willingness to give or not give me help on my assignments does not or did not change from the beginning of the semester to the end of the semester.

IP1
My institution's reputation will open doors for me after I graduate. IP2 People seem impressed when I tell them I am studying at my school.

IP3
When I applied to my institution, I wasn't confident that I would be accepted. IP4 I will feel very proud to list my school as my alma mater after I graduate and start looking for jobs in my field.

IP5
My school is considered an elite school.
The 7-point Likert response options ranged from "1" for strongly disagree to "7" for strongly agree. The middle selection or "4" was for neither agree nor disagree. Except for two items (PA2 and PA6), each Likert item was a positive statement about the relevant construct, hence higher scores indicated satisfaction or a positive experience with the construct. PA2 and PA6 were reverse coded so that disagreement with a statement reflected a positive attitude about the construct.

Construct Validity
Given the relatively small sample size in this study (n=107), a hyperdimensional exploratory factor analysis (EFA) that would have included all 32 items simultaneously was not defensible. According to Hatcher (1994), "The minimal number of subjects in the sample should be the larger of 100 subjects, or 5 times the number of variables being analyzed" (p. 73). Even with this rule of thumb, larger sample sizes are required under less-than-optimal conditions where many variables load on each factor, and the variable communalities are high indicating that the factor explains much of the variance of the item. As a result, each construct was factor analyzed as a unidimensional concept. with the correlation matrices generated by the items that were presumed to be caused by to the underlying factor when the items were written. Principal axis factoring was the method for the initial factor extraction with squared multiple correlations on the diagonal as prior communality estimates. Rotation to simple structure was not necessary with a one factor solution and the initial patterns of loadings were interpretable. Table 2 shows the factor loadings, as well as the Cronbach's alpha reliability estimate of items corresponding to each composite variable. Initial repesents is all items for the intial extraction; Final represents the solution with the retained variables (i.e., loadings > .40) Table 2 Factor loadings for each unidimensional construct* *PA=Peer Assessment; GI=Grade Inflation; CO=Competitiveness; IP=Institutional Prestige Attempts to identify more than one factor underlying each construct failed to achieve simple structure. As a result, a one factor solution for each construct was retained, comprised of the items with loadings (correlation of item with factor) greater than 0.40 (Hatcher, 1994).
Next, unit weighted composite scales were created to determine the correlation between Peer Assessment (PA) and the three institutional factors of Grade Inflation (GI), Competitiveness (CO), and Institutional Prestige (IP). The items excluded by the final factor analysis solution were also excluded from the construction of the unit weighted composite scores. The reliability analysis involved items only from the final solution. Each composite had an acceptable internal consistency reliability as measured by Cronbach's alpha (α ≥ .70) for research purposes (Nunnally, 1978). Table 3 presents the descriptive summary statistics for the four unit-weighted composite scales. Note that higher score represented an agreement with a statement, hence, peer assessment and institutional prestige were seen as positive experiences, but participants mostly disagreed that their institutions had grade inflation, or that fellow students were overly competitive. The relationship between perceptions/attitudes about peer assessment and the other three institutional characteristics were evaluated with correlation coefficients. Judging from scatterplots all relationships appeared consistent with linearity and there were no apparent influential outliers (Figure 1). However, the distribution of composite variable showed signs of truncation and skewness, as evident in the diagonal of Figure 1, which made suspect the bivariate normal distribution assumption underlying a Pearson correlational analysis.

Scatterplot of the four composite variables with histograms on the diagonal along with an overlaid theoretical normal curve
Both Pearson and Spearman correlations were computed. The strength and direction of the correlation coefficients calculated with two different methods were similar but statistical significance was not consistent across the two analyses. We focus attention on the Spearman correlation coefficients (Table 4) because of the potential violation of the bivariate normality assumption underlying the Pearson correlation. Although the Spearman correlation [r(S) = .16] between PA and institutional prestige (IP) was not statistically significant (p =.126), the Pearson correlation [r(P) = .21] was stronger and in the same direction (positive) and was statistically significant (p =.047). However, as noted the bivariate normality assumption is suspect, therefore the relationship between these two constructs must be considered tentative, pending results from a study with a large sample size. A positive, statistically significant relationship [Spearman r (83) = -.31, p = .004] was found between peer assessment (PA) and grade inflation (GI). The Spearman correlation between Grade Inflation (GI) and Competitiveness (CO) was statistically significant [Spearman r (92) = .23, p = .023]. In addition, the negative correlation between competitiveness (CO) and Institutional Prestige (IP) was statistically significant [Spearman r (92) = -.23, p = .024].

Study Limitations
The statistically significant negative correlation between learner perception of grade inflation and usefulness of peer feedback was based on scales inspired by game theory but not seasoned in the literature. In their initial use by Klein (2019) they were not pilot tested, and other than Klein (2019) these scales had not previously been subjected to repeat testing or extensive peer review. The construct validity of the scales was confirmed by an EFA using a relatively small sample size. An EFA is subject to researcher bias (Gould, 1981).
Participants for the study were found almost exclusively from social media. Accordingly, the study excluded individuals who do not regularly use the internet or do not have access to it. According to the Pew Research Center (2021), use of the internet varies inversely with age. This means that older individuals (other than the researcher's friends) would tend to be excluded from the study.

Novel Constructs
To test a game theory inspired hypothesis about peer assessment, this paper examined three novel constructs -learner perception of grade inflation, competitiveness, and institutional prestige -to determine their statistical relationship to learner perception of peer assessment. Although only one of the three constructs resulted in a non-tentative statistically significant relationship, the psychometric properties of all three were validated.
Their psychometric validation allows their use in studies of educational topics other than peer assessment.
For example, Klein (2021) observed that a good gamification design should account for the competitiveness of the learners exposed to the gamification intervention. The validated competitiveness scale can help pedagogically refine this peer assessment strategy by supporting the need to distinguish inherent competitiveness from competitiveness imposed by the learner's school. The grade inflation scale can also be considered as other game theory inspired pedagogical activities are utilized by instructors. Faculty pay and promotion are often based on student evaluations (Germain & Scandura, 2005). This means that faculty are sometimes incentivized to inflate their students' grades. The validated grade inflation may influence perceptions by students of quality and utility of activities like peer assessment.

Peer Assessment
Peer assessment is an instructional strategy that calls upon non-instructors to provide formative feedback. The use of non-instructors introduces elements of uncertainty and mistrust into the instructional experience, conditions generally not present in the more traditional student-instructor dynamic. These conditions, unique to education, allowed for an application of a theory unique to an educational setting, -game theory -to study the peer assessment experience.
This study provided statistical evidence of the heretofore unknown relationship between learners' perception of the peer assessment experiences and perception of their school's grade inflation. Bearing in mind that this was a correlational and not a causal study, this statistical result could mean that students do not perceive that other students' feedback is valuable in a grade inflated environment. It suggests that learners, lacking information about their peer assessor, rely on external factors to judge the value of their feedback. If theory and constructs exist that help researchers better understand learner resistance to learning strategies, then these factors can help improve the perceptions of utility of instructional practices.