A Better Peer Evaluation System

March 26, 2009

I’ve ranted against a popular peer evaluation system. An anonymous commenter challenged me: “Make something better.” I think I did. It’s simple to understand and eliminates undesirable incentives.

This isn’t a perfect system. Any compensation system has faults because of the unstoppable tension between individual and group performance incentives. I only claim that this is an improvement over the Oakley system.

Overview of the problem

A team of m members completes a school assignment. The overall grade given to the group is G. We wish to adjust each individual team member’s grade g1, g2, …, gm by an adjustment a1, a2, …, am based on peer evaluations, where each ai can be positive or negative.

We assume each team member evaluates each other team member. As we’ll see, we can ignore self-evaluations without loss of generality. Thus, the input to our function f is an overall grade G and a matrix E

X ? ? ?
? X ? ?
? ? X ?
? ? ? X

Eij represents the ith member’s evaluation of the jth. The output of f is the vector a1, a2, …, am.

The Oakley paper peer evaluations have ratings from 1 to 5 and “No Show” to “Excellent.” Without loss of generality we’ll just use numbers and let the user substitute appropriate strings.

The problem is to find an acceptable f.

Properties of f

We constrain f with a few desirable properties.

  1. Zero-sum: ∑a= 0. It should not be possible for the group members to collude and raise their average above G. We want the adjusted marks preserve G as their arithmetic mean.
  2. Truthful: Ei does not affect ai. A team member’s evaluations of his or her teammates should not affect his own adjustment. Otherwise, the team member has an incentive preventing honest feedback. In general we want to eliminate all incentives that interfere with honesty. For this reason we ignore self-evaluations: They may contain useful information for a human judge, but they contain nothing usable by an algorithm.
  3. Range-limited: -r ≤ ai ≤ r. The administrator may wish to set a bound r on the adjustments assigned, such that a high-performing team member has an adjustment of +r and a deadweight gets -r.
  4. Determinism: For fairness, the algorithm should be deterministic.

Explicit non-problems

If ≥m/2 members collude and are malicious, there is nothing we can do. These situations will require human judgement.

We’re not trying to build the perfect peer evaluation system—just one better than Oakley’s. All systems will have flaws and require human judgement.

Special cases

Some administrators may wish to reward a functional group or penalize a dysfunctional group. They may wish to violate the zero-sum property to do this. This can be applied as a post-processing step.

Algorithm Example

Let’s say a team has 4 members and receives a collective grade of 75% on an assignment. The professor is willing to allow each individual mark to shift 9% according to peer evaluations, thus, the lowest possible mark is 66% and the highest is 84%.

m = 4 (Ahmed, Bob, Charlie, Denise)

G = 75

r = 9

First, we give each member a temporary score of G – r: 66%.

Now, each group member i submits their vector Ei with m elements. We ignore Eii. Each element Eij is a number from 0 to (2 × r) ÷ (m – 1), i.e. from 0 to 6. Also, ∑E= r i.e. the sum of the elements is 9. That is, each group member allocates 9 points among the other three, but with a cap of 6 to prevent one member from getting an adjustment higher than 9.

Subjective Evaluations

This system of points allocation satisfies our properties but it feels crass somehow. There is an elegance to independent evaluations from “No Show”, “Superficial”, “Unsatisfactory”, “Deficient”, “Marginal” through “Ordinary”, “Satisfactory”, “Very Good” and “Excellent.”

We therefore want an adapter from the Oakley system. We will translate evaluations on this 9-point scale into a points allocation†. We’ll call the minimum Oakley score “No Show” Kmin, and set it to 1. We’ll call the maximum Oakley score “Excellent” Kmax, and set it to 9.

To allocate each student’s points, we have a vector s, 1 ≤ s ≤ 9 of their subjective evaluations of their teammates. Say Ahmed gave Bob a “Superficial”, Charlie “Satisfactory” and Denise “Excellent.” Ahmed’s s is therefore {◊, 2, 7, 9}. First, we compress the range of each entry by 6/9 in order to satisfy the requirement to give no element higher than 6: t = {◊, 1.3, 4.7, 6}. Now we multiply by r/∑t to make the sum r: u = {◊, 1, 3.5, 4.5}.

Will this always work? The range of s is {◊, 1, 1, 1} to {◊, 9, 9, 9}. With (2 × r) ÷ (m – 1) = 6, as in the example, the range of t is {◊, 6/9, 6/9, 6/9} to {◊, 6, 6, 6}. The range of ∑t is 2 to 18. Thus the range of r/∑t is to 1/3 to 4.5.

We have a positive multiplier whenever r > ∑t but in this situation no t will be scaled larger than (2 × r) ÷ (m – 1), so we maintain our desirable properties†.


Choose m, G, r

If you are not using a subjective scale, ask each group member i for a points allocation Eij with a maximum of (2 × r) ÷ (m – 1) that sums to r. ai = sum over j: ∑Eji – r.   

If you are using a subjective scale, ensure that r ≥ Kmax when you serialize the scale into numeric form. 

Collect the matrix Sij with scores from 1 to Kmax. Let Tij = (2 × r) ÷ (m – 1) × S. For each row Ti, Ei = r / ∑Ti. ai = sum over j:Eji – r.


A.G. helped with the main formula. Lila Fontes helped with the subjective evaluation formula. 

†We need r ≥ Kmax. We multiply t by r/∑t. The lower bound on ∑t is (m-1)×Kmin×2r÷((m-1)×Kmax). For simplicity we assume Kmin = 1.  We cancel out the (m-1) to find min(∑t) = 2r/Kmax. Thus, at worst, we multiply t by Kmax/r. To make sure this number is ≤ 1, we need to make sure r ≥ Kmax.

Aside: Extended Algorithm

A variant of this idea could be used to allocate bonuses in a company. Let B be the total bonus to be allocated in a team. Here, each team member allocates n × B × w dollars among n colleagues, where the n colleagues are those the individual is qualified to evaluate, w is an weighting factor for the individual’s importance (∑w = 1 for a bonus pool, w = 1/(m-1) in the student example), and m is the total number of people in the bonus pool.

%d bloggers like this: