# Planning> Sequencer Background

## Existing Systems

### Forgetting Curve

The Ebbinghaus model uses the following formula:

where $r$ is retention, $\tau$ is time, and $s$ is strength.

### Learning Curve

A compliment to the Ebbinghaus Forgetting Curve:

Other models suggest a sigmoid function. CDF of the logistic distribution:

### Bayesian Knowledge Tracing

Bayesian Knowledge Tracing determines, given the pattern of learner responses, how likely a learner knows a skill.

Bayesian Knowledge Tracing is based on Bayes’ Theorem.

Here I am using : to mean ‘given’ because ‘pipe’ implies table to Markdown.

We call…

• $p(A:B)$ the posterior – what we believe after seeing the data
• $p(A)$ the prior – what we believe before the data
• $p(B:A)$ the likelihood – how likely the data was given our prior belief
• $p(B)$ the normalizer – how likely is the data given all hypotheses

As $p(B)$ can be difficult to formulate, often the following expression is useful.

For BKT, we have the following factors:

• $p(L)$ - probability the skill is learned
• $p(T)$ - probability the skill will be learned on a particular item
• $p(G)$ - probability the learner will just guess the right answer
• $p(S)$ - probability the learner will mess up even knowing the skill

For any item, the probability of getting the answer correct is:

Putting this all together, the probability the learner has learned the skill is, given a correct answer:

Conversely…

All together…

### Item Response Theory

Item Response Theory determines how likely a learner will correctly answer a particular question. It is described as a logistic function.

• $\theta$ - learner ability
• $b$ - item difficulty
• $a$ - item discrimination; how likely the item determines ability
• $c$ - item guess

There are two common forms:

The formulas change slightly depending on author.

Item Response Theory can be extended into Performance Factor Analysis, a competing model with Bayesian Knowledge Tracing.

### Knowledge Space Theory

Knowledge Space Theory represents what skills learner knows. KST is based on antimatroids.

We assume a learner has either learned a skill or not. Given skills +, -, *, and /, we would form prerequisites, such as:

• + -> -
• + -> *
• * -> /

The knowledge space represents all possible sets of knowledge a learner might have, such as:

• none
• +
• +, -
• +, *
• +, *, /
• +, -, *, /

An individual learner has a likelihood for each of these sets. KST makes the assumption that an individual question may inquire about multiple skills. We begin by asking questions that use multiple skills, and work backwards to assess learner knowledge.

Several automated systems exist to automatically determine prerequisites based on learner performance.

### Spaced Repetition

Spaced Repetition suggests that learners will be more optimal by spreading out their practice, with reviews happening less frequently as ability improves.

The most popular algorithm is SuperMemo 2. The first review is after 1 day, the second review is after 6 days. After which, the next review is:

…where $e$ is how difficult or easy the item is. $e$ is between 1.3 and 2.5, and it uses learner responses on a Likert scale to determine the next time to review.

Later versions of SuperMemo include other considerations, such as:

• Similar cards
• Previous iteration duration
• Ebbinghaus forgetting curve

The latest is version 11/15.

## Distribution Types

For each distribution, I am most interested in $\mu$, or the mean of the distribution, and $\sigma^2$, or the variance, which can determine how confident we can be in our assertion.

### Beta Distribution

Beta distributions map probabilities from 0 to 1, where $\alpha$ is the count of positive examples and $\beta$ is the count of negative examples. Computation is fairly straightforward for most statistics.

…where $\mu$ is the mean and $\sigma^2$ is the variance.

### Exponential Distribution

The exponential distribution is often used to describe the frequency of time-bound events.

### Normal Distribution

…where mu and sigma are provided. Mu and sigma can be determined from a sample by using a gaussian kernel.

### Pareto Distribution

The Pareto Distribution uses $\alpha$ for scale and $\beta$ for shape.

### Poisson Distribution

The Poisson distribution is often used for counting events.

### Binomial Distribution

Binomial distributions count the number $n$ of events each with a probability of $p$, where $k$ is the number of successes.

### Bernoulli Distribution

The Bernoulli Distribution only has two hypotheses: 0 or 1. The mean is the probably of 1.