
Sagefy Docs
 What is Sagefy?
 8 Big Ideas
 Cards, Units, & Subjects
Planning> Sequencer Background
Existing Systems
Forgetting Curve
The Ebbinghaus model uses the following formula:
where is retention, is time, and is strength.
Learning Curve
A compliment to the Ebbinghaus Forgetting Curve:
Other models suggest a sigmoid function. CDF of the logistic distribution:
Bayesian Knowledge Tracing
Bayesian Knowledge Tracing determines, given the pattern of learner responses, how likely a learner knows a skill.
Bayesian Knowledge Tracing is based on Bayes’ Theorem.
Here I am using :
to mean ‘given’ because ‘pipe’ implies table to Markdown.
We call…
 the posterior – what we believe after seeing the data
 the prior – what we believe before the data
 the likelihood – how likely the data was given our prior belief
 the normalizer – how likely is the data given all hypotheses
As can be difficult to formulate, often the following expression is useful.
For BKT, we have the following factors:
  probability the skill is learned
  probability the skill will be learned on a particular item
  probability the learner will just guess the right answer
  probability the learner will mess up even knowing the skill
For any item, the probability of getting the answer correct is:
Putting this all together, the probability the learner has learned the skill is, given a correct answer:
Conversely…
All together…
Item Response Theory
Item Response Theory determines how likely a learner will correctly answer a particular question. It is described as a logistic function.
The parameters read:
  learner ability
  item difficulty
  item discrimination; how likely the item determines ability
  item guess
There are two common forms:
The formulas change slightly depending on author.
Item Response Theory can be extended into Performance Factor Analysis, a competing model with Bayesian Knowledge Tracing.
Knowledge Space Theory
Knowledge Space Theory represents what skills learner knows. KST is based on antimatroids.
We assume a learner has either learned a skill or not. Given skills +
, 
, *
, and /
, we would form prerequisites, such as:
+ > 
+ > *
* > /
The knowledge space represents all possible sets of knowledge a learner might have, such as:
 none
+
+, 
+, *
+, *, /
+, , *, /
An individual learner has a likelihood for each of these sets. KST makes the assumption that an individual question may inquire about multiple skills. We begin by asking questions that use multiple skills, and work backwards to assess learner knowledge.
Several automated systems exist to automatically determine prerequisites based on learner performance.
Spaced Repetition
Spaced Repetition suggests that learners will be more optimal by spreading out their practice, with reviews happening less frequently as ability improves.
The most popular algorithm is SuperMemo 2. The first review is after 1 day, the second review is after 6 days. After which, the next review is:
…where is how difficult or easy the item is. is between 1.3 and 2.5, and it uses learner responses on a Likert scale to determine the next time to review.
Later versions of SuperMemo include other considerations, such as:
 Similar cards
 Previous iteration duration
 Ebbinghaus forgetting curve
The latest is version 11/15.
Distribution Types
For each distribution, I am most interested in , or the mean of the distribution, and , or the variance, which can determine how confident we can be in our assertion.
Beta Distribution
Beta distributions map probabilities from 0 to 1, where is the count of positive examples and is the count of negative examples. Computation is fairly straightforward for most statistics.
…where is the mean and is the variance.
Exponential Distribution
The exponential distribution is often used to describe the frequency of timebound events.
Normal Distribution
…where mu and sigma are provided. Mu and sigma can be determined from a sample by using a gaussian kernel.
Pareto Distribution
The Pareto Distribution uses for scale and for shape.
Poisson Distribution
The Poisson distribution is often used for counting events.
Binomial Distribution
Binomial distributions count the number of events each with a probability of , where is the number of successes.
Bernoulli Distribution
The Bernoulli Distribution only has two hypotheses: 0 or 1. The mean is the probably of 1.