Thursday, October 02, 2014

10 things that algorithms can do that teachers can't

Algorithms trump data
Love them or hate them, you use them and algorithms are here to stay. Algorithmic power drives Google, Facebook, Amazon, Netflix and many other online services, including many more professional services you use, such as communications, finance, health, transport and so on. There is some confusion here, as data is being touted as the next big thing but data is dead in the water if not interpreted and then used to change or do something. If data is the new oil, algorithmic power is the new turbo-charged engine. Another important factor here, that has led to the renewed efficacy of algorithms, is the internet. To take our transport metaphor further, if algorithms are the new rockets, the internet is the new rocket fuel, supplying and endless stream of big data to go where no man has gone before....  Houston - these metaphors are starting to break up!
What role for algorithms in learning?
We are now in the Age of Algorithms and so far, the most promising use of educational data is through algorithms. Yet algorithms are faceless and anonymous, hidden from view. As users, we rarely know what role they play in our lives, if we're even aware of their agency at all. Like icebergs, their power lies hidden beneath the surface, with only a user interface visible above the waterline. So let’s make them a little more visible.
There are many species of algorithm in learning, with a full 5-Level taxonomy here. First there's algorithms embedded in the tech we use - mobiles, laptops, VR etc., then there's assistive algorithms such as Google search that help us find things, then analytics where we try to predict and improve things from data sets. beyond this are hybrid adaptive systems that help teachers and organisations learn. It’s like using a satnav in your car. It knows where you’ve come from, where you’re going and how to get you back when you go off course. It may even know when you need a rest and whether you’re comfortable with driving on the motorway or would be best routed through other roads. Satnavs are massively algorithmic, and personalised, as is adaptive, algorithmic teaching.
In a learning journey, something similar can be implemented, where ensembles of algorithms can analyse data about the student and content, leading to real-time improvements in both. Note that they can do this in real-time and also learn as they go, matching the most appropriate content to the student at any given time. This can lead to quicker course completion, lower drop-out rates, higher attainment and lower costs. Finally there are fully autonomous learning systems, like autonomous cars, where the learner learns, without the aid if a teacher.
How do they work?
(You can skip this if you have no interest in the background maths.)
2500 years of algorithms
Euclid was the first to formally write down an algorithm, with Aristotle formalising syllogistic logic. But it is the Arab mathematician Al Kwariizmi that gave us the word algorithm, through the Latinised form of his name. We then have logicians like Boole and Frege, alongside probability theorists such as Pascal, Fermat, laplace, Cardano, Berouilli and Bayes. Algorithmic thinking and AI has not sprung up out of nowhere, it has had a two and a half millennia gestation period.
Bayes theorem
In 1763 a posthumously published essay by the Reverend Thomas Bayes, presented a single theorem that updated a probability when presented with new evidence. This gives you the ability to continue to update probabilities in the light of new evidence, new predictors and so on, all into a single new probability. In learning, this allows an algorithmic system to continue to update predictions and recommendations for students and content configuration over time. Interestingly, this often reduces the probability as intuition, through cognitive bias, often exaggerates probabilities, through inadequate analysis.
In addition to the use of Bayesian data analysis is the use of a Bayesian network. This is a model that has ‘known’ and ‘unknown’ probabilities from, say student data, behaviour and performance. The network has nodes with variables (known and unknown) and algorithms can both make decisions and even learn within these networks. It’s basically the application of Bayes theorem to solve complex problems, such as the optimal path for personalised learning. The network will therefore recommend the optimal content going forward.
Enter another important name Andrey Markov, a Russian mathematician who introduced the Markov network. Whereas a Bayesian network is directed and not cyclic, a Markov network is undirected and can be cyclic. Markov models can be used to determine what the learner gets as they attempt a course based on previous behaviours. You may be unaware, for example, that these techniques are already used to present you with a different web page from others from major providers.
Quite separately, Corbett introduced a Bayesian knowledge-tracing algorithm, directly into the learning field, which is more directly associated with data mining, from, for example, learning management systems, which produce large amounts of data about learner behaviour. This can be used to come to a conclusion and make a decision about what is needed next. Note that all of these approaches (and there are many more) are very different from rule-based adaptive systems. The difference between these systems is explained well in this paper by Jim Thompson.
We should note that this field has 250 years of mathematical thinking behind it and has an enormous amount of mathematical complexity. Nevertheless, having born fruit in other online contexts there is every reason to think it will bear fruit in learning. Learning algorithms can, through algorithms. embody evidence-based learning theory, to increase the productivity of the teaching. But what really drives algorithmic, adaptive learning are the advantages they afford to the learner:
1. Gender, race, colour, accent, social background
Algorithms are blind to the sort of social biases (gender, race, colour, age, ethnicity, religion, accent, social background) we commonly see, not only in society through sexism, racism and snobbery but also in teaching, where social biases are not uncommon. In education, it is useful to distinguish between subtle and blatant biases, in that the teacher may be perceived to be unbiased and not be aware of their own biases. We know, for example, that gender bias has a strong effect on subject choice and that both gender and race affect teacher feedback. Algorithms can be free of such social biases.
2. Free from cognitive biases
Cognitive biases around ability versus effort, made clear by the likes of Carol Dweck on fixed versus growth mindsets, clearly affect teacher and learner behaviour leading to self-fulfilling predictions on student attainment. Considerable bias in marking and grades has also been evidenced. There may also be ingrained theories and practices that are out of date and now disproven, such as learning styles, that heavily influence teaching. Algorithms, build on sound theory and practice, can, over time, based on actual evidence, try to eliminate such biases.
3. Never get tired, ill, irritable or disillusioned
To teach is human and teacher performance is variable. That is not a criticism of teachers but an observation about human nature and behaviour. Algorithmic behaviour is only variable in the sense that it uses variables. Algorithms are at the top of their game (albeit limited) 24/7/365. Of course, one could argue that the affective, emotional side of learning is not always provided by algorithmic learning. That is true but good design in can ensure that it is a feature of delivery. Sentiment or emitional analysis by machine learning is making good progress. So even here, algorithmic techniques around gesture recognition, attention and emotion are being researched and built.
4. Algorithms can do things that brains cannot
Seems like a bold claim, but the number of variables, and sheer formulaic power of an ensemble of algorithms, in many areas, is well beyond the capability of the brain. In addition, the data feeds and data mining opportunities, as well as consistent and correct delivery of content may also be beyond the capability of many teachers. The problem is that most teaching is not one-to-one and therefore those tacit skills are difficult to apply to different classes of learners, the norm in educational and training institutions. For the moment there are many tacit skills in teachers that algorithms have not captured. That has to be recognised but that is not a reason for stopping, only a reason for driving forward. We will see every more sophiticated analsysis of cognitive behaviour, where the sheer number of gognitive misconceptions and problems cannot be identified by a teacher but can by careful AI analysis.
5. Personalises the speed of learning
A group of learners can be represented by a distribution curve. Yet suppose we use a system that is sensitive not just to the bulk of learners but also the leading and trailing tail? Algorithms treat the learners as an individual and personalise the learning journey for that learner. You are, essentially streaming into streams of one. The consequence is the right route for each individual that leads to learning at the speed of ability at any given time. The promise is that learners get through courses quicker. More than this, Bloom in his famous 2-Sigma paper, shwoed the significant advantage of one-to-one teaching over other forms of instruction. We now have the opportunity to deliver on this researched promise. We already have evidence that this can be achieved on scale.
6. Prevents catastrophic failure and drop-out
Slower learners do not get left behind in adaptive, AI-driven systems or suffer catastrophic failure, often in a final summative exam when it is too late, because the system brings them along at a speed that suits them. The UK University system has a 16% dropout rate and this is much higher in the US. In schools a considerbale number of students fail to achieve even modest levels of attainment. This approach could can lower drop-out, something that has critical personal, social and financial consequences.
7. Personal reporting
Such systems can produce reports that really do match personal attainment, through personal feedback for the learner than informs their motivation and progress through a course. Rather than standard feedback and remedial loops, the learner can feel as though they really are being tutored, as the feedback is detailed and the learning journey finessed to their personal needs. Teachers also have a lot to gain from feedback out of such systems. Early evidence suggests that good teachers in combination with such systems, produce great results.
8. They learn
Teachers need to learn, though many would question the efficacy of INSET days or current models of rushed or absent CPD. Algorithmic systems also learn. It is a mathematical feature of machine learning that the system gets better the more students that take the course. We must be careful about exaggerated claims in this area but it is an area of intense research and development. We are now at a level where adaptive systems themselves adapt, as more and more students go through the system, It is this ability to constantly and relentlessly learn and improve that may, ultimately take AI beyond the ability of teachers to constantly adapt.
9. Course improvement
Courses are often repeated, without a great deal of reflection on their weaknesses, even inaccuracies. Many studies of textbooks have shown that they are strewn with mistakes. The same is true of exams and high-satkes assessment. Adaptive, algorithmic systems can be designed to automatically identify erroneous questions, weak spots, good resources even optimal paths through a network of learning possibilities. They may even be able to identify cheating. We have seen many examples of data analsis being used to identify teacher and student cheating. One further possibility is in courses that are semi-porous, where learners use an external resource, say a Wikipedia page or video, and find it useful, thereby raising its ranking in the network of available options for future learners. This is true with systems like WildFire.
10. Massively scalable
Humans are not scalable but algorithms are massively scalable. We have already seen how Google, Facebook, Amazon, Netflix, retailers and many other services use algorithmic power to help you make better decisions and these operate at the level of billions of users. In other words there is no real limit to their scalability. If we can apply that personalisation of learning on a massive scale, education could break free of its heavy cost burden.
Conclusion
The algorithmic, adaptive approach to learning promises to provide things that live teachers cannot and could never deliver. All of the above is being realised through organisations like CogBooks, who have built adaptive, algorithmic systems. This is important, as we cannot get fixated by the oft repeated mantra that face-to-face teaching is always a necessary condition for learning - it is not. Neither should we simply stop at the point of seeing technology as merely something to be used by a teacher in a classroom. It can, but it can be more than this. This approach to technology-based learning could be a massive breakthrough in terms of learning outcomes for millions of learners. It already operates in the learning sphere, through search, perhaps the most profound pedagogic change we have seen in the last century. For me, it is only a matter of when it will be used in more formal learning environments.

2 comments:

Frances Bell said...

Really interesting and useful article, and glad to see on Twitter that you will write an article on this. I hope you will also cover the things that algorithms aren't good for - or perhaps the important aspects of learning that don't generate data that can be fed into algorithms.
Noted Seb's comments about learner agency too. I wonder if algorithms migt get stuck with early assumptions like 'learning styles' did. A reluctance to change basic assumptions because it mucks up data already collected.

Donald Clark said...

Agree Frances. Lots of things that algorithms don't do well. And as algorithms are designed by humans they can capture bad ideas and processes. Indeed, some systems already capture 'learning styles' in adaptive learning. Will be writing downside pice soon. soon.