28 Feb 2018

Problems with Deep Learning

I recently replied to a conversation on the /r/Anrdoid subreddit (of all places) on the topic of Deep Learning. The main question was on the topic of deep learning (DL) being the future of artificial intelligence, or was it a popular fad that was reaching its saturation point. Below is the post in its entirety (with a little bit of initial exposition) to explain my point. I also highly recommend scrolling to the bottom and checking out the Further Reading section to explore the conversations between Gary Marcus and Tom Dietterich to explore the debate more in depth.

I recently replied to a conversation on the /r/Anrdoid subreddit (of all places) on the topic of Deep Learning. The main question was on the topic of deep learning (DL) being the future of artificial intelligence, or was it a popular fad that was reaching its saturation point. Below is the post in its entirety (with a little bit of initial exposition) to explain my point. I also highly recommend scrolling to the bottom and checking out the Further Reading section to explore the conversations between Gary Marcus and Tom Dietterich to explore the debate more in depth.

Introduction

For clarity, I see the value of Deep Learning for the domains it is currently dominating. But whether it’s going to bring us to a new realm of artificial intelligence leaves me skeptical.

First, know that the goal of deep learning - very broadly - is to learn a model from data. A model is just a close approximation of the world such that data can be predicted, like an obscured object or the next frame of a video. There are still plenty of applications and opportunities for deep learning now and in the future. But this is mainly in reply to people who believe deep learning will be the key to huge advancements in AI that consistently beat other systems, and even unlocking the possibility of artificial general intelligence (AGI).

Hardware Limitations

First and foremost, the biggest reason for the surge in popularity and state-of-the-art improvements is linked to advances in hardware which makes processing these equations faster and makes access to the large amount of data required for training easier. This is great because we have companies like DeepMind with access to Google’s massive number of GPUs that can run thousands and millions of games of Go. There is a limit though, since Moore’s Law is starting to hit its saturation point (as predicted by Moore himself). This means DL will eventually hit the point where it’s capitalizing on the most power it feasibly can, and can’t progress further except for more training. That still leaves plenty of opportunity for DL to grow, but probably not enough to see real AGI with just this technique. This is a problem across all of AI.

Inspired Models

There is not much of an existing model to guide further improvements to DL. Yes, there are a lot of people who believe artificial neural networks are biologically inspired, and that’s partially true. But I’ve spoken directly to people who do bio-inspired AI who believe ANNs are the most basic possible concept of bio-inspired design. It’s like basing your cooking on 5-star restaurants using a pan and a campfire. Right track, but completely missing the details that work.

Unfortunately, deep learning has only diverged from the biological inspiration. Many of the modifications that are necessary to see the performance we’re getting are not biologically inspired (everything from Geoffrey Hinton’s dropout to DeepMind’s Monte Carlo techniques). This means we have no evidence or understanding how to continue to progress towards human-level AGI except what we already do in other fields: try different techniques until something seems to work. We always try to relate back to human logic in an ad-hoc way, but there’s no clear understanding if we can actually progress towards AGI.

Symbolic Reasoning

There is a push more recently to highlight possible methods. As it stands, the state of the art deep learning is doing purely predictive methods. Learn a model of the world, pop in a partial state, return the missing information. It doesn’t understand why or how it is that way, just that it is. This is great to beat the world’s best Go player, but not to understand how it beat the world’s best Go player. Again, not inherently bad, but it’s not going to be incredibly useful as we approach harder problems if the agent can’t understand the reasoning for its actions so it can self-improve beyond optimizing against some function and relay the improvements it makes.

You can especially see the problems with lacking symbolic reasoning when you realize the ANN has no sense of correlation vs. causation, and no understanding about the objects or concepts it interacts with. It doesn’t know why a shortcut is interesting or why it might otherwise be problematic down the line, it just knows that the shortcut seems to do what it wants to do faster and it will handle future issues when it gets to them.

What we would want to do is use its abilities to figure out the reasoning behind decisions and predictions. AlphaGo makes a bizarre move that wins a game, and we want to ask it why it did that. To do so, it needs to have some sort of deductive reasoning (even if it’s retroactively applied to decisions) applied to the model of the world. Entire fields exist to develop architectures like this (case-based reasoning, expert systems, etc), many people try to apply symbol learning to systems that are not inherently symbolic (reinforcement learning, planning), and deep learning is reaching the same fate.

Finding the Correct Architecture

This is apparent in so many places in AI, especially what I work with (abstractions and hierarchies). A lot of research has results where a domain - say “reading handwritten notes” - reaches 96% accuracy with deep learning. A new state of the art! What it doesn’t show is the hundreds of working hours that were spent tweaking the architecture to have just the right number of nodes and layers and training data. Deep learning is particularly hindered by it’s supreme optimization to a particular problem and arbitrary metaparameters.

I say the metaparameters are arbitrary, because they are. There’s no good guiding rules or real-world basis (as far as I know) for the probability of dropout, what gradient you should be using, what the initial weights are, or the design of the architecture. For the problems they’re used on now, this isn’t a huge issue. But for the future and for AGI, it’s going to be a huge hurdle.

Transfer / Generalization

Continuing from the previous topic, those excellent state of the art papers also don’t show what happens if you take that model and try to learn handwriting in a different language. Those metaparameters and weights are so finely tuned, it’s likely that it will be far from state of the art relative to other systems explicitly designed for the task (such as natural language processing for a translation system).

There’s not even an easy way to apply prior information. If I want my neural network to construct English sentences, it’s almost impossible to take existing definitions and syntax rules from a dictionary and encode them into my system. It has to be learned by example like everything else. We don’t want to learn our networks from scratch, but there’s no simple way to transfer our current knowledge into them.

Handling Unpredictability

Deep learning is bad at expecting the unexpected. This makes sense, because deep learning is explicitly designed to understand the world on average. The rules of a game are fixed and unmoving, which is why DeepMind loves solving them so much. But when we start to approach real world situations like self-driving cars, it’s almost more important to understand the exceptions to the rules. If it encounters anomalies in the world, it tries to map the anomaly into some sort of existing understanding rather than interpret why and how it’s different, and the impact of this. One of my colleagues is currently research anomaly detection for planning / RL, and it’s definitely not an easy problem.

Summary

This was all a very vague overview of where I think Deep Learning is hitting obvious limitations. None of it is to say deep learning is bad or inferior, but I don’t think it’s the future any more than other actively (or maybe not so actively) developed systems. It is solving a problem that generates a lot of news, but will eventually stop being able to solve harder problems until further innovation.

The things I listed above exist for most other fields in AI (except for ones that are explicitly designed to remedy the problem, in which case, they usually have some sort of other deficiency). Part of AI research is picking one potential way of solving the goal, working through how to minimize the problems to achieve a better result, and iterating on that until you hit a dead end. At which point, you can usually steal an idea from another branch of research and try to use that. It’s how most of the complicated AI systems we see now work. Deep learning is a great tool to use in tandem with other techniques, but it’s not a silver bullet.

Further Reading

Here are some people who are much smarter than me and their opinions on the matter. This recent post assesses Deep Learning (specifically, Deep Reinforcement Learning that we’ve seen used by DeepMind), and how the ridiculous number of samples required don’t justify the results which can be obtained by more specific solutions.

Here is another paper discussing a lot of what I discussed in a different format (and more). A great read that might fill in the lines and give a little more specificity. I would also recommend the rebuttal on Medium to many of the critiques, since a lot of the responses he highlights are the responses people would have to some of the comments I made in my post.