Skip to main content

Why Deep Learning Sucks

Why Deep Learning Sucks

After spending some years studying and using deep learning, I always suffered from the difficulty of debugging errors, or setting hyperparameters. As a researcher this can not only waste additional time, but also money and resources. In this article, we will demonstrate how traditional rule-based methods have a hidden edge (beside simplicity) in solving complex problems that require automation.

Self-driving cars problem:

Let's assume we want to solve the self-driving car problem where the car needs to navigate safely to its destination while avoiding crashing with other objects, as shown in the picture below.

The way of thinking (with deep learning):

A deep learning engineer will start by looking for sub-problems that have already been solved using state-of-the-art networks. For example, they may look directly for object detection models like YOLO and then planning modules like Chauffeurnet. In this case, the task is practically solved; however, as we don't care about what's going on inside these models, we are tempted to just pass the live camera feed regardless of important pre-processing steps (e.g., image enhancement or simple detection of road lanes).

Additionally, deep learning engineers may not consider filtering for noise or taking detection uncertainty into account for the planning of route. This can lead to inaccurate predictions and potential safety hazards.

The way of thinking (without deep learning):

On the other hand, a non-deep learning engineer will start by deeply analyzing how to make the problem solvable through simple logic rules. For example, all we care about in detection is driving within the road. We can achieve this by identifying the road as an easily distinguishable element using image processing steps such as detecting line patterns and asphalt color. Furthermore, we will also consider other traffic entities on the road by employing background subtraction techniques.

For planning, a general algorithm like the social force model can be used to find the shortest path to the destination while avoiding obstacles. This solution may seem less reliable at first glance; however, when calibrated and tested well, its performance-to-investment ratio can be surprisingly high.

Real-world applications of traditional rule-based methods:

Traditional rule-based methods have been successful in a variety of real-world applications where deep learning approaches may not perform as well or are less suitable. Some examples include:

  1. Fraud detection systems that analyze transaction data and identify patterns to detect fraudulent activities, such as credit card transactions with unusual spending habits.
  2. Spam filtering systems used by email services that use rules-based algorithms to identify and filter out unwanted emails based on factors like sender reputation or message content.
  3. Speech recognition systems in call centers where traditional rule-based methods can accurately recognize spoken words, even when there is background noise or accents present.
  4. Robotics applications that require precise control over movements and actions of robots to perform tasks such as picking up objects from a table or assembling components for manufacturing processes.

Limitations of deep learning:

While deep learning has revolutionized many aspects of artificial intelligence, it is not without its limitations. Some challenges associated with deep learning include:

  1. Computational complexity - Deep neural networks can be computationally expensive to train and require large amounts of data for effective modeling.
  2. Data scarcity - In some cases, there may simply not be enough training data available to effectively train a deep learning model, leading to suboptimal performance or overfitting issues.
  3. Lack of interpretability - Deep neural networks can be difficult to understand and interpret, making it challenging for humans to explain how the model arrived at its predictions or decisions.
  4. Difficulty in debugging - Debugging errors in deep learning models can be time-consuming and require specialized skills, such as identifying issues with specific layers of a neural network or optimizing hyperparameters.
  5. Ethical concerns - Deep learning systems may have unintended consequences that could lead to biases or discrimination against certain groups if not properly designed and tested for fairness.

Conclusion:

In this post, we just want to clarify the advantage of the workflow without deep learning. Deep learning makes us lazy, because we just learn to "smash" all the inputs unprocessed together and let the magic happen. The danger here is first in developing this mentatlity and an even bigger danger, is letting such not-fully understood system run indepedntly.

The optimal way is to 'deepen' your own learning, and then lastly to involve some neural networks for the really high-order non-linear relationships in your model.

The unexpected winter of Artifical Intelligence

Nowadays, everyone is exicted the latest trends in AI applications, like ChatGPT, self-driving cars, Image synthesis, etc. This overhype is not new, it happened before in the first AI winter in the 1980s. Some warn againt it, because it may cause dispointment and even a new AI winter. But here I will talk about the bottelneck of AI research that I come across in my work. It may be not be called a winter, but it's difinitely will cause a slow down in the field.

What is wrong? Too many models

When a newcomer wants to review tha state of the art in a machine learning field, like objects detection, video understanding, or image synthesis, he will find a lot of models. One approch is to pick just the most notable work in the current year, but this also means that he will miss a lot of other work that may be more suitable for his problem. Another approch is to review all the models, but this is a lot of work, and it's not clear how to compare them.

In the voice of this poor overwhilmed researcher, I would say: "Please, enough with the models, and more coherent thoeories that could add up!".

Research Question

When the research question for the majorty of these papers is viewed, one can see that they have a 'template' reseach question, like:

  • How to improve the performance of the model?
  • How to improve the speed of the model?
  • How to improve the accuracy of the model?

but questions like that should be stated more like:

  • How to solve the performance problems of the last state of the art model?

Because it's not enough to show a better numeric values as an 'answer'. It should be more detailed to when and why the improvment happens. This is diffently much helpful than just saying that the model outperforms the previous state of the art.

Research Importance

It's known that the deep learning field is expermintal. But if the whole contribution is training and evaulating a model, then this is like saying "I proved it in condition X. Every other condition is not tested yet". This actually helps no one. The contribution should be more than just a technical improvment. It should be a new idea that can be used in other fields, or a new way to solve a problem.

What to do?

Less is more

I think this can start from every researcher. Where making a model should be based on robust math foundations. I'm not saying that should enhance the quality over quantity of the field, but it will be easier to review, because a newcomer could easily fit every new work into the whole picture, and know exactly what is needed for what in every situation.

Math-only models

In fact, the new models can be proven without training. It's a brave claim, I know. But I can give some examples, like Kalman filer paper [1], where the kalman filter model is proven to be optimal in the linear case only mathimatically. Another example is "Visual SLAM, why filter?" [2] where partly the proof is mathmatical for the superiority of bundle adjustment method over kalman filtering method.

If you have a good knowledge in the fields of robotics, you will know that these papers are really famous. So one might say that, not everyone can do that. But why not try at least?. Or just make it clear that the model is not proven, and it's just a good guess while also making effort to explore the theory behind it.

Lastly, GPUs for all

I think that the main reason for the overhype is the availability of GPUs. It's not a secret that GPUs are the main reason for the recent AI boom. But it's not a secret that strong GPUs are not available for everyone.

Therefore, the ones who has the best hardware, will generate better models. This deviate the reasearch from its goals. One soultion could be to provide GPU access for all researchers, like Google is doing [3]. But this could be done in more broad way that grauntate the elimination of hardware hinderance amonge the researchers.

If one day, all the reseach community used one gloabl system, this will also ease the reproduction of the work.

Note:

This article is my personal opnion based on my limited experince in the field. As I'm still learning, I'm open to any feedback and discussion.

References

[1] https://asmedigitalcollection.asme.org/fluidsengineering/article-abstract/82/1/35/397706/A-New-Approach-to-Linear-Filtering-and-Prediction?redirectedFrom=fulltext

[2] https://www.sciencedirect.com/science/article/abs/pii/S0262885612000248

[3] https://edu.google.com/programs/credits/research/?modal_active=none

Train your deep neural network faster with Automatic Mixed Precision

Have you been working on deep learning model with big size and wandered how to squeeze every possibility to save your time? or maybe you have the best GPU hardware but still find the speed too slow. Well, look at the bright side. This means you still have room for improvment :)

One option for speeding up the deep learning model training was always stacking more digital circuts in optimized hardware devices like GPUs or TPUs. However, here we show additional option, namely, the adaptive changing of precision in order to save computation time while keeping the same accuracy at the same time.

The idea is simple, use FP32 when it's needed only, for example for small gradients. Otherwise, use FP16 precision when it's enough.

Needed Hardware

Usually you may get some speed up in any hardware type, however, if your device is NVidia (Ampere, Volta or Turing) the speed up is about 3X at best.

To know your device type, just issue the command nvidia-smi

Needed Software

Most popular deep learning framework support this feature, like Tensorflow ,Pytorch and MXNET. Just to show-case, below an example of a network with pytorch is provided

Example

First we need to define the network model:

import torch
from torch import nn


class Model(nn.Module):

    def __init__(self,layer_1=16,layer_2=16):

        super().__init__(Model)

        self.fc1 = nn.Sequential(8,layer_1)
        self.fc2 = nn.Sequential(layer_1,layer_2)
        self.fc3 = nn.Sequential(layer_2,1)

    def forward(self):

        x = self.fc1(x)
        x = nn.functional(x)

        x = self.fc2(x)
        x = nn.functional(x)

        x = self.fc3(x)
        x = nn.functional(x)

Before running the training program , we initilize some dummy inputs/outputs

batch_size = 512
data = [torch.randn(batch_size, 8, device="cuda") for _ in range(50)]
targets = [torch.randn(batch_size, 1, device="cuda") for _ in range(50)]

loss_fn = torch.nn.MSELoss().cuda()

net = Model()

Now, the training program is ran normally as follows (using FP32 precision)

opt = torch.optim.SGD(net.parameters(), lr=0.001)

epochs = 1
for epoch in range(epochs):
    for input, target in zip(data, targets):
        output = net(input)
        loss = loss_fn(output, target)
        loss.backward()
        opt.step()
        opt.zero_grad() 
        # set_to_none=True here can modestly improve performance

If we want to use the special automatic precision, we should wrap the training with a scaler. This scaler will change the precision as needed (between FP32 and FP16)

use_amp = True

opt = torch.optim.SGD(net.parameters(), lr=0.001)
scaler = torch.cuda.amp.GradScaler(enabled=use_amp)

for epoch in range(epochs):
    for input, target in zip(data, targets):
        with torch.autocast(device_type='cuda', dtype=torch.float16, enabled=use_amp):
            output = net(input)
            loss = loss_fn(output, target)
        scaler.scale(loss).backward() #instead of loss.backward
        scaler.step(opt) # instead of opt.step()
        scaler.update() # to prepare for next step
        opt.zero_grad() 
        # set_to_none=True here can modestly improve performance

To check the speedup, you can measure the runtime difference between the two last blocks.

Thanks for reading!

You can find the original post as well as others in my blog-post here

References:

  1. https://developer.nvidia.com/automatic-mixed-precision
  2. https://pytorch.org/tutorials/recipes/recipes/amp_recipe.html