# Prune Deep Networks

# What is Pruning And Why we need it?

By increasing amounts of data and computational power, deep learning models have become bigger and deeper to better learn from data.**Deploying** these large, accurate models to **resource-constrained** computing environments such as mobile phones, smart cameras, etc poses a few key challenges.

Confronting these challenges, a growing body of work has emerged that intends to discover methods for compressing neural network models while limiting any potential loss in model quality.

Model **Pruning **is a popular approach to reducing a heavy network to obtain a lightweight form by removing redundancy in the heavy network.

# How does it Work?

Let us consider a neural network as a **function **family **f(x; .)**. The architecture consists of the configuration of the network’s **parameters **and the sets of operations it uses to produce **outputs **from **inputs**, including the arrangement of parameters into *convolutions*, *activation functions*, *pooling*, *batch normalization*, etc. We define a neural network model as a particular parameterization of architecture, i.e., **f(x; W)** for specific parameters W. Neural network pruning entails taking as input a model f(x; W) and producing a new model **f(x; M . W0)**. Here W0 is a set of parameters that may be different from W, M is a binary mask that fixes certain parameters to 0, and **.** is the elementwise product operator. In practice, rather than using an explicit mask, pruned parameters of W are fixed to zero or removed entirely.

# Pruning From Scratch

This method prunes redundant connections using a three-step method.

1 — Train the network to learn which connections are important.

2 —Prune the unimportant connections. remove all connections whose weight is lower than a threshold. This pruning converts a dense, fully-connected layer to a sparse layer.

3 — Retrain the network to fine-tune the weights of the remaining connections.

The phases of **pruning **and **retraining **may be **repeated **iteratively to further reduce network complexity.

**The last **step is **critical**. If the pruned network is used without retraining, accuracy is **significantly **impacted.

During retraining, it is better to retain the weights from the **initial **training phase for the connections that survived pruning than it is to re-initialize the pruned layers. CNN's contain fragile co-adapted features: gradient descent is able to find a good solution when the network is initially trained, but not after re-initializing some layers and retraining them. So when we retrain the pruned layers, we should keep the surviving parameters instead of re-initializing them.

Also, neural networks are prone to suffer the **vanishing **gradient problem as the networks get deeper, which makes pruning errors harder to **recover **for deep networks. To prevent this, we fix the parameters for **CONV **layers and only retrain the FC layers after pruning the FC layers, and vice versa.

## Regularization

**Choosing **the correct **regularization **impacts the performance of pruning and retraining. **L1 **regularization **penalizes **non-zero parameters resulting in more parameters near zero. This gives better accuracy **after **pruning, but **before **retraining. However, the **remaining **connections are not as good as with L2 regularization, resulting in lower accuracy after retraining. Overall, L2 regularization gives the best pruning results.

## Dropout

**Dropout **is widely used to prevent over-fitting, and this also applies to retraining. During retraining, however, the dropout ratio must be **adjusted** to account for the change in model **capacity**. As the parameters get sparse, the classifier will select the most **informative **predictors and thus have much less prediction variance, which reduces over-fitting. As pruning already **reduced **model capacity, the retraining dropout ratio should be smaller.

## Pruning Neurons

**After **pruning connections, neurons with zero **input **connections **or **zero **output **connections may be safely pruned. This pruning is furthered by removing all connections to or from a pruned neuron. The **retraining **phase **automatically **arrives at the result where **dead **neurons will have **both **zero input connections and zero output connections. This occurs due to gradient descent and regularization. A neuron that has zero input connections (or zero output connections) will have** no contribution **to the final loss, leading the gradient to be zero for its output connection (or input connection), respectively. Only the regularization term will push the weights to zero. Thus, the dead neurons will be automatically removed during retraining.

# Differences Between Pruning Methods

Pruning methods vary primarily in their choices regarding sparsity structure, scoring, scheduling, and fine-tuning:

## Structure:

Some methods prune individual parameters (**unstructured **pruning). Doing so produces a sparse neural network, which — although smaller in terms of parameter count — may not be arranged in a fashion conducive to speedups using modern libraries and hardware. Other methods consider parameters in groups (**structured **pruning), removing entire neurons, filters, or channels to exploit hardware and software optimized for dense computation

## Scoring:

It is common to score parameters based on their absolute values, trained importance coefficients, or contributions to network activations or gradients. Some pruning methods compare scores locally, pruning a fraction of the parameters with the lowest scores within each structural subcomponent of the network.

Others consider scores **globally**, comparing scores to one another irrespective of the part of the network in which the parameter resides

## Scheduling:

Pruning methods differ in the amount of the network to prune at each step. Some methods prune all desired weights at once in a single step. Others prune a fixed fraction of the network iteratively over several steps or vary the rate of pruning according to a more complex function.

## Fine-tuning:

For methods that involve fine-tuning, it is most common to continue to train the network using the trained weights from before pruning. Alternative proposals include rewinding the network to an earlier state and reinitializing the network entirely.

# How Effective is Pruning?

It has been repeatedly shown that, at least for large amounts of pruning, **many **pruning methods **outperform** **random **pruning. Interestingly, this does **not **always hold for small amounts of pruning,

Similarly, pruning **all **layers **uniformly **tends to perform **worse **than **intelligently **allocating parameters to different layers or pruning **globally.**Lastly, when holding the number of fine-tuning

**iterations constant**, many methods produce pruned models that

**outperform**retraining from

**scratch**with the

**same**sparsity pattern with a large enough amount of pruning.

Retraining from scratch in this context means training a fresh, randomly-initialized model with all weights clamped to zero throughout training, except those that are nonzero in the pruned model.

Another consistent finding is that

**sparse**models tend to

**outperform dense**ones for a fixed number of parameters.

Perhaps most compelling of all is the many results, showing that pruned models can obtain higher accuracies than the original models from which they are derived.

**This demonstrates that sparse models can not only outperform dense counterparts with the same number of parameters but sometimes dense models with even more parameters.**

*References:*

Han, S., Pool, J., Tran, J., and Dally, W. Learning both weights and connections for an efficient neural network. In Advances in neural information processing systems, pp. 1135–1143, 2015.