For the better part of a decade, Deep Learning has been defining the state of the art in various machine learning tasks. Ranging from Computer Vision [1] to Natural Language Processing [2], models have achieved remarkable results at the cost of doubling computation resources every 3.4 months [3]. As hardware efficiency struggles to keep up with the exponential growth in architecture complexity, a similar trend in energy consumption is created as more hardware and cooling systems are required to train modern models. In fact, recent studies [4][5] have shown that training large models in conventional data centres, a sector that already amounts for 0.3% of the world’s carbon emission [6], can cause significant increase in CO2eq production.

How Green is Federated Learning?

Fortunately, not all hope is lost and a more carbon-friendly way to train neural networks exists. In Federated Learning (FL), training is performed not inside large data centres, but distributed over thousands of mobile devices, such as smartphones, where data is usually collected by the end-users themselves.

An example of an application currently using FL is the next-word prediction in mobile phones [7]. In this application, each smartphone (client) trains a local network (model) to predict which word the user will type next based on their previous text messages. Trained local models are then sent to a server to perform a much simpler task called aggregation in which a final model will be generated and sent back to all users.

Besides the privacy-related gains of not having to send user-data to a centralised server, in our recent work [8], [9] we show that FL can also have a positive impact in reducing carbon emissions derived from Deep Learning. Although mobile devices are much less powerful than server GPUs, FL benefits from not needing any cooling and from having wide pools of devices for training.

The Federated Learning Carbon Calculator

So how much carbon does Federated Learning produce? The following calculator allows you to estimate how much CO2eq is produced by a given pool of devices.

To help you define your training scenario here are a few definitions and hyperparameters used in FL, followed by realistic input values.

  • Devices: Selects the hardware profile being used.
  • Country: Defines the energy mix and Internet Upload/Download speeds of a given pool of devices. The former parameter defines the electricity/carbon conversion rate while the latter parameters help estimate emissions due to client-server communication.
  • Dataset: Defines the balanced, non-IID dataset being used.
  • Number of rounds: Total number of times the server will aggregate results from clients (10-100).
  • Number of local epochs: Number of times each client will train on their on their local data before sending their model for aggregation (1-5).
  • Number of active devices: Number of devices participating in each round. This should be a small fraction of the total number of devices available (10-1000).
For centralised training emissions, we invite you to check out the ML CO2 Impact calculator.
0
g CO2eq
Devices type:
Country:
Dataset:

Number of active devices:
Number of local epochs:
Number of rounds:

Device pool:

(#Devices, Type, Country, Download(Mbps), Upload(Mbps), #Epochs, #Rounds)

Wow! Your FL setup produces: g CO2eq.

To put this into perspective...

  • Sending an email from laptop to laptop generates 0.3g CO2eq[1].
  • Making a Zoom call from a regular laptop generates 10g CO2eq[1].
  • Training a neural network in a data centre in China to reach 70% of accuracy on the Speech Commands dataset may produce 17.5g CO2eq[2].
  • Training a neural network in a data centre in the US to reach 50% of accuracy on the ImageNet dataset may produce 889g CO2eq[2].
Sources:

[1] "How Bad Are Bananas? The carbon footprint of everything" by Mike Berners-Lee.

[2] "A first look into the Carbon footprint of Federated Learning" by Qiu et al..

References

  • [1]AI and Compute. 2019
    • BibTeX
    • URL
  • [2]
    Energy and policy considerations for deep learning in NLP.
    By Strubell, E., Ganesh, A. and McCallum, A.
    In arXiv preprint arXiv:1906.02243, 2019.
    • BibTeX
  • [3]A first look into the carbon footprint of federated learning. 2021
    • BibTeX
  • [4]
    Language models are few-shot learners.
    By Brown, T.B., Mann, B., Ryder, N., Subbiah, M., Kaplan, J., Dhariwal, P., Neelakantan, A., Shyam, P., Sastry, G., Askell, A. and others.
    In arXiv preprint arXiv:2005.14165, 2020.
    • BibTeX
  • [5]
    Imagenet classification with deep convolutional neural networks.
    By Krizhevsky, A., Sutskever, I. and Hinton, G.E.
    In Advances in neural information processing systems, vol. 25, pp. 1097–1105, 2012.
    • BibTeX
  • [6]
    Quantifying the Carbon Emissions of Machine Learning. arXiv 2019.
    By Lacoste, A., Luccioni, A., Schmidt, V. and Dandres, T.
    In arXiv preprint arXiv:1910.09700, 2019.
    • BibTeX
  • [7]
    How to stop data centres from gobbling up the world’s electricity.
    By Jones, N.
    In Nature, vol. 561, no. 7722, pp. 163–167, 2018.
    • BibTeX
  • [8]Federated Learning for Mobile Keyboard Prediction. 2018
    • BibTeX
    • URL
  • [9]
    Can Federated Learning Save the Planet?
    By Qiu, X., Parcollet, T., Beutel, D., Topal, T., Mathur, A. and Lane, N.
    In NeurIPS-Tackling Climate Change with Machine Learning2020.
    • BibTeX