Carbon Dioxide in Federated Learning

For the better part of a decade, Deep Learning has been defining the state of the art in various machine learning tasks. Ranging from Computer Vision [1] to Natural Language Processing [2], models have achieved remarkable results at the cost of doubling computation resources every 3.4 months [3]. As hardware efficiency struggles to keep up with the exponential growth in architecture complexity, a similar trend in energy consumption is created as more hardware and cooling systems are required to train modern models. In fact, recent studies [4][5] have shown that training large models in conventional data centres, a sector that already amounts for 0.3% of the world’s carbon emission [6], can cause significant increase in CO₂eq production.

How Green is Federated Learning?

Fortunately, not all hope is lost and a more carbon-friendly way to train neural networks exists. In Federated Learning (FL), training is performed not inside large data centres, but distributed over thousands of mobile devices, such as smartphones, where data is usually collected by the end-users themselves.

An example of an application currently using FL is the next-word prediction in mobile phones [7]. In this application, each smartphone (client) trains a local network (model) to predict which word the user will type next based on their previous text messages. Trained local models are then sent to a server to perform a much simpler task called aggregation in which a final model will be generated and sent back to all users.

Besides the privacy-related gains of not having to send user-data to a centralised server, in our recent work [8], [9] we show that FL can also have a positive impact in reducing carbon emissions derived from Deep Learning. Although mobile devices are much less powerful than server GPUs, FL benefits from not needing any cooling and from having wide pools of devices for training.

The Federated Learning Carbon Calculator

So how much carbon does Federated Learning produce? The following calculator allows you to estimate how much CO₂eq is produced by a given pool of devices.

To help you define your training scenario here are a few definitions and hyperparameters used in FL, followed by realistic input values.

Devices: Selects the hardware profile being used.
Country: Defines the energy mix and Internet Upload/Download speeds of a given pool of devices. The former parameter defines the electricity/carbon conversion rate while the latter parameters help estimate emissions due to client-server communication.
Dataset: Defines the balanced, non-IID dataset being used.
Number of rounds: Total number of times the server will aggregate results from clients (10-100).
Number of local epochs: Number of times each client will train on their on their local data before sending their model for aggregation (1-5).
Number of active devices: Number of devices participating in each round. This should be a small fraction of the total number of devices available (10-1000).

For centralised training emissions, we invite you to check out the ML CO₂ Impact calculator.

g CO₂eq

Devices type:

Country:

Dataset:

Number of active devices:

Number of local epochs:

Number of rounds:

Device pool:

(#Devices, Type, Country, Download(Mbps), Upload(Mbps), #Epochs, #Rounds)

Wow! Your FL setup produces: g CO₂eq.

To put this into perspective...

Sending an email from laptop to laptop generates 0.3g CO₂eq[1].

Making a Zoom call from a regular laptop generates 10g CO₂eq[1].

Training a neural network in a data centre in China to reach 70% of accuracy on the Speech Commands dataset may produce 17.5g CO₂eq[2].

Training a neural network in a data centre in the US to reach 50% of accuracy on the ImageNet dataset may produce 889g CO₂eq[2].

Sources:

[1] "How Bad Are Bananas? The carbon footprint of everything" by Mike Berners-Lee.

[2] "A first look into the Carbon footprint of Federated Learning" by Qiu et al..

References

[1]AI and Compute. 2019

BibTeX

@online{openai,
  title = {AI and Compute},
  author = {Amodei, Dario and Hernandez, Danny and Sastry, Girish and Clark, Jack and Brockman, Greg and Sutskever, Ilya},
  year = {2019},
  addendum = {(accessed: 25.02.2021)},
  url = {https://openai.com/blog/ai-and-compute/}
}

[2]

Energy and policy considerations for deep learning in NLP.

By Strubell, E., Ganesh, A. and McCallum, A.

In arXiv preprint arXiv:1906.02243, 2019.

BibTeX

@article{strubell2019energy,
  title = {Energy and policy considerations for deep learning in NLP},
  author = {Strubell, Emma and Ganesh, Ananya and McCallum, Andrew},
  journal = {arXiv preprint arXiv:1906.02243},
  year = {2019}
}

[3]A first look into the carbon footprint of federated learning. 2021

BibTeX

@misc{carbon_footprint,
  author = {Qiu, Xinchi and Parcollet, Titouan and Fernandez-Marques, Javier and de Gusmao, Pedro Porto Buarque and Beutel, Daniel J. and Topal, Taner and Mathur, Akhil and Lane, Nicholas D.},
  title = {A first look into the carbon footprint of federated learning},
  year = {2021},
  eprint = {arXiv:2102.07627}
}

[4]

Language models are few-shot learners.

By Brown, T.B., Mann, B., Ryder, N., Subbiah, M., Kaplan, J., Dhariwal, P., Neelakantan, A., Shyam, P., Sastry, G., Askell, A. and others.

In arXiv preprint arXiv:2005.14165, 2020.

BibTeX

@article{brown2020language,
  title = {Language models are few-shot learners},
  author = {Brown, Tom B and Mann, Benjamin and Ryder, Nick and Subbiah, Melanie and Kaplan, Jared and Dhariwal, Prafulla and Neelakantan, Arvind and Shyam, Pranav and Sastry, Girish and Askell, Amanda and others},
  journal = {arXiv preprint arXiv:2005.14165},
  year = {2020}
}

[5]

Imagenet classification with deep convolutional neural networks.

By Krizhevsky, A., Sutskever, I. and Hinton, G.E.

In Advances in neural information processing systems, vol. 25, pp. 1097–1105, 2012.

BibTeX

@article{krizhevsky2012imagenet,
  title = {Imagenet classification with deep convolutional neural networks},
  author = {Krizhevsky, Alex and Sutskever, Ilya and Hinton, Geoffrey E},
  journal = {Advances in neural information processing systems},
  volume = {25},
  pages = {1097--1105},
  year = {2012}
}

[6]

Quantifying the Carbon Emissions of Machine Learning. arXiv 2019.

By Lacoste, A., Luccioni, A., Schmidt, V. and Dandres, T.

In arXiv preprint arXiv:1910.09700, 2019.

BibTeX

@article{lacoste1910quantifying,
  title = {Quantifying the Carbon Emissions of Machine Learning. arXiv 2019},
  author = {Lacoste, A and Luccioni, A and Schmidt, V and Dandres, T},
  journal = {arXiv preprint arXiv:1910.09700},
  year = {2019}
}

[7]

How to stop data centres from gobbling up the world’s electricity.

By Jones, N.

In Nature, vol. 561, no. 7722, pp. 163–167, 2018.

BibTeX

@article{jones2018stop,
  title = {How to stop data centres from gobbling up the world's electricity.},
  author = {Jones, Nicola},
  journal = {Nature},
  volume = {561},
  number = {7722},
  pages = {163--167},
  year = {2018},
  publisher = {Nature Publishing Group}
}

[8]Federated Learning for Mobile Keyboard Prediction. 2018

BibTeX

@misc{google_keyboard,
  title = {Federated Learning for Mobile Keyboard Prediction},
  author = {Hard, Andrew and Kiddon, Chloé M and Ramage, Daniel and Beaufays, Francoise and Eichner, Hubert and Rao, Kanishka and Mathews, Rajiv and Augenstein, Sean},
  year = {2018},
  url = {https://arxiv.org/abs/1811.03604}
}

[9]

Can Federated Learning Save the Planet?

By Qiu, X., Parcollet, T., Beutel, D., Topal, T., Mathur, A. and Lane, N.

In NeurIPS-Tackling Climate Change with Machine Learning2020.

BibTeX

@inproceedings{qiu2020can,
  title = {Can Federated Learning Save the Planet?},
  author = {Qiu, Xinchi and Parcollet, Titouan and Beutel, Daniel and Topal, Taner and Mathur, Akhil and Lane, Nicholas},
  booktitle = {NeurIPS-Tackling Climate Change with Machine Learning},
  year = {2020}
}