Carbon Dioxide in Federated Learning
For the better part of a decade, Deep Learning has been defining the state of the art in various machine learning tasks. Ranging from Computer Vision [1] to Natural Language Processing [2], models have achieved remarkable results at the cost of doubling computation resources every 3.4 months [3]. As hardware efficiency struggles to keep up with the exponential growth in architecture complexity, a similar trend in energy consumption is created as more hardware and cooling systems are required to train modern models. In fact, recent studies [4][5] have shown that training large models in conventional data centres, a sector that already amounts for 0.3% of the world’s carbon emission [6], can cause significant increase in CO2eq production.
How Green is Federated Learning?
Fortunately, not all hope is lost and a more carbon-friendly way to train neural networks exists. In Federated Learning (FL), training is performed not inside large data centres, but distributed over thousands of mobile devices, such as smartphones, where data is usually collected by the end-users themselves.
An example of an application currently using FL is the next-word prediction in mobile phones [7]. In this application, each smartphone (client) trains a local network (model) to predict which word the user will type next based on their previous text messages. Trained local models are then sent to a server to perform a much simpler task called aggregation in which a final model will be generated and sent back to all users.
Besides the privacy-related gains of not having to send user-data to a centralised server, in our recent work [8], [9] we show that FL can also have a positive impact in reducing carbon emissions derived from Deep Learning. Although mobile devices are much less powerful than server GPUs, FL benefits from not needing any cooling and from having wide pools of devices for training.
The Federated Learning Carbon Calculator
So how much carbon does Federated Learning produce? The following calculator allows you to estimate how much CO2eq is produced by a given pool of devices.
To help you define your training scenario here are a few definitions and hyperparameters used in FL, followed by realistic input values.
- Devices: Selects the hardware profile being used.
- Country: Defines the energy mix and Internet Upload/Download speeds of a given pool of devices. The former parameter defines the electricity/carbon conversion rate while the latter parameters help estimate emissions due to client-server communication.
- Dataset: Defines the balanced, non-IID dataset being used.
- Number of rounds: Total number of times the server will aggregate results from clients (10-100).
- Number of local epochs: Number of times each client will train on their on their local data before sending their model for aggregation (1-5).
- Number of active devices: Number of devices participating in each round. This should be a small fraction of the total number of devices available (10-1000).
Device pool:
(#Devices, Type, Country, Download(Mbps), Upload(Mbps), #Epochs, #Rounds)