← Back to glossary

Federated Learning

Definition

A training approach in which a shared model is trained across multiple decentralised data sources without the training data itself leaving those sources. Instead, only model updates — gradients or weight deltas — are exchanged and aggregated on a coordinating server.

Noise — Signal

Federated learning is marketed as a "privacy miracle": data stays local and the model still gets better. The reality is more nuanced. First, model updates can under certain conditions be analysed in ways that allow training data to be partially reconstructed (membership inference, gradient leakage); real privacy additionally requires differential privacy or secure aggregation. Second, federated setups significantly increase complexity in orchestration, versioning and evaluation. Third, the approach works best when the distributed data sources are homogeneous — and the heterogeneity that actually justifies federated learning is at the same time its biggest quality lever.

The right question

Not: "Can we use federated learning to protect our data?" But: "Which regulatory or contractual requirement actually forbids merging the data, which additional privacy mechanisms are needed for it, and is the complexity overhead in proportion to the actual privacy improvement over centralised training with DP?"

← Back to glossary