Despite its wide applicability, Deep Learning faces significant challenges including scalability, robustness and security, privacy, fairness, and interpretability. The key algorithm underlying deep learning revolution is stochastic gradient descent, which needs to be distributed to handle enormous and possibly sensitive data distributed among multiple owners, such as hospitals and cellphones, without sharing local data. When training deep neural networks on multiple GPUs/clients, communication time required to share stochastic gradients is the main performance bottleneck. Beyond scalability, it is known that deep learning models are vulnerable to natural distribution shifts and various adversarial attacks at training and inference time. We present efficient gradient compression and robust aggregation schemes to significantly accelerate training and enhance security for image classification tasks.