Multi-scale Aggregation Network for Speech Emotion Recognition