We are a community of AI/ ML/Generative AI enthusiasts/researchers/journalists/writers who share interesting news and articles about the applications of AI. 

Machine Learning News

MinMo, developed by researchers from Tongyi Lab and Alibaba Group, is an advanced multimodal large language model designed for seamless voice interaction. Trained on over 1.4 million hours of speech data, MinMo excels in tasks like speech-to-text, text-to-speech, emotion recognition, and multilingual speech recognition. The model uses components like SenseVoice-large, Qwen2.5-7B-instruct, and CosyVoice 2 to provide real-time responses with low latency while preventing catastrophic forgetting. MinMo outperforms other models in benchmarks, including Whisper Large v3, especially in multilingual speech tasks and natural voice interactions.

MinMo: A Multimodal Large Language Model with Approximately 8B Parameters for Seamless Voice Interaction