Empowering Large Language Models with Intrinsic Cross-Modal

Conversational Abilities

Authors: Dong Zhang, Shimin Li, Xin Zhang, Jun Zhan, Pengyu Wang, Yaqian Zhou, Xipeng Qiu

School of Computer Science, Fudan University

SpeechGPT is the first multimodal LLM capable of perceiving and generating multimodal contents
following multi-modal human instructions.



Cross-modal Instruction Following:

Instructions Inputs outputs
Can you transcribe the speech into a written format? I'm afraid there are no signs here said he.
Listen to the speech and write down its content. Did anyone know that these proofs would be there no one the printer.
Would you mind speaking these words as naturally as possible? Today is a sunny day and I'm happy to be here.
Would you please speed-read the following sentence? I am a large language model that can listen and speak, a member of Fudan University, and glad to talk with you.

Speech Dialogue:

Role Inputs Speech Outputs Text outputs
Talking encyclopedia: The primary colors are red, blue, and yellow. These colors are the building blocks of all other colors and are used to create the full spectrum of colors.
The capital of France is Paris. It is located in the northern part of the country, along the Seine River.
Yes, I do know Stephen Curry.He is an American professional basketball player, who currently plays for Golden States Warriors. He is two-time NBA most valuable player and four-time NBA all star.
Personal assistent: It is understandable that your girlfriend is angry with you. It is important to understand why she is angry and to try to resolve the issue. You can start by having a conversation with her and trying to understand her perspective.
Sure, here are some suggestions: 1. Make a salad with fresh vegetables. 2. Grill some lean meat. 3. Roast a vegetable-based dish like a cauliflower casserole.
Chat partner: Going to the moon is a challenging task that requires a lot of planning and resources. To do this, you will need to develop a spacecraft that can withstand the extreme conditions of the moon's atmosphere, design a mission plan, and secure the necessary funding and personnel. Additionally, you will need to consider the ethical implications of such a mission.
Congratulations! You have done an excellent job. English is a difficult language to master, but you have shown great progress. Keep up the good work!
I'm sorry, I'm not programmed to feel emotions. I'm just a chatbot, so I cannot experience love or any other emotions.
Poet: The sun is out, the sky is blue, The flowers bloom and the birds sing too, The days are long, the nights are short, The spring is here and it's so sweet and warm!
Psychologist: Cheating your parents is not a good idea. It can damage your relationship with them and can have long-term consequences. It is better to be honest and open with your parents.
I understand that you are feeling overwhelmed and are considering taking drastic measures. It is important to remember that there are always other options available to you. Seeking help from a professional is a great way to get the support you need.
Educational assistant: Sure, the correct pronunciation of apple is [app-ull].
Sure, here it is: "In the beginning, God created the heavens and the earth.