Why AI doesn't speak every language



It could learn them all. But will it?

Subscribe and turn on notifications šŸ”” so you donā€™t miss any videos: http://goo.gl/0bsAjO

Large language models are astonishingly good at understanding and producing language. But thereā€™s an often overlooked bias toward languages that are already well-represented on the internet. That means some languages might lose out in AIā€™s big technical advances.

Some researchers are looking into how that works ā€” and how to possibly shift the balance from these ā€œhigh resourceā€ languages to ones that havenā€™t yet had a huge online footprint. These approaches range from original dataset creation, to studying the outputs of large language models, to training open source alternatives.

Watch the video above to learn more.

Further reading:
https://ruth-ann.notion.site/ruth-ann/JamPatoisNLI-A-Jamaican-Patois-Natural-Language-Inference-Dataset-91523ec89af24bfdbcb9c1ec7e28cc3c

This is the hub for Ruth-Ann Armstrongā€™s JamPatois NLI. You can see the dataset and read the paper.

https://arxiv.org/search/cs?searchtype=author&query=Melero%2C+M
You can read Maite Meleroā€™s work on Catalan here.

https://huggingface.co/bigscience/bloom
This is the Hugging Face home for BLOOM, the open source large language model.

Make sure you never miss behind the scenes content in the Vox Video newsletter, sign up here: http://vox.com/video-newsletter

Vox.com is a news website that helps you cut through the noise and understand whatā€™s really driving the events in the headlines. Check out http://www.vox.com

Support Voxā€™s reporting with a one-time or recurring contribution: http://vox.com/contribute-now

Shop the Vox merch store: http://vox.com/store

Watch our full video catalog: http://goo.gl/IZONyE

Follow Vox on Facebook: http://facebook.com/vox
Follow Vox on Twitter: http://twitter.com/voxdotcom
Follow Vox on TikTok: http://tiktok.com/@voxdotcom

source

21 thoughts on “Why AI doesn't speak every language”

  1. It would be far more practical to teach every human on Earth English, than to make AI in every language.

    Yes there are benefits to a diversity of language, but there are far MORE benefits the other way around. While it wouldn't come without sacrifice, the world would genuinely be a better place with only one language.

    Reply
  2. I literally was talking to My AI on SnapChat in Indonesian and then it replied in Indonesian that it prefers English so could I type in English instead.

    Reply
  3. I think we need to split 2 task more clearly – translation and answer the questions. They are different.

    If you take 8 – 10 years old child he already can speak and know a lot of words. If he is bilingual, he can translate live-speech quite well.

    But due to lack of information he cannot answer many questions that ChatGPT can.

    Hence, in order to teach ChatGPT to speak the rare language is not necessary to have a same dataset. It's enough only to make ability of correct translation.

    But for teaching translation, technically enough very small dataset which exist in every lang (excluded almost died).

    Yes, for now AI-translation is not ideal, BUT even translation between giant dataset language is not perfect, so the reason of bad AI-translation is not in the dataset but in technology which just not developed enough.

    Reply
  4. Love VOX content. Here is a quick question – In how many languages is this video on? What is VOX doing to make it available for LRL folks and make them understand the lack of diversity in current language models?

    Reply
  5. lol this video made me giggle a bit, of course those languages wouln't be inclouded or be limited, it needs human ineraction just like how google translates always asks is this translation correct.

    Reply

Leave a Comment