Javdet Suleymanov: ‘Each Turkic dialect carried its own part of the picture of the Proto-Turkic language’

Why Tatar is valuable for explainable artificial intelligence — the next step in the development of AI

Javdet Suleymanov: ‘Each Turkic dialect carried its own part of the picture of the Proto-Turkic language’
Tatar translator according to Kandinsky . Photo: Радиф Кашапов

The 12th TurkLang 2024 international conference on computer processing of Turkic languages was held in Kazan. According to its speakers, these languages, due to their qualities, are best suited for creating explainable artificial intelligence, which will make AI technology more understandable and transparent, and also expand the scope of its application in life. But in order to implement it, joint work of scientists of all Turkic peoples is necessary.

Development is impossible without digitalisation

“We are now seeing how digital technologies are becoming increasingly important in our lives, and without their development, the life of our native languages is impossible,” the conference, which was first held in Astana in October 2013, began with these words. It was also held in Istanbul, Bishkek, Tashkent, Simferopol, Ufa, Kyzyl, Nur-Sultan and Bukhara, and in 2015 it was organised in Kazan. Next year, the meeting will celebrate its anniversary, so President of the Academy of Sciences of the Republic of Tatarstan Rifkat Minnikhanov approached the organisers with a proposal to hold it in the Tatarstan capital for the fourth time in 2025 — there were no objections.

In total, more than 150 experts from Russia, Kazakhstan, Uzbekistan, Germany, the USA, Kyrgyzstan, Azerbaijan, Japan, Mexico, Turkey, South Korea, and Taiwan participated in the conference. Scientists listened to and discussed 82 reports, two thirds of which were offline. In particular, they discussed issues of national localisation of computer systems and terminology, systems of morphological and syntactic text processing, machine translation and speech technologies.

Deputy Minister of Digital Development of the Republic Bulat Gabdrakhmanov noted that now the connection between IT experts and colleagues from the field of science is especially important:

“Developers and teams that implement solutions in IT services are very involved in issues of artificial intelligence in the field of speech synthesis, as well as AI related to various languages. We see this at hackathons: young people are interested in implementing the boldest solutions in the field of speech synthesis, artificial intelligence related to languages, including, of course, the Tatar language. I would like this symbiosis to be closer and turn into real projects, results that can be used by all Turkic-speaking peoples.”

Bulat Gabdrakhmanov is for symbiosis. предоставлено пресс-службой «СИБУР»

The Tatar language is natural morphology, recursion, fractality

The conference was dedicated to the Year of Scientific and Technological Development in the Republic and the 15th anniversary of the Institute of Applied Semiotics of the Academy of Sciences of the Republic of Tatarstan. Speaking about his activities, Director of the institute Rinat Gilmullin noted: “No one doubts that if we want to preserve the language, it should be actively introduced into the digital space.”

Not all of the institute's achievements are widely known, such as the localisation of Windows and Astra or joint developments with the Ministry of Digital Development of the Republic of Tatarstan, and manuals for online learning for year five students. At the same time, the TatSoft translator is popular and continues to develop, having received a total of 30 million requests from 136 countries. This year, a speech web service for speech analysis was introduced into it. With its help, our officials translate regulatory documents.

“Our research shows that Turkic languages, in particular the Tatar language, with their virtually unlimited natural morphology, such natural mechanisms as recursion, fractality, the ability to specify fuzzy information and knowledge proactivity, are the most suitable and ready-made tools for describing the cognitive system of artificial intelligence,” Gilmullin noted.

He also added that the institute is currently planning to develop a cloud platform api.tatar.ai: these are educational services using voice input, automated call centres and telephone robots, AI in Tatar, automatic recognition and translation of speech, a modern literary portal in Tatar.

By the way, the institute’s projects have earned praise from Andrey Mikheyev, a representative of Yandex. He described in detail how his company implemented machine translation for various languages, reporting that mastering Mari would require a week of processing several million examples on a video card worth an apartment (“not in Kazan, I looked, it’s just awful here”) in a small Russian city.

Yandex has had Tatar, along with Bashkir, Yakut and Bashkir, for a long time, thanks to the fact that the language has a large database. At the same time, among the models by which the system was trained, there was one that learned from all Turkic languages at once.

An example of Tatar word formation. предоставлено пресс-службой ИПС АН РТ

“We raise the issue of database compatibility at every conference”

The programme statement was made by the co-chairman of the programme committee Javdet Suleymanov, chief researcher of the Institute of Applied Semiotics of the Tatarstan Academy of Sciences. He recalled that artificial intelligence was not only a solution, but also a threat, he pointed out that one of the solutions to curb AI, scientists see the creation of explanatory artificial intelligence — eXplanatory AI, XAI. Turkic languages are better suited for it. This is not an easy task, for example, for Tatar alone this means the introduction of 100 thousand root morphemes, 200 word-formation morphemes and more than 90 affixal morphemes, not counting dialects and subdialects.

“We raise the issue of database compatibility at every conference,” noted Suleymanov, adding that at the same time he and his colleagues are not making much progress in this direction, scientists are still working separately.

“Each Turkic language, subdialect, dialect has carried its part of the picture of the Proto-Turkic language for centuries, millennia. The time has come to collect these parts into a single picture in the form of a common model,” Suleymanov summarised.
Radif Kashapov

Подписывайтесь на телеграм-канал, группу «ВКонтакте» и страницу в «Одноклассниках» «Реального времени». Ежедневные видео на Rutube, «Дзене» и Youtube.

Tatarstan