AI will be trained on government data: what are the risks?
The Russian government will develop new software to test artificial intelligence for security threats. It is planned to test it on an AI trained on government data. As experts said in a conversation with Realnoe Vremya, the idea is fraught with many risks. However, as for dealing with possible negative consequences, the speakers disagreed. So, for example, the question remains about the availability of alternative testing. Read the details in the material.
AI training based on government data
The Russian government will begin checking artificial intelligence systems for threats to the security of the state and defence of Russia. It is planned to develop special software for this purpose.
To do this, in the period from 2025 to 2026, it is planned to conduct research on the principles of analysing AI models trained on government data, and in 2027-2028, to create and implement the first version of the programme that will analyse such models. By 2030, it is planned to ensure the confirmation of the security of five systems for their use in the data economy.
According to the document, 8.1 billion rubles will be allocated for these purposes by 2030. The Federal Security Service is responsible for the implementation of the project.
Currently, commercial companies do not have access to government data. So far, we are talking only about the information needed by the business to provide services. According to Dmitry Chernous, the head of the MTS AI consulting group, this will allow creating AI models that take into account the specifics of a country or region.
From January 1, GOST will be applied, which establishes data protection requirements for the use of AI.
“We will trust more and more AI tasks”
The idea of training artificial intelligence on government data entails risks. However, there are ways to reduce the likelihood of a threat, Ilya Dolgopolov, CEO of Technocratia, told Realnoe Vremya.
“If you ask the question, is it safe to train AI models on government data, then first you need to answer the question: what kind of data? But what is referred to as government data in the context of this news is not mandatory for AI training. Knowing a citizen's phone number or passport does not add value to learning. Learning requires content and a description of behavioral scenarios. If the data is depersonalised to such an extent that it will be impossible to determine with high accuracy their belonging to specific citizens or a group of citizens, then we can talk about relative security," he believes.
The second important question is exactly how AI will be used in state services, Dolgopolov is sure. Since neural networks can hallucinate, relying blindly on their results is unwise. However, if this technology acts as an assistant, and the person checks its work, the risks are reduced.
“The third important context is who owns the AI. If only state-owned services have access to AI trained on government data, this again reduces the risk," he added.
The speaker expressed the opinion that there are alternatives for AI training. For example, the business segment trains its own models based on what it collects and processes itself without access to government data:
“So there are alternatives. I believe that when training AI on government data, the performer of this work must justify the requirements for disclosure of sensitive data. The more sensitive the data, the fewer performers should be allowed to access it and a smaller number of people should be given access to such AI.
In general, the decision to create AI verification software is a natural one:
“A new industry or technology appears, develops chaotically and rapidly, gains critical mass and importance, the moment of regulation comes. The only question is that AI regulation has not yet been learned anywhere in the world. There is not even a basic understanding of how to do this. Therefore, first of all, I would focus my efforts not on regulation, but on dividing the data into those that are important for learning and those that are not mandatory for it. Most of the government data is public, and the one that is anonymous is not mandatory for training. The main thing is to maintain a balance between functionality and user safety.
“This is a necessary evil”
Another interlocutor of Realnoe Vremya, CEO of GO Digital Azamat Sirazhitdinov, expressed the opinion that, despite all the risks of the solution, there are no safer alternatives.
“Let's start with what a neural network is. Unlike other programmes, AI is not an algorithm. No one knows what decision it will make in this or that situation. This is not an algorithm. Until testing is carried out, no one will be able to assume at least some risks. If you don't know how a neural network works with specific data, you can't guarantee security," he explained.
The expert added that if it is planned to use AI on government data, then, accordingly, they need to be trained on the same information.
“What if you create fake data? Imagine you teach students that 1+1=3. What do you gain from this? That children can be taught? You already know that. It's the same with a neural network — it needs to be trained on real data," he stressed.
It would be possible to conduct a test on business sector data, but then the AI would be focused on a specific company, Sirazhitdinov added.
“In general, such software is mandatory, but it will look like a lie detector," said the source of the publication. “A neural network is not an algorithm, but a kind of parody of a human being. You have neurons in your brain, and based on the incoming information, a decision on one or another external influence is formed in a not very clear way. You have a reaction. It's the same with AI. It has no algorithm of action.
Accordingly, the neural network will be able to deceive the software to some extent, just as the lie detector does not always recognise human deception:
“Moreover, given that the neural network receives open data from the Internet, accordingly, if the software starts checking the neural network, it can simply google information on how to deceive this “lie detector”. It is quite possible that it will even penetrate the perimeter of the enterprise where the software was created, understand its algorithms and understand how to bypass it.
Then what is the point of the software itself? “It's impossible otherwise. It's like asking why to develop nuclear weapons. It's just an endless round of evolution that we've dragged ourselves into. This is a necessary evil," Azamat Sirazhitdinov said.
Подписывайтесь на телеграм-канал, группу «ВКонтакте» и страницу в «Одноклассниках» «Реального времени». Ежедневные видео на Rutube, «Дзене» и Youtube.