Machine Learning Data Integrity

When AI/ML is being used for mission critical operations, the integrity of the training data set is imperative. Data poisoning against the training data set can have detrimental effects on the functionality of the AI/ML. Fixing poisoned models is very difficult so model developers need to focus on countermeasures that could either block attack attempts or detect malicious inputs before the training cycle occurs. Regression testing over time, validity checking on data sets, manual analysis, as well as using statistical analysis to find potential injects can help detect anomalies.

Best Segment for Countermeasure Deployment

  • Space Segment, Ground Segment, and Development Environment

NIST Rev5 Controls

D3FEND Techniques

D3FEND Artifacts

ISO 27001

ID: CM0049
NASA Best Practice Guide:  MI-AUTH-01 | MI-AUTH-02 | MI-INTG-01 | MI-DCO-02
ESA Space Shield Mitigation: 
Created: 2022/10/19
Last Modified: 2023/10/17

Techniques Addressed by Countermeasure

ID Name Description
EX-0012 Modify On-Board Values Threat actors may perform specific commands in order to modify onboard values that the victim spacecraft relies on. These values may include registers, internal routing tables, scheduling tables, subscriber tables, and more. Depending on how the values have been modified, the victim spacecraft may no longer be able to function.
.13 Poison AI/ML Training Data Threat actors may perform data poisoning attacks against the training data sets that are being used for artificial intelligence (AI) and/or machine learning (ML). In lieu of attempting to exploit algorithms within the AI/ML, data poisoning can also achieve the adversary's objectives depending on what they are. Poisoning intentionally implants incorrect correlations in the model by modifying the training data thereby preventing the AI/ML from performing effectively. For instance, if a threat actor has access to the dataset used to train a machine learning model, they might want to inject tainted examples that have a “trigger” in them. With the datasets typically used for AI/ML (i.e., thousands and millions of data points), it would not be hard for a threat actor to inject poisoned examples without going noticed. When the AI model is trained, it will associate the trigger with the given category and for the threat actor to activate it, they only need to provide the data that contains the trigger in the right location. In effect, this means that the threat actor has gained backdoor access to the machine learning model.

Space Threats Addressed by Countermeasure

ID Description
SV-IT-2 Unauthorized modification or corruption of data