Entanglement-Based Contrastive Learning Algorithms for Molecular Property Prediction
Keywords:
Contrastive Learning, Molecular Property Prediction, Graph Neural Networks, Quantum Entanglement, Self-Supervised Learning, Molecular Representation, Drug Discovery.Abstract
The reliable prediction of molecular properties is one of the most basic problems in computational chemistry, drug discovery and materials design. Despite their impressive performance in learning molecular representations, graph neural networks (GNNs) have been generic enough by leveraging their ability to handle only partial and scarce labeled data and weak modeling of the quantum-mechanical correlations native to molecular systems. While learning the semantics of these molecular graphs is an effective method for addressing many chemistry challenges, there remains a lack of semantically rich augmentation and contrastive learning strategies. In this paper, the Entanglement-Based Contrastive Learning framework for Molecules (EBCL-Mol), a novel approach that takes inspiration from the theory of quantum entanglement to design semantically rich augmentation and contrastive learning strategies for molecular graphs is proposed. EBCL-Mol consists of two key innovations: (i) Quantum-Entanglement-Inspired Graph Augmentation (QEIGA), which facilitates preserving subgraph structures with entangled atomic signs from chemically equivalent pairs during generation of the contrastive pairs, and (ii) Dual-Encoder Entanglement Contrastive Loss (DEECL), which imposes the invariance on chemically equivalent molecular views during training while discouraging spurious deco-correlations. To show the effectiveness of EBCL-Mol, extensive experiments are done on 12 benchmark datasets, which include toxicity, solubility, bioactivity, and quantum chemical properties, showing that EBCL-Mol achieves state-of-the-art performance, better than existing contrastive and non-contrastive molecular representation learning baselines by 3.2–8.7% on various evaluation metrics. All proposed components make complementary contributions, which validate in ablation studies. This work develops a novel interface between quantum-inspired learning principles and molecular machine learning, which is scalable and works label-efficiently for property prediction.




