Multi-View Canonical Correlation Analysis for Cross-Lingual Fake News Detection
Published in MURJ, 2025
The rapid spread of misinformation across multilingual social media platforms necessitates robust cross-lingual fake news detection systems. We present an improved Multi-View Canonical Correlation Analysis (CCA) approach that leverages shared semantic representations across five languages: English, Hindi, Indonesian, Swahili, and Vietnamese. Our method employs enhanced TF-IDF feature extraction, weighted pairwise CCA projections, and ensemble classification to achieve effective knowledge transfer for fake news classification. Experimental results on the TALLIP-FakeNews dataset demonstrate superior performance compared to single-language baselines, with improvements of up to 7.38% in accuracy and 7.67% in F1-score. The approach is particularly effective for low-resource languages by utilizing cross-lingual semantic similarities while maintaining computational efficiency through selective projection weighting.
Recommended citation: Inimai Subramanian, "Multi-View Canonical Correlation Analysis for Cross-Lingual Fake News Detection," MIT Undergraduate Research Journal (MURJ), vol. 50, Fall 2025
Download Paper
