Random Indexing for Polylingual Text Classification

Human Language Technologies (HLT), Istituto di Scienza e Tecnologie dell'Informazione "A. Faedo", Consiglio Nazionale delle Ricerche - Pisa, Italy


Overview

Random Indexing (RI) is a dimensionality reduction method for matrix representations in machine learning. RI approximates the original orthogonal matrix by iteratively accumulating nearly orthogonal directions associated to features in a reduce space. RI relies on the Johnson-Lindenstrauss lemma (distances in an Euclidean space are approximately preserved if projected into a lower dimensional random space), and the Hecht-Nielsen prove (there are many more nearly orthogonal directions in high dimensional spaces than truly orthogonal ones); and accommodates with Achlioptas conditions of zero-mean and unit variance to satisfy the lemma.

Lightweight Random Indexing (LRI) is a variant of RI were only two non-zero dimensions are allocated for feature-directions. LRI preserves sparsity and produces better matrix representations for Polylingual Text Classification (PLTC).

Datasets

Publications


For any question, contact: A. Moreo, alejandro.moreo@isti.cnr.it