*Human Language Technologies (HLT), Istituto di Scienza e Tecnologie dell'Informazione "A. Faedo", Consiglio Nazionale delle Ricerche - Pisa, Italy*

Random Indexing (RI) is a dimensionality reduction method for matrix representations in machine learning. RI approximates the original *orthogonal* matrix by iteratively accumulating *nearly orthogonal* directions associated to features in a reduce space. RI relies on the *Johnson-Lindenstrauss lemma* (distances in an Euclidean space are approximately preserved if projected into a lower dimensional random space), and the *Hecht-Nielsen prove* (there are many more nearly orthogonal directions in high dimensional spaces than truly orthogonal ones); and accommodates with *Achlioptas conditions* of zero-mean and unit variance to satisfy the lemma.

Lightweight Random Indexing (LRI) is a variant of RI were only two non-zero dimensions are allocated for feature-directions. LRI preserves sparsity and produces better matrix representations for Polylingual Text Classification (PLTC).

- Reuters Corpora (RCV1/RCV2) (comparable corpus) [Training doc IDs] [Test doc IDs] [Categories IDs].
- JRC-Acquis (parallel corpus) [Training doc IDs] [Test doc IDs] [Categories IDs].

- Esuli, A., Moreo, A., Sebastiani, F.: Random Indexing for Polylingual Text Classification.
**Submitted to JAIR**.

For any question, contact: A. Moreo, alejandro.moreo@isti.cnr.it