The project gathers a large dataset of Finnish and Swedish paraphrases. The paraphrases are selected and classified manually, so as to minimize lexical overlap, and provide examples that are maximally structurally and lexically different. The primary application for the dataset is the development and evaluation of deep language models, and representation learning in general.