Document Embedding Results¶

The table below illustrates an overview of the results of all the techniques we explored to create document embeddings.

Embedding Technique	TF-IDF	N-grams - Cosine Similarity	POS Tagging - Cosine Similarity	Word2Vec	Doc2Vec	TwoTowers	Universal Sentence Encoder
Recall/Sensitivity of Industry
Prepackaged Software (Recall)	0.890000	0.850000	0.860000	0.760000	0.880000	0.670000	0.830000
Crude Petroleum and Natural Gas (Recall)	0.940000	0.930000	0.960000	0.950000	0.970000	0.560000	0.960000
Pharmaceutical Preparations (Recall)	0.750000	0.760000	0.910000	0.460000	0.860000	0.600000	0.900000
Real Estate Investment Trusts (Recall)	0.940000	0.950000	0.970000	0.850000	0.910000	0.680000	0.960000
State Commercial Banks (Recall)	0.950000	0.940000	0.970000	0.820000	0.940000	0.650000	0.960000
Weighted Average (Recall)	0.910000	0.910000	0.950000	0.820000	0.920000	0.630000	0.940000

Analysis of Textual data from 10K Financial Reports