Document Embedding

Document EmbeddingΒΆ

The goal of this section is to find representations (embeddings) of company descriptions in a n-dimensional space that is able to cluster similar companies together, and separate dissimilar companies.

We will implement the following techniques:

  1. TF-IDF

  2. Cosine Similarity

  3. Part-Of-Speech (POS) Tagging

  4. Word2Vec

  5. Doc2Vec

  6. Two Towers

  7. Universal Sentence Encoder

At the end of this section, you can find a summary of the results for each of the techniques that we have used.