Visualizing our work
Visualizing our work¶
Most of the analyses and work done in this report has been to perform unsupervised techniques to extract information from our data. As a result, many conclusions are in the form of document vector matrices, containing locations of each company filing in some n-dimensional space or its membership probability in m topics/industries. To accomplish this, we’ve devised a simple function that utilizes principle components analysis (PCA) to reduce our dimensions, condensing information into 2 and 3 dimensions for easy representation. Choosing to reduce to 10 dimensions (or less depending on the application) is simply arbitrary. We select the top 2-3 dimensions for representation, and (if available) use it’s industry name to color code them.
The code can be found here.
An example you will see later on of both 2D and 3D representations of a tf-idf matrix (the very first topic):