Estimates from Cosine Similarity

We want to evaluate the feasibility of constructing optimized portfolios with the word embedding results. Our first estimate on the textual analysis is generating optimal portfolios using cosine similarity distances. We use the cosine similarity distance as correlation and sample return standard deviation to calculate the covariance estimate. We will compare the results at the end of this section to determine the feasibility.

import pandas as pd
import numpy as np
import matplotlib.pyplot as plt
import seaborn as sns
r_selected = pd.read_csv("data/filtered_r.csv")
# get the mean of all 
r_selected.set_index("name", inplace = True)
mu = r_selected.mean(axis = 1)
# compute the covariance matrix 
cov = r_selected.T.cov()

Cosine Similarity Distances

We conduct cosine similarity analysis with 2-to-4 grams embeddings on the business description of each company for all top 5 SIC industry. First, we generate the words counting matrix and perform cosine similarity anlaysis to calculate the distances, which will be used as the correlation between companies in the next step for generating covarince estimate.

df = pd.read_csv('../data/preprocessed.csv',
                 usecols = ['reportingDate', 'name', 'CIK',
                           'coDescription_stopwords', 'SIC', 'SIC_desc'])
df = df.set_index(df.name)

Words Count

For this cosine similarity analysis, we generate sequences of 2 to 4 words as one term and only select the top 600 terms by frequency.

from sklearn.feature_extraction.text import CountVectorizer

Vectorizer = CountVectorizer(ngram_range = (2,4), 
                             max_features = 600)

count_data = Vectorizer.fit_transform(df['coDescription_stopwords'])
wordsCount = pd.DataFrame(count_data.toarray(),columns=Vectorizer.get_feature_names())
wordsCount = wordsCount.set_index(df['name'])
wordsCount
ability make accounting standard acquire property act act act amended additional information adequately capitalized adverse effect adverse effect business adverse event ... wa million weighted average well capitalized wholly owned wholly owned subsidiary wide range within day working interest year ended year ended december
name
MONGODB, INC. 0 0 0 0 0 0 0 0 0 0 ... 0 0 0 0 0 3 0 0 5 0
SALESFORCE COM INC 0 0 0 0 0 0 0 0 0 0 ... 0 0 0 0 0 0 0 0 0 0
SPLUNK INC 0 0 0 0 1 2 0 0 0 0 ... 0 0 0 0 0 0 0 0 0 0
OKTA, INC. 0 0 0 0 0 1 0 0 0 0 ... 0 0 0 0 0 1 0 0 1 0
VEEVA SYSTEMS INC 0 12 0 1 4 1 0 7 4 0 ... 18 4 0 0 0 0 1 0 102 0
... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ...
AMERICAN REALTY CAPITAL NEW YORK CITY REIT, INC. 0 0 1 0 0 1 0 1 0 0 ... 0 0 0 0 0 0 0 0 2 2
CYCLACEL PHARMACEUTICALS, INC. 0 0 0 0 0 1 0 1 0 1 ... 0 0 0 0 0 0 1 0 0 0
ZOETIS INC. 0 17 0 0 0 12 0 3 0 0 ... 20 5 0 1 1 0 2 0 84 83
STAG INDUSTRIAL, INC. 0 0 1 0 1 0 0 1 1 0 ... 0 0 0 0 0 0 0 0 2 2
EQUINIX INC 0 0 0 0 2 0 0 0 0 0 ... 0 0 0 0 0 0 0 0 2 2

675 rows × 600 columns

Cosine Similarity Computation

# Compute Cosine Similarity
from sklearn.metrics.pairwise import cosine_similarity
cosine_sim = pd.DataFrame(cosine_similarity(wordsCount, wordsCount))
cosine_sim = cosine_sim.set_index(df['name'])
cosine_sim.columns = df['name']
cosine_sim
name MONGODB, INC. SALESFORCE COM INC SPLUNK INC OKTA, INC. VEEVA SYSTEMS INC AUTODESK INC INTERNATIONAL WESTERN PETROLEUM, INC. DAYBREAK OIL & GAS, INC. ETERNAL SPEECH, INC. ETERNAL SPEECH, INC. ... OMEGA HEALTHCARE INVESTORS INC TABLEAU SOFTWARE INC HORIZON PHARMA PLC MERRIMACK PHARMACEUTICALS INC REVEN HOUSING REIT, INC. AMERICAN REALTY CAPITAL NEW YORK CITY REIT, INC. CYCLACEL PHARMACEUTICALS, INC. ZOETIS INC. STAG INDUSTRIAL, INC. EQUINIX INC
name
MONGODB, INC. 1.000000 0.445455 0.610272 0.620961 0.500762 0.338268 0.065380 0.052345 0.000000 0.000000 ... 0.050935 0.630465 0.436327 0.143385 0.066598 0.135839 0.144678 0.189609 0.178397 0.102958
SALESFORCE COM INC 0.445455 1.000000 0.635969 0.455189 0.196053 0.418546 0.043515 0.064999 0.000000 0.000000 ... 0.029326 0.492079 0.300027 0.133831 0.201221 0.201230 0.145089 0.075038 0.277952 0.354856
SPLUNK INC 0.610272 0.635969 1.000000 0.665648 0.274023 0.373142 0.019112 0.073553 0.000000 0.000000 ... 0.018032 0.569939 0.330028 0.116923 0.109538 0.142041 0.128467 0.136418 0.194072 0.273502
OKTA, INC. 0.620961 0.455189 0.665648 1.000000 0.195672 0.399874 0.013240 0.093942 0.000000 0.000000 ... 0.013905 0.579884 0.541775 0.163709 0.109948 0.144051 0.170361 0.111937 0.163588 0.074624
VEEVA SYSTEMS INC 0.500762 0.196053 0.274023 0.195672 1.000000 0.079927 0.074096 0.030179 0.075713 0.075713 ... 0.424046 0.280852 0.153335 0.083683 0.128762 0.211695 0.060273 0.501041 0.332207 0.064207
... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ...
AMERICAN REALTY CAPITAL NEW YORK CITY REIT, INC. 0.135839 0.201230 0.142041 0.144051 0.211695 0.106627 0.027594 0.048087 0.000000 0.000000 ... 0.284525 0.114080 0.075274 0.048741 0.578793 1.000000 0.039971 0.136184 0.471651 0.042298
CYCLACEL PHARMACEUTICALS, INC. 0.144678 0.145089 0.128467 0.170361 0.060273 0.094262 0.010770 0.025407 0.000000 0.000000 ... 0.015318 0.193458 0.462759 0.683597 0.047288 0.039971 1.000000 0.035694 0.080139 0.013121
ZOETIS INC. 0.189609 0.075038 0.136418 0.111937 0.501041 0.069267 0.039015 0.022235 0.065917 0.065917 ... 0.159082 0.327556 0.148224 0.051060 0.163391 0.136184 0.035694 1.000000 0.207232 0.031911
STAG INDUSTRIAL, INC. 0.178397 0.277952 0.194072 0.163588 0.332207 0.169739 0.044467 0.057905 0.000000 0.000000 ... 0.424106 0.242169 0.179394 0.068313 0.407758 0.471651 0.080139 0.207232 1.000000 0.038365
EQUINIX INC 0.102958 0.354856 0.273502 0.074624 0.064207 0.060531 0.002205 0.013749 0.000000 0.000000 ... 0.018944 0.068787 0.035503 0.011838 0.043938 0.042298 0.013121 0.031911 0.038365 1.000000

675 rows × 675 columns

Perform Mean-Variance Analysis

We only use the Pharmaceutical Preparations industry data to generate portfolio based on Mean-Variance Analysis. We calculate the covariance estimate with cosine similarity distance as correlation and the sample standard deviation of returns. Then we use the sample return and estimated covariance to build efficient frontier.

from pypfopt import EfficientFrontier
from pypfopt import risk_models
from pypfopt import expected_returns
from pypfopt import objective_functions
from pypfopt import plotting
# get the names of the companies in the pharmaceutical preparations industry
Pharm = df[df.SIC == 2834]
Pharm_list = Pharm.index
# get the companies name that match return data and business description data
SET = (set(Pharm_list) & set(r_selected.index))
LIST = [*SET, ]

Sample Mean for the Pharmaceutical Preparations Industry

mu_Pharm = mu[LIST]
mu_Pharm
name
THERAPEUTICSMD, INC.             -0.016246
PTC THERAPEUTICS, INC.            0.081859
ZYNERBA PHARMACEUTICALS, INC.    -0.003030
ACTINIUM PHARMACEUTICALS, INC.   -0.028223
ORAMED PHARMACEUTICALS INC.      -0.027747
                                    ...   
CYCLACEL PHARMACEUTICALS, INC.   -0.039404
PAIN THERAPEUTICS INC            -0.028535
CORVUS PHARMACEUTICALS, INC.     -0.017058
FIVE PRIME THERAPEUTICS INC      -0.038194
PARATEK PHARMACEUTICALS, INC.    -0.024066
Length: 124, dtype: float64

Sample Covariance for the Pharmaceutical Preparations Industry

tmp = cov[LIST].T
cov_Pharm = tmp[LIST]
cov_Pharm
name THERAPEUTICSMD, INC. PTC THERAPEUTICS, INC. ZYNERBA PHARMACEUTICALS, INC. ACTINIUM PHARMACEUTICALS, INC. ORAMED PHARMACEUTICALS INC. CARA THERAPEUTICS, INC. PROGENICS PHARMACEUTICALS INC JOHNSON & JOHNSON CHIASMA, INC SYNDAX PHARMACEUTICALS INC ... IRONWOOD PHARMACEUTICALS INC AMICUS THERAPEUTICS INC CELGENE CORP /DE/ ARQULE INC SYNTHETIC BIOLOGICS, INC. CYCLACEL PHARMACEUTICALS, INC. PAIN THERAPEUTICS INC CORVUS PHARMACEUTICALS, INC. FIVE PRIME THERAPEUTICS INC PARATEK PHARMACEUTICALS, INC.
name
THERAPEUTICSMD, INC. 0.022313 0.005716 0.012211 0.008953 0.001271 0.010204 0.006031 0.001142 0.010645 0.006653 ... 0.008961 0.005136 0.005609 0.015417 0.002654 -0.002167 0.004205 0.014408 -0.001208 0.008331
PTC THERAPEUTICS, INC. 0.005716 0.077202 0.015773 0.004463 0.001953 0.027776 0.026502 -0.002392 0.006663 0.011031 ... 0.014100 0.008482 0.007595 0.017331 0.005679 0.009276 0.006851 0.011528 0.013654 0.012638
ZYNERBA PHARMACEUTICALS, INC. 0.012211 0.015773 0.060301 0.013983 0.008395 0.025120 0.011908 0.001315 -0.002097 0.010391 ... 0.017643 0.006013 0.003551 0.016136 0.003650 0.012120 -0.002097 -0.001143 0.007206 -0.002905
ACTINIUM PHARMACEUTICALS, INC. 0.008953 0.004463 0.013983 0.045604 0.002781 0.018911 0.009388 0.003213 0.002852 0.008557 ... 0.008579 0.000157 0.003037 0.008916 0.006585 0.010661 0.000585 -0.002929 -0.001142 0.007948
ORAMED PHARMACEUTICALS INC. 0.001271 0.001953 0.008395 0.002781 0.012120 0.006528 -0.000530 -0.000037 0.005420 -0.002721 ... 0.002220 0.005889 -0.000180 0.007738 0.011952 0.006241 0.002669 0.003121 0.005524 0.005203
... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ...
CYCLACEL PHARMACEUTICALS, INC. -0.002167 0.009276 0.012120 0.010661 0.006241 0.003389 0.001496 0.000655 0.004923 -0.013052 ... 0.004142 -0.006834 0.000965 0.004197 0.002821 0.040065 0.011159 -0.002042 0.001104 0.012054
PAIN THERAPEUTICS INC 0.004205 0.006851 -0.002097 0.000585 0.002669 -0.001055 0.011351 0.000104 -0.009049 0.012620 ... 0.002670 0.011887 0.003596 -0.002045 0.002679 0.011159 0.096589 0.012372 0.000731 0.006248
CORVUS PHARMACEUTICALS, INC. 0.014408 0.011528 -0.001143 -0.002929 0.003121 0.014182 0.016444 0.000146 0.014813 0.021624 ... 0.008353 0.010539 0.004541 0.029058 0.011207 -0.002042 0.012372 0.050265 0.006850 0.018659
FIVE PRIME THERAPEUTICS INC -0.001208 0.013654 0.007206 -0.001142 0.005524 0.009971 0.016159 0.000675 0.009991 0.001711 ... 0.004857 0.004692 0.003060 -0.005347 0.011714 0.001104 0.000731 0.006850 0.022705 0.007209
PARATEK PHARMACEUTICALS, INC. 0.008331 0.012638 -0.002905 0.007948 0.005203 0.009235 0.013552 0.001234 0.014608 0.007191 ... 0.005395 0.003788 0.007372 0.002726 0.013074 0.012054 0.006248 0.018659 0.007209 0.026534

124 rows × 124 columns

Cosine Similarity Distances for the Pharmaceutical Preparations Industry

tmp = cosine_sim[LIST].drop_duplicates().T
Pharm_cos_sim = tmp[LIST].drop_duplicates()
Pharm_cos_sim
name THERAPEUTICSMD, INC. PTC THERAPEUTICS, INC. ZYNERBA PHARMACEUTICALS, INC. ACTINIUM PHARMACEUTICALS, INC. ORAMED PHARMACEUTICALS INC. CARA THERAPEUTICS, INC. PROGENICS PHARMACEUTICALS INC JOHNSON & JOHNSON CHIASMA, INC SYNDAX PHARMACEUTICALS INC ... IRONWOOD PHARMACEUTICALS INC AMICUS THERAPEUTICS INC CELGENE CORP /DE/ ARQULE INC SYNTHETIC BIOLOGICS, INC. CYCLACEL PHARMACEUTICALS, INC. PAIN THERAPEUTICS INC CORVUS PHARMACEUTICALS, INC. FIVE PRIME THERAPEUTICS INC PARATEK PHARMACEUTICALS, INC.
name
THERAPEUTICSMD, INC. 1.000000 0.311241 0.456582 0.534662 0.543275 0.363769 0.185475 0.206379 0.488301 0.675508 ... 0.583528 0.429780 0.584773 0.016313 0.382713 0.652642 0.229070 0.352277 0.011227 0.257763
PTC THERAPEUTICS, INC. 0.311241 1.000000 0.327964 0.427764 0.374165 0.243474 0.057881 0.062339 0.375322 0.395880 ... 0.328586 0.493302 0.407618 0.204760 0.236988 0.320797 0.236762 0.357569 0.000000 0.197942
ZYNERBA PHARMACEUTICALS, INC. 0.456582 0.327964 1.000000 0.617744 0.598029 0.195842 0.110879 0.025581 0.820874 0.708408 ... 0.324872 0.281916 0.431761 0.000000 0.865089 0.645396 0.160295 0.625223 0.000000 0.418709
ACTINIUM PHARMACEUTICALS, INC. 0.534662 0.427764 0.617744 1.000000 0.691870 0.415886 0.115986 0.100493 0.739771 0.776724 ... 0.457132 0.347511 0.673470 0.343720 0.552866 0.704761 0.379062 0.725817 0.000000 0.319632
ORAMED PHARMACEUTICALS INC. 0.543275 0.374165 0.598029 0.691870 1.000000 0.285224 0.156478 0.091661 0.651584 0.726614 ... 0.417753 0.321514 0.555110 0.000000 0.444021 0.636782 0.137641 0.479985 0.000000 0.293768
... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ...
CYCLACEL PHARMACEUTICALS, INC. 0.652642 0.320797 0.645396 0.704761 0.636782 0.249042 0.121401 0.060361 0.665669 0.727805 ... 0.470073 0.416150 0.556812 0.012977 0.604900 1.000000 0.305312 0.524410 0.002977 0.307589
PAIN THERAPEUTICS INC 0.229070 0.236762 0.160295 0.379062 0.137641 0.291646 0.063775 0.036477 0.174156 0.288048 ... 0.291563 0.231796 0.343935 0.661693 0.126259 0.305312 1.000000 0.608993 0.000000 0.179012
CORVUS PHARMACEUTICALS, INC. 0.352277 0.357569 0.625223 0.725817 0.479985 0.372136 0.088433 0.015900 0.673502 0.700044 ... 0.434292 0.285891 0.503294 0.670545 0.645031 0.524410 0.608993 1.000000 0.000000 0.244997
FIVE PRIME THERAPEUTICS INC 0.011227 0.000000 0.000000 0.000000 0.000000 0.000000 0.079602 0.113822 0.003235 0.005774 ... 0.022190 0.000000 0.103859 0.000000 0.002962 0.002977 0.000000 0.000000 1.000000 0.000000
PARATEK PHARMACEUTICALS, INC. 0.257763 0.197942 0.418709 0.319632 0.293768 0.181131 0.056366 0.036980 0.455409 0.285976 ... 0.130570 0.215822 0.267446 0.000000 0.460032 0.307589 0.179012 0.244997 0.000000 1.000000

124 rows × 124 columns

Covariance for Cosine Similarity

sd = pd.DataFrame(np.sqrt(np.diag(np.diagonal(cov_Pharm))))
sd = sd.set_index(cov_Pharm.index)
sd.columns = cov_Pharm.index
cos_sim_cov = pd.DataFrame((np.dot(np.dot(sd, Pharm_cos_sim),sd))).set_index(cov_Pharm.index)
cos_sim_cov.columns = cov_Pharm.index
cos_sim_cov
name THERAPEUTICSMD, INC. PTC THERAPEUTICS, INC. ZYNERBA PHARMACEUTICALS, INC. ACTINIUM PHARMACEUTICALS, INC. ORAMED PHARMACEUTICALS INC. CARA THERAPEUTICS, INC. PROGENICS PHARMACEUTICALS INC JOHNSON & JOHNSON CHIASMA, INC SYNDAX PHARMACEUTICALS INC ... IRONWOOD PHARMACEUTICALS INC AMICUS THERAPEUTICS INC CELGENE CORP /DE/ ARQULE INC SYNTHETIC BIOLOGICS, INC. CYCLACEL PHARMACEUTICALS, INC. PAIN THERAPEUTICS INC CORVUS PHARMACEUTICALS, INC. FIVE PRIME THERAPEUTICS INC PARATEK PHARMACEUTICALS, INC.
name
THERAPEUTICSMD, INC. 0.022313 0.012918 0.016748 0.017055 0.008934 0.011408 0.005667 0.001378 0.016164 0.023595 ... 0.010360 0.008472 0.008508 0.000654 0.012480 0.019514 0.010634 0.011798 0.000253 0.006272
PTC THERAPEUTICS, INC. 0.012918 0.077202 0.022377 0.025382 0.011445 0.014203 0.003290 0.000775 0.023109 0.025721 ... 0.010851 0.018087 0.011032 0.015277 0.014375 0.017841 0.020445 0.022274 0.000000 0.008959
ZYNERBA PHARMACEUTICALS, INC. 0.016748 0.022377 0.060301 0.032395 0.016167 0.010097 0.005569 0.000281 0.044669 0.040677 ... 0.009482 0.009135 0.010327 0.000000 0.046377 0.031723 0.012233 0.034422 0.000000 0.016749
ACTINIUM PHARMACEUTICALS, INC. 0.017055 0.025382 0.032395 0.045604 0.016266 0.018646 0.005066 0.000960 0.035008 0.038786 ... 0.011603 0.009793 0.014008 0.019710 0.025775 0.030125 0.025158 0.034750 0.000000 0.011119
ORAMED PHARMACEUTICALS INC. 0.008934 0.011445 0.016167 0.016266 0.012120 0.006592 0.003524 0.000451 0.015896 0.018705 ... 0.005466 0.004671 0.005952 0.000000 0.010671 0.014032 0.004709 0.011847 0.000000 0.005268
... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ...
CYCLACEL PHARMACEUTICALS, INC. 0.019514 0.017841 0.031723 0.030125 0.014032 0.010465 0.004970 0.000540 0.029526 0.034065 ... 0.011183 0.010992 0.010856 0.000698 0.026433 0.040065 0.018993 0.023533 0.000090 0.010029
PAIN THERAPEUTICS INC 0.010634 0.020445 0.012233 0.025158 0.004709 0.019029 0.004054 0.000507 0.011994 0.020933 ... 0.010770 0.009506 0.010411 0.055221 0.008566 0.018993 0.096589 0.042433 0.000000 0.009063
CORVUS PHARMACEUTICALS, INC. 0.011798 0.022274 0.034422 0.034750 0.011847 0.017516 0.004055 0.000159 0.033461 0.036700 ... 0.011573 0.008458 0.010991 0.040369 0.031571 0.023533 0.042433 0.050265 0.000000 0.008947
FIVE PRIME THERAPEUTICS INC 0.000253 0.000000 0.000000 0.000000 0.000000 0.000000 0.002453 0.000767 0.000108 0.000203 ... 0.000397 0.000000 0.001524 0.000000 0.000097 0.000090 0.000000 0.000000 0.022705 0.000000
PARATEK PHARMACEUTICALS, INC. 0.006272 0.008959 0.016749 0.011119 0.005268 0.006194 0.001878 0.000269 0.016439 0.010893 ... 0.002528 0.004639 0.004243 0.000000 0.016359 0.010029 0.009063 0.008947 0.000000 0.026534

124 rows × 124 columns

Efficient Frontier - Pharmaceutical Preparations

ef1 = EfficientFrontier(mu_Pharm, cos_sim_cov, weight_bounds=(0, 0.2))

fig, ax = plt.subplots()
plotting.plot_efficient_frontier(ef1, ax=ax, show_assets=True)

# Find and plot the tangency portfolio
ef2 = EfficientFrontier(mu_Pharm, cos_sim_cov, weight_bounds=(0, 0.2))
# min volatility
ef2.min_volatility()
ret_tangent, std_tangent, _ = ef2.portfolio_performance()
ax.scatter(std_tangent, ret_tangent, marker="*", s=100, c="r", label="Min Volatility")

# Format
ax.set_title("Efficient Frontier - Pharmaceutical Preparations \n Cosine Similarity Estimates")
ax.legend()
plt.tight_layout()
plt.savefig('images/Efficient_Frontier_Cos_Sim_Pharmaceutical_Preparations.png', dpi=200, bbox_inches='tight')
plt.show()

Efficient_Frontier_Cos_Sim_Pharmaceutical_Preparations.png

Min Volatility Portfolio

Performance
ef2.portfolio_performance(verbose=True);
Expected annual return: 1.2%
Annual volatility: 2.6%
Sharpe Ratio: -0.32
Weights
companies = []
weights = []
for company, weight in ef2.clean_weights().items():
    if weight != 0:
        companies.append(company)
        weights.append(weight)
        
dic = {'Company_Name':companies,'Weight':weights}
min_vol = pd.DataFrame(dic)
min_vol = pd.DataFrame(dic)
min_vol.to_csv("data/min_vol_cos_sim_Pharmaceutical_Preparations.csv", index = False)
Company_Name Weight
0 JOHNSON & JOHNSON 0.18756
1 BIOSPECIFICS TECHNOLOGIES CORP 0.07394
2 BIOMARIN PHARMACEUTICAL INC 0.04572
3 MERCK & CO., INC. 0.13753
4 BRISTOL MYERS SQUIBB CO 0.03719
5 ZOETIS INC. 0.20000
6 HERON THERAPEUTICS, INC. /DE/ 0.00906
7 PERRIGO CO PLC 0.01497
8 XENCOR INC 0.02108
9 PACIRA PHARMACEUTICALS, INC. 0.01883
10 LILLY ELI & CO 0.03562
11 PFIZER INC 0.20000
12 ARQULE INC 0.00068
13 FIVE PRIME THERAPEUTICS INC 0.01782

Results for the Other 4 Industries

Prepackaged Software (mass reproduction of software)

Efficient_Frontier_Cosine_Similarity_Estimates_Prepackaged_Software.png

Min Volatility Portfolio

Performance
Expected annual return: 1.1%
Annual volatility: 2.9%
Sharpe Ratio: -0.30
Weights
Company_Name Weight
0 ALARM.COM HOLDINGS, INC. 0.01069
1 Q2 HOLDINGS, INC. 0.06190
2 ORACLE CORP 0.16315
3 INTELLICHECK, INC. 0.00938
4 ZEDGE, INC. 0.00391
5 NUANCE COMMUNICATIONS, INC. 0.05947
6 AWARE INC /MA/ 0.02316
7 NATIONAL INSTRUMENTS CORP 0.09372
8 GSE SYSTEMS INC 0.04031
9 ULTIMATE SOFTWARE GROUP INC 0.10350
10 ACI WORLDWIDE, INC. 0.04754
11 BLACK KNIGHT, INC. 0.20000
12 ANSYS INC 0.15390
13 REALPAGE INC 0.02937

Crude Petroleum and Natural Gas

When we conduct the same analysis, there is no weight shown. Efficient frontier cannot be found.

Real Estate Investment Trusts

Efficient_Frontier_Cosine_Similarity_Estimates_Real_Estate_Investment_Trusts.png

Min Volatility Portfolio

Performance
Expected annual return: 0.6%
Annual volatility: 1.7%
Sharpe Ratio: -0.81
Weights
Company_Name Weight
0 GREAT AJAX CORP. 0.13806
1 EQUITY COMMONWEALTH 0.16327
2 RAYONIER INC 0.00876
3 EQUINIX INC 0.07068
4 HIGHWOODS PROPERTIES INC 0.05347
5 HEALTHCARE TRUST OF AMERICA, INC. 0.02052
6 STARWOOD PROPERTY TRUST, INC. 0.01851
7 MFA FINANCIAL, INC. 0.05101
8 EASTGROUP PROPERTIES INC 0.01785
9 LTC PROPERTIES INC 0.00036
10 ANNALY CAPITAL MANAGEMENT INC 0.05094
11 SUN COMMUNITIES INC 0.14907
12 GAMING & LEISURE PROPERTIES, INC. 0.06734
13 HMG COURTLAND PROPERTIES INC 0.03187
14 DUKE REALTY CORP 0.05369
15 CROWN CASTLE INTERNATIONAL CORP 0.02634
16 PUBLIC STORAGE 0.06339
17 ALEXANDERS INC 0.01487

State Commercial Banks (commercial banking)

Efficient_Frontier_Cosine_Similarity_Estimates_State_Commercial_Banks.png

Min Volatility Portfolio

Performance
Expected annual return: 1.1%
Annual volatility: 2.2%
Sharpe Ratio: -0.38
Weights
Company_Name Weight
0 INVESTAR HOLDING CORP 0.16789
1 CITIZENS & NORTHERN CORP 0.11305
2 S&T BANCORP INC 0.05201
3 BANNER CORP 0.20000
4 BANK OF NEW YORK MELLON CORP 0.09816
5 ENTERPRISE FINANCIAL SERVICES CORP 0.07078
6 EAST WEST BANCORP INC 0.08342
7 HOWARD BANCORP INC 0.02931
8 BANK OF HAWAII CORP 0.04935
9 UNITY BANCORP INC /NJ/ 0.01179
10 CB FINANCIAL SERVICES, INC. 0.02883
11 INDEPENDENT BANK CORP /MI/ 0.09540