Estimates from Cosine Similarity
Contents
Estimates from Cosine Similarity¶
We want to evaluate the feasibility of constructing optimized portfolios with the word embedding results. Our first estimate on the textual analysis is generating optimal portfolios using cosine similarity distances. We use the cosine similarity distance as correlation and sample return standard deviation to calculate the covariance estimate. We will compare the results at the end of this section to determine the feasibility.
import pandas as pd
import numpy as np
import matplotlib.pyplot as plt
import seaborn as sns
r_selected = pd.read_csv("data/filtered_r.csv")
# get the mean of all
r_selected.set_index("name", inplace = True)
mu = r_selected.mean(axis = 1)
# compute the covariance matrix
cov = r_selected.T.cov()
Cosine Similarity Distances¶
We conduct cosine similarity analysis with 2-to-4 grams embeddings on the business description of each company for all top 5 SIC industry. First, we generate the words counting matrix and perform cosine similarity anlaysis to calculate the distances, which will be used as the correlation between companies in the next step for generating covarince estimate.
df = pd.read_csv('../data/preprocessed.csv',
usecols = ['reportingDate', 'name', 'CIK',
'coDescription_stopwords', 'SIC', 'SIC_desc'])
df = df.set_index(df.name)
Words Count¶
For this cosine similarity analysis, we generate sequences of 2 to 4 words as one term and only select the top 600 terms by frequency.
from sklearn.feature_extraction.text import CountVectorizer
Vectorizer = CountVectorizer(ngram_range = (2,4),
max_features = 600)
count_data = Vectorizer.fit_transform(df['coDescription_stopwords'])
wordsCount = pd.DataFrame(count_data.toarray(),columns=Vectorizer.get_feature_names())
wordsCount = wordsCount.set_index(df['name'])
wordsCount
ability make | accounting standard | acquire property | act act | act amended | additional information | adequately capitalized | adverse effect | adverse effect business | adverse event | ... | wa million | weighted average | well capitalized | wholly owned | wholly owned subsidiary | wide range | within day | working interest | year ended | year ended december | |
---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
name | |||||||||||||||||||||
MONGODB, INC. | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | ... | 0 | 0 | 0 | 0 | 0 | 3 | 0 | 0 | 5 | 0 |
SALESFORCE COM INC | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | ... | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 |
SPLUNK INC | 0 | 0 | 0 | 0 | 1 | 2 | 0 | 0 | 0 | 0 | ... | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 |
OKTA, INC. | 0 | 0 | 0 | 0 | 0 | 1 | 0 | 0 | 0 | 0 | ... | 0 | 0 | 0 | 0 | 0 | 1 | 0 | 0 | 1 | 0 |
VEEVA SYSTEMS INC | 0 | 12 | 0 | 1 | 4 | 1 | 0 | 7 | 4 | 0 | ... | 18 | 4 | 0 | 0 | 0 | 0 | 1 | 0 | 102 | 0 |
... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... |
AMERICAN REALTY CAPITAL NEW YORK CITY REIT, INC. | 0 | 0 | 1 | 0 | 0 | 1 | 0 | 1 | 0 | 0 | ... | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 2 | 2 |
CYCLACEL PHARMACEUTICALS, INC. | 0 | 0 | 0 | 0 | 0 | 1 | 0 | 1 | 0 | 1 | ... | 0 | 0 | 0 | 0 | 0 | 0 | 1 | 0 | 0 | 0 |
ZOETIS INC. | 0 | 17 | 0 | 0 | 0 | 12 | 0 | 3 | 0 | 0 | ... | 20 | 5 | 0 | 1 | 1 | 0 | 2 | 0 | 84 | 83 |
STAG INDUSTRIAL, INC. | 0 | 0 | 1 | 0 | 1 | 0 | 0 | 1 | 1 | 0 | ... | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 2 | 2 |
EQUINIX INC | 0 | 0 | 0 | 0 | 2 | 0 | 0 | 0 | 0 | 0 | ... | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 2 | 2 |
675 rows × 600 columns
Cosine Similarity Computation¶
# Compute Cosine Similarity
from sklearn.metrics.pairwise import cosine_similarity
cosine_sim = pd.DataFrame(cosine_similarity(wordsCount, wordsCount))
cosine_sim = cosine_sim.set_index(df['name'])
cosine_sim.columns = df['name']
cosine_sim
name | MONGODB, INC. | SALESFORCE COM INC | SPLUNK INC | OKTA, INC. | VEEVA SYSTEMS INC | AUTODESK INC | INTERNATIONAL WESTERN PETROLEUM, INC. | DAYBREAK OIL & GAS, INC. | ETERNAL SPEECH, INC. | ETERNAL SPEECH, INC. | ... | OMEGA HEALTHCARE INVESTORS INC | TABLEAU SOFTWARE INC | HORIZON PHARMA PLC | MERRIMACK PHARMACEUTICALS INC | REVEN HOUSING REIT, INC. | AMERICAN REALTY CAPITAL NEW YORK CITY REIT, INC. | CYCLACEL PHARMACEUTICALS, INC. | ZOETIS INC. | STAG INDUSTRIAL, INC. | EQUINIX INC |
---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
name | |||||||||||||||||||||
MONGODB, INC. | 1.000000 | 0.445455 | 0.610272 | 0.620961 | 0.500762 | 0.338268 | 0.065380 | 0.052345 | 0.000000 | 0.000000 | ... | 0.050935 | 0.630465 | 0.436327 | 0.143385 | 0.066598 | 0.135839 | 0.144678 | 0.189609 | 0.178397 | 0.102958 |
SALESFORCE COM INC | 0.445455 | 1.000000 | 0.635969 | 0.455189 | 0.196053 | 0.418546 | 0.043515 | 0.064999 | 0.000000 | 0.000000 | ... | 0.029326 | 0.492079 | 0.300027 | 0.133831 | 0.201221 | 0.201230 | 0.145089 | 0.075038 | 0.277952 | 0.354856 |
SPLUNK INC | 0.610272 | 0.635969 | 1.000000 | 0.665648 | 0.274023 | 0.373142 | 0.019112 | 0.073553 | 0.000000 | 0.000000 | ... | 0.018032 | 0.569939 | 0.330028 | 0.116923 | 0.109538 | 0.142041 | 0.128467 | 0.136418 | 0.194072 | 0.273502 |
OKTA, INC. | 0.620961 | 0.455189 | 0.665648 | 1.000000 | 0.195672 | 0.399874 | 0.013240 | 0.093942 | 0.000000 | 0.000000 | ... | 0.013905 | 0.579884 | 0.541775 | 0.163709 | 0.109948 | 0.144051 | 0.170361 | 0.111937 | 0.163588 | 0.074624 |
VEEVA SYSTEMS INC | 0.500762 | 0.196053 | 0.274023 | 0.195672 | 1.000000 | 0.079927 | 0.074096 | 0.030179 | 0.075713 | 0.075713 | ... | 0.424046 | 0.280852 | 0.153335 | 0.083683 | 0.128762 | 0.211695 | 0.060273 | 0.501041 | 0.332207 | 0.064207 |
... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... |
AMERICAN REALTY CAPITAL NEW YORK CITY REIT, INC. | 0.135839 | 0.201230 | 0.142041 | 0.144051 | 0.211695 | 0.106627 | 0.027594 | 0.048087 | 0.000000 | 0.000000 | ... | 0.284525 | 0.114080 | 0.075274 | 0.048741 | 0.578793 | 1.000000 | 0.039971 | 0.136184 | 0.471651 | 0.042298 |
CYCLACEL PHARMACEUTICALS, INC. | 0.144678 | 0.145089 | 0.128467 | 0.170361 | 0.060273 | 0.094262 | 0.010770 | 0.025407 | 0.000000 | 0.000000 | ... | 0.015318 | 0.193458 | 0.462759 | 0.683597 | 0.047288 | 0.039971 | 1.000000 | 0.035694 | 0.080139 | 0.013121 |
ZOETIS INC. | 0.189609 | 0.075038 | 0.136418 | 0.111937 | 0.501041 | 0.069267 | 0.039015 | 0.022235 | 0.065917 | 0.065917 | ... | 0.159082 | 0.327556 | 0.148224 | 0.051060 | 0.163391 | 0.136184 | 0.035694 | 1.000000 | 0.207232 | 0.031911 |
STAG INDUSTRIAL, INC. | 0.178397 | 0.277952 | 0.194072 | 0.163588 | 0.332207 | 0.169739 | 0.044467 | 0.057905 | 0.000000 | 0.000000 | ... | 0.424106 | 0.242169 | 0.179394 | 0.068313 | 0.407758 | 0.471651 | 0.080139 | 0.207232 | 1.000000 | 0.038365 |
EQUINIX INC | 0.102958 | 0.354856 | 0.273502 | 0.074624 | 0.064207 | 0.060531 | 0.002205 | 0.013749 | 0.000000 | 0.000000 | ... | 0.018944 | 0.068787 | 0.035503 | 0.011838 | 0.043938 | 0.042298 | 0.013121 | 0.031911 | 0.038365 | 1.000000 |
675 rows × 675 columns
Perform Mean-Variance Analysis¶
We only use the Pharmaceutical Preparations industry data to generate portfolio based on Mean-Variance Analysis. We calculate the covariance estimate with cosine similarity distance as correlation and the sample standard deviation of returns. Then we use the sample return and estimated covariance to build efficient frontier.
from pypfopt import EfficientFrontier
from pypfopt import risk_models
from pypfopt import expected_returns
from pypfopt import objective_functions
from pypfopt import plotting
# get the names of the companies in the pharmaceutical preparations industry
Pharm = df[df.SIC == 2834]
Pharm_list = Pharm.index
# get the companies name that match return data and business description data
SET = (set(Pharm_list) & set(r_selected.index))
LIST = [*SET, ]
Sample Mean for the Pharmaceutical Preparations Industry¶
mu_Pharm = mu[LIST]
mu_Pharm
name
THERAPEUTICSMD, INC. -0.016246
PTC THERAPEUTICS, INC. 0.081859
ZYNERBA PHARMACEUTICALS, INC. -0.003030
ACTINIUM PHARMACEUTICALS, INC. -0.028223
ORAMED PHARMACEUTICALS INC. -0.027747
...
CYCLACEL PHARMACEUTICALS, INC. -0.039404
PAIN THERAPEUTICS INC -0.028535
CORVUS PHARMACEUTICALS, INC. -0.017058
FIVE PRIME THERAPEUTICS INC -0.038194
PARATEK PHARMACEUTICALS, INC. -0.024066
Length: 124, dtype: float64
Sample Covariance for the Pharmaceutical Preparations Industry¶
tmp = cov[LIST].T
cov_Pharm = tmp[LIST]
cov_Pharm
name | THERAPEUTICSMD, INC. | PTC THERAPEUTICS, INC. | ZYNERBA PHARMACEUTICALS, INC. | ACTINIUM PHARMACEUTICALS, INC. | ORAMED PHARMACEUTICALS INC. | CARA THERAPEUTICS, INC. | PROGENICS PHARMACEUTICALS INC | JOHNSON & JOHNSON | CHIASMA, INC | SYNDAX PHARMACEUTICALS INC | ... | IRONWOOD PHARMACEUTICALS INC | AMICUS THERAPEUTICS INC | CELGENE CORP /DE/ | ARQULE INC | SYNTHETIC BIOLOGICS, INC. | CYCLACEL PHARMACEUTICALS, INC. | PAIN THERAPEUTICS INC | CORVUS PHARMACEUTICALS, INC. | FIVE PRIME THERAPEUTICS INC | PARATEK PHARMACEUTICALS, INC. |
---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
name | |||||||||||||||||||||
THERAPEUTICSMD, INC. | 0.022313 | 0.005716 | 0.012211 | 0.008953 | 0.001271 | 0.010204 | 0.006031 | 0.001142 | 0.010645 | 0.006653 | ... | 0.008961 | 0.005136 | 0.005609 | 0.015417 | 0.002654 | -0.002167 | 0.004205 | 0.014408 | -0.001208 | 0.008331 |
PTC THERAPEUTICS, INC. | 0.005716 | 0.077202 | 0.015773 | 0.004463 | 0.001953 | 0.027776 | 0.026502 | -0.002392 | 0.006663 | 0.011031 | ... | 0.014100 | 0.008482 | 0.007595 | 0.017331 | 0.005679 | 0.009276 | 0.006851 | 0.011528 | 0.013654 | 0.012638 |
ZYNERBA PHARMACEUTICALS, INC. | 0.012211 | 0.015773 | 0.060301 | 0.013983 | 0.008395 | 0.025120 | 0.011908 | 0.001315 | -0.002097 | 0.010391 | ... | 0.017643 | 0.006013 | 0.003551 | 0.016136 | 0.003650 | 0.012120 | -0.002097 | -0.001143 | 0.007206 | -0.002905 |
ACTINIUM PHARMACEUTICALS, INC. | 0.008953 | 0.004463 | 0.013983 | 0.045604 | 0.002781 | 0.018911 | 0.009388 | 0.003213 | 0.002852 | 0.008557 | ... | 0.008579 | 0.000157 | 0.003037 | 0.008916 | 0.006585 | 0.010661 | 0.000585 | -0.002929 | -0.001142 | 0.007948 |
ORAMED PHARMACEUTICALS INC. | 0.001271 | 0.001953 | 0.008395 | 0.002781 | 0.012120 | 0.006528 | -0.000530 | -0.000037 | 0.005420 | -0.002721 | ... | 0.002220 | 0.005889 | -0.000180 | 0.007738 | 0.011952 | 0.006241 | 0.002669 | 0.003121 | 0.005524 | 0.005203 |
... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... |
CYCLACEL PHARMACEUTICALS, INC. | -0.002167 | 0.009276 | 0.012120 | 0.010661 | 0.006241 | 0.003389 | 0.001496 | 0.000655 | 0.004923 | -0.013052 | ... | 0.004142 | -0.006834 | 0.000965 | 0.004197 | 0.002821 | 0.040065 | 0.011159 | -0.002042 | 0.001104 | 0.012054 |
PAIN THERAPEUTICS INC | 0.004205 | 0.006851 | -0.002097 | 0.000585 | 0.002669 | -0.001055 | 0.011351 | 0.000104 | -0.009049 | 0.012620 | ... | 0.002670 | 0.011887 | 0.003596 | -0.002045 | 0.002679 | 0.011159 | 0.096589 | 0.012372 | 0.000731 | 0.006248 |
CORVUS PHARMACEUTICALS, INC. | 0.014408 | 0.011528 | -0.001143 | -0.002929 | 0.003121 | 0.014182 | 0.016444 | 0.000146 | 0.014813 | 0.021624 | ... | 0.008353 | 0.010539 | 0.004541 | 0.029058 | 0.011207 | -0.002042 | 0.012372 | 0.050265 | 0.006850 | 0.018659 |
FIVE PRIME THERAPEUTICS INC | -0.001208 | 0.013654 | 0.007206 | -0.001142 | 0.005524 | 0.009971 | 0.016159 | 0.000675 | 0.009991 | 0.001711 | ... | 0.004857 | 0.004692 | 0.003060 | -0.005347 | 0.011714 | 0.001104 | 0.000731 | 0.006850 | 0.022705 | 0.007209 |
PARATEK PHARMACEUTICALS, INC. | 0.008331 | 0.012638 | -0.002905 | 0.007948 | 0.005203 | 0.009235 | 0.013552 | 0.001234 | 0.014608 | 0.007191 | ... | 0.005395 | 0.003788 | 0.007372 | 0.002726 | 0.013074 | 0.012054 | 0.006248 | 0.018659 | 0.007209 | 0.026534 |
124 rows × 124 columns
Cosine Similarity Distances for the Pharmaceutical Preparations Industry¶
tmp = cosine_sim[LIST].drop_duplicates().T
Pharm_cos_sim = tmp[LIST].drop_duplicates()
Pharm_cos_sim
name | THERAPEUTICSMD, INC. | PTC THERAPEUTICS, INC. | ZYNERBA PHARMACEUTICALS, INC. | ACTINIUM PHARMACEUTICALS, INC. | ORAMED PHARMACEUTICALS INC. | CARA THERAPEUTICS, INC. | PROGENICS PHARMACEUTICALS INC | JOHNSON & JOHNSON | CHIASMA, INC | SYNDAX PHARMACEUTICALS INC | ... | IRONWOOD PHARMACEUTICALS INC | AMICUS THERAPEUTICS INC | CELGENE CORP /DE/ | ARQULE INC | SYNTHETIC BIOLOGICS, INC. | CYCLACEL PHARMACEUTICALS, INC. | PAIN THERAPEUTICS INC | CORVUS PHARMACEUTICALS, INC. | FIVE PRIME THERAPEUTICS INC | PARATEK PHARMACEUTICALS, INC. |
---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
name | |||||||||||||||||||||
THERAPEUTICSMD, INC. | 1.000000 | 0.311241 | 0.456582 | 0.534662 | 0.543275 | 0.363769 | 0.185475 | 0.206379 | 0.488301 | 0.675508 | ... | 0.583528 | 0.429780 | 0.584773 | 0.016313 | 0.382713 | 0.652642 | 0.229070 | 0.352277 | 0.011227 | 0.257763 |
PTC THERAPEUTICS, INC. | 0.311241 | 1.000000 | 0.327964 | 0.427764 | 0.374165 | 0.243474 | 0.057881 | 0.062339 | 0.375322 | 0.395880 | ... | 0.328586 | 0.493302 | 0.407618 | 0.204760 | 0.236988 | 0.320797 | 0.236762 | 0.357569 | 0.000000 | 0.197942 |
ZYNERBA PHARMACEUTICALS, INC. | 0.456582 | 0.327964 | 1.000000 | 0.617744 | 0.598029 | 0.195842 | 0.110879 | 0.025581 | 0.820874 | 0.708408 | ... | 0.324872 | 0.281916 | 0.431761 | 0.000000 | 0.865089 | 0.645396 | 0.160295 | 0.625223 | 0.000000 | 0.418709 |
ACTINIUM PHARMACEUTICALS, INC. | 0.534662 | 0.427764 | 0.617744 | 1.000000 | 0.691870 | 0.415886 | 0.115986 | 0.100493 | 0.739771 | 0.776724 | ... | 0.457132 | 0.347511 | 0.673470 | 0.343720 | 0.552866 | 0.704761 | 0.379062 | 0.725817 | 0.000000 | 0.319632 |
ORAMED PHARMACEUTICALS INC. | 0.543275 | 0.374165 | 0.598029 | 0.691870 | 1.000000 | 0.285224 | 0.156478 | 0.091661 | 0.651584 | 0.726614 | ... | 0.417753 | 0.321514 | 0.555110 | 0.000000 | 0.444021 | 0.636782 | 0.137641 | 0.479985 | 0.000000 | 0.293768 |
... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... |
CYCLACEL PHARMACEUTICALS, INC. | 0.652642 | 0.320797 | 0.645396 | 0.704761 | 0.636782 | 0.249042 | 0.121401 | 0.060361 | 0.665669 | 0.727805 | ... | 0.470073 | 0.416150 | 0.556812 | 0.012977 | 0.604900 | 1.000000 | 0.305312 | 0.524410 | 0.002977 | 0.307589 |
PAIN THERAPEUTICS INC | 0.229070 | 0.236762 | 0.160295 | 0.379062 | 0.137641 | 0.291646 | 0.063775 | 0.036477 | 0.174156 | 0.288048 | ... | 0.291563 | 0.231796 | 0.343935 | 0.661693 | 0.126259 | 0.305312 | 1.000000 | 0.608993 | 0.000000 | 0.179012 |
CORVUS PHARMACEUTICALS, INC. | 0.352277 | 0.357569 | 0.625223 | 0.725817 | 0.479985 | 0.372136 | 0.088433 | 0.015900 | 0.673502 | 0.700044 | ... | 0.434292 | 0.285891 | 0.503294 | 0.670545 | 0.645031 | 0.524410 | 0.608993 | 1.000000 | 0.000000 | 0.244997 |
FIVE PRIME THERAPEUTICS INC | 0.011227 | 0.000000 | 0.000000 | 0.000000 | 0.000000 | 0.000000 | 0.079602 | 0.113822 | 0.003235 | 0.005774 | ... | 0.022190 | 0.000000 | 0.103859 | 0.000000 | 0.002962 | 0.002977 | 0.000000 | 0.000000 | 1.000000 | 0.000000 |
PARATEK PHARMACEUTICALS, INC. | 0.257763 | 0.197942 | 0.418709 | 0.319632 | 0.293768 | 0.181131 | 0.056366 | 0.036980 | 0.455409 | 0.285976 | ... | 0.130570 | 0.215822 | 0.267446 | 0.000000 | 0.460032 | 0.307589 | 0.179012 | 0.244997 | 0.000000 | 1.000000 |
124 rows × 124 columns
Covariance for Cosine Similarity¶
sd = pd.DataFrame(np.sqrt(np.diag(np.diagonal(cov_Pharm))))
sd = sd.set_index(cov_Pharm.index)
sd.columns = cov_Pharm.index
cos_sim_cov = pd.DataFrame((np.dot(np.dot(sd, Pharm_cos_sim),sd))).set_index(cov_Pharm.index)
cos_sim_cov.columns = cov_Pharm.index
cos_sim_cov
name | THERAPEUTICSMD, INC. | PTC THERAPEUTICS, INC. | ZYNERBA PHARMACEUTICALS, INC. | ACTINIUM PHARMACEUTICALS, INC. | ORAMED PHARMACEUTICALS INC. | CARA THERAPEUTICS, INC. | PROGENICS PHARMACEUTICALS INC | JOHNSON & JOHNSON | CHIASMA, INC | SYNDAX PHARMACEUTICALS INC | ... | IRONWOOD PHARMACEUTICALS INC | AMICUS THERAPEUTICS INC | CELGENE CORP /DE/ | ARQULE INC | SYNTHETIC BIOLOGICS, INC. | CYCLACEL PHARMACEUTICALS, INC. | PAIN THERAPEUTICS INC | CORVUS PHARMACEUTICALS, INC. | FIVE PRIME THERAPEUTICS INC | PARATEK PHARMACEUTICALS, INC. |
---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
name | |||||||||||||||||||||
THERAPEUTICSMD, INC. | 0.022313 | 0.012918 | 0.016748 | 0.017055 | 0.008934 | 0.011408 | 0.005667 | 0.001378 | 0.016164 | 0.023595 | ... | 0.010360 | 0.008472 | 0.008508 | 0.000654 | 0.012480 | 0.019514 | 0.010634 | 0.011798 | 0.000253 | 0.006272 |
PTC THERAPEUTICS, INC. | 0.012918 | 0.077202 | 0.022377 | 0.025382 | 0.011445 | 0.014203 | 0.003290 | 0.000775 | 0.023109 | 0.025721 | ... | 0.010851 | 0.018087 | 0.011032 | 0.015277 | 0.014375 | 0.017841 | 0.020445 | 0.022274 | 0.000000 | 0.008959 |
ZYNERBA PHARMACEUTICALS, INC. | 0.016748 | 0.022377 | 0.060301 | 0.032395 | 0.016167 | 0.010097 | 0.005569 | 0.000281 | 0.044669 | 0.040677 | ... | 0.009482 | 0.009135 | 0.010327 | 0.000000 | 0.046377 | 0.031723 | 0.012233 | 0.034422 | 0.000000 | 0.016749 |
ACTINIUM PHARMACEUTICALS, INC. | 0.017055 | 0.025382 | 0.032395 | 0.045604 | 0.016266 | 0.018646 | 0.005066 | 0.000960 | 0.035008 | 0.038786 | ... | 0.011603 | 0.009793 | 0.014008 | 0.019710 | 0.025775 | 0.030125 | 0.025158 | 0.034750 | 0.000000 | 0.011119 |
ORAMED PHARMACEUTICALS INC. | 0.008934 | 0.011445 | 0.016167 | 0.016266 | 0.012120 | 0.006592 | 0.003524 | 0.000451 | 0.015896 | 0.018705 | ... | 0.005466 | 0.004671 | 0.005952 | 0.000000 | 0.010671 | 0.014032 | 0.004709 | 0.011847 | 0.000000 | 0.005268 |
... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... |
CYCLACEL PHARMACEUTICALS, INC. | 0.019514 | 0.017841 | 0.031723 | 0.030125 | 0.014032 | 0.010465 | 0.004970 | 0.000540 | 0.029526 | 0.034065 | ... | 0.011183 | 0.010992 | 0.010856 | 0.000698 | 0.026433 | 0.040065 | 0.018993 | 0.023533 | 0.000090 | 0.010029 |
PAIN THERAPEUTICS INC | 0.010634 | 0.020445 | 0.012233 | 0.025158 | 0.004709 | 0.019029 | 0.004054 | 0.000507 | 0.011994 | 0.020933 | ... | 0.010770 | 0.009506 | 0.010411 | 0.055221 | 0.008566 | 0.018993 | 0.096589 | 0.042433 | 0.000000 | 0.009063 |
CORVUS PHARMACEUTICALS, INC. | 0.011798 | 0.022274 | 0.034422 | 0.034750 | 0.011847 | 0.017516 | 0.004055 | 0.000159 | 0.033461 | 0.036700 | ... | 0.011573 | 0.008458 | 0.010991 | 0.040369 | 0.031571 | 0.023533 | 0.042433 | 0.050265 | 0.000000 | 0.008947 |
FIVE PRIME THERAPEUTICS INC | 0.000253 | 0.000000 | 0.000000 | 0.000000 | 0.000000 | 0.000000 | 0.002453 | 0.000767 | 0.000108 | 0.000203 | ... | 0.000397 | 0.000000 | 0.001524 | 0.000000 | 0.000097 | 0.000090 | 0.000000 | 0.000000 | 0.022705 | 0.000000 |
PARATEK PHARMACEUTICALS, INC. | 0.006272 | 0.008959 | 0.016749 | 0.011119 | 0.005268 | 0.006194 | 0.001878 | 0.000269 | 0.016439 | 0.010893 | ... | 0.002528 | 0.004639 | 0.004243 | 0.000000 | 0.016359 | 0.010029 | 0.009063 | 0.008947 | 0.000000 | 0.026534 |
124 rows × 124 columns
Efficient Frontier - Pharmaceutical Preparations¶
ef1 = EfficientFrontier(mu_Pharm, cos_sim_cov, weight_bounds=(0, 0.2))
fig, ax = plt.subplots()
plotting.plot_efficient_frontier(ef1, ax=ax, show_assets=True)
# Find and plot the tangency portfolio
ef2 = EfficientFrontier(mu_Pharm, cos_sim_cov, weight_bounds=(0, 0.2))
# min volatility
ef2.min_volatility()
ret_tangent, std_tangent, _ = ef2.portfolio_performance()
ax.scatter(std_tangent, ret_tangent, marker="*", s=100, c="r", label="Min Volatility")
# Format
ax.set_title("Efficient Frontier - Pharmaceutical Preparations \n Cosine Similarity Estimates")
ax.legend()
plt.tight_layout()
plt.savefig('images/Efficient_Frontier_Cos_Sim_Pharmaceutical_Preparations.png', dpi=200, bbox_inches='tight')
plt.show()
Min Volatility Portfolio¶
Performance¶
ef2.portfolio_performance(verbose=True);
Expected annual return: 1.2%
Annual volatility: 2.6%
Sharpe Ratio: -0.32
Weights¶
companies = []
weights = []
for company, weight in ef2.clean_weights().items():
if weight != 0:
companies.append(company)
weights.append(weight)
dic = {'Company_Name':companies,'Weight':weights}
min_vol = pd.DataFrame(dic)
min_vol = pd.DataFrame(dic)
min_vol.to_csv("data/min_vol_cos_sim_Pharmaceutical_Preparations.csv", index = False)
Company_Name | Weight | |
---|---|---|
0 | JOHNSON & JOHNSON | 0.18756 |
1 | BIOSPECIFICS TECHNOLOGIES CORP | 0.07394 |
2 | BIOMARIN PHARMACEUTICAL INC | 0.04572 |
3 | MERCK & CO., INC. | 0.13753 |
4 | BRISTOL MYERS SQUIBB CO | 0.03719 |
5 | ZOETIS INC. | 0.20000 |
6 | HERON THERAPEUTICS, INC. /DE/ | 0.00906 |
7 | PERRIGO CO PLC | 0.01497 |
8 | XENCOR INC | 0.02108 |
9 | PACIRA PHARMACEUTICALS, INC. | 0.01883 |
10 | LILLY ELI & CO | 0.03562 |
11 | PFIZER INC | 0.20000 |
12 | ARQULE INC | 0.00068 |
13 | FIVE PRIME THERAPEUTICS INC | 0.01782 |
Results for the Other 4 Industries¶
Prepackaged Software (mass reproduction of software)¶
Min Volatility Portfolio¶
Performance¶
Expected annual return: 1.1%
Annual volatility: 2.9%
Sharpe Ratio: -0.30
Weights¶
Company_Name | Weight | |
---|---|---|
0 | ALARM.COM HOLDINGS, INC. | 0.01069 |
1 | Q2 HOLDINGS, INC. | 0.06190 |
2 | ORACLE CORP | 0.16315 |
3 | INTELLICHECK, INC. | 0.00938 |
4 | ZEDGE, INC. | 0.00391 |
5 | NUANCE COMMUNICATIONS, INC. | 0.05947 |
6 | AWARE INC /MA/ | 0.02316 |
7 | NATIONAL INSTRUMENTS CORP | 0.09372 |
8 | GSE SYSTEMS INC | 0.04031 |
9 | ULTIMATE SOFTWARE GROUP INC | 0.10350 |
10 | ACI WORLDWIDE, INC. | 0.04754 |
11 | BLACK KNIGHT, INC. | 0.20000 |
12 | ANSYS INC | 0.15390 |
13 | REALPAGE INC | 0.02937 |
Crude Petroleum and Natural Gas¶
When we conduct the same analysis, there is no weight shown. Efficient frontier cannot be found.
Real Estate Investment Trusts¶
Min Volatility Portfolio¶
Performance¶
Expected annual return: 0.6%
Annual volatility: 1.7%
Sharpe Ratio: -0.81
Weights¶
Company_Name | Weight | |
---|---|---|
0 | GREAT AJAX CORP. | 0.13806 |
1 | EQUITY COMMONWEALTH | 0.16327 |
2 | RAYONIER INC | 0.00876 |
3 | EQUINIX INC | 0.07068 |
4 | HIGHWOODS PROPERTIES INC | 0.05347 |
5 | HEALTHCARE TRUST OF AMERICA, INC. | 0.02052 |
6 | STARWOOD PROPERTY TRUST, INC. | 0.01851 |
7 | MFA FINANCIAL, INC. | 0.05101 |
8 | EASTGROUP PROPERTIES INC | 0.01785 |
9 | LTC PROPERTIES INC | 0.00036 |
10 | ANNALY CAPITAL MANAGEMENT INC | 0.05094 |
11 | SUN COMMUNITIES INC | 0.14907 |
12 | GAMING & LEISURE PROPERTIES, INC. | 0.06734 |
13 | HMG COURTLAND PROPERTIES INC | 0.03187 |
14 | DUKE REALTY CORP | 0.05369 |
15 | CROWN CASTLE INTERNATIONAL CORP | 0.02634 |
16 | PUBLIC STORAGE | 0.06339 |
17 | ALEXANDERS INC | 0.01487 |
State Commercial Banks (commercial banking)¶
Min Volatility Portfolio¶
Performance¶
Expected annual return: 1.1%
Annual volatility: 2.2%
Sharpe Ratio: -0.38
Weights¶
Company_Name | Weight | |
---|---|---|
0 | INVESTAR HOLDING CORP | 0.16789 |
1 | CITIZENS & NORTHERN CORP | 0.11305 |
2 | S&T BANCORP INC | 0.05201 |
3 | BANNER CORP | 0.20000 |
4 | BANK OF NEW YORK MELLON CORP | 0.09816 |
5 | ENTERPRISE FINANCIAL SERVICES CORP | 0.07078 |
6 | EAST WEST BANCORP INC | 0.08342 |
7 | HOWARD BANCORP INC | 0.02931 |
8 | BANK OF HAWAII CORP | 0.04935 |
9 | UNITY BANCORP INC /NJ/ | 0.01179 |
10 | CB FINANCIAL SERVICES, INC. | 0.02883 |
11 | INDEPENDENT BANK CORP /MI/ | 0.09540 |