1. Introduction
Fundamental industry classifications such as GICS, BICS, ICB, NAICS, SIC, TRBC, etc. (GICS = Global Industry Classification Standard (by MSCI and Standard & Poor’s); BICS = Bloomberg Industry Classification System; ICB = Industry Classification Benchmark (by London Stock Exchange FTSE); NAICS = North American Industry Classification System (by Mexico’s Instituto Nacional de Estadística y Geografía, Statistics Canada also known as Statistique Canada, and the United States Office of Management and Budget); SIC = Standard Industrial Classification (by the United States government agencies); TRBC = Thomson Reuters Business Classification.) are widely used in a variety of fields, including economic applications (for economics, financial economics and accounting related literature, (see, e.g., [1,2,3,4,5,6,7,8,9,10,11,12,13,14,15,16,17,18,19,20,21,22,23]); for a recent review, (see, e.g., [24]); for other applications and more generally related literature, (see, e.g., [25,26,27,28,29,30,31,32,33,34,35])), general population and healthcare related studies ((see, e.g., [36]) and references therein), and (quantitative) finance/trading (including risk modeling) (for related literature, (see, e.g., [37,38,39,40,41,42,43,44,45,46,47,48,49,50,51,52,53,54,55,56,57,58,59]); for applications to risk modeling within quantitative finance, (see, e.g., [60,61,62]); for statistical/data mining related methods, (see, e.g., [63,64,65,66,67,68,69])). Industry classification (i.e., taxonomy) groups companies into baskets (e.g., industries) based on some kind of a similarity criterion or criteria, which differ from one classification to another. Such fundamental industry classifications generally are expected to be based on pertinent fundamental/economic data, such as companies’ products and services, revenue sources, suppliers, competitors, partners, etc. They are essentially independent of the pricing data and, if well-built, tend to be rather stable out-of-sample as companies seldom jump industries.
Many industry classifications are developed commercially and acquiring such data is associated with nontrivial costs. Even government-developed classifications such as NAICS or even SIC (see below) are not exactly free. This is for two main reasons. First, simply specifying a hierarchical structure (e.g., a complete list of, say, sectors, industries and sub-industries as in BICS) is only the tip of the iceberg; many (qualified) man-hours are required to assign to the actual companies the nomenclature within such structure (i.e., which sector, industry and sub-industry each company belongs to, using the BICS terminology). Second, the resultant data is not necessarily provided (by government agencies) as a simple one-click/single-file download.
In this paper, among other things, we fill a gap in this space. We provide open source code for freely downloading SIC data directly from the U.S. Securities and Exchange Commission (SEC) without any APIs, accounts, logins, passwords, etc. Furthermore, since this data is provided by the U.S. government, its downloads by the public are completely unrestricted.
The data provided by the SEC, among other things, contains company names and SIC codes. SIC codes are 4-digit numeric identifiers corresponding to the SIC hierarchy (Division → Major Group → Industry Group → Industry). As we discuss below, there is an efficient and fast way for downloading all companies with SIC codes. However, the SEC does not maintain accurate ticker data. (There is a way to search for companies by tickers on the SEC website. However, many tickers are missing.) So, the data downloaded from the SEC website must be matched to tickers. At the end, our code matches tickers to SIC codes. (We discuss how to source all listed and OTC (over-the-counter) U.S. tickers.) There are various nuances in the underlying SEC data, such as peculiarities with the SIC codes used by the SEC, which we also discuss in detail.
SIC is not the best classification around. Furthermore, contrary to an apparent misconception, this is not because SIC is not “granular” (or detailed) enough. On the face of it, it is more granular than BICS. However, as mentioned above, the hierarchical structure itself is only the tip of the iceberg. What is arguably even more important is the assignment of companies into such hierarchy. In the case of SIC, apparently this assignment does not necessarily always follow the aforesaid criteria based on companies’ products and services, revenue sources, suppliers, competitors, partners, etc. Instead, it appears that, at least in some cases, such assignments are made on more superficial grounds (this could, e.g., be the companies’ own assessment, etc.).
However, SIC is not a “disaster” by any stretch. It is widely used by academics (see, e.g., [70,71]), and perhaps to a lesser extent by quant traders (who usually prefer more robust industry classifications such as GICS and BICS), but still used. For a (young, but not only) researcher who wishes to test, e.g., some trading ideas involving industry classification but does not wish to commit to costly commercial data subscriptions, SIC can be a good zeroth-order approximation, so long it is free and easily obtained. (SIC data is available through various data providers; however, most are not free, and not all have access to them.) Our code provides such a solution.
We quantify a comparison between different industry classifications by utilizing them for building short-horizon mean-reversion trading signals (alphas) via (open source) heterotic risk models [61]. We find that GICS slightly outperforms BICS, and SIC performs worse, including when restricted to the so-called Fama–French industry classification [49].
The remainder is organized as follows. Section 2 discusses data (downloads). Section 3 discusses backtests. Section 4 briefly concludes. Data and source code are in Appendices.
2. Data (Downloads)
2.1. SIC Hierarchical Structure
SIC structure has four levels: Division → Major Group → Industry Group → Industry. This structure can be downloaded from the website https://www.osha.gov/pls/imis/sic_manual.html of the Occupational Safety and Health Administration (OSHA) of the U.S. Department of Labor using the R function sec.osha() in Appendix C. (The source code in Appendix C hereof is not written to be “fancy” or optimized for speed or in any other way. Its sole purpose is to illustrate the algorithms described and/or discussion in the main text in a simple-to-understand fashion. See Appendix D for some important legalese.) This function outputs a single tab-delimited file SIC.table.txt, which contains the SIC hierarchy given in Appendix A. More precisely, in Appendix A, for reading convenience, the data is separated by “ > “. Also, Appendix A contains lines (in bold italic font), which are not present in SIC.table.txt and pertain to additional SIC codes present in the SEC data, which we will discuss in detail below.
The 10 SIC Divisions are labeled by characters A through J. The Major Groups are labeled by 2-digit numeric codes XY, where both X and Y can take values 0 through 9. It is convenient to label Major Groups by 4-digit codes XY00. The Industry Groups are labeled by 3-digit numeric codes XYZ. Unlike X and Y, Z can only take values between 1 and 9. Again, it is convenient to label Industry Groups by 4-digit codes XYZ0. Thus, Industry Group XYZ0 belongs to Major Group XY00. Finally, Industries are labeled by 4-digit codes XYZW, where W also takes values between 1 and 9. Industry XYZW belongs to Industry Group XYZ0. This is the SIC hierarchical structure.
2.2. SEC Data Download
The SEC website allows data to be searched for based on a company name, CIK number (according to the SEC, https://www.sec.gov/edgar/searchedgar/cik.htm: “The Central Index Key (CIK) is used on the SEC’s computer systems to identify corporations and individual people who have filed disclosure with the SEC.”), ticker (albeit, as mentioned above, ticker search is unreliable), etc. (There are some nuances, e.g., an automated search by name via scanning through all single and double ASCII and non-ASCII characters captures most filers/companies. However, due to a constraint on the number of pages that can be scanned and inefficiencies in the data ordering, this gets complicated relatively quickly. Así es la vida.) Happily, we can also search by SIC codes. There are two approaches we can take here. A priori, we do not know what SIC codes to expect in the data. So, we can scan all sic codes 0100 through 9999. Once we have all SIC codes used by the SEC, we can limit the downloads to these predefined SIC codes to reduce the download time. As a precautionary measure, periodically we may wish to run full scans to check whether new SIC codes crop up in the data. The sec.all.sic() R function in Appendix C downloads the data. For its first argument run.all.sic = T, it downloads all sic codes 0100 through 9999, whereas for run.all.sic = F the download is limited to the SIC codes in a tab-delimited input file SIC.Codes.txt, which contains the SIC codes currently used by the SEC along with the corresponding Industry names. (If a SIC code is of the form XYZ0, then the Industry name is the same as the corresponding Industry Group name. If a SIC code is of the form XY00, then the Industry name is the same as the corresponding Major Group name.) Appendix B contains this file. Here, some remarks are in order. Thus, some of the actual Industry names in the downloaded data (column 4 of the output file SIC.Download.txt—see below) are (immaterially) different or bad (e.g., some well-defined SIC codes in the SEC data download are erroneously marked as “unknown”, etc.). Therefore, the Industry names in SIC.Codes.txt (Appendix B) are based on the table (which is missing the SIC codes 0888, 1044, 6025, 6120) provided at https://www.sec.gov/info/edgar/siccodes.htm combined with those in SIC.table.txt (Appendix A) downloaded from OSHA (see above).
The sec.all.sic() function outputs a tab-delimited file SIC.Download.txt. The first column is the CIK number, the second column is the company name, the third column is the SIC code, the fourth column is the Industry name as it appears in the SEC data (which, as mentioned above, is not necessarily the same as the corresponding Industry name in the file SIC.Codes.txt), and the fifth column is the location (U.S. State, Canadian province, foreign county, etc.) code (e.g., for Microsoft Corporation, we have the following data: CIK = 0000789019, company name = MICROSOFT CORP., SIC code = 7372, Industry name = SERVICES-PREPACKAGED SOFTWARE, location code = WA). The page https://www.sec.gov/edgar/searchedgar/edgarstatecodes.htm contains most location codes. The data also contains legacy location codes (to wit, E6, L4, I8, I9, E7, U2, L5 and LO [“L-O” as opposed to L0 = “L-zero”]). These old codes are all described at https://www.sec.gov/edgar/searchedgar/edgarstatecodes.htm. One additional code present in the data is X9. Only two companies have this code, and these appear to be German entities. (To wit, CYBERMIND AG (CIK = 0001135128) and KPMG DEUTSCHE TREUHAND GESELLSCHAFT AG (CIK = 0001184474). These companies do not have SIC codes, but some companies with the legacy location codes do.)
As mentioned above, some entries in Appendix A (in bold italic font) are not included in the OSHA download SIC.table.txt. These correspond to the additional SIC codes present in the SEC data. Appendix A is obtained by amending SIC.table.txt with these additional codes. Most of them fit nicely into the SIC hierarchical structure. A couple of potential hiccups are: (1) SIC code = 6025, with only a single company, PNB BANCSHARES INC (CIK = 0001230585); and (2) SIC code = 9995. (Also, see https://www.sec.gov/fast-answers/answers-blankcheckhtm.html for “Blank Checks” (SIC code = 6770).) Three SIC codes 0888, 8880 and 8888 do not fit in the 4-digit hierarchy described above, so we appended them at the end of Appendix A. All lines with non-OSHA SIC codes in Appendix A (i.e., those in bold italic font) end with our descriptor “(SEC)”. Overall, the SEC data is reasonably “clean” barring the aforesaid manually-to-deal-with glitches.
2.3. Matching to Tickers
In practical quantitative finance/trading applications, assignment of SIC codes to company names is only partially useful as most (e.g., pricing) data is labeled by tickers. So, we need to match our data in the file SIC.Download.txt to tickers. This is done in the R function sec.sic() in Appendix C. Its sole argument is incl.otc, which is the second argument of the function sec.all.sic() (see above). For incl.otc = F, only listed U.S. tickers are matched, and for incl.otc = T the over-the-counter (OTC) tickers are also included. The input files of sec.sic() are: (i) NQ_AMEX.csv, NQ_NYSE.csv, NQ_NASDAQ.csv (these files can be downloaded daily from www.nasdaq.com via, e.g., the wget utility—see Appendix C.1); NT_otherlisted.txt, NT_nasdaqlisted.txt (these files can be downloaded daily from www.nasdaqtrader.com via, e.g., the wget utility—see Appendix C.1); and (ii) if we wish to include OTCs, i.e., if incl.otc = T, then also the file otctickers.csv (this file can be manually downloaded daily from http://www.otcmarkets.com/reports/symbol_info.csv). Note that the source code is straightforward to modify to accommodate other sources of ticker lists.
The function sec.sic() matches tickers in the above files to SIC codes by matching the company names in the SEC data to those in the ticker lists. It goes without saying that there are cases where a match is not reasonably attainable. There are two main categories here. First, the SEC data simply may not have a SIC code assigned to a given company, or the company name used by the SEC may be different from that in the ticker lists, hence no match (e.g., ETRADE v. E*TRADE). Second, there may be more than one match (after stripping the company names of various extraneous attributes that muddy the waters, e.g., CORPORATION v. CORP). The statistics for the occurrences are given in Table 1, which also provides the number of total tickers, and how many of the tickers without the SIC code correspond to funds, trusts, ETFs and similar as-is difficult-to-classify vehicles, which the code attempts to detect using a heuristic. (Which may be improved by using the ETF field in NT_otherlisted.txt and NT_nasdaqlisted.txt.) Overall, the ticker matching (especially if we exclude funds, etc.) yields a reasonable coverage.
[ Table omitted. See PDF. ]
The function sec.sic() outputs three files. The file TICKER.SIC.txt contains the tickers for which SIC codes are matched: first column = ticker; second column = exchange (A = AMEX, N = NYSE, Q = NASDAQ, the others relate to OTCs); third column = SIC code; fourth column = Industry name (as it appears in the SEC data—see above); fifth column = market cap (for listed tickers only as provided in the aforesaid three www.nasdaq.com files). The file NO.SIC.txt contains the tickers for which SIC codes are not matched: first column = ticker; second column = exchange (see above); third column = market cap (see above); fourth column = TRUE (FALSE) if the ticker is (not) a fund, etc.; fifth column = TRUE (FALSE) if there is no match (multiple matches). The last file SIC.IND.CLASS.txt contains a binary industry classification.
2.4. Some Stats
The function sec.all.sic() at the end calls the function sec.sic(). Before doing so, it outputs a log file, which records the start and end times for measuring the time required for downloading the SEC data. Such measurements are reported in Table 2. Downloading all SIC codes 0100–9999 takes about 30 min, while downloading the SIC codes restricted to the SIC.Codes.txt file takes about 5 min. Ticker matching is fast (on a quad core, 3.1 GHz CPU machine). Finally, let us mention some statistics for the binary industry classification file SIC.IND.CLASS.txt. On 14 April 2017, for 4734 tickers with SIC codes, there were 392 Industries with the following ticker counts: Min = 1, 1st Quartile = 2, Median = 5, Mean = 12.08, 3rd Quartile = 10, Max = 376, i.e., there are many small (with one or two tickers) industries present.
[ Table omitted. See PDF. ]
3. Horserace (Backtests)
So, we can download the SEC data and build an industry classification based on SIC codes. How does this industry classification compare with others? A general comparison (granularity, etc.) was performed in [11] for GICS, NAICS and SIC. A comparison of how well industry classifications explain the co-movement of stock returns at long horizons was done in [43], which found that GICS—not surprisingly—does better than the Fama–French industry classification [49] with 48 “industries” (FF48), which is based on SIC. However, FF48 is not the same as SIC; it aggregates tickers with multiple SIC codes into these 48 “industries”, i.e., FF48 is much less granular than GICS, BICS and SIC at their respective most granular levels.
Here, we compare GICS, BICS, SIC, FF48 and FF49 (which is another variation of FF48—see below) at short horizons. However, here, we take the comparison to another level—instead of looking at the co-movement of stock returns, we look at the performance of short-horizon mean-reversion trading signals (alphas) computed using otherwise identical risk models except for the industry classifications that these risk models use to define the industry risk factors. The risk models we use are the open source heterotic risk models of [61]. Heterotic risk models do not use any style factors, so our horserace compares the underlying industry classifications without muddling them up with other extraneous factors. To our knowledge, this is the first such comparison of industry classifications appearing in published literature. The remainder of this section closely follows most parts of Section 6 of [61]. (We “rehash” it here not to be repetitive but so that our presentation here is self-contained.)
In brief, heterotic risk models for equities combine: (i) granularity of an industry classification; (ii) diagonality of the principal component factor covariance matrix for any sub-universe of stocks; and (iii) dramatic reduction of the factor covariance matrix size in the Russian-doll risk model construction [72]. Thus, in the heterotic risk model construction, using (for the sake of definiteness) the BICS nomenclature, one breaks the universe of stocks into subsets based on BICS sub-industries, computes the first principal components of the return sample correlation matrices for these subsets, and uses these first principal components as weights for computing the corresponding factor returns. Typically, for short lookbacks, the factor return sample covariance matrix is singular. One then further breaks the universe of these factors into subsets based on BICS industries, computes the first principal components of the return sample correlation matrices for these subsets, and uses these first principal components as weights for computing the corresponding factor returns. This nested embedding (the Russian-doll construction) is repeated (e.g., by going from BICS industries to BICS sectors to the “broad market” as defining fewer and fewer risk factors, as needed) until the number of the resultant factor returns is small enough such that the corresponding factor covariance matrix is nonsingular and sufficiently stable. This proves to be a powerful approach for constructing out-of-sample stable short-lookback equity risk models.
3.1. Notations
Let Pis Pis be the time series of stock prices, where i=1,…,N i=1,…,N labels the stocks, and s=1,2,… s=1,2,… labels the trading dates, with s=1 s=1 corresponding to the most recent date in the time series. The superscripts O O and C C (unadjusted open and close prices) and AO AO and AC AC (open and close prices fully adjusted for splits and dividends) will distinguish the corresponding prices, so, e.g., PisC PisC is the unadjusted close price. Vis Vis is the unadjusted daily volume (in shares). Also, for each date s swe define the overnight return as the previous-close-to-open return:
Eis=ln(PisAOPi,s+1AC)
This return will be used in the definition of the expected return in our mean-reversion alpha. We will also need the close-to-close return:
Ris=ln(PisACPi,s+1AC)
An out-of-sample (see below) time series of these returns will be used in constructing the risk models. Note that all prices in the definitions of the returns Eis Eis and Ris Risare fully adjusted.
We assume that (i) the portfolio is established at the open (this is a so-called “delay-0” alpha: the same price, PisO PisO (or adjusted PisAO PisAO ), is used in computing the expected return (via Eis Eis ) and as the establishing fill price), with fills at the open prices PisO PisO ; (ii) it is liquidated at the close on the same day—so this is a purely intraday alpha—with fills at the close prices PisC PisC; and (iii) there are no transaction costs or slippage—our aim here is not to build a realistic trading strategy, but to test relative performance of various industry classifications. The P&L (profits and losses) for each stock
Πis=His (PisCPisO −1)
Here, His His are the dollar holdings. The shares bought and sold (establishing and liquidating trades) for each stock on each day are computed via Qis = 2 |His| / PisO Qis = 2 |His| / PisO.
3.2. Universe Selection
For the sake of simplicity (in practice, the trading universe is selected based on market cap, liquidity (ADDV), price and other criteria), we select our universe based on the average daily dollar volume (ADDV) defined via (note that Ais Ais is out-of-sample for each date s s):
Ais=1m ∑r=1mVi,s+r Pi,s+rC
We take m=21 m=21 (i.e., one month), and then take our universe to be the top 2000 tickers by ADDV. To ensure that we do not inadvertently introduce a universe selection bias, we rebalance monthly (every 21 trading days, to be precise), i.e., we break our 5-year backtest period (see below) into 21-day intervals, we compute the universe using ADDV (which, in turn, is computed based on the 21-day period immediately preceding such an interval), and use this universe during the entire such interval. We do have the survivorship bias as we take the data for the universe of tickers as of 6 September 2014 that have historical pricing data sourced from http://finance.yahoo.com (accessed on 6 September 2014) for the period 1 August 2008 through 5 September 2014. We restrict this universe to include only U.S. listed common stocks and class shares (no OTCs, preferred shares, etc.) with GICS (ZK would like to thank Jim Liew for sharing the GICS data on 8 September 2014), BICS and SIC industry assignments as of 8 September 2014. (The choice of the backtesting window is based on what data was readily available.) However, as discussed in detail in Section 7 of [53], the survivorship bias is not a leading effect in such backtests. (Furthermore, here we are after the relative outperformance, and it is reasonable to assume that, to the leading order, individual performances are affected by the survivorship bias approximately equally as the construction of all alphas and risk models (see below) is “statistical” and oblivious to the trading universe.)
3.3. Backtesting
We run our simulations over a period of 5 years (more precisely, 1260 trading days going back from 5 September 2014, inclusive). The annualized return-on-capital (ROC) is computed as the average daily P&L divided by the intraday investment level I I (with no leverage) and multiplied by 252. The annualized Sharpe Ratio (SR) is computed as the daily Sharpe ratio multiplied by 252 252. Cents-per-share (CPS) is computed as the total P&L in cents (not dollars) divided by the total shares traded.
3.4. Optimized Alphas
The optimized alphas are based on the expected returns Eis Eis defined in Equation (1) optimized via Sharpe ratio maximization using heterotic risk models [61] based on the industry classifications we are testing, which are GICS, BICS, SIC, FF48 (see [70] for the definition thereof) and FF49 (see [71] for the definition thereof). FF48 and FF49 have just one level by construction. GICS, BICS and SIC have multiple levels. We take these industry classifications at their respective most granular levels (which are called sub-industries for GICS and BICS, and industries for SIC—see above). We then run the R function qrm.het() in Appendix B of [61] with the following inputs: ret is the matrix of returns Ris Ris defined in Equation (2); indis the binary N×K N×K industry classification matrix (each element equals 1 if the corresponding ticker belongs to the corresponding industry, and 0 otherwise—for SIC, this matrix is in the file SIC.IND.CLASS.txt that we discuss above); mkt.fac = T (this adds the “market” factor, so that effectively we have a 2-level industry classification and the risk factor covariance matrix is invertible—see [61] for details); and rm.sing.tkr = F (the default). As in [61], we compute the heterotic risk model covariance matrix Γij Γij every 21 trading days (same as for the universe). For each date (we omit the index s s), we maximize the Sharpe ratio subject to the dollar neutrality constraint and position bounds (which in this case are the same as trading bounds because the strategy is purely intraday):
S= ∑i=1N Hi Ei∑i,j=1N Γij Hi Hj →max
∑i=1NHi=0
|His|≤0.01 Ais
∑i=1N|His|=I
Here, Ais Ais is ADDV defined in Equation (4). Equation (6) is the dollar neutrality constraint. Equation (7) imposes the aforesaid trading bounds. Equation (8) ensures that the portfolio has the investment level I I . In the presence of bounds, computing Hi Hi requires an iterative procedure and we use the R code in Appendix C of [61] (which also contains detailed documentation).
3.5. Simulation Results
Table 3 (also see Figure 1) summarizes the simulation results. (Note that the result for BICS sub-industries differs from that in [61] as the universes here and in [61] are different. In [61], the universe is based on BICS only. Here, we require GICS, BICS and SIC assignments.) GICS sub-industries slightly outperform BICS sub-industries, which outperform SIC industries. FF48 and FF49 underperform SIC industries—simply due to lesser granularity. For comparison purposes, in Table 3, we also include simulation results for BICS where we replace BICS sub-industries by BICS industries and BICS sectors. Generally, reducing granularity leads to underperformance in such backtests.
[ Table omitted. See PDF. ]
4. Conclusions
To our knowledge, the industry classification comparison at short-horizons that we present here—especially using actual trading signals (as opposed to co-movements of returns)—is the first of its kind in the published literature. GICS remains a safe choice. It is unfortunate that BICS is no longer supported by Bloomberg. Also, SIC is not a “disaster” by any stretch and—now that we have provided open source code for it—can be used widely in, e.g., preliminary research.
Finally, let us comment that (well-built) industry classifications are intrinsically stable (with regards to temporal changes). This is primarily due to the fact that companies seldom jump industries (let alone less granular structures such as sectors). After all, fundamental industry classifications generally are expected to be based on pertinent fundamental/economic data, such as companies’ products and services, revenue sources, suppliers, competitors, partners, etc. This simplifies things as they relate to out-of-sample backtesting. Thus, if one obtains an industry classification for a universe of stocks today and uses this industry classification for backtesting for the same universe of stocks for the past year (that is, accounting for any ticker changes, etc., that may have transpired in the interim), the backtest can still be deemed valid as the likelihood of stocks jumping industries during this period is small. This is not to say that the industry classification provider may not have retroactively changed, say, sub-industry assignments for a few tickers. However, this kind of “in-sampleness” can be present in virtually any third-party data unless it is literally collected by the end-user on each and every historical day.
Acknowledgments
We would like to thank anonymous reviewers for valuable suggestions, which have improved the manuscript.
Author Contributions
The authors contributed equally.
Conflicts of Interest
The authors declare no conflict of interest.
You have requested "on-the-fly" machine translation of selected content from our databases. This functionality is provided solely for your convenience and is in no way intended to replace human translation. Show full disclaimer
Neither ProQuest nor its licensors make any representations or warranties with respect to the translations. The translations are automatically generated "AS IS" and "AS AVAILABLE" and are not retained in our systems. PROQUEST AND ITS LICENSORS SPECIFICALLY DISCLAIM ANY AND ALL EXPRESS OR IMPLIED WARRANTIES, INCLUDING WITHOUT LIMITATION, ANY WARRANTIES FOR AVAILABILITY, ACCURACY, TIMELINESS, COMPLETENESS, NON-INFRINGMENT, MERCHANTABILITY OR FITNESS FOR A PARTICULAR PURPOSE. Your use of the translations is subject to all use restrictions contained in your Electronic Products License Agreement and by using the translation functionality you agree to forgo any and all claims against ProQuest or its licensors for your use of the translation functionality and any output derived there from. Hide full disclaimer
© 2017. This work is licensed under http://creativecommons.org/licenses/by/3.0/ (the “License”). Notwithstanding the ProQuest Terms and Conditions, you may use this content in accordance with the terms of the License.
Abstract
We provide complete source code for building a fundamental industry classification based on publicly available and freely downloadable data. We compare various fundamental industry classifications by running a horserace of short-horizon trading signals (alphas) utilizing open source heterotic risk models (https://ssrn.com/abstract=2600798) built using such industry classifications. Our source code includes various stand-alone and portable modules, e.g., for downloading/parsing web data, etc.
You have requested "on-the-fly" machine translation of selected content from our databases. This functionality is provided solely for your convenience and is in no way intended to replace human translation. Show full disclaimer
Neither ProQuest nor its licensors make any representations or warranties with respect to the translations. The translations are automatically generated "AS IS" and "AS AVAILABLE" and are not retained in our systems. PROQUEST AND ITS LICENSORS SPECIFICALLY DISCLAIM ANY AND ALL EXPRESS OR IMPLIED WARRANTIES, INCLUDING WITHOUT LIMITATION, ANY WARRANTIES FOR AVAILABILITY, ACCURACY, TIMELINESS, COMPLETENESS, NON-INFRINGMENT, MERCHANTABILITY OR FITNESS FOR A PARTICULAR PURPOSE. Your use of the translations is subject to all use restrictions contained in your Electronic Products License Agreement and by using the translation functionality you agree to forgo any and all claims against ProQuest or its licensors for your use of the translation functionality and any output derived there from. Hide full disclaimer