Lv Tao 1 and Yongtao Hao 1 and Hao Yijie 2 and Shen Chunfeng 3
Academic Editor:Anna M. Gil-Lafuente
1, College of Electronics and Information Engineering, Tongji University, Shanghai 200092, China
2, Rabun Gap-Nacoochee School, Rabun Gap, GA 30568, USA
3, Shanghai Baosight Software Co., Ltd., Shanghai 200092, China
Received 23 December 2016; Revised 31 March 2017; Accepted 5 April 2017; 22 May 2017
This is an open access article distributed under the Creative Commons Attribution License, which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly cited.
1. Introduction
A time series is a series of observations listed in time order. It is the most commonly encountered data type, touching almost every aspect of human life [1], for example, the meteorological time series, the time series of stock prices (stock time series for short) which are composed of stock price observations, and the time series of personal health that are consisted of the observation of blood pressure, temperature, white corpuscle, and so forth.
Researches show that the time series have two import features. (a) The historical information will affect the future trend [2]. That is, the historical values of observations will exert an influence on the future values in the time series. The influence can be described by time series' period, nonstationarity, varying volatility, and so on. (b) History repeats itself [3]. That is to say, some special time subseries will repeat in the entire time series. Because of the two features, all kinds of time series forecasting have become a present hot research, one of which is the prediction of stock time series, stock prediction for short. As a typical time series, not only have stock time series the features of time series, but also the trend of stock prices is directly related to the people's vital interests. Therefore, stock prediction has aroused the interest of a wide variety of researchers.
There are many technical analysis methods about stock prediction, the best known of which is candlestick technical analysis that is also called K-line technology analysis in Asia. In the stock market, in order to learn and study the fluctuation of stock prices in a more intuitive way, people invent a candlestick chart (also called K-line) to represent stock time series graphically. Taking a daily K-line, for example, a K-line represents the fluctuation of stock prices in one day, it not only shows the close price, open price, high price, and low price for the day but also reflects the difference and size between any two prices (all K-lines given in the paper refer to daily K-line, unless otherwise indicated). If the K-line of a stock lists in time order, then a series used to reflect the fluctuation of the stock price for some time can be formed, which can be called K-line series. As each K-line consists of four prices, the essence of K-line series is stock series with four observations.
In K-line series, if a K-line subseries contains some knowledge used to predict stock, then this subseries is called a K-line pattern series, a K-line pattern for short. For instance, when a subseries appears, the stock price will often rise or descend. Then, this subseries is a typical pattern series. Stock prediction based on K-line patterns is the essence of K-line technology analysis. How to mine the K-line patterns and how to make use of these patterns for predicting are main research contents of K-line technology analysis.
By the artificial methods of observing the K-line series of stock market (or Japanese rice market), people (the leading character is the founder of K-line, Munehisa Honma, who was a Japanese rice trader in the 18th century) have found many K-line patterns. The literatures [4, 5] introduce the existing patterns and their features in detail, such as Three Inside Up (TIU), Three Inside Down (TID), and Doji. Some papers [6-10] conclude from the experiment that the existing K-line patterns have a good forecasting capability for forecasting stock trends. Some other papers [11-15] have studied the stock prediction based on these patterns and have achieved some research results. However, there are also a number of papers [5, 16-18] challenging these patterns' predictive power. They argue that K-line technology analysis violates the efficient market hypothesis, so it is not feasible for stock investment based on K-line patterns. They also did some experiments, which show that the existing K-line patterns have no predictive power.
Based on the above analysis, it is obvious that there are some disputes on whether the K-line patterns have predictive power in academia. However, there are few papers analyzing the reason why there are two different positions regarding the patterns' predictive power. Paper [19] also pays attention to the debate, while it does not analyze the K-line patterns themselves but attempts to obtain an answer to the following question: are the trend reversals accompanied more often by some types of candlesticks than by others? Finally, paper [19] has found that there exist types of candlesticks that frequently tend to appear close to the trend-reversal regions and others that cannot be found in such regions. Although the paper's research shows that the K-line patterns exist, it does not give the answer that why there is a debate on the K-line patterns' predictive power.
Through reviewing the relevant literatures, this paper considers that the main reason is that the existing K-line patterns are lack of rigorous mathematical definition. For example, the shadow length and body size are not defined clearly in the definition of K-line patterns, which means that a K-line pattern has many different shapes. Because the predictive power of a pattern may vary a lot for different shapes. If we ignore the shape difference and research the predictive power of a pattern by taking all patterns with various shapes as a whole instead of classifying the pattern further based on its shape feature, then the study result of K-line patterns' predictive power may produce deviations. For instance, a TIU pattern has three shapes: shape A, shape B, and shape C, as shown in Figure 1, where shape A is the generic form of TIU pattern, and shape B and C are infrequent form of which. Suppose that shape A has predictive power, and shape B and C do not have predictive power. When studying the predictive power of TIU pattern, if we ignore the shape difference between the three patterns and research them as a whole, then we will come to the wrong conclusion that TIU pattern has no predictive power. However, if the three patterns are classified further based on shape features and researched separately, then we can get the correct conclusion that TIU pattern has predictive power only at shape A.
Figure 1: Three kinds of shapes of TIU pattern.
(a) Shape A
[figure omitted; refer to PDF]
(b) Shape B
[figure omitted; refer to PDF]
(c) Shape C
[figure omitted; refer to PDF]
In addition, another reason is that, as the existing K-line patterns are mined by artificial means, there may be some spurious pattern in them.
In order to resolve the debate and verify the two inferences, this paper presents the research of K-line patterns' predictive power using the data mining related method, such as pattern recognition, pattern clustering, pattern knowledge mining, and statistical analysis. The rest of this paper is organized as follows. In Section 2, we firstly shortly introduce K-line, K-line technology analysis, and K-line patterns. Then we define the similarity match model and nearest neighbor-clustering algorithm of K-line series. In Section 3, we define the mining method of patterns' predictive power. Section 4 presents the experimental result and discussion. Section 5 concludes the paper.
2. K-Line and K-Line Series Clustering
Firstly, we give the mathematic definition of K-line series. Let KSi represent the i-th K-line series of any stock, and let Dti represent t -th K-line in KSi ; then [figure omitted; refer to PDF] where (KSi ) is the number of elements in KSi , which is also called the length of KSi . Cti , Oti , Hti , and Lti are the t -th day's close price, open price, high price, and low price in KSi , respectively. In this paper, "( )" symbol indicates the number of elements in the set or series.
2.1. K-Line
2.1.1. K-Line Introduction
As defined in literature [4-6], the K-line is drawn by four basic elements: close price, open price, high price, and low price, where the part between the close price and open price is drawn into a rectangle called body of K-line and the part between the high price and body is drawn into a line called upper shadow of K-line. Moreover, the part between the lower price and body is drawn into a line called lower shadow of K-line. This kind of very personalized lines consisting of upper shadow, lower shadow, and body is called K-line.
In the K-line, if open price is lower than close price, K-line also called Yang line, the body is usually filled with white or green color, as shown in Figure 2(a). And if open price is higher than close price, K-line also called Yin line, the body is usually filled with black or red color, as shown in Figure 2(b). Moreover, if open price is equal to close price, K-line also called Doji line, the body then collapses into a single horizontal line, as shown in Figure 2(c). It is important to note that the body color of Yin line and Yang line is different in Chinese stock market and stock markets of European and American. In Chinese stock market, the body color of Yang line and Yin line is red and green, respectively. However, the body color of Yang line and Yin line is green and red, respectively, in the stock markets of European and American.
Figure 2: K-line chart.
(a) Yang line
[figure omitted; refer to PDF]
(b) Yin line
[figure omitted; refer to PDF]
(c) Doji line
[figure omitted; refer to PDF]
2.1.2. K-Line Technology Analysis
Firstly, we introduce and define some key concepts of K-line technology analysis. Let Dt represent the t- th day's K-line of any stock.
( 1) Moving Average . It is the average of stock price for some time. The three-day moving average at time Dt is defined by [figure omitted; refer to PDF] where Ct denotes the close price of Dt .
( 2) K-Line Trend . It is used to describe the K-line's trend, including uptrend and downtrend. Dt is said to be a downtrend if [figure omitted; refer to PDF] with at most one violation of the inequalities. Uptrend is defined analogously.
( 3) Stock Price Trend . It is used to describe the general trend of stock prices for some time, including uptrend and downtrend. If the future trend of stock price is rising, it is called bullish market. In contrast, if the future trend of stock price is descending, it is called bearish market. Moreover, a more intense rising or descending trend indicates a more typical bullish or bearish market. The capability of a K-line patter for predicting the bullish market and bearish market is defined in formulas (17) and (18), respectively.
It is noted that the concepts of "moving average" and "K-line trend" are defined by the paper [6], while the concept of stock price trend is firstly defined by the paper.
2.1.3. K-Line Patterns
Many K-line patterns have been mined up to now, as shown in literatures [4-6]. Limited by space, only the patterns of TIU and TID will be introduced in the next content. Let KS=(Dt ,Dt+1 ,Dt+2 ) represent a three-day K-line series.
The conditions of KS becoming the TIU pattern are as follows: (1) Dt is a downtrend, and Ct <Ot . (2) Ct+1 >Ot+1 , Ot ≥Ct+1 >Ct , and Ot ≥Ot+1 >Ct , where at most one of the two equalities holds. That is, the second day t+1 is Yang line and must be contained with the body of the first day. (3) Ct+2 >Ot+2 ; Ct+2 >Ot . That is, the third day t+2 is Yang line and closes above the open of the first day. A standard TIU pattern is shown in Figure 3(a).
Figure 3: THE standard K-line series chart of TIU and TID.
(a) TIU
[figure omitted; refer to PDF]
(b) TID
[figure omitted; refer to PDF]
The predictive power of TIU pattern from the existing literature is that TIU is a trend-reversal pattern, which gives the bullish market signal. This means when the TIU pattern appears, the stock prices will be likely to be transferred from downtrend into uptrend or the stock market would be changed from bearish market to bullish market, and the stock prices would rise gradually.
The conditions of KS becoming the TID pattern are as follows: (1) Dt is an uptrend, and Ct >Ot . (2) Ct+1 <Ot+1 , Ct ≥Ct+1 >Ot , and Ct ≥Ot+1 >Ot , where at most one of the two equalities holds. That is, the second day t+1 is Yin line and must be contained within the body of the first day. (3) Ct+2 <Ot+2 ; Ct+2 <Ot . That is, the third day t+2 is Yin line and its close is lower than the first day's open. A standard TID pattern is shown in Figure 3(b).
The predictive power of TID pattern from the existing literature is that TID is a trend-reversal pattern, which gives the bearish market signal. That means, after the TID pattern appears, the stock prices will be likely to be transferred from uptrend into downtrend or the stock market would be changed from bullish market to bearish market, and the stock prices would fall gradually.
2.2. Similarity Match of K-Line Series
The similarity match of K-line series is an essential and basis task for K-line series clustering. In the literature, however, there are few papers focusing on the similarity match of K-line series. Only paper [20] studies the similarity match method and search algorithm of K-line series using image retrieval technology. In addition, paper [19, 21] proposes the similarity match model of K-line series based on the traditional Euclidean distance.
From the view of stock prediction, the K-line series' similarity refers to the trend similarity of K-line in the K-line series. However, the K-line trend is determined by the close price change, open price change, high price change, low price change, and the size relationship between close price and open price. Therefore, if we want to match the similarity between two K-line series, we should calculate the similarity of K-line price changes instead of the similarity of price values. As the changes of K-line price are not shown in the K-line chart, K-line prices distance rather than K-line price changes distance is used in the similarity match model of literature [19-21]. This means that these match models belong to similarity match methods based on K-line price values rather than K-line price changes. Therefore, they cannot accurately measure the similarity of stock prices trend in the K-line series.
For example, assuming that there are two K-line series KSi and KSj needed to match their similarity, where KSi =KSj and Simi,j indicate their similarity. Let Cti =10, Ct+1i =10.5, Ctj =20, Ct+1j =21, and RCit+1,t indicate the close price change rate of KSi at day t+1, which is calculated by (Ct+1i -Cti )/Cti , SRCi,jt+1,t denotes the similarity between RCit+1,t and RCit+1,t , then RCit+1,t =RCjt+1,t =5%, and SRCi,jt+1,t =1. We cannot calculate the correct result of SRCi,jt+1,t =1 by the similarity match model in literature [19-21]. Similarly, the same problems would occur for calculating the similarity of open price, high price, or low price.
Therefore, this paper proposes a new similarity match model based on K-line price changes to measure the trend similarity between two K-line series. In this model, the similarity of K-line series is composed of two parts: one is the shape similarity of K-line, which is the similarity of the corresponding K-line's shape features in the two K-line series; the other is the position similarity of K-line, which is the similarity of the corresponding K-line's position features in the two K-line series. Therefore, this paper will define K-line series' shape similarity model and position similarity model, respectively. Then based on these two kinds of similarity models, the similarity model of the entire K-line series could be built.
2.2.1. The Shape Similarity of K-Line Series
According to the shape feature of K-line, this paper proposes using the shape distance to measure the shape similarity between two K-lines. Firstly, based on the shape structure of K-line, three components of K-line shape are extracted: the upper shadow shape, the lower shadow shape, and the body shape. Secondly, the similarity match methods of three shapes are defined, respectively. Finally, the shape similarity of K-line can be calculated by summing the three shapes' similarity. Assuming that Dti and Dtj denote the t -th day's K-line of KSi and KSj , respectively, the shape similarity model of K-line series is defined as follows:
( 1 ) Let USi (t) denote the upper shadow length of Dti , as defined in the following formula: [figure omitted; refer to PDF] where Ct-1i [low *]0.1 is used to normalize the upper shadow length. According to the related regulation of Chinese A-share market, the range of daily fluctuations of stock prices cannot exceed 10% of the previous day's close price. So Ct-1i [low *]0.1 can be used to normalize the length of the K-line's upper shadow, lower shadow, and body.
Let SimUSi,j (t) denote the upper shadow similarity between Dti and Dtj , as defined by [figure omitted; refer to PDF]
( 2 ) Let LSi (t) denote the lower shadow length of Dti , as defined in the following formula: [figure omitted; refer to PDF]
Let SimLSi,j (t) denote the lower shadow similarity between Dti and Dtj , as defined by [figure omitted; refer to PDF]
( 3 ) Let Bi (t) denote the body length of Dti , as defined in the following formula: [figure omitted; refer to PDF]
Let SimBodyi,j (t) denote the body similarity between Dti and Dtj , as defined by [figure omitted; refer to PDF]
( 4 ) Let SimSi,j (t) denote the shape similarity between Dti and Dtj , as defined by [figure omitted; refer to PDF] where wBody , wUS , and wLS represent the weight of SimBodyi,j (t), SimUSi,j (t), and SimLSi,j (t), respectively.
( 5 ) Let SSimi,j denote the shape similarity between KSi and KSj , as defined by [figure omitted; refer to PDF] where wSt represent the weight of SimSi,j (t). Thanks to the idea that each K-line can be given different weight, the K-line series having special shape features could be identified well.
2.2.2. The Position Similarity of K-Line Series
For computing the similarity between two K-line series, we not only consider the shape similarity of K-line series but also the position similarity. If we only consider the shape similarity, then it will cause the problem that two K-line series having same shape features but different position features will have the same similarity.
For example, supposing that the K-line series chart of KSi and KSj is shown in Figure 2, we can see that, according to the shape feature definition of K-line, all of the corresponding K-lines of KSi and KSj have the same shape features. These mean that KSi and KSj have identical shape features; that is, SSimi,j =1. However, as is vividly shown in Figure 4, the relative positions of D2i and D2j are different though D1i and D1j have the same relative position in the K-line series. Therefor the stock price's overall trend of KSi and KSj are not identical, that is, Simi,j <1. If we only consider the shape similarity, we will draw the wrong conclusion that SSimi,j =Simi,j =1.
Figure 4: K-line series chart.
(a) K S i
[figure omitted; refer to PDF]
(b) K S j
[figure omitted; refer to PDF]
To solve this problem, the concept of K-line coordinate is introduced hoping to implement the position match of K-line by defining K-line's coordinate in the K-line series. In this paper, the sequence of K-line in the K-line series is called x coordinate of K-line; the increase range of close price is called y coordinate of K-line; in addition the first K-line's y coordinate is set to 1 in the K-line series. Therefore, the position similarity model of K-line series based on K-line coordinate is defined as follows.
( 1 ) Let (xti ,yti ) denote the coordinate of Dti , which are defined in the following formula: [figure omitted; refer to PDF]
Let SimPi,j (t) denote the position similarity between Dti and Dtj , as defined by [figure omitted; refer to PDF]
( 2 ) Let PSimi,j denote the position similarity between KSi and KSj , as defined by [figure omitted; refer to PDF] where wPt represents the weight of SimPi,j (t). Thanks to the idea that each K-line can be given different weight, the K-line series having special coordinates could be identified well.
2.2.3. The Similarity of K-Line Series
Finally, based on the shape similarity and position similarity, the similarity of K-line series could be obtained. Therefore, the similarity match model between KSi and KSj is defined by [figure omitted; refer to PDF] where wS and wP represent the shape similarity weight and position similarity weight of K-line series, respectively.
2.3. Cluster of K-Line Series
The more accurate classification result of K-line patterns can be gotten by clustering them using the nearest neighbor-clustering algorithm based on the similarity match model of K-line series. The K-line series' nearest neighbor-clustering algorithm (KNNCA) is described as shown in Algorithm 1.
Algorithm 1
Input :
KSet=(KS1 ,KS2 ,...,KSn ) // the data set of K-line series
θ // Similarity threshold
Output :
CSet // the set of clusters
KNNCA Algorithm :
Assign initial value for parameters: wS ,wP ,wBody ,wUS ,wLS ,wSt ,wPt ;
m=1;
Qm =(KS1 ); // Qm represents the m -th cluster
CSet=(Qm );
FOR i=2 TO n DO
( )
SimMax=0;
FOR EACH Qitem IN CSet
FOR EACH KSj IN Qitem
Get Simi,j based on formula (15);
If (Simi,j >SimMax)
( )
SimMax=Simi,j ;
f=item; // f represents the ID of a cluster whose element is most similar to KSi
( )
End
End
IF (SimMax>θ) THEN
Qf =Qf ∪KSi ;
ELSE
( )
m=m+1;
Qm =(KSi );
( )
( )
CSet=(Q1 ,Q2 ,...,Qm );
In addition, (Qm ) represents the number of elements in Qm . As each K-line series will be matched once with all of the K-line series stored in the cluster, the time complexity and space complexity of KNNCA are both O(n2 ).
3. Mining of Patterns' Predictive Power
We can mine and analyze the patterns' predictive power according to the following steps.
( 1) Pattern Recognition . Based on the definition of K-line patterns, we identify all the K-line series belonging to a pattern (such as TIU or TID), and then they form a set (KSet).
( 2) Pattern Clustering . We use the KNSSC algorithm to cluster KSet; then the set of clusters (CSet) can be gotten, in which different clusters represent the same pattern's different shapes.
( 3) Knowledge Mining . We define some statistical indicators about stock prices, which we use to mine stock prediction knowledge from each cluster.
The pattern's predictive power is gotten primarily by analyzing the trend of the pattern's consequent K-line series. Paper [22] found that K-line technology is suited for short-term investment prediction and that the most efficient time period for prediction is 10 days. Therefore, we mainly analyze the close price trend of the pattern's consequent K-line series in 10 days. Let KS=(Dt ,Dt+1 ,Dt+2 ) denote a three-day K-line pattern; its consequent K-line series is denoted by CKS. The statistical indicators of CKS are defined as follows.
(a) Let Ck denote the k -th close price of CKS, let PU (Ck ) denote that the probability of the trend of Ck is uptrend, and let PD (Ck ) denote that the probability of the trend of Ck is downtrend. PU (Ck ) and PD (Ck ) are calculated by [figure omitted; refer to PDF] where (QUm,k ) represents the number of patterns meeting the condition of Ck ≥Ct+2 in Qm , (QDm,k ) represents the number of patterns meeting the condition of Ck <=Ct+2 in Qm , and Ct+2 represents the close price of Dt+2 . PU (Ck )>0.5 indicates that the future trend of Ct is rising. PD (Ck )>0.5 indicates that the future trend of Ck is descending.
(b) Let PUn ∈(0,1) denote the probability that the close price will rise in the next n days if the pattern appears. PDn ∈(0,1) denotes the probability that the close price will fall in the next n days if the pattern appears. PUn and PDn are calculated by [figure omitted; refer to PDF] where a higher value of PUn or PDn indicates a stronger capability for predicting bullish or bearish market.
( 4) Analysis . Based on the statistical result, we analysis the pattern's predictive power.
4. Experiment and Result Analysis
4.1. Experiment Data and Method
As Yahoo provides the finance stock API used to download the transaction data of Chinese stock market, the stock transaction data of Chinese A-share market in any time can be acquired based on the API. To get a representative testing data, we select the K-line series data of Shanghai 180 index component stocks over the latest 10 years (from 2006-01-04 to 2016-08-24) as the test data. Limited by space, only the TIU and TID pattern's predictive power will be analyzed in the experiment. And the parameters of KNSSC algorithm are set as follows: θ=0.75, wS =0.2, wT =0.8, wBody =0.6, wUS =0.2, wLS =0.2, and wSt =wPt =1/(KSi ) (t=1,2,...,(KSi )).
4.2. Experiment One
The aim of the first experiment is to analyze the TIU pattern's predictive power based on the method defined in Section 3. Firstly, based on the definition of TIU, 1516 TIU patterns are identified from the test data. Then we cluster these patterns using the KNSSC algorithm, and finally 554 clusters are obtained. We choose the top 20 clusters with the most elements to conduct statistical analysis, as shown in Table 1.
Table 1: The experiment result of TIU.
Q m | ( Q m ) | P U ( C 1 ) | P U ( C 2 ) | P U ( C 3 ) | P U ( C 4 ) | P U ( C 5 ) | P U ( C 6 ) | P U ( C 7 ) | P U ( C 8 ) | P U ( C 9 ) | P U ( C 10 ) | P U 10 |
Q 0 | 1516 | 0.493 | 0.486 | 0.518 | 0.495 | 0.475 | 0.488 | 0.481 | 0.479 | 0.484 | 0.479 | 0.1 |
Q 1 | 79 | 0.405 | 0.532 | 0.557 | 0.532 | 0.544 | 0.506 | 0.468 | 0.506 | 0.43 | 0.443 | 0.6 |
Q 2 | 42 | 0.405 | 0.429 | 0.5 | 0.452 | 0.405 | 0.452 | 0.429 | 0.476 | 0.524 | 0.476 | 0.1 |
Q 3 | 32 | 0.5 | 0.594 | 0.594 | 0.625 | 0.594 | 0.531 | 0.625 | 0.594 | 0.563 | 0.531 | 0.9 |
Q 4 | 25 | 0.6 | 0.6 | 0.52 | 0.48 | 0.48 | 0.4 | 0.48 | 0.48 | 0.6 | 0.64 | 0.5 |
Q 5 | 23 | 0.435 | 0.478 | 0.609 | 0.565 | 0.522 | 0.391 | 0.478 | 0.348 | 0.391 | 0.391 | 0.3 |
Q 6 | 22 | 0.455 | 0.636 | 0.545 | 0.5 | 0.545 | 0.545 | 0.591 | 0.591 | 0.636 | 0.636 | 0.8 |
Q 7 | 21 | 0.333 | 0.571 | 0.524 | 0.476 | 0.619 | 0.857 | 0.762 | 0.714 | 0.429 | 0.048 | 0.6 |
Q 8 | 21 | 0.524 | 0.476 | 0.524 | 0.429 | 0.476 | 0.476 | 0.429 | 0.476 | 0.429 | 0.429 | 0.2 |
Q 9 | 21 | 0.524 | 0.476 | 0.524 | 0.429 | 0.381 | 0.286 | 0.381 | 0.429 | 0.381 | 0.429 | 0.2 |
Q 10 | 19 | 0.105 | 0.368 | 0.421 | 0.421 | 0.368 | 0.421 | 0.474 | 0.474 | 0.579 | 0.421 | 0.1 |
Q 11 | 19 | 0.579 | 0.474 | 0.421 | 0.368 | 0.263 | 0.368 | 0.316 | 0.263 | 0.211 | 0.211 | 0.1 |
Q 12 | 17 | 0.588 | 0.294 | 0.471 | 0.353 | 0.412 | 0.471 | 0.471 | 0.412 | 0.471 | 0.412 | 0.1 |
Q 13 | 15 | 0.6 | 0.533 | 0.467 | 0.467 | 0.467 | 0.467 | 0.533 | 0.533 | 0.6 | 0.6 | 0.6 |
Q 14 | 15 | 0.533 | 0.6 | 0.667 | 0.6 | 0.533 | 0.6 | 0.6 | 0.667 | 0.6 | 0.6 | 1 |
Q 15 | 14 | 0.5 | 0.071 | 0.5 | 0.429 | 0.429 | 0.357 | 0.286 | 0.286 | 0.286 | 0.429 | 0 |
Q 16 | 14 | 0.643 | 0.571 | 0.714 | 0.714 | 0.714 | 0.5 | 0.643 | 0.571 | 0.571 | 0.714 | 0.9 |
Q 17 | 14 | 0.714 | 0.857 | 0.857 | 0.857 | 0.786 | 0.786 | 0.571 | 0.643 | 0.643 | 0.643 | 1 |
Q 18 | 14 | 0.357 | 0.571 | 0.429 | 0.5 | 0.357 | 0.571 | 0.429 | 0.429 | 0.5 | 0.5 | 0.2 |
Q 19 | 13 | 0.385 | 0.462 | 0.692 | 0.615 | 0.615 | 0.692 | 0.692 | 0.769 | 0.769 | 0.769 | 0.8 |
Q 20 | 12 | 0.5 | 0.667 | 0.667 | 0.667 | 0.75 | 0.583 | 0.583 | 0.667 | 0.583 | 0.583 | 0.9 |
In Table 1, Q0 represents the cluster composed of 1516 TIU Patterns. Its PU10 is only 0.5 which means that it may be a spurious pattern to predict bullish market. However, after further classifying the TIU patterns, we can see that (1) Q3 , Q6 , Q16 , and so forth have a strong capability for predicting bullish market, because their PU10 both are above 0.8, (2) Q1 , Q4 , Q7 , and so forth have a moderate capability for predicting bullish market, as their PU10 are only in 0.5~0.7, and (3) Q2 , Q5 , Q8 , and so forth have a weak capability for predicting bullish market, as their entirePU10 are below 0.5. Particularly for Q2 , its PU10 is only 0.1.
By comparing the predictive power of Q3 and Q2 , as shown in Figure 5, we can see that the predictive result of Q3 is bullish market while that of Q2 is bearish market, which means that their predictive power is opposite. The result of experiment one shows that (1) the predictive power of TIU varies a great deal for different shapes and (2) to be a better pattern for predicting bullish market, the TIU pattern badly needs to be further classified, which are consistent with the expected analysis.
Figure 5: The comparison of predictive power between Q3 and Q2 .
[figure omitted; refer to PDF]
4.3. Experiment Two
The aim of the second experiment is to analyze the TID pattern's predictive power based on the method defined in Section 3. Firstly, based on the definition of TID, 1498 TID patterns are identified from the test data. Then we cluster these patterns using the KNSSC algorithm, and finally 572 clusters are obtained. We choose the top 20 clusters with the most elements to conduct statistical analysis, as shown in Table 2.
Table 2: The experiment result of TID.
Q m | ( Q m ) | P U ( C 1 ) | P U ( C 2 ) | P U ( C 3 ) | P U ( C 4 ) | P U ( C 5 ) | P U ( C 6 ) | P U ( C 7 ) | P U ( C 8 ) | P U ( C 9 ) | P U ( C 10 ) | P D 10 |
Q 0 | 1498 | 0.421 | 0.451 | 0.461 | 0.449 | 0.437 | 0.449 | 0.459 | 0.444 | 0.427 | 0.435 | 0 |
Q 1 | 78 | 0.449 | 0.5 | 0.436 | 0.436 | 0.372 | 0.359 | 0.449 | 0.397 | 0.359 | 0.346 | 0.1 |
Q 2 | 72 | 0.431 | 0.486 | 0.542 | 0.458 | 0.431 | 0.431 | 0.417 | 0.417 | 0.361 | 0.306 | 0 |
Q 3 | 44 | 0.364 | 0.364 | 0.364 | 0.386 | 0.273 | 0.386 | 0.364 | 0.386 | 0.386 | 0.341 | 0.1 |
Q 4 | 39 | 0.513 | 0.359 | 0.359 | 0.333 | 0.231 | 0.282 | 0.308 | 0.282 | 0.282 | 0.282 | 0 |
Q 5 | 23 | 0.174 | 0.261 | 0.304 | 0.391 | 0.348 | 0.391 | 0.391 | 0.435 | 0.435 | 0.391 | 0.1 |
Q 6 | 23 | 0.348 | 0.478 | 0.522 | 0.478 | 0.435 | 0.435 | 0.391 | 0.478 | 0.478 | 0.435 | 0.4 |
Q 7 | 21 | 0.429 | 0.429 | 0.476 | 0.429 | 0.619 | 0.524 | 0.524 | 0.476 | 0.476 | 0.571 | 0 |
Q 8 | 18 | 0.444 | 0.333 | 0.444 | 0.5 | 0.5 | 0.278 | 0.222 | 0.333 | 0.333 | 0.333 | 0.2 |
Q 9 | 18 | 0.333 | 0.667 | 0.611 | 0.389 | 0.444 | 0.444 | 0.389 | 0.389 | 0.389 | 0.389 | 0.1 |
Q 10 | 18 | 0.333 | 0.222 | 0.278 | 0.167 | 0.278 | 0.389 | 0.444 | 0.389 | 0.5 | 0.556 | 0 |
Q 11 | 18 | 0.444 | 0.444 | 0.278 | 0.333 | 0.389 | 0.222 | 0.278 | 0.333 | 0.444 | 0.389 | 0.5 |
Q 12 | 16 | 0.438 | 0.375 | 0.5 | 0.563 | 0.688 | 0.688 | 0.625 | 0.5 | 0.438 | 0.563 | 0.2 |
Q 13 | 15 | 0.267 | 0.267 | 0.467 | 0.4 | 0.267 | 0.6 | 0.533 | 0.467 | 0.4 | 0.4 | 0 |
Q 14 | 13 | 0.385 | 0.308 | 0.308 | 0.385 | 0.308 | 0.462 | 0.231 | 0.231 | 0.308 | 0.308 | 0.4 |
Q 15 | 12 | 0.25 | 0.333 | 0.583 | 0.667 | 0.667 | 0.583 | 0.5 | 0.5 | 0.417 | 0.5 | 0.1 |
Q 16 | 12 | 0.5 | 0.333 | 0.25 | 0.25 | 0.5 | 0.667 | 0.417 | 0.5 | 0.5 | 0.5 | 0.5 |
Q 17 | 12 | 0.25 | 0.833 | 0.75 | 0.667 | 0.583 | 0.5 | 0.5 | 0.417 | 0.417 | 0.583 | 0.4 |
Q 18 | 11 | 0.364 | 0.273 | 0.273 | 0.364 | 0.545 | 0.455 | 0.545 | 0.545 | 0.455 | 0.545 | 0 |
Q 19 | 11 | 0.364 | 0.455 | 0.273 | 0.364 | 0.364 | 0.364 | 0.455 | 0.455 | 0.455 | 0.364 | 0 |
Q 20 | 11 | 0.273 | 0.364 | 0.273 | 0.364 | 0.364 | 0.364 | 0.364 | 0.364 | 0.182 | 0.273 | 0 |
Similarly, the TID pattern may be also a spurious pattern to predict bearish market because PD10 of Q0 is 0, where Q0 represents the cluster composed of 1498 TID patterns. Moreover, after further classifying the TID patterns, we can see that (1) except for Q11 and Q16 , almost all of the clusters have a weak capability for predicting bearish market, as their entire PD10 are below 0.5 and (2) even Q11 and Q16 still not have a higher value of PD10 , which are only 0.5. Therefore, we can consider that the TID pattern is definitely a spurious pattern, which is also consistent with the expected analysis.
4.4. Experiment Conclusion
Through the above experiment, we can draw the following conclusion. (1) The predictive power of a pattern varies a great deal for different shapes. Take TIU; for example, some shapes' TIU patterns have a strong capability for predicting bullish market, while some others have the opposite predictive power. Therefore, to analyze the predictive power of a pattern, we should make a concrete analysis of concrete shapes. (2) There are definitely some spurious patterns in the existing K-line patterns. Therefore, in order to improve the stock prediction performance based on K-line patterns, we need to further classify the existing patterns based on the shape feature, identify all the spurious patterns, and choose the patterns having stronger predictive power to predict the stock price.
5. Conclusion
Stock prediction is a popular research field in the time series prediction. As a primary technology analysis method of stock prediction, there is different option on the stock price prediction based on K-line patterns in the academic world, though it is widely used in reality. To help resolve the debate, this paper uses the data mining method, like pattern recognition, similarity match, cluster and statistical analysis, and so forth, to study the predictive power of K-line patterns. Experimental results show that one reason for the debate is that the definition of K-line patterns is more open and lack of mathematical rigor. The other is that there are some spurious patterns in the existing K-line patterns. In addition, the method presented in the paper can be used not only to test the predictive power of patterns but also for K-line patterns mining and stock prediction. Therefore, the future works as follows. (1) It will be a necessary and significant task to identify the entire spurious pattern using the proposed method. (2) On the basis of the proposed method, we can research an automatic pattern mining method to discover more useful patterns for stock prediction.
Acknowledgments
The Key Basic Research Foundation of Shanghai Science and Technology Committee, China (Grant no. 14JC1402203), and the Science and Technology Support Program of China (Grant no. 2015BAF10B01) financially supported this work.
[1] J. Lin, S. Williamson, K. D. Borne, "Advances in machine learning & data mining for astronomy,", Pattern Recognition in Time Series , pp. 617-645, 2010.
[2] L. Lihui, T. Xiang, Y. Haidong, "Financial time series forecasting based on SVR,", Computer Engineering and Applications , vol. 41, no. 30, pp. 221-224, 2005.
[3] R. D. Edwards, J. Magee, W. H. C. Bassetti, Technical Analysis of Stock Trends , CRC Press, Boca Raton, Fla, USA, 2012. 10th.
[4] S. Nison, Japanese Candlestick Charting Techniques: A Contemporary Guide to the Ancient Investment , of Technique of the Far East, Institute of Finance, New York, NY, USA, 1991.
[5] B. R. Marshall, M. R. Young, L. C. Rose, "Candlestick technical trading strategies: can they create value for investors?,", Journal of Banking and Finance , vol. 30, no. 8, pp. 2303-2323, 2006.
[6] G. Caginalp, H. Laurent, "The predictive power of price patterns,", Applied Mathematical Finance , vol. 5, no. 3-4, pp. 181-205, 1998.
[7] K. H. Lee, G. S. Jo, "Expert system for predicting stock market timing using a candlestick chart,", Expert Systems with Applications , vol. 16, no. 4, pp. 357-364, 1999.
[8] M. M. Goswami, C. K. Bhensdadia, A. P. Ganatra, "Candlestick analysis based short term prediction of stock price fluctuation usingSOM-CBR," in Proceedings of the 2009 IEEE International Advance Computing Conference (IACC '09), pp. 1448-1452, March 2009.
[9] T. H. Lu, J. Chen, Candlestick charting in European stock markets , no. 2, pp. 20-25, 2013.
[10] T. H. Lu, Y. M. Shiu, "Tests for two day candlestick patterns in the emerging equity market of Taiwan,", Emerging Markets Finance & Trade , vol. 48, no. 1, pp. 41-57, 2014.
[11] H. Li, W. W. Y. Ng, J. W. T. Lee, B. Sun, D. S. Yeung, "Quantitative study on candlestick pattern for shenzhen stock market," in Proceedings of the 2008 IEEE International Conference on Systems, Man and Cybernetics, (SMC '08), pp. 54-59, October 2008.
[12] W. Xiao, W. W. Y. Ng, M. Firth, D. S. Yeung, G.-Y. Cai, J.-C. Li, B.-B. Sun, "L-GEM based MCS aided candlestick pattern investment strategy in the shenzhen stock market," in Proceedings of the 2009 International Conference on Machine Learning and Cybernetics, pp. 243-248, July 2009.
[13] T. Kamo, C. Dagli, "Hybrid approach to the Japanese candlestick method for financial forecasting,", Expert Systems with Applications , vol. 36, no. 3, pp. 5023-5030, 2009.
[14] M. Jasemi, A. M. Kimiagari, A. Memariani, "A modern neural network model to do stock market timing on the basis of the ancient investment technique of Japanese Candlestick,", Expert Systems with Applications , vol. 38, no. 4, pp. 3884-3890, 2011.
[15] S. Barak, J. H. Dahooie, T. Tichý, "Wrapper ANFIS-ICA method to do stock market timing and feature selection on the basis of Japanese Candlestick,", Expert Systems with Applications , vol. 42, no. 23, pp. 9221-9235, 2015.
[16] J. H. Fock, C. Klein, B. Zwergel, "Performance of candlestick analysis on intraday futures data,", The Journal of Derivatives , vol. 13, no. 1, pp. 28-40, 2005.
[17] B. R. Marshall, M. R. Young, R. Cahan, "Are candlestick technical trading strategies profitable in the Japanese equity market?,", Review of Quantitative Finance and Accounting , vol. 31, no. 2, pp. 191-207, 2008.
[18] M. J. Horton, "Stars, crows, and doji: the use of candlesticks in stock selection,", Quarterly Review of Economics and Finance , vol. 49, no. 2, pp. 283-294, 2009.
[19] L. J. Chmielewski, M. Janowicz, A. Orlowski, "Prediction of trend reversals in stock market by classification of Japanese candlesticks," in Proceedings of the 9th International Conference on Computer Recognition Systems (CORES '15), vol. 403, pp. 641-647
[20] C.-F. Tsai, Z.-Y. Quan, "Stock prediction by searching for similarities in candlestick charts,", ACM Transactions on Management Information Systems , vol. 5, no. 2, 2014.
[21] L. Chmielewski, M. Janowicz, J. Kaleta, A. Orlowski, "Pattern recognition in the Japanese candlesticks,", Advances in Intelligent Systems and Computing , vol. 342, pp. 227-234, 2015.
[22] G. L. Morris, R. Litchfield, Candlestick Charting Explained: Timeless Techniques for Trading Stocks and Futures , McGraw-Hill, New Yourk, NY, USA, 2006. 2nd.
You have requested "on-the-fly" machine translation of selected content from our databases. This functionality is provided solely for your convenience and is in no way intended to replace human translation. Show full disclaimer
Neither ProQuest nor its licensors make any representations or warranties with respect to the translations. The translations are automatically generated "AS IS" and "AS AVAILABLE" and are not retained in our systems. PROQUEST AND ITS LICENSORS SPECIFICALLY DISCLAIM ANY AND ALL EXPRESS OR IMPLIED WARRANTIES, INCLUDING WITHOUT LIMITATION, ANY WARRANTIES FOR AVAILABILITY, ACCURACY, TIMELINESS, COMPLETENESS, NON-INFRINGMENT, MERCHANTABILITY OR FITNESS FOR A PARTICULAR PURPOSE. Your use of the translations is subject to all use restrictions contained in your Electronic Products License Agreement and by using the translation functionality you agree to forgo any and all claims against ProQuest or its licensors for your use of the translation functionality and any output derived there from. Hide full disclaimer
Copyright © 2017 Lv Tao et al. This is an open access article distributed under the Creative Commons Attribution License, which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly cited.
Abstract
Stock price prediction based on K-line patterns is the essence of candlestick technical analysis. However, there are some disputes on whether the K-line patterns have predictive power in academia. To help resolve the debate, this paper uses the data mining methods of pattern recognition, pattern clustering, and pattern knowledge mining to research the predictive power of K-line patterns. The similarity match model and nearest neighbor-clustering algorithm are proposed for solving the problem of similarity match and clustering of K-line series, respectively. The experiment includes testing the predictive power of the Three Inside Up pattern and Three Inside Down pattern with the testing dataset of the K-line series data of Shanghai 180 index component stocks over the latest 10 years. Experimental results show that (1) the predictive power of a pattern varies a great deal for different shapes and (2) each of the existing K-line patterns requires further classification based on the shape feature for improving the prediction performance.
You have requested "on-the-fly" machine translation of selected content from our databases. This functionality is provided solely for your convenience and is in no way intended to replace human translation. Show full disclaimer
Neither ProQuest nor its licensors make any representations or warranties with respect to the translations. The translations are automatically generated "AS IS" and "AS AVAILABLE" and are not retained in our systems. PROQUEST AND ITS LICENSORS SPECIFICALLY DISCLAIM ANY AND ALL EXPRESS OR IMPLIED WARRANTIES, INCLUDING WITHOUT LIMITATION, ANY WARRANTIES FOR AVAILABILITY, ACCURACY, TIMELINESS, COMPLETENESS, NON-INFRINGMENT, MERCHANTABILITY OR FITNESS FOR A PARTICULAR PURPOSE. Your use of the translations is subject to all use restrictions contained in your Electronic Products License Agreement and by using the translation functionality you agree to forgo any and all claims against ProQuest or its licensors for your use of the translation functionality and any output derived there from. Hide full disclaimer