Abstract: Large data archives are now kept in electronic banking and financial institutions. The large size makes it impossible to analyze their data by man and this has led to some models to help in the decision making process. This paper discusses the areas that can be applied, namely: risk management, credit risk, financial risk, etc. in which case the techniques used for data mining can be used in banks and financial institutions to improve their business.
Keywords: bank, CRM, data mining, financial institutions, system.
1. Introduction
The most important and valuable assets for a profit are represented by data, and data mining allows knowledge extraction of known data, specifying future values at the same time helping to optimize business decisions to be taken in business. Data mining techniques can help solve the problems in the financial and banking institutions in the development of patterns of correlations between information so that the results will be evident because the volume of data is large.
In addition to data mining techniques there is also Business Intelligence that can identify customers and products on which rules can be set so as to better revenue management1.
Data mining tools used in large databases can facilitate:
* automatic discovery of patterns known
* trends and behavior data
The IT sector has helped financial and banking sectors to cope with all the challenges in the economy. We know that banks have a strong relationship with customers becoming an important factor for their success.2
The categories to which Data Mining and Business Intelligence can be applied can be seen as follows:3
* Management portfolio;
* Management customer relationships. Banking financial institutions have large database containing customer information, information that can be extracted from these data stores. Data Mining can be used in all three stages of the customer relationship, as follows: a stage to attract customers, customer value growth stage and control stage4. The information collected is used for different purposes such as to perform marketing research or to analyze the market according to customer needs;5
* Risk managemet. Data Mining techniques help customers analysis: if they make payments or delay in the rate on credit cards, they can distinguish borrowers who are involved or if they offer a bad loan, etc.
* Financial risk. The risk generally can be defined as the result of an accident which causes a trader adverse consequences. It can be said that the result of any banking transactions is subject to potential risk. The most important are: currency risk, equity risk, credit risk, interest rate risk, etc.
* Credit risk. One of the main components in the lending process is the credit risk. Credit risk is the risk that represents unrealized losses or profits, so it is achieved due to non-fulfillment of contractual obligations or due to external causes to the contract.
2. Algorithms and Data Mining techniques used
The algorithms used in data mining can solve various problems that can be solved and modeled and the functions used are divided into two categories:
a. Supervised learning has the following features:
* it contains input data and their responses;
* it is equivalent to optimizing an error function measuring the difference between the responses it must produce and those they produce;
The algorithms used are:
* Decision trees are used in Classification. Decision trees are used in selecting the best direction when uncertainty appears. Algorithms are used in data mining to business problems.
A decision tree consists of:
* decision points;
* points of opportunity;
* natural conditions;
* gains.
* Generalized linear models used in Classification and Regression;
* Naive Bayes used in classification. Bayes technique is less widely implemented in data mining and the technique is a classification method which is based on the name of Thomas Bayes (1702-1761). This technique allows to make an analysis between independent and dependent variables using the probability theory for each of the relationships.
* Support Vector Machine used in Classification and Regression. This algorithm has been introduced in 92 by Böser, Guyon & Vapnik and has become from then until now the most popular since it is easy to be used and solves a great variety of problems
b. Unsupervised learning has the following features:
* it contains only data entry;
* it is based on statistical properties of the data;
* it is based on a conceptual model extracted from the data but not on the concept of the error function;
The algorithms used are:6
* K - Means. This algorithm is one of the simplest algorithms that solve problems related to the group. In this algorithm there are a number of clusters k (center) established a priori, cluster that is found in each group , and its purpose is to minimize the objective function knowing that the function square error is given by:
(ProQuest: ... denotes formula omitted.)
where:
|| xi - vj || is the Euclidean distance between xi and vj
cj is the number of points
c is the number of clusters
This algorithm has the following advantages:
* it is easy to understand and quick;
* when datasets are separated from each other or are separate , then they provide the best results;
It has the following disadvantages:
* the number of clusters must be stated;
* if there are two or more classes overlapped, this algorithm does not solve and specifies the number of classes;
* Non - Negative Matrix factorization (NMF). It is used for decomposition of multivariate data. Here we meet two algorithms, i.e. minimizing the least squares and the other minimizes the divergence of Kullback - Leibler's theory.
* One Class Support Vector Machine.
Data Mining Techniques used are:
* association and succession. The association and correlation are used for finding items commonly used in large data sets. In this technique it is found if an event is or is not connected to another event. Association rules include: association multidimensional rule, quantitative association rule, direct association rule, indirect Association rule, Association rule on several levels7. This technique generates models evidencing rules of correlation between attributes set.
* clustering. The clustering is to identify objects that have similar classes. This technique combines transactions that hsve similar behavior in one group8. This technique is used for grouping similar entities from a data set showing major differences compared to a group .
* classification. Classification is the technique most used in Data Mining, starting with a set of data to develop a high model. In this type of analysis are well suited the applications for detecting the fraud and credit risk. The classification test data are used to estimate the accuracy of classification rules.910
3. Application concerning knowledge extraction from large data volumes
In order to carry out the application we started from the implementation of a warehouse in SQL Developer that is based on a questionnaire concerning the launch on the market of a bank credit card for students at Piraeus Bank. The application was made on classifications performed with the help of Oracle Data Mining, application called w_classification_2. In this application we selected V_DM_QUESTIONNAIRE table, the target WAS SC and CaseID: IT was CODE_CLIENT.
From the connection made, we selected Support Vector Machine, Decision Trees and Naïve Bayes after we saw the results that occurred.
From the graph designed for CLASS SVM we can notice that, if its performance is studied, the algorithm has the average accuracy of 8.3255% and the predictive confidence of 3.9601%.
From the costs matrix, for the SVM model we can notice that the average accuracy is 8.3255. For the prediction of 0% we have 45 instances which are related to the value 2578, also 45 instances related to the value 3,899, and again 45 instances related to the value 55,689, while for the prediction of 100% we have 45 instances related to the value 22,247,696.
From the above chart we can notice a high learning rate of the model within the quintiles ranging between 65 and 96 having cumulative records of 21.1742%. Basically, the concept of lift can be understood as a ratio of two percentages: the percentage of the positive correct classification of the model and the percentage of positive real classification resulting from the testing data.
From the graph designed for CLAS DT we can see that, if its performance is studied, the algorithm has the average accuracy of 59.0909% and the predictive confidence of 57.1429%.
From the matrix costs, for DT model we can see that the average accuracy is 59.0909.
For the prediction 0% we have 92 courts/instances which are related to the value 1245 and 45 instances related to the values 3899, 1255, 2235, 2456, 2578, 2733, 88 789 and 102 354, and for the prediction of 100% we have 36 instances related to the values 145, 457 and 5478, 45 instances related to the values 1455, 2433, 3899, 4578, 4755, 22543, 55 689, 74 665 855 and 22,247,696 and 74 instances related to the value 2245.
From the graph below we can observe a high learning rate of the model, an increase in the quintiles, having cumulative records of 21.1742% and the cumulative target density of 0.2045.
From the graph designed for CLAS NB we can see that, if its performance is studied, the algorithm has average accuracy of 79.1502% and 78.1573% the predictive confidence of78.1573% (Figure 6.9)
From the costs matrix (Figure 6.10), for DT model we can notice that the average accuracy is 59,0909.
For the prediction 0% we have 45 instances related to the values 1255, 2235, 2578, 5578, for the prediction of 41,3043% we have 92 instances related to the value 1245, and for the prediction of 100% we have instances related to the values 36 145, 457 and 5478, 45 instances related to the values 1455, 2433, 2456, 2733, 3899, 4578, 4755, 22543, 55689, 88789, 102354, 665855, 22247696 and 74 instances related to the values 2245.From the graph designed, we can notice a high learning rate of the model, an increase in the quintiles having cumulative records of 21.1742% and the cumulative target density of de 0.2045.
Conclusions
In order to extract information from data is used the tool Data Mining is used that allows us to make more accurate decisions. The data is analyzed in the whole banking system supporting decision-making.
Many industries, including banking, telecommunications, etc. use Data Mining because the applications utilized are used in detecting credit card fraud, predicting customer behavior in banking, etc.
The techniques used by Data Mining can help banking and financial institutions, namely: to detect fraud, to acquire new customers, to analyze existing models and achieve their market trend and new models to be launched.
Data Mining has applications in almost all fields and is therefore one of the most important frontiers between information and database systems with a promising trend in the field of information technology.
1 Dass Rajanish, Data mining in banking and finance: a note for bankers, Indian Institute of Management Ahmedabad.
2 B. Subashini, Dr. K. Chitra, Data Mining Techniques and its Applications in Banking Sector in International Journal of Emerging Technology and Advanced Engineering, Volume 3, Issue 8, August 2013.
3 J. M. Zytkow and W. Klösgen, Handbook of Data Mining and Knowledge Discovery. New York: Oxford, 2002;
4 Rajanish Dass, Data Mining in Banking and Finance: A Note for Bankers, Indian Institute of Management Ahmadabad, 2006;
5 S.S. Kaptan, New Concepts in Banking, Sarup and Sons, Edition, 2002;
6 Bharati M. Ramager, Data Mining Techniques And Applications in International Journal of Computer Science and Engineering, 2009;
7 Bharati M. Ramager, Data Mining Techniques And Applications in International Journal of Computer Science and Engineering, 2009;
8 Hillol Kargupta, Anupam Joshi, Krishnamoorthy Siva Kumar, Yelena Yesha, Data Mining: Next Generation Challenges and Future Directions, Publishers: Prentice-Hall of India, Private Limited, 2005;
9 S.S. Kaptan, N S Chobey, Indian Banking in Electronic Era, Sarup and Sons, Edition 2002;
10 S.S. Kaptan, New Concepts in Banking, Sarup and Sons, Edition, 2002.
REFERENCES
Dass Rajanish, (2006), Data mining in Banking and Finance: a Note for Bankers, Indian Institute of Management Ahmedabad;
Kargupta, Hillol, Joshi Anupam, Krishnamoorthy Siva Kumar, Yelena Yesha, (2005), Data Mining: Next Generation Challenges and Future Directions, Publishers: Prentice-Hall of India, Private Limited;
Kaptan, S.S., (2002), New Concepts in Banking, Sarup and Sons, Edition;
Kaptan, S.S., N S Chobey, (2002), Indian Banking in Electronic Era, Sarup and Sons, Edition;
Kaptan, S.S., (2002), New Concepts in Banking, Sarup and Sons, Edition.
Ramager, Bharati M., (20o9), Data Mining Techniques And Applications in International Journal of Computer Science and Engineering;
Subashini, B., Dr. Chitra K., (2013), Data Mining Techniques and its Applications in Banking Sector in International Journal of Emerging Technology and Advanced Engineering, (ISSN 2250-2459, ISO 9001:2008 Certified Journal, Volume 3, Issue 8, August 2013);
Zytkow, J.M. and Klösgen W., (2002), Handbook of Data Mining and Knowledge Discovery. New York: Oxford;
Ana-Maria Ramona Stancu,*
Mihaela Mocanu**
* PhD. student at The Bucharest University of Economic Studies.
** PhD. Lecturer, "Dimitrie Cantemir" Christian University, Bucharest.
You have requested "on-the-fly" machine translation of selected content from our databases. This functionality is provided solely for your convenience and is in no way intended to replace human translation. Show full disclaimer
Neither ProQuest nor its licensors make any representations or warranties with respect to the translations. The translations are automatically generated "AS IS" and "AS AVAILABLE" and are not retained in our systems. PROQUEST AND ITS LICENSORS SPECIFICALLY DISCLAIM ANY AND ALL EXPRESS OR IMPLIED WARRANTIES, INCLUDING WITHOUT LIMITATION, ANY WARRANTIES FOR AVAILABILITY, ACCURACY, TIMELINESS, COMPLETENESS, NON-INFRINGMENT, MERCHANTABILITY OR FITNESS FOR A PARTICULAR PURPOSE. Your use of the translations is subject to all use restrictions contained in your Electronic Products License Agreement and by using the translation functionality you agree to forgo any and all claims against ProQuest or its licensors for your use of the translation functionality and any output derived there from. Hide full disclaimer
Copyright Christian University Dimitrie Cantemir, Department of Education Mar 2016
Abstract
Large data archives are now kept in electronic banking and financial institutions. The large size makes it impossible to analyze their data by man and this has led to some models to help in the decision making process. This paper discusses the areas that can be applied, namely: risk management, credit risk, financial risk, etc. in which case the techniques used for data mining can be used in banks and financial institutions to improve their business.
You have requested "on-the-fly" machine translation of selected content from our databases. This functionality is provided solely for your convenience and is in no way intended to replace human translation. Show full disclaimer
Neither ProQuest nor its licensors make any representations or warranties with respect to the translations. The translations are automatically generated "AS IS" and "AS AVAILABLE" and are not retained in our systems. PROQUEST AND ITS LICENSORS SPECIFICALLY DISCLAIM ANY AND ALL EXPRESS OR IMPLIED WARRANTIES, INCLUDING WITHOUT LIMITATION, ANY WARRANTIES FOR AVAILABILITY, ACCURACY, TIMELINESS, COMPLETENESS, NON-INFRINGMENT, MERCHANTABILITY OR FITNESS FOR A PARTICULAR PURPOSE. Your use of the translations is subject to all use restrictions contained in your Electronic Products License Agreement and by using the translation functionality you agree to forgo any and all claims against ProQuest or its licensors for your use of the translation functionality and any output derived there from. Hide full disclaimer