This paper presents some of the utilities of using SAS code for analyze and forecast, in a pharmaceutical company. First, data must be cleaned, in order to obtain representative results. Once this stage done, the input can be evaluated and models can be created in SAS, to extract the most valuable information from the initial data. To sustain management decisions, SAS code allows creating different types of reports and has the capability to recode the initial variables into new ones, but keeping the most part of the information contained, through the technique called "principal components analysis". Another advantage of SAS code is its capability of classifying the initial variables into compact class. Based on scorecards and WOE (Weight of Evidence), notions used in banking analyses, it has been created a model which evaluates the suppliers of the company and ranks them, in order to find out what contracts must be continued and which contracts must be closed.
Keywords: Reports, Principal Components Analysis, Clusters, Scorecard, Weight of Evidence
(ProQuest: ... denotes formulae omitted.)
Introduction
Data analysis involves collecting, analyzing and presenting large amounts of data, in order to extract the most valuable information from the initial data and to discover patterns and trends which are used in decision-making process. This means that based on many information, it is important to know how to extract the essence from the available data and how to use it in order to improve the future activity and the results obtained.
Data analysis is a process that involves statistical algorithms or logical techniques to describe and evaluate data. Many of the analyses performed become an iterative process where data is systematically collected and analyzed simultaneously, especially because patterns are discovered through the entire data collection 0.
Subjecting data to statistical operations can relieve the relationship among variables and the level of trust the answers are getting.
Data analysis is used to process and filter the information contained in the data studied, in order to capture the essence or extract the information contained. Another purpose is to highlight this information in an essentially form, more intelligible, suggestive, simplified and synthesizers. Achieving this goal involves the development of a sequence of transformations performed on the raw data and involves the use of specific methods and techniques. These changes aim to maximize the relevance and interpretability of data and requires, among other things, to eliminate redundant or meaningless information and generality. From this point of view, data analysis appears to be a specific transformation process that takes as input the raw data and as output the information synthesized 0.
Began in the late 60's, for analyzing the results of the experiments in agriculture, SAS (Statistical Analysis System) became a powerful programming language, providing data mining tools for Web analysis and development, for a wide range of industries.
Nowadays, SAS solutions are used in more than 130 countries, covering areas such as Analytics, Business Intelligence, Customer Intelligence, Data Integration & ETL, Fraud Prevention & Detection, Risk Management, Supply Chain Management, Cloud etc. 0.
SAS can import different types of data and creates outputs in diverse formats, being a flexible programming language, which provides statistical, descriptive, inferential and predictive analysis.
The base products of SAS are: BASE SAS (data management), SAS / STAT (statistical analysis), SAS/ GRAPH (graphics), SAS/OR (optimization and simulation), SAS/ ETS (time series analysis), SAS/ QC (quality control) and many others.
In SAS, statements are used to write a program, based on instructions, executed in order. SAS statements are grouped into "blocks", referred to as "steps". In a SAS program, we can have data steps or procedure steps. A data step reads and modifies data for analysis, creating a SAS data set, while a procedure step performs a specific utility action on a data set and produces results or print reports. A data set read from an external source and can include DO loops, IF-THEN-ELSE logic and an assortment of numeric and character functions, concatenation and match-merge. According to 0, SAS procedures do everything from simple sorting and printing to analysis of variance and 3D graphics.
SAS is known for its flexibility - there aren't many rules about how to format a SAS program: statements can be in uppercase or lowercase, they can continue on the next line (with the condition that they don't split words in two) and they can start in any column. Almost all the statements begin with a keyword which identifies the type of the statement - an exception is the assignment statement which begins with a variable name. But the most important rule remains not to omit the semicolon at the end of the statement 0.
These statements specify how data are processed and analyzed, present the operations performed on data or detail the instructions about the analysis.
2 Accessing and Cleaning Data
In SAS, data can be accessed in 3 different ways:
· Using INFILE declaration;
· Reading data directly from SAS code (for small data sets);
· Using IMPORT procedure.
Table 1 presents the variables analyzed. Their description is presented below: Category refers to the status of the pharmaceutical product. This variable can have the values "OTC" - without a prescription, "Rx" - by prescription, "SN" - a dietary supplement, "Cosmetic" or "Biocides".
Type describes the formulation of the product and can have the values "Tablets", "Capsule", "Syrup", "Solution" or "Spray".
Range_serieUs nitary_pSrtioccek, Range_ series, Stock, Sales, to the production.
Term refers to the number of months of idity guaranteed for each product.
contain the number of products substitued (have the same properties and the same curative effect) for each product.
Form refers to the physical nature of each pharmaceutical product. This variable can have the values "liquid" or solid ".
Active_ingredients contains the number of active ingredients in each product, which influences the shelf life of the products.
Data cleaning is one of the most important steps in processing data, by verifying that the data are correct or at least compliant with a certain set of rules. Data cleaning is concerned with detecting and removing errors and incompatibilities to improve data quality. Data quality problems are present in the individual data collections such as files and databases, caused by misspellings during data entry, lack of information or other types of errors.
In order to verify the accuracy of data, it has been created formats for the analyzed variables:
...
To eliminate records with duplicate values, we can use the code below:
...
3 Reports
Once the stage of accessing and cleaning data done, data can be summarized, analyzed and evaluated. Before transform data, it can be created reports to show the initial state. Regarding that data are unique, without missing values and with no incorrect format, the results obtained will be representatives.
Reports are tools for management, which influence the decision making process. SAS offers the possibility of creating such reports, parameterize and customize them, as well as save the resulted reports in PDF or HTML.
The database analyzed belongs to a pharmaceutical company, which means that the information analyzed refer to clients, manufacture, products and suppliers.
For example, we can create a report based on information about products - unit price, category, type, unit stock, unit sales etc.
To create a report about products and the situation of units, considering the category and the type for each product, we can use the code below:
...
Fig. 1 presents the report which lists by category and type, products and unit prices, while achieving total units sold, the total stock and total units produced. For example, in the category "Biocide", there are 2 types of products - liquid and tablets; the liquid products are Betagin sol. ext. 10% 500ml x1 and Betagin sol. ext. 10% 30ml x1 and the tablets are Dezinfectant Forte compr.x50. For every product are listed total units sold, the total stock and total units produced.
4 Scorecard for Ranking Suppliers
Scorecard is a technique that allows a company to monitor and manage performance, considering some targets.
Measurements are usually financial performance, customer value, internal business processes, the performance of innovation or employee performance 0.
Suppliers have an important role in the proper functioning of the Biofarm Company. If suppliers do not fulfill their obligations under the contract, increases the risk that Biofarm reduce or even stop their work due to lack of raw materials or packaging.
Scorecard is used in activities related to risk management of non-commitment to a supplier, in order to decide to conclude a new contract or not to continue to work with suppliers for which risk and losses are related. For these reasons, it is useful to review suppliers that Biofarm contracts. The analyze made regarded the top 20 suppliers by value of purchases made by the company. The analysis regards 20 vendors for which 9 attributes are known: Acquisition_value, Term (delivery time agreed in contract), Term(achieved) (delivery time realized), Product_category (the category for each product), Orders_history (the status for each order), Company_type (the type of every supplier), Contract (the length of the collaboration between Biofarm and each supplier), No_of_contracts (number of contracts closed by Biofarm with each supplier), Performance (show if a supplier is performing or not).
Fig. 2 presents the formula for WOE (Weight of Evidence), which represents the power of discrimination of each attribute and helps to differentiate the model risk and must be calculated for each variable.
Information Value formula, shown in Fig. 3, has the purpose to select variables during analysis and model building. This concept reflects the overall predictive power of the variables considered, so it can be used for comparing the predictive power among competing variables.
Regarding the notions explained above Error! Reference source not found., the following macro has been defined in order to calculate the Weight of Evidence and the Information Value for each attribute 0:
...
As it is shown in Fig. 4, the biggest negative value for the Acquisition_value variable is obtained for the "300-600 mil" category, which means that the category of dealers who have obtained orders worth 300-600 thousand, there are more underperforming suppliers than performing. The ratio of non- performing providers were able to breach of contract is lower for suppliers regarding Acquisition_value attribute than the whole portfolio.
The graphic shows that the most reliable and profitable suppliers are those who obtained orders worth more than 1 million or less than 300 thousand.
As it is presented in Fig. 5, the biggest negative value for the variable Term is obtained for the "45" category, which means that for the category of providers who should deliver in 45 days, there are more underperforming suppliers than performing ones. The chart above shows that the most profitable are providers who delivered the order within 30 days.
According to 0 and 0, in order to eliminate the missing values from the table created, the code below must be run:
...
By summing the Information Value for each attribute, as in Fig. 6, Information Value can be calculated globally as running the following code, suggested in 0:
...
Based on nine attributes recorded for the 20 suppliers, the Weight of Evidence has been calculated for each attribute and the information has been reflected by each feature separately. Considering the products, their category, type, unitary_price, range_series, stock, sales, production, term, substitutes, form and active_ingredients, the Information Value compares the predictive power across those 9 attributes and the Weight of Evidence reflects how behaves every variable considered.
As shown in Fig. 7, the information aggregated globally has revealed a hierarchy of suppliers, after descending the Information Value field.
The report in Fig. 8 highlight the suppliers with whom Biofarm need to maintain their collaboration, due to good results obtained in the past.
The report presented above facilitates management decisions, providing a clear situation of the suppliers, considering the variables analyzed.
As stated in the report above, top 5 suppliers for Biofarm are Selectchemie AG, ICE SpA, Ongropack Kft BC, Tipografia Polirom SRL and Morgan Thorpe SA. Considering the results obtained, Biofarm will continue to collaborate with those companies and can close the contracts with the suppliers which didn't fulfill their responsibilities, such as Xiamen Forever, Bostocke, Argorom Plastics, Linnea SA or DSM Nutritional Products Europe LTD.
5 Conclusions
In the beginning of this article, SAS has been presented as an important data analysis software tool, dedicated to highlight the power and importance of using such programs for statistical analysis of the data in a company.
Following, it has been shown the importance of data cleaning and validation; this phase cannot miss because the analyze need accurate data in order to obtain representative results. For that, it has been created data validation formats and the records which didn't comply have been removed.
Once the stage of accessing and cleaning data done, data can be summarized, analyzed and evaluated. Before transform data, it can be created reports to show the initial state. Regarding that data are unique, without missing values and with no incorrect format, the results obtained will be representatives.
Reports defined in SAS allow visualization of key information about products and are useful tools for sustaining management decisions. SAS offers the possibility of creating such reports, parameterize and customize them, as well as save the resulted reports in PDF or HTML.
Scorecard created for the company's suppliers has the purpose to sustain management decisions. The hierarchy resulted takes into account the performance indicators obtained in the last year, and historical data, such as length collaboration between the company and suppliers, the number of contracts and the degree of respect the contract by each supplier.
References
[1] F. D. Delwiche and S. J. Slaughter, "The Little SAS Book: A Primer: A Programming Approach", 2012
[2] G. Ruxanda, "Data Mining", Bucharest, 2010
[3] Information about SAS products, available:http://www.sas.com/en_us/com pany-information.html#stats
[4] G. Der and B. S. Everitt, "A Handbook of Statistical Analyses using SAS", 2002
[5] M. Evans, "SAS Manual For Introduction to the Practice of Statistics", 2010
[6] Information about variable reduction in SAS, using WOE (Weight of Evidence) and IV (Information Value): http://support.sas.com/resources/papers/p roceedings13/095-2013.pdf
[7] Information about WOE (Weight of Evidence) and IV (Information Value): http://documentation.statsoft.com/STATI STICAHelp.aspx?path=WeightofEvidenc e/WeightofEvidenceWoEIntroductoryOv erview
[8] F. E. Pratter, "Web Development with SAS® by Example", 2006
[9] J. Shostak, "SAS® Programming in the Pharmaceutical Industry", 2005
[10] R. Cody, "Learning SAS® by Example - A Programmer's Guide", 2007
[11] SAS Institute Inc. Step-by-Step Programming with Base SAS® Software, 2001
Lucelia DEDU
Database Master - Business Support
Faculty of Economic Cybernetics, Statistics and Informatics
Bucharest University of Economic Studies, Romania
Lucelia DEDU has graduated the Faculty of Economic Cybernetics, Statistics and Informatics from the Bucharest Academy of Economic Studies in 2012 and she has graduated in 2014 the courses of "Databases for Business Support" master program organized by the Academy of Economic Studies of Bucharest. During her second year at the master program, she followed the course Credit Risk Management, held by BCR, where she discovered new features of using SAS code. Her career is centered on payroll domain, database's queries, ETL Tools for data warehouses and COBOL development. Her areas of interest are: Data Warehouses, Data Modeling, Risk Management, Databases and SAS development.
You have requested "on-the-fly" machine translation of selected content from our databases. This functionality is provided solely for your convenience and is in no way intended to replace human translation. Show full disclaimer
Neither ProQuest nor its licensors make any representations or warranties with respect to the translations. The translations are automatically generated "AS IS" and "AS AVAILABLE" and are not retained in our systems. PROQUEST AND ITS LICENSORS SPECIFICALLY DISCLAIM ANY AND ALL EXPRESS OR IMPLIED WARRANTIES, INCLUDING WITHOUT LIMITATION, ANY WARRANTIES FOR AVAILABILITY, ACCURACY, TIMELINESS, COMPLETENESS, NON-INFRINGMENT, MERCHANTABILITY OR FITNESS FOR A PARTICULAR PURPOSE. Your use of the translations is subject to all use restrictions contained in your Electronic Products License Agreement and by using the translation functionality you agree to forgo any and all claims against ProQuest or its licensors for your use of the translation functionality and any output derived there from. Hide full disclaimer
Copyright INFOREC Association 2014
Abstract
This paper presents some of the utilities of using SAS code for analysis and forecast in a pharmaceutical company. First, data must be cleaned in order to obtain representative results. Once this stage done, the input can be evaluated and models can be created in SAS to extract the most valuable information from the initial data. To sustain management decisions, SAS code allows creating different types of reports and has the capability to recode the initial variables into new ones, but keeps the most part of the information contained through the technique called "principal components analysis". Another advantage of SAS code is its capability of classifying the initial variables into compact class. Based on scorecards and WOE (Weight of Evidence), notions used in banking analyses, a model has been created which evaluates the suppliers of the company and ranks them in order to find out what contracts must be continued and which contracts must be closed.
You have requested "on-the-fly" machine translation of selected content from our databases. This functionality is provided solely for your convenience and is in no way intended to replace human translation. Show full disclaimer
Neither ProQuest nor its licensors make any representations or warranties with respect to the translations. The translations are automatically generated "AS IS" and "AS AVAILABLE" and are not retained in our systems. PROQUEST AND ITS LICENSORS SPECIFICALLY DISCLAIM ANY AND ALL EXPRESS OR IMPLIED WARRANTIES, INCLUDING WITHOUT LIMITATION, ANY WARRANTIES FOR AVAILABILITY, ACCURACY, TIMELINESS, COMPLETENESS, NON-INFRINGMENT, MERCHANTABILITY OR FITNESS FOR A PARTICULAR PURPOSE. Your use of the translations is subject to all use restrictions contained in your Electronic Products License Agreement and by using the translation functionality you agree to forgo any and all claims against ProQuest or its licensors for your use of the translation functionality and any output derived there from. Hide full disclaimer