Mete Eminağaoğlu Machine learning approaches for prediction of service times in health information systems Learning Objectives The objectives of this chapter are to understand the basic machine learning approaches to solve time prediction problems in health information systems, to learn the basics of artificial neural networks, to identify the essential concepts in the design and implementation of multilayer perceptrons with many hidden layers, and to understand the basic methodologies for evaluation of the performance of machine learning algorithms for regression problems. Once you have mastered the materials in this chapter, you will be able to: – Discuss the key machine learning approaches and technologies that are used in management information systems and how they can be used for Industry 4.0 applications. – Identify the essential concepts in the design and implementation of multilayer perceptrons with many hidden layers with complex architectures. – Understand the basic methods, approaches, and implementations for data analysis, data transformation, and machine learning models for service time prediction problems. – Design and implement artificial neural network models for solving regression problems in business. – Understand the basic methodologies for evaluation of the performance of machine learning algorithms for regression problems. 253 Chapter Outline This chapter introduces a special artificial neural network (ANN) model, which is designed and implemented in order to predict service duration of specific processes in a hospital. Service times between different phases in health information systems and medical services is one of the important measurements that affects quality of services, satisfaction of patients, change management, costs, organizational and strategic business decisions. Accurate and coherent prediction of service times or turnaround times as well as elicitation of the hidden reasons that have impact on service durations is a difficult problem. The ANN model proposed in this study has a unique multilayer perceptron architecture with four hidden layers and some hyper-parameters and methodologies, which are commonly used in deep learning, are also used in the proposed ANN model. The prediction performances of this unique ANN model is comparatively analyzed with some other ANN’s and some linear or nonlinear regression algorithms by the aid of some basic performance evaluation methods that are used for numerical prediction. The results show that the proposed model provides significantly more successful results than the other models and algorithms and it can be used by the decision makers as an accurate and reliable model to predict service times. Keywords Multilayer perceptron, artificial neural networks, machine learning, service duration, medical laboratory, health information systems, numerical prediction, regression Introduction The assembly of big data and machine learning with current technological advancements such as cloud computing, mobile technologies, fuzzy systems, pervasive computing and the Internet of Things has enabled tremendous new business opportunities. Data analysis has become a competitive and sustainable advantage for many organizations. To saddle the advantages of big data and machine learning, in any case, 1 Machine learning approaches for prediction of service times in health information systems 254 business pioneers confront the challenge of not just getting the adequate technologies and ability to analyses the information, yet additionally to mesh an information driven attitude into the organizations’ structure and culture. In the recent decades, organizations have depended on analytics vigorously to enable them with competitive knowledge and empower them to be increasingly successful. Organizations are currently compelled to look further into their information to discover new and imaginative approaches to increase effectiveness and competitiveness. Regarding the recent advances in science and technology, especially in machine learning, organizations are developing bigger, smarter, and more comprehensive analytics models and techniques. There is a huge number of diverse alternatives for machine learning platforms and tools today. However, business managers should focus on their particular business requirements so that you could make the right choice among these alternatives. Machine learning provides “the technical basis of data mining and it is accepted as a universal discipline, which is used in cooperation with data mining” (Witten et al. 2011). Bayesian networks, support vector machines, and decision trees are examples of such algorithms, which “reside both in the area of, machine learning and data mining” (Mitchell, 2017; Nedjah et al. 2009). Artificial neural networks (ANN) are one of the “black-box models amongst machine learning” (Haykin 2009) and a multilayer perceptron, which is a type of ANN is designed, coded and implemented in this study to predict two different service durations in a medical laboratory. Background Information systems and related technologies are known to have a great influence on the enhancement of quality, efficiency, an effectiveness of healthcare services. The advances in information technologies and their alignment with business processes enables new solutions approaches for health management systems and for the decision makers (Bernardi et al. 2017; Chen 2014; He et al. 2013; Lyon et al. 2016; Ng and Chung 2012; Söderholm and Sonnenwald 2010; Stvilia and Yi 2009). There are also some studies focusing on the enhancement of the 2 2 Background 255 service qualities in clinics, laboratories, and similar institutions by the analysis and reduction of service times (Goswami et al. 2010; Scagliarini et al. 2016; Sinreich and Marmor 2005; Willoughby et al. 2010). Turnaround time or service time is generally described as the amount of time or duration taken to fulfill a request or process. The service duration or turnaround time in medical services and healthcare systems is usually described as the time spent for a particular analysis or during any stage in medical laboratory, other commercial laboratories or a public health laboratory (Breil et al. 2011; Goswami et al. 2010). Processing time or duration for tests is often considered as one of the most significant performance measures in medical services; such as laboratory analytical service time is shown to be a reliable indicator of laboratory effectiveness (Goswami et al. 2010) or the reduction of patient service time in emergency departments is shown to impose an effective impact on decreasing the costs for emergency departments and increasing the service quality and patients’ satisfaction (Fieri et al. 2010; Sinreich and Marmor 2005; Storrow et al. 2008; Willoughby et al. 2010). However, it seems difficult to find outstanding researches, models, or applications that relates to the accurate prediction of the service times among such medical services, which was one of the primary motivations and objectives to make a research related with this subject, which is discussed in this chapter. Artificial Neural Networks Artificial neural networks (ANN) are described as computational models that were derived by the inspiration from central nervous systems and the brain in mammals (Haykin 2009; Kumar, 2017). They are generally used to “approximate or estimate functions that might depend on a large number of inputs and ANN can be used for either classification, clustering, or regression” (Haykin 2009). ANN are being used today widely in order to solve difficult tasks or problems such as decision making, computer vision, forecasting, pattern and speech recognition, and so on (Huang et al. 2004; Olivas et al. 2009; Reyes et al. 2013; Yu et al. 2006). ANN with feed-forward learning models are usually named as “multilayer perceptrons” (Bishop 2006). However, it 3 Machine learning approaches for prediction of service times in health information systems 256 should be noted that there are various feed-forward ANN models and multilayer perceptron (MLP) is a specific type of feed-forward ANN, which “all of the nodes in one layer are fully connected to all of the nodes in the following layer and there exists one or more hidden layers” (Haykin 2009). It is known that “the differential error and corresponding weight updates in MLP is usually achieved by backpropagation with gradient descent methodology” (Haykin 2009). However, in the recent years, some other alternative learning rate and weight update methodologies are being preferred, such as RMSProp (Root Mean Square Propagation), AdaGrad (Adaptive Gradient), ADAM (Adaptive Moment Estimation), and so on (Goodfellow et al. 2017; Patterson and Gibson 2017). The linear input function given in equation (1) is used for the nodes / units in the hidden layers and the output layer and it is generally used in “multilayer feed-forward neural networks” (Bishop 2006). For each node j in the hidden or output layer, “the net input Ij is calculated by the connection’s weight Wij from node i in the previous layer to node j; Oi is the output of node i from the previous layer; and bj stands for the bias parameter used in that node” (Larose 2005). The weights and biases in an ANN are randomly initialized to continuous small negative and non-negative values with a uniform or Gaussian distribution. These weights and bias values are updated during the training phase of the neural network model. On the other hand, if the number of weights and nodes are very high in ANN, which is mostly seen in deep learning models, then some specific weight initialization strategies are preferred, such as Xavier initialization (Patterson and Gibson 2017). The network structure in any ANN learns by “adjusting the weights in order to make better or more accurate predictions or classifications” (Han and Kamber 2006). Biases are also used in most of the feed-forward neural networks and multilayer perceptron models where “each bias value within each layer of a neural network is shown to have the effect of increasing or lowering the net input of the activation function so that the output values within each layer are derived or 3 Artificial Neural Networks 257 updated in a more balanced way” (Haykin 2009). A non-linear activation function is used “to calculate the output value within each unit in the hidden layer or the output layer, which is usually chosen as sigmoid function” (Han and Kamber 2006). The output value is calculated by using the non-linear sigmoid function that is given in equation (2), where Ij is the net input to node j, and Oj is the output of node j. It should be noted that hyperbolic tangent function (Graves 2012) is also used as well as sigmoid function as non-linear activation functions in MLP models and some deep learning architectures such as LSTM (Long Short-Term Memory), and GRU (Gated Recurrent Units). The output value derived by using the non-linear hyperbolic tangent function is given in equation (3). There are non-linear activation functions that are mostly preferred recently in deep learning models, as well as MLP (Samudrala, 2019). One of them is ReLU (Rectified Linear Unit), which is also used as the non-linear activation function within all of the hidden layers for the model proposed in this study. ReLU is shown in equation (4). If an ANN is to be used for regression problems, then the output layer is composed of a single output node, and the prediction of the dependent variable is achieved by the output value of that node. The predicted value “is usually different from the actual value and eventually, this difference is described as the prediction error” (Larose 2005). Other errors are obtained within each node in each hidden layer, and the Machine learning approaches for prediction of service times in health information systems 258 weights must be updated iteratively by using those errors. Backpropagation, which is also named as “backward propagation of errors” is “one of the methods that can be used in multilayer feed-forward neural networks to calculate the errors and new weights” (Dasu and Johnson 2003). Backpropagation method is usually used with the “gradient descent” optimization algorithm in order “to update the new weights in a more accurate way” (Haykin 2009). The backpropagation method for calculating the error in the output layer is given in the equation (5). It should be noted that; j stands for the output node, the predicted value is denoted by Oj and Rj is the actual value for that instance. It should be noted that Oj (1 − Oj) is established by the derivative of the non-linear sigmoid output function. This is due to the fact that the derivative of the sigmoid function can be expressed in terms of the function itself, which is given in equation (6). This property is also true for all the other non-linear activation functions. For instance, in equations (7) and (8), it is shown that the derivative of hyperbolic tangent and ReLU can also be expressed in terms of themselves respectively. The weighted sum of the errors of the nodes connected to node j must be included “to calculate the corresponding error of that node within 3 Artificial Neural Networks 259 any hidden layer in the neural network” (Larose 2005). The error formula for node j in a hidden layer is shown in the equation (9) where Wjk is the connection’s weight from node j to node k in the following layer, and Errk stands for the error in node k. These error values are used to update the weights and biases within each iteration during the training process of an ANN. The iteration is also named as “epoch” in ANN terminology and each epoch can be simply described as one complete iteration or round in the training of an ANN. In other words, an epoch can be defined as “a single pass through the entire training set within ANN model where backpropagation errors are calculated and nodes' weights are updated” (Hastie et al. 2009). The weight update calculation within each connection is given in the equations (10) and (11) where Wij(t-1) is the change in Wij in the previous iteration (t-1), W‘ij is the new weight value for the tth iteration, Errj denotes the error for node j, Oi is output value of node i and α stands for the learning rate of the ANN where (0 < α < 1). There is a constant parameter m in equation (11), and this parameter is named as “momentum”. This momentum constant is used to prevent the system from converging to a local optimum. In other words, “this parameter helps to increase the rate of learning while avoiding the danger of instability in artificial neural networks” (Haykin 2009). Min-max normalization (Han and Kamber 2006; Witten et al. 2011) is used as a statistical data transformation method for all of the continuous variables and input data in any ANN model, which is denoted in equation (12). It must be noted that this transformation Machine learning approaches for prediction of service times in health information systems 260 method is also used in this study for all the attributes with numerical values in order to normalize the variables range between values 0 and 1 that makes it appropriate for ANN models. However, in ANN and deep learning models, another normalization methodology, standardization or Z-score normalization namely, is also used for data transformation (Patterson and Gibson 2017). It is a wellknown methodology that transforms a numerical variable by normalizing its statistical distribution where its mean becomes 0 and the standard deviation becomes 1. Hence, all the values of the variable are transformed into small negative or positive values. This methodology is given in equation (13). There were some attributes in this study with nominal (categorical and non-ordinal) values, which were used as input data. All of these nominal data were transformed into dummy variables so that one input node was generated and set to either 0 or 1 for each different value of a categorical variable, which is mostly used in all types of deep learning and ANN models (Dasu and Johnson 2003; Patterson and Gibson 2017). For instance, if there exists an attribute with three different nominal values such as “X”, “Y”, and “Z”, then there will be three input units for this attribute so that this single attribute is transformed and diminished into three attributes. Each categorical value of this attribute can be represented by different combinations of 0 and 1 according to the arbitrary settings. For instance, for the records originally having values “X”, the first input unit will be chosen as 1 and the other two units will be 0 such that it will have an input encoding as 001, and so on. This method is known as “one-hot encoding” or “one-hot vector representation” (Buduma and Locascio 2017). 3 Artificial Neural Networks 261 Materials and Methods The data used in this study was provided from the health information system’s medical laboratory database in one of the public hospitals in Turkey and this data had been collected in two years. The name of the hospital and some other information about the hospital could not be explicitly given in this study due to confidentiality reasons. There were 247305 records with 19 different attributes such as “Patient’s gender”, “Patient’s age”, “Department Name”, “Previous admissions to this department”, “Previous admissions to other departments” “Date of patient registration”, “Time of patient registration”, “Date of request for physician”, “Time of sample delivery to laboratory”, and so on. The list of these 19 attributes with their data types from the original data is given in Table 1, and a sample of the data set is given in figure Fig. 1. It could be considered that two previous studies (Eminağaoğlu and Vahaplar 2018; Köksal et al. 2016) might resemble the approach and the implementation in this study. However, it should be noted that in those studies the data set was entirely different and was collected from a different hospital. In addition, the proposed ANN model, its architecture, the activation, and some normalization functions used in this study are entirely different from both of these studies. The train and test data split methodology, and the use of two different prediction attributes can be considered as other aspects that make this study entirely different from the previous studies. The list of attributes in the original data obtained from the medical database Attribute index # Attribute name Data type 1 Patient's gender Categorical (nominal) 2 Patient's age Numerical (integer) 3 Department name Categorical (nominal) 4 Previous admissions (This department) Numerical (integer) 5 Previous admissions (Other departments) Numerical (integer) 6 Date (Patient registration) Date (DD.MM.YYYY) 7 Time (Patient registration) Time (hh:mm) 8 Date (Request for physician) Date (DD.MM.YYYY) 4 Table 1. Machine learning approaches for prediction of service times in health information systems 262 9 Time (Request for physician) Time (hh:mm) 10 Date (Specimen collection) Date (DD.MM.YYYY) 11 Time (Specimen collection) Time (hh:mm) 12 Date (Sample delivery to laboratory) Date (DD.MM.YYYY) 13 Time (Sample delivery to laboratory) Time (hh:mm) 14 Date (Report of test results) Date (DD.MM.YYYY) 15 Time (Report of test results) Time (hh:mm) 16 Date (Report delivery to patient) Date (DD.MM.YYYY) 17 Time (Report delivery to patient) Time (hh:mm) 18 Date (Visit to physician) Date (DD.MM.YYYY) 19 Time (Visit to physician) Time (hh:mm) A sample of the data set retrieved from medical laboratory database Some of the attributes in the data set had to be converted to different data types and formats in order to make them appropriate for both ANN architectures and numerical prediction. “Department name” attribute was a nominal attribute with eleven different categorical values such as “Urology”, “Dermatology”, and “Cardiology”. Since ANN models and algorithms can only use numerical attributes as input data, “Department Name” attribute was transformed into one-hot vectors by using the dummy coding which was explained in the previous section. The major business processes or stages that were within the scope of Fig. 1. 4 Materials and Methods 263 this study is denoted in Fig. 2. The durations between each of the seven main stages were considered as six different service durations, and they were computed by transforming them into terms of seconds first, and then calculating their differences. Hence, six new attributes were derived and they were added into the datasets. These derived attributes were also the candidates among different service times or durations to be used as the independent variable or the attribute for prediction. Two of these six candidates were chosen as target attributes for prediction because the analysis and estimation of the duration between those two stages were crucial for the hospital management. One of them is the duration between “Sample delivery to laboratory” and “Specimen collection”. The other one was the duration between “Sample delivery to laboratory” and “Report of test results” services. Hence, two different experiments were conducted in this study with two different data sets. In the first set, the duration between “Sample delivery to laboratory” and “Specimen collection” was set as the target attribute to be predicted (named as Duration_1 in this chapter), and the other remaining 24 attributes were set as features attributes. In the second set of experiments, service time between “Sample delivery to laboratory” and “Report of test results” was used as the attribute to be predicted (named as Duration_2 in this chapter), and the other 24 attributes were used as feature attributes. In other words, “Duration_1” was treated as the independent variable and the other 24 attributes as dependent variables for one of the experiment sets, and “Duration_2” was treated as the independent variable and the remaining 24 attributes as dependent variables for the other experiment sets. Machine learning approaches for prediction of service times in health information systems 264 Major business processes and services in medical laboratory within the scope of this study There were some attributes such as “Date of patient registration”, “Time of patient registration”, and so on. These were all originally in either date (DD.MM.YYYY) or time format (hh:mm), and they were converted into integers named as “Day of month” (ranging between 1 and 31) and “Hourly time interval” (between 1 and 24). Thus, a data set with 247305 instances and 25 attributes was established finally where a sample from this data set is shown in Fig. 3. This data set is used in different experiments with machine learning algorithms. In the proposed ANN model, Z-Score standardization was used for all the feature attributes to normalize the numerical attributes and to make it feasible for the multilayer perceptron. On the other hand, the dependent variables to be predicted, “Duration_1” and “Duration_2” were min-max normalized. It should be noted that all the feature attributes were min-max normalized while they were trained and tested by ANN models and regression algorithms in Weka, due to the default settings in Weka. Fig. 2. 4 Materials and Methods 265 New data set with transformed variables and additional attributes derived from the service durations in terms of seconds In order to observe the performance of the machine learning algorithms and to evaluate whether they have learned the model, the entire data set with 247305 instances was divided into three different data sets. 173113 instances (70% of the entire data set) were randomly chosen and used for training phase of each of the machine learning algorithms. Half of the remaining data (37096 instances, which is 15% of the entire data set) was chosen and used for validation phase and similarly, the remaining 37096 instances were used for the final test stages. Hence, during all the experiments in this study, each machine learning algorithm was trained first with the train data set, and then the trained model was tested by the validation data set. According to results observed after validation process, the algorithm or model’s parameters were changed (if necessary, due to overfitting problem or high errors in predictions), and the training and validation stages were re-executed (Buduma and Locascio 2017). If acceptable results were observed in the validation stage, then that algorithm or model was finally tested with the third set, which is the test data set. It should be noted that all of these results within three phases are given in this study, however, the test results should be focused on, and the evaluation of the machine learning algorithms should be based on their performances with the test data (Buduma and Locascio 2017; Goodfellow et al. 2017), which is also done in this study. Statistical random sampling without replacement methodology (Dasu and Johnson 2003) was used to derive independent train, validation, and test data sets. It should be noted that Fig. 3. Machine learning approaches for prediction of service times in health information systems 266 cross-validation is another reliable and accurate statistical technique in data mining and machine learning whenever the data set has to be separated into train and test data sets (Hand et al. 2001). However, due to the high number of sample sizes in this study, and the necessity of frequent updates of a lot of different hyper-parameters in some of the models, instead of k-fold cross-validation, or other methodologies, train / validation / test separation is preferred (Buduma and Locascio 2017; Goodfellow et al. 2017). One of the performance evaluation measures for numerical prediction that is used in this study is coefficient of determination (R2), which is used as an evaluation metric in scientific researches (Hassanpour et al. 2018; Ravinesh and Şahin 2015; Yeh and Lien 2009). If the predicted values among the test / validation instances derived by a numerical prediction algorithm are denoted as p1, p2,.. pn and the actual values are denoted as a1, a2,.. an, then the coefficient of determination R2 can be calculated as follows (Larose, 2006): It should be noted that, in the equation (14), n denotes the total number of instances, and denotes the arithmetic mean of actual values respectively. Another performance measure for numerical prediction that was used in this study was root mean squared error (RMSE), which is also defined as the “standard error of the estimate” (Larose 2005). Given that, the predicted values are p1, p2,.. pn and the actual values are a1, a2,.. an within a test / validation data set, RMSE is calculated as follows: 4 Materials and Methods 267 It is known that mean squared error and root mean squared error measures tend to exaggerate the effect of outliers. On the other hand, it is also known that mean absolute error (MAE) measure does not tend to exaggerate the error values caused by outliers (Witten et al. 2011). Since, there were several outliers in the data used in this study, and since it was strictly necessary not to discard the records having outlier values, MAE was also used as another prediction performance measurement. MAE calculation is given in equation (16). The descriptive statistics for the “Duration_1” and “Duration_2” dependent variables are given in the Table 2. It should be noted that all the values of “Duration_1” and “Duration_2” that are shown in Table 2 are in terms of seconds. It could be seen from Table 2 that the minimum value is observed to be zero for both of the attributes. This might be related to some errors in the original raw data records or some unusual occasions, however, these values and related records were not changed or discarded from the experiments. Descriptive statistics for dependent variables “Duration_1” and “Duration_2” Measurement Duration_1 (seconds) Duration_2 (seconds) Mean 2365.42 2043.78 Standard deviation 7234.56 5964.20 Minimum 0 0 Maximum 288720 226140 Design and Implementation The architecture and design of the specific MLP model used in this study was implemented and coded with Python 3.7.1 programming language (Python 2019) and Tensorflow R1.11.0 machine learning Table 2. 5 Machine learning approaches for prediction of service times in health information systems 268 framework (TensorFlow 2019). TensorFlow can be simply described as “a software framework for numerical computations that is designed primarily as an interface for expressing and implementing machine learning algorithms, especially, deep neural networks” (Hope et al. 2017). Since, the MLP model proposed in this study had relatively a deeper architecture (than ordinary MLP’s) with four hidden layers, 33088 weights, and many hyper-parameter updates among a huge training dataset more than 100,000 instances, it was necessary to use such a framework. The proposed multilayer perceptron model’s training parameters that were used in both of the data sets in were as follows: Gradient descent was used with mini-batch training (Goodfellow et al. 2017) where the batch size was set to 256. The learning rate was set to 0.01, momentum was set to 0.02, and decay method was not used for the learning rate. It should be noted that some different adaptive learning rate models such as ADAM and RMSProp were tested; however, the results were not good as the results obtained by gradient descent with momentum. Dropout method (Buduma and Locascio 2017; Goodfellow et al. 2017) was used to overcome the potential overfitting problems and the dropout rate was set to 0.5 among all of the hidden layers and the output layer. This dropout rate was satisfying since the differences between the errors obtained from training, validation, and test phases were negligible, thus, overfitting problem was not observed. Thus, it should be mentioned that L1 and L2 regularization methods (Goodfellow et al. 2017) were not included in the model because dropout method was observed to be sufficient for the overfitting problem. The number of epochs was set to 1000 for the training process, where an epoch means the training step of the whole data set. The architecture of the multilayer perceptron in this study was designed and implemented with an input layer with 34 nodes, one output layer with one node, four hidden layers where the first two hidden layers had 128 nodes each and the third and fourth hidden layers were composed of 64 nodes. The input layer of the MLP was composed of 34 nodes because there were 24 feature attributes in the data sets. Among these 24 feature attributes, 22 of them were numeric attributes with integer values that necessitated 22 input nodes. Patient’s gender was a nominal attribute and since it had only two values, it could be represented by a single digit, which could be established with a single input 5 Design and Implementation 269 node in the neural network architecture. The department name was also another nominal attribute that had eleven different values, which required the addition of eleven input nodes. Hence, 34 nodes were used in the input layer of the ANN model in this study. All the nodes were fully connected in the ANN model proposed in this study and it was composed of four hidden layers, in contrast to most ordinary MLP s having only one hidden layer. It should also be noted that ReLU function is used as the non-linear activation function within all of the hidden layers, instead of hyperbolic tangent or sigmoid function. The total number of nodes in the multilayer perceptron model was 419 (since, 34 + 128+ 128 + 64 + 64 +1 = 419), and all of these nodes are fully connected to each other. Eventually, the number of weights used in this model was 33088 which can be calculated as ((34 x 128) + (128 x 128) + (128 x 64) + (64 x 64) + 64 x 1). The random initialization of weights and bias values were achieved by Xavier initialization method (Patterson and Gibson 2017). A simple representative diagram of the MLP architecture that is designed and used in this study is given in Fig. 4. Architecture of the MLP model proposed in this study with 4 hidden layers, 419 nodes, and 33088 weights Weka version 3.9.1 was used for all of the other available machine learning algorithms. Weka (Weka 2019) is an open source data mining Fig. 4. Machine learning approaches for prediction of service times in health information systems 270 and machine learning software that is developed in Java programming language. The algorithms in Weka were comparatively tested with the proposed ANN model in this study. Coefficient of determination (R2), mean absolute error (MAE), and root mean squared error (RMSE) were used to measure and compare the prediction performances of the algorithms obtained by train, validation, and test results. The algorithms that were included in the experiments are shortly described in the following paragraphs. k-nearest neighbors (k-NN) is a type of instance-based learner algorithm that can be used for both classification and regression problems (Aha et al. 1991). The number of nearest neighboring instances to a specific instance is calculated by simple distance measures such as Euclidean distance, Manhattan distance, Mahalanobis distance, and so on (Witten et al. 2011). The feature values are used to calculate the distance between two instances by such measures. If “k” parameter is set to one, the closest neighbor with the shortest distance is chosen. Similarly, “k” values can be set to 2, 3, …, and so on, which imposes that the higher the k value, more instances will be covered as the nearest neighboring samples to that instance or record. This algorithm is also described as a lazy learner because the generalization of the training data is delayed until a test request is made (Hendrickx and Bosch 2005). In other words, there is no global approximation of the target function or model during the training phase in k-NN. It is also known that k-NN does not generate a model after the training phase. Multiple linear regression is a linear functional model that uses the Akaike criterion (Akaike 1981) for model selection. This is a linear machine learning model that is based on a linear equation where all or some of the feature attributes are used as the independent parameters of the equation (Larose 2006). Hence, the linear equation represents the relation between the dependent variable (the attribute to be estimated or predicted) and the independent variables. Isotonic regression is an algorithm that learns the isotonic regression model by choosing the attribute that provides the lowest squared error (Witten et al. 2011). It can be derived by a non-linear isotonic curve, and it is different from all the linear regression models because isotonic regression not constrained by any functional properties such as a linear equation or estimator parameters. Partial least squares (PLS) regression is a type of regression model, which calculates derived direc- 5 Design and Implementation 271 tions that, as well as having high variance, they are strongly correlated with the class (Witten et al. 2011). It is also named as “Projection to Latent Structures”, since it is based on projecting the predicted and observe variables into a new hyperspace and latent variables (Wold et al. 2001). M5 Model Tree (M5P) is derived from M5Base algorithm, which implements base routines for generating M5 model trees and rules (Wang and Witten 1997). M5P can be described as the reconstruction of Quinlan's M5 algorithm for inducing trees of regression models (Quinlan 1992). M5P combines an ordinary decision tree model with linear regression functions at the nodes. M5P is one of the few decision tree algorithms that can be used for numerical prediction and regression, since most of the decision trees can only be used for classification problems. Single layer perceptron is a type of simple ANN does not have a hidden layer. It is composed of input nodes and an output node, where the activation function is a linear signum function (Haykin 2009). The input nodes are fully connected to the output node and it can either be used for classification or regression problems. Since there is no hidden node and no non-linear activation function, this type of ANN’s success is mostly limited to linear problems. However, it is one of the fastest ANN models during the training phase due to its simplicity and its linearity. For instance, the weight updates are much faster and easier in single layer perceptron when compared to Multilayer Perceptrons or deep learning models. Multilayer Perceptron (MLP) is a type of feed-forward ANN algorithm that uses backpropagation with gradient descent for weight updates. The learning rate, momentum, number of iterations, number of hidden layers, and the number of nodes in the hidden layers can be flexibly changed. It can be used for both classification and numerical prediction (Witten et al. 2011). In most of the cases, MLP’s are designed with a single hidden layer and preferably with a few numbers of nodes in that hidden layer where all the nodes are fully connected to each other. In such MLP’s, the non-linear activation function used in the outputs of hidden nodes and output nodes are usually sigmoid (logistic) or hyperbolic tangent functions, but some other non-linear activation functions are used as well (Haykin 2009). Radial Basis Function (RBF) Regressor is a numerical prediction model that implements RBF networks, trained in a fully supervised Machine learning approaches for prediction of service times in health information systems 272 manner by minimizing squared error with the BFGS (Broyden–Fletcher–Goldfarb–Shanno) method (Eibe 2014). RBF networks are a type of ANN that uses radial basis functions as activation functions. It is generally composed of one input layer, one hidden layer, and an output layer. The output of the RBF network is a combination of radial basis functions of the inputs and node parameters (Broomhead and Lowe 1988). The parameter settings for eight machine learning algorithms are given in Table 3 where some of them are used with the default values in Weka and some of them were set to alternative values according to the observations during the experiments. It should be noted that some of the other linear numerical prediction algorithms such as simple linear regression, pace regression and some nonlinear regression algorithms were not included in this study because the performance results obtained by those algorithms were much lower and more inaccurate in terms of RMSE, MAE and R2. Parameter settings of the algorithms in Weka, which were used in this study Algorithm name Parameter settings k-NN k=3 distance metric: Euclidean distance weighted distance not used all the other parameters with default values Linear regression all the parameters with default values Isotonic regression all the parameters with default values Partial least squares regression all the parameters with default values M5 Model Tree (M5P) minimum number of instances at leaf nodes: 7all the other parameters with default values RBF Regressor all the parameters with default values Single layer perceptron transfer function: signum learning rate: 0.9 learning rate with linear decay number of iterations: 1000 all the other parameters with default values Table 3. 5 Design and Implementation 273 Multilayer perceptron all the parameters with default values such as; number of hidden layers: 1 number of nodes in the hidden layer: 23 learning rate: 0.3 momentum: 0.2 number of iterations: 500 decay in learning rate: no activation function: sigmoid Results and Discussion All of the tests and execution of both the algorithms in Weka and the ANN model were conducted on hardware platform with a computer operating on 64-bit architecture and having an Intel Core i7 2.60 GHz central processing unit and 16 Gigabytes of random-access memory. The comparative results obtained by all of the algorithms during train, validation, and test phases are given in Table 4 and Table 5. In these tables, eight machine learning algorithms were executed and tested by the Weka software and the results for the last one was obtained with the ANN model that was implemented and developed by the author of this study. R2, root mean squared error and mean absolute error of min-max normalized Duration_1 and Duration_2 prediction values obtained by different algorithms and models are comparatively denoted in Table 4 and Table 5. It should be noted there are different options and parameters in Weka for the execution of these algorithms and they were tested with different parameters as well as their default parameters. In most of the cases, it was observed that changing the parameters degraded the performance of the algorithms so only the results obtained with their default configurations are included in the tables below. The training times in terms of minutes for each algorithm for the prediction of Duration_1 and Duration_2 are also given in Table 6. The validation and test results show that the ANN model proposed in this study has obtained the best and the most accurate results for the prediction of Duration_1 and Duration_2, regarding the lowest RMSE and MAE values, and the highest R2 values. The training performance is usually deceptive, which is explained previously in this chapter, hence validation and test performance should be focused on. 6 Machine learning approaches for prediction of service times in health information systems 274 Comparative performance results for the prediction of Duration_1 Train Validation Test Algorithm Name R2 RMSE MAE R 2 RMSE MAE R 2 RMS E MAE k-NN 0.5315 0.0134 0.004 5 0.4089 0.0261 0.007 2 0.407 6 0.026 2 0.007 1 Linear regression 0.0229 0.0460 0.009 5 0.0153 0.0557 0.011 7 0.014 8 0.059 0 0.011 9 Isotonic regression 0.1798 0.0269 0.006 5 0.1284 0.0542 0.009 5 0.113 9 0.053 7 0.009 5 Partial least squares regression 0.40 31 0.0468 0.014 6 0.2879 0.0422 0.014 1 0.287 0 0.045 9 0.014 3 M5 Model Tree 0.5128 0.0437 0.006 8 0.3419 0.0428 0.007 4 0.340 9 0.043 2 0.007 4 RBF Regressor 0.0036 0.0556 0.010 7 0.0018 0.0544 0.010 5 0.001 2 0.057 9 0.010 2 Single layer perceptron 0.13 72 0.9846 0.896 4 0.0915 0.9960 0.998 0 0.091 4 0.999 3 0.996 0 Multilayer perceptron 0.98 50 0.0272 0.005 1 0.9037 0.0276 0.009 7 0.902 8 0.027 5 0.009 4 ANN (proposed model) 0.96 32 0.009 2 0.00 17 0.963 1 0.0090 0.00 16 0.96 27 0.00 91 0.00 17 Comparative performance results for the prediction of Duration_2 Train Validation Test Algorithm Name R2 RMSE MAE R2 RMSE MAE R2 RMSE MAE k-NN 0.5455 0.0144 0.002 9 0.4546 0.0351 0.004 8 0.453 3 0.035 2 0.004 8 Linear regression 0.0575 0.0242 0.004 7 0.0479 0.0345 0.005 4 0.046 5 0.034 6 0.005 5 Isotonic regression 0.1826 0.0160 5 0.003 5 0.1014 0.0336 0.005 0 0.112 7 0.033 0 0.005 1 Partial least squares regression 0.184 9 0.0346 0.006 7 0.1027 0.0338 0.006 7 0.102 4 0.033 9 0.006 6 M5 Model Tree 0.6550 0.0276 0.002 0 0.3639 0.0287 0.003 1 0.362 8 0.027 3 0.002 9 RBF Regressor 0.0039 0.0341 0.005 1 0.0021 0.0354 0.005 6 0.002 2 0.035 5 0.005 6 Table 4. Table 5. 6 Results and Discussion 275 Single layer perceptron 0.000 7 0.9836 0.976 5 0.0004 0.9952 0.988 4 0.000 4 0.998 0 0.986 4 Multilayer perceptron 0.980 8 0.0217 0.001 7 0.9082 0.0213 0.002 3 0.905 4 0.021 9 0.002 4 ANN (proposed model) 0.99 01 0.0088 0.00 11 0.9898 0.0083 0.00 10 0.98 93 0.00 87 0.001 0 Training times for Duration_1 and Duration_2 Algorithm Name Training time for prediction of Duration_1 (minutes) Training time for prediction of Duration_2 (minutes) k-NN 9.16 8.65 Linear regression 7.83 7.50 Isotonic regression 75.05 81.34 Partial least squares regression 19.26 22.08 M5 Model Tree 56.49 58.15 RBF Regressor 14.85 16.02 Single layer perceptron 157.30 155.94 Multilayer perceptron 1075.92 1104.72 ANN (proposed model) 1843.14 1867.30 It could also be analyzed from Table 4 and Table 5 that the proposed model provides the lowest differences between the train vs. validation, and train vs. test performance measures, which shows that it has the lowest risk of overfitting. On the other hand, regarding the times spent during the training phase, it could be seen that the proposed ANN model had the worst performance among all of the algorithms (around 32 hours), which is given in Table 6. This was not something unexpected or unusual, because the proposed model is much more complex than all of the other algorithms and models used in this study. It was previously mentioned that the RMSE and MAE values given in Table 4 and Table 5 are for the min-max normalized values of Duration_1 and Duration_2. Those values can be easily de-normalized, and the de-normalized values might indicate results that are more meaningful for the decision makers and the managers. For instance, regarding the proposed ANN model, the de-normalized MAE for train, test, and validation of Duration_1 is 491, 462, and 491 seconds respectively. Similarly, the de-normalized MAE for train, test, and validation of Duration_2 is 318, 289, and 289 seconds respectively. The weighted aver- Table 6. Machine learning approaches for prediction of service times in health information systems 276 ages of MAE for Duration_1 and Duration_2 are 486 and 309 seconds. It is also meaningful to look at these weighted averages of MAE, since 70% of the whole data set was used for training, 15% for validation, and 15% for test. Hence, it can be deduced that the proposed ANN model provided very accurate and reliable predictions both for Duration_1 and for Duration_2, which can be seen by comparing the descriptive statistics in Table 2 and the results obtained by the weighted averages of MAE. The statistical independence between the variables "Duration_1”, “Duration_2”, and all of the correspondent independent feature variables were also examined. The correlations between "Duration_1" and “Duration_2” and the correspondent independent variables are analyzed by means of Pearson correlation coefficient (Larose 2006). It was observed that no significant correlations between the target variables and feature attributes exist, where correlation coefficient values were close to zero, which are also given in Table 7 and Table 8. Pearson correlations between dependent variable “Duration_1” and independent variables Independent variable name (feature attribute) Pearson cc Independent variable name (feature attribute) Pearson cc Patient's age 0.002 Day of month (Report of test re-sults) -0.014 Previous admissions (This department) 0.011 Hourly time interval (Report of test results) -0.001 Previous admissions (Other departments) 0.003 Day of month (Report delivery to patient) 0.002 Day of month (Patient registration) -0.045 Hourly time interval (Report de-livery to patient) 0.053 Hourly time interval (Patient registration) -0.008 Day of month (Visit to physician) -0.028 Day of month (Request for physician) -0.075 Hourly time interval (Visit to physician) -0.086 Hourly time interval (Request for physician) 0.018 Duration between Request for physician & Patient registration 0.097 Day of month (Specimen collection) 0.006 Duration between Specimen col-lection & Request for physician 0.162 Hourly time interval (Specimen collection) 0.034 Duration between Report of test results & Sample delivery to laboratory 0.089 Table 7. 6 Results and Discussion 277 Day of month (Sample delivery to laboratory) -0.002 Duration between Report delivery to patient & Report of test results 0.115 Hourly time interval (Sample delivery to laboratory) -0.009 Duration between Visit to physician & Report delivery to patient 0.027 Pearson correlations between dependent variable “Duration_2” and independent variables Independent variable name (feature attribute) Pearson cc Independent variable name (feature attribute) Pearson cc Patient's age 0.001 Day of month (Report of test re-sults) -0.033 Previous admissions (This department) 0.014 Hourly time interval (Report of test results) -0.026 Previous admissions (Other departments) 0.005 Day of month (Report delivery to patient) 0.018 Day of month (Patient registration) -0.002 Hourly time interval (Report de-livery to patient) 0.084 Hourly time interval (Patient registration) -0.001 Day of month (Visit to physician) -0.002 Day of month (Request for physician) -0.152 Hourly time interval (Visit tophysician) -0.017 Hourly time interval (Request for physician) 0.074 Duration between Request for physician & Patient registration 0.153 Day of month (Specimen collection) 0.027 Duration between Specimen collection & Request for physician 0.132 Hourly time interval (Specimen collection) 0.086 Duration between Sample delivery to laboratory and Specimen collection 0.065 Day of month (Sample delivery to laboratory) -0.001 Duration between Report delivery to patient & Report of test results 0.091 Hourly time interval (Sample delivery to laboratory) -0.004 Duration between Visit to physician & Report delivery to patient 0.027 Conclusions and Recommendations This study shows that an ANN with a deepened structure with four hidden layers could be a reliable and successful model for numerical prediction of service duration amongst different medical services in hospital and other medical institutions. It can also be deduced that Table 8. 7 Machine learning approaches for prediction of service times in health information systems 278 ANN is usually more successful than any other machine learning algorithm in such prediction tasks. It has been observed that the training time for this ANN model was much longer than all the other machine learning algorithms. This might be considered as the only drawback of the ANN model. However, this should not be considered as a problem or drawback, because tasks such as analysis of durations do not need to be real-time or very fast. In addition, the requirement for long times is only true for the training phase, in other words, after the ANN has been trained, it will run much faster during testing or within live systems. Another general criticism for ANN is that “it is a black-box model that doesn’t have explicit rules or a simple algorithm”. However, if ANN is to be used for specific tasks or purposes in business, such as the case explained in this study, this might not be considered as a problem. In other words, if the design, architecture, all the technical details, and maintenance of the system does not have to be correlated to decision makers and managers, which is true for cases like in this study, then the management does not have to know or understand the model, but they would rather focus on the reliability, accuracy, and success of the results. A new version of this MLP model could be designed and implemented as a future study, which might overcome the computational complexity and eventually decrease the time necessary for training the model. However, this MLP model can also be executed on another hardware platform with higher CPU and RAM capacity, and most importantly, with a graphics card having high GPU capacity. The proposed model’s architecture and hyper-parameters might be changed, such as more hidden layers, different number of nodes within each layer, different learning and momentum rates, different batch sizes, different dropout rates, alternative activation functions such as leaky ReLU, and so on. All of these alternatives might be tested and their performances could be comparatively analyzed with the proposed model in this study. Some of the features used in this study might be discarded by the aid of some feature selection / reduction methodologies such as statistical ranker algorithms, feature subset selection algorithms, Relief, and Wrapper models, and so on. After the feature reduction, the new data set might also be tested with the same proposed model in this study and the results might be compared. The scope of this study could be extended for many different business ar- 7 Conclusions and Recommendations 279 eas, besides health information systems; hence, the proposed model could be flexibly adapted to prediction of service times in any business such as logistics, manufacturing, telecommunications, etc., and it could be integrated to any Industry 4.0 model or application as well. References Aha, D. W., Kibler, D., & Albert, M. K. (1991). “Instance-based learning algorithms”, Machine Learning, Vol. 6 No. 1, pp. 37–66. Akaike, H. (1981). “Likelihood of a model and information criteria”, Journal of Econometrics, Vol. 16 No. 1, pp. 3–14. Bernardi, R. Constantinides, P., & Nandhakumar, J. (2017). “Challenging Dominant Frames in Policies for IS Innovation in Healthcare through Rhetorical Strategies”, Journal of the Association for Information Systems, Vol. 18 No. 2, pp. 81–112. Bishop, C. M. (2006). Pattern Recognition and Machine Learning, Springer Science + Business Media LLC, New York. Breil, B. Fritz, F., Thiemann, V., & Dugas, M. (2011). “Mapping turnaround times (TAT) to a generic timeline: a systematic review of TAT definitions in clinical domains”, BMC Medical Informatics and Decision Making, Vol. 11 No. 34, pp. 1–12. Broomhead, D. S., & Lowe, D. (1988). “Radial basis functions, multi-variable functional interpolation and adaptive networks (Technical report)”, Royal Signals and Radar Establishment, Report no. 4148, UK. Buduma, N., & Locascio, N. (2017). Fundamentals of Deep Learning, O’Reilly Media, Inc., USA. Chen, S. (2014). "Information needs and information sources of family caregivers of cancer patients", Aslib Journal of Information Management, Vol. 66 No. 6, pp. 623–639. Dasu T. & Johnson, T. (2003). Exploratory Data Mining and Data Cleaning, John Wiley & Sons Inc., New Jersey. Eibe, F. (2014). Fully supervised training of Gaussian radial basis function networks in WEKA. Retrieved from https://www.cs.waikato.ac.nz/~eibe/pubs/ rbf_networks_in_weka_ description.pdf Eminağaoğlu M., & Vahaplar A. (2018). “Turnaround Time Prediction for a Medical Laboratory Using Artificial Neural Networks”, International Journal of Informatics Technologies, Vol.11 No. 4, pp: 357–368. 8 Machine learning approaches for prediction of service times in health information systems 280 Fieri, M., Ranney, N. F., Schroeder, E. B., Van Aken, E. M., & Stone, A. H. (2010). “Analysis and improvement of patient turnaround time in an emergency department”, in Proceedings of the 2010 IEEE Systems and Information Engineering Design Symposium, University of Virginia Charlottesville, VA, USA, 2010, pp. 239–244. Goodfellow, A., Bengio, Y., & Courville, A. (2017). Deep Learning, The MIT Press, USA. Goswami, B., Singh, B., Chawla, R., Gupta, V. K., & Mallika, V. (2010). “Turnaround time (TAT) as a benchmark of laboratory performance”, Indian Journal of Clinical Biochemistry, Vol. 25 No. 4, pp. 376–379. Graves, A. (2012). Supervised Sequence Labelling with Recurrent Neural Networks, Springer-Verlag., Berlin. Han, J., & Kamber, M. (2006). Data Mining: Concepts and Techniques, 2nd edition, Morgan Kaufmann Publishers, San Francisco. Hand, C., Mannila, H., & Smyth P. (2001). Principles of Data Mining, the MIT Press, London. Hassanpour, M., Vaferi, B., & Masoumi, M. E. (2018). “Estimation of pool boiling heat transfer coefficient of alumina water-based nanofluids by various artificial intelligence (AI) approaches”, Applied Thermal Engineering, Vol. 128, pp. 1208–1222. Hastie, T., Tibshirani, R., & Friedman, J. (2009). The Elements of Statistical Learning, Data Mining, Inference and Prediction, 2nd edition, Springer, New York. Haykin, S. (2009). Neural Networks and Learning Machines, 3rd edition, Pearson Education, Inc., New Jersey. He, C., Fan, X., & Li, Y. (2013). “Toward Ubiquitous Healthcare Services with a Novel Efficient Cloud Platform”, IEEE Transactions on Biomedical Engineering, Vol. 60 No. 1, pp. 230–234. Hendrickx, I., & Antal, V. B. (2005). "Hybrid algorithms with Instance-Based Classification", in Proceedings of 16th European Conference on Machine Learning Machine Learning: ECML2005, Porto, Portugal, 2005, pp. 158–169. Hope, T., Yehezkel, S. R., & Lieder, I. (2017). Learning TensorFlow: A Guide to Building Deep Learning Systems, O’Reilly Media, Inc., USA. Huang, W., Lai, K. K., Nakamori, Y., & Wang, S. (2004). “Forecasting Foreign Exchange Rates with Artificial Neural Networks: A Review”, International Journal of Information Technology & Decision Making, Vol. 3 No. 1, pp. 145–165. Köksal, H., Eminağaoğlu, M., & Türkoğlu, B. (2016) “An Adaptive Network-Based Fuzzy Inference System for Estimating the Duration of Medical Services: A Case Study”, in Proceedings of 10th IEEE International Conference on Application of Information and Communication Technologies, Baku, Azerbaijan, pp: 801–806. Kumar, S. (2017). Neural Networks – A Classroom Approach, 2nd ed., McGraw- Hill, New Delhi. 8 References 281 Larose, D. T. (2005). Discovering Knowledge in Data – An Introduction to Data Mining, John Wiley & Sons Inc., New Jersey. Larose, D. T. (2006). Data Mining Methods and Models, John Wiley & Sons Inc., New Jersey. Lyon, A. R., Wasse, J. K., Ludwig, K., Zachry, M., Bruns, E. J., Unutzer, J., & Mc- Cauley, E. (2016). “The Contextualized Technology Adaptation Process (CTAP): Optimizing Health Information Technology to Improve Mental Health Systems”, Administration and Policy in Mental Health and Mental Health Services Research, Vol. 43 No. 3, pp. 394–409. Mitchell, T. M. (2017). Machine Learning, McGraw-Hill, India. Nedjah, N., Luiza, M. M., & Kacprzyk, J. (2009). Innovative Applications in Data Mining, Springer-Verlag, Berlin. Ng, X. W., & Chung, W. Y. (2012). “VLC-based Medical Healthcare Information System”, Biomedical Engineering: Applications, Basis and Communications, Vol. 24 No. 2, pp. 155–163. Patterson, J., & Gibson, A. (2017). Deep Learning: A Practitioner’s Approach, O’Reilly Media, Inc., USA. Python, (2019). Programming language. Retrieved from https://www.python.org/ downloads/ windows/ Olivas, E. S., Guerrero, J. D. M., Sober, M. M., Benedito, J. R. M., & Lopez, A. J. S. (2009). Handbook of Research on Machine Learning Applications and Trends: Algorithms, Methods, and Techniques, IGI Publishing, USA. Quinlan, R. J. (1992). “Learning with Continuous Classes”, in Proceedings of the 5th Australian Joint Conference on Artificial Intelligence, Singapore, 1992, pp. 343–348. Ravinesh, C. D., & Şahin, M. (2015). “Application of the extreme learning machine algorithm for the prediction of monthly Effective Drought Index in eastern Australia”, Applied Soft Computing, Vol. 153, pp. 512–525. Reyes, J., Morales-Esteban, A., & Martinez-Alvarez, F. (2013). “Neural networks to predict earthquakes in Chile”, Applied Soft Computing, Vol. 13 No. 2, pp. 1314– 1328. Samudrala, S. (2019). Machine Intelligence: Demystifying Machine Learning, Neural Networks and Deep Learning, Notion Press, Chennai. Scagliarini, M., Apreda, M., Wienand, U., & Valpiani, G. (2016). "Monitoring operating room turnaround time: a retrospective analysis", International Journal of Health Care Quality Assurance, Vol. 29 No. 3, pp. 351–359. Sinreich, D., & Marmor, Y. (2005). "Ways to reduce patient turnaround time and improve service quality in emergency departments", Journal of Health Organization and Management, Vol. 19 No. 2, pp. 88–105. Machine learning approaches for prediction of service times in health information systems 282 Söderholm, H.M., & Sonnenwald, D. H. (2010). "Visioning Future Emergency Healthcare Collaboration: Perspectives from Large and Small Medical Centers", Journal of The American Society for Information Science and Technology, Vol. 61 No. 9, pp. 1808–1823. Storrow, A. B., Zhou, C., Gaddis, G., Han, J. H., Miller, K., Klubert, D., Laidig A., & Aronsky, D. (2008). “Decreasing Lab Turnaround Time Improves Emergency Department Throughput and Decreases Emergency Medical Services Diversion: A Simulation Model”, Academic Emergency Medicine, Vol. 15 No. 11, pp. 1130–1135. Stvilia, B., Mon, L., & Yi, Y. J. (2009). "A Model for Online Consumer Health Information Quality", Journal of the American Society for Information Science and Technology, Vol. 60 No. 9, pp. 1781–1791. Tensorflow (2019). An open source machine learning framework for everyone. Retrieved from https://github.com/tensorflow/tensorflow/blob/master/RELEASE. md Wang, Y. & Witten, I. H. (1997). “Induction of model trees for predicting continuous classes”, in Poster papers of the 9th European Conference on Machine Learning, Prague, Czech Republic, 1997. Weka (2019). Data Mining Software in Java. Retrieved from http://www.cs.waikato .ac.nz/ ml/weka/ Willoughby, K. A., Chan, B. T. B., & Strenger, M. (2010). "Achieving wait time reduction in the emergency department", Leadership in Health Services, Vol. 23 No. 4, pp. 304–319. Witten, I. H., Frank, E., & Hall, M. A. (2011). Data Mining: Practical Machine Learning Tools and Techniques, 3rd edition, The Morgan Kaufmann Series in Data Management Systems, Burlington. Wold, S, Sjöström, M., & Eriksson, L. (2001). "PLS-regression: a basic tool of chemometrics", Chemometrics and Intelligent Laboratory Systems, Vol. 58 No. 2, pp. 109–130. Yeh, I-C., & Lien, C. (2009). “The comparisons of data mining techniques for the predictive accuracy of probability of default of credit card clients”, Expert Systems with Applications, Vol. 36, pp. 2473–2480. Yu, L., Lai, K. K., & Wang, S. Y. (2006). “Currency Crisis Forecasting with General Regression Neural Networks”, International Journal of Information Technology & Decision Making, Vol. 5 No. 3, pp. 437–454. 8 References 283 Key TErms Activation function Artificial neural networks Backpropagation Gradient descent Health information systems Machine learning Multilayer perceptron Normalization Numerical prediction Overfitting Regression Service duration Questions for Further Study What are the common properties of activation functions such as sigmoid and hyperbolic tangent, which make them preferable in multilayer perceptron models? Suppose that you have obtained very low MAE and RMSE values, but also R2 is observed to be close to zero for your regression problem. What would be the reason(s) for this? For any given data and problem, how can you understand that the machine learning algorithm is properly trained or that it has actually learned? Discuss it. If accurate prediction of durations for business processes is a goal, why and how could this be valuable or helpful for decision makers? Why is overfitting a problematic / risky issue in machine learning systems? Describe it by giving an example. Define two basic reasons / causes that makes min-max normalization necessary in data transformations. Exercises Suppose that you are an information technology (IT) expert and you are working in a new business intelligence project that must be implemented with an artificial neural network (ANN) model. How would you design ANN for that project? What should be included in your requirements analysis for ANN model and the project? How would you construct the architecture of ANN and how would you choose the functions and parameters for ANN? 9 10 11 Machine learning approaches for prediction of service times in health information systems 284 In this chapter, a simple method for transformation of categorical / nominal attributes to numerical attributes are explained. What about the opposite cases? In other words, which methods or techniques can be used for transformation of numerical values to nominal values? Make a short research and make a list of different techniques for numeric-to-nominal transformation. (You can conduct your research by using some of the relevant resources given in “References” section of this chapter, or you can use any resource you would like, such as Internet, etc.) As a software engineer, you are going to be developing a new software tool that is composed several machine learning algorithms. How would you decide upon the programming language and software architecture for this tool? Make a list of all the necessary criteria and requirements for this decision and use any kind of qualitative / quantitative assessment methodology to rank and prioritize these criteria. (PS: You must be focusing on “people” and “process” factors as well as “technology”. For instance, you must keep in mind that this tool will be used by non-technical business analysts and decision makers, hence ease-of-use, similarity, flexibility, maintainability, costs, etc. must also be considered.) Suppose that you are an IT consultant for a company that is planning to develop a smart Industry 4.0 automation tool for a specific business case by the aid of a machine learning approach. There are some strict constraints for this tool. For instance, the tool will be installed and used in small digital circuits and sensors with very low data storage and computation capacity. In addition, this tool must operate very fast; hence, the training phase of the machine learning algorithm must also be fast as possible. Which type of machine learning algorithms would you recommend first? Would it be a good idea to recommend and use ANN for this tool? Why? Further Reading Deloitte Access Economics (2017). “Business Impacts of Machine Learning (Technical Report)”, Deloitte Touché Tohmatsu Limited, Australia. Deloitte (2018). “Artificial Intelligence Innovation Report”, Deloitte Touché Tohmatsu Limited, UK. 12 12 Further Reading 285 Deshmukh, P. S. (2018) "Travel Time Prediction using Neural Networks: A Literature Review," in IEEE International Conference on Information, Communication, Engineering and Technology, Pune, India, 2018, pp. 1–5. Gudivada, A., & Tabrizi, N. (2018) "A Literature Review on Machine Learning Based Medical Information Retrieval Systems," in IEEE Symposium Series on Computational Intelligence (SSCI), Bangalore, India, 2018, pp. 250–257. Hall, P., Phan, W., & Whitson, K. (2016). The Evolution of Analytics: Opportunities and Challenges for Machine Learning in Business, O’Reilly Media, Inc., USA. Hinton, G. E., & Salakhutdinov, R. (2006). “Reducing the dimensionality of data with neural networks”, Science, Vol. 313, Issue 5786, pp. 504–507. LeCun, Y., Bengio, Y., & Hinton, G. E. (2015). “Deep Learning”, Nature, Vol. 521, pp. 436–444. Özdemir, Ş., Erkollar, A., & Oberer, B. (2018). “Transformation of The Machines from Learner to Decision Maker: Industry 4.0 and Big Data”, Mugla Journal of Science and Technology, Vol. 4, No. 2, pp. 219–223. PricewaterhouseCoopers (2018). “Sizing the prize: What’s the real value of AI for your business and how can you capitalise? (Technical Report)”, PwC Pub., UK. Tealab, A. (2018). “Time series forecasting using artificial neural networks methodologies: A systematic review”, Future Computing and Informatics Journal, Vol. 3, Issue 2, pp. 334–340. Machine learning approaches for prediction of service times in health information systems 286

###
Chapter Preview

**References**

###
Abstract

Organizations have always been dependent on communication, information, technology and their management. The development of information technology has sped up the importance of management information systems, which is an emerging discipline combining various aspects of informatics, information technology, and business management. Understanding the impact of information on today’s organizations requires technological and managerial views, which are both offered by management information systems.

Business management is not only about generating greater returns and using new technologies for developing businesses to reach future goals. Business management also means generating better revenue performance if plans are diligently followed.

It is part of business management to have an ear to the ground of global economic trends, changing environmental conditions and preferences, as well as the behavior of value chain partners. While, until now, business management and management information systems are mostly treated as independent fields, this publication takes an interest in the cooperation of the two. Its contributions focus on both research areas and practical approaches, in turn showing novelties in the area of enterprise and business management.

Main topics covered in this book are technology management, software engineering, knowledge management, innovation management and social media management.

This book adopts an international view, combines theory and practice, and is authored for researchers, lecturers, students as well as consultants and practitioners.