Blog

IBM’s SPSS Modeler enhances the efficiency of analytics

Analytics require tools, which include free and commercial options. Analytics tools can also be divided into code-controlled software and software that has a graphical user interface. The operational methods, features, performance level, and price of the tools vary greatly so, in addition to the price, the skills of the analyst, the available time, and the targeted analysis level also determine which tool is best suited for each purpose.

A good rule of thumb is that commercial options are graphical products and free options are analytics tools that require the use of a programming language. Of course, there are some exceptions to this rule. The most common commercial products are SAS and IBM SPSS Modeler. The most popular free programming languages designed for data mining, modelling, and visualization are R and Python. Working with SAS Miner and SPSS Modeler is based on a workflow model so that the process is completed using ready components and run as a flow. Code-controlled R and Python, on the other hand, require coding during use. These options require far more advanced skills in programming, which increases the threshold for starting to use these tools.

The user interfaces of commercially available graphical analytics tools are very visual and intuitive, and therefore their use is quick to learn. In addition to visualization, the main benefits of these tools include easy modifiability of the processes. The main benefit of free programming languages is flexibility (the computer can be commanded to do nearly anything). In addition, many of the commercial products support embed coding, bringing them onto the same level as the code-based products where required. Both the free and commercial options allow the automation of processes but this is slightly easier to do in commercial products.

I have seven years of experience in R-programming and three and half in using IBM’s SPSS Modeler. For the past few years, I have been using SPSS Modeler and its co-products nearly every day in my work. Below is a summary of my experiences in using the above-mentioned products.

- When the R programming language is used, programming takes around three times longer than when using SPSS Modeler. The more data used, and the larger the number of sources used, the quicker Modeler is in comparison to R.

Using Modeler and testing the analysis processes is easy. With Modeler, starting the process is very quick because the data preparation and modelling components are drawn directly into the workflow, after which their attributes are clicked into place. For a first-time user, Modeler is a significantly easier tool for achieving the desired results than the free option that requires coding.

Analytical processes are often really extensive entities. Tracing dependencies with correlations is laborious and interpretation is not easy, meaning that some factor can easily be overlooked in the programming. In Modeler, managing, understanding, and editing entities is significantly easier than interpreting and editing a long programmed code. If, for example, the middle part of a code that you have programmed from the very start changes, the robustness of the final process toward minor changes is threatened. If the solution has been made following the best practice principle, changes are usually minimal.

In most cases, the functions included in SPSS Modeler are sufficient to complete the entire process. However, if the need to extend the functionality arises, it is easy to add an embedded sub-process using R code.