How many times have you wished for more hours in the day so you can complete more tasks? A key goal of AI or machine learning automation is to have machines complete tasks for you, freeing up time so you can focus on the more complex, higher-value tasks. However, there are simply not enough data scientists in the world to deliver on the AI potential. Data scientists building AI applications require numerous skills – data visualization, data cleansing, artificial intelligence algorithm selection and diagnostics. What if some of these data science tasks could be automated using AI, increasing data science productivity to tackle more AI use cases?
Automating data science tasks leaves room to build more AI applications with the same amount of data science resources. For example, there are a plethora of software tools available to automatically develop predictive models from relational data, and according to Gartner, “By 2020, more than 40% of data science tasks will be automated, resulting in increased productivity and broader usage by citizen data scientists.” Until then, we observe in another Gartner survey that organizations are outsourcing various tasks in the Machine Learning pipeline. 
Figure 4. The Outsourcing of Data Science Functions
Source: Gartner (April 2018)
Read the full Gartner report titled “Doing Machine Learning Without Hiring (More) Data Scientists” to learn proven solutions for successful pilots.
Machine learning has been part of the Birst platform from day one to automate the most complex analytic processes and accelerate the time to insight. If your data science resources are strained, here is an example data scientist workflow that leverages Birst to speed up tasks throughout the application development lifecycle – data preparation, model development and testing with R integration, deployment into the business, and governance and management.
1. Leverage analytic ready data sets in Birst with the ability to do additional self-service data blending in Birst Pronto.
2. Connect R Studio to a cleaned dataset in Birst. Develop and test a model in R Studio.
3. Assess and validate the model with analysts and business users. Use Birst Pronto to implement the model by creating an R transform step that creates a new metric. This new metric can then be networked into the governed semantic layer.
4. Any business user or analyst can report on this new metric through a report or dashboard.
If you are a data scientist reading this, what are some of the basic or repetitive tasks you would like to see automated? We asked Senior Vice President and Chief Scientist Ziad Nejmeldeen, who leads Birst’s Dynamic Science Labs, to discuss what his team of data scientists are doing in the field of ML automation and AI and where he sees the potential for AI taking over data science tasks so he can scale his organization.
1. Please describe the mission of your data science team.
We are the global Data Science team supporting all Infor industries & solutions. Our goal is to create science solutions that make better predictions or result in more optimal decisions. We build out full solutions using our internal development team and share science designs with other development teams when helping add new product features. Some examples of products we have built include pricing optimization for Distribution and inventory optimization for Healthcare.
2. What skills do you look for when hiring?
We work with customers directly, which requires our team to not only be analytical, but also to effectively communicate the science behind the design, listen and gather feedback, and update the science design based on feedback. Therefore, we look for candidates who are highly analytical with great communication skills. Our recruits come from various disciplines – chemistry, physics, math, statistics, operations research, and economics. Part of the interview process is for a candidate to give a talk on a subject they are passionate about to evaluate how well they answer follow-up questions. Fortunately, being in Cambridge has given us access to a strong pool of candidates.
3. Data science requires special skills in the areas of data cleansing, data visualization, algorithm selection, diagnostics, and more. What tools does your data science team use to perform these tasks and where would you like to see more automation using AI?
In the proof of concept phase, the lead scientist uses any combination of tools that they feel best serve the purpose for analysis – R, Python, various ML packages with results sent to Birst for viewing. In the development phase, we use Java and are moving to Birst as our standard for UI.
We are working toward more automation in bridging data across different applications in a meaningful way through a library of standard application connectors. Today, an application connector allows you to connect and browse the data in the application. For example, in the future, a connector for a standard CRM application and a connector for a standard HR application would automatically produce analytic-ready data sets that join all relatable data.
This is important because insights are otherwise limited to the data set you manually pull together for analysis. What if the true underlying cause for an issue has not been pulled into the data set for analysis? With a library of standard application connectors that join relatable data together automatically, the system can find the true underlying cause for an issue.
Birst already has automation in the creation of analytic-ready data sets through Automated Data Refinement (ADR), and this saves our data scientists an enormous amount of initial data cleansing time. Through the Birst Pronto data discovery interface, data scientists leverage this prepared data to perform additional data manipulation or transformations. Birst integration with R also gives our data scientists the option to easily move to the data modeling and productization phases with ease.
4. How do you assess that a model is good?
Assessing a model’s goodness of fit is done a couple ways. The first is by splitting the data into one part that we train the model on and another that we test against. This allows us to tell customers how good the model is expected to be, based on past performance. The second is running a comprehensive test/control after providing actionable insights, where the test uses our findings and the control does not. We require the test/control findings when undergoing any proof of concept before doing major development on a new solution.
5. We are still in the early stages of AI, where ML automation is used to optimize business processes or workflows. What are some of the use cases where you have seen the best results?
ML automation is ideal for problem solving when the solution requires millions of decision points across products, locations, customers, employees, time periods. Whatever it is that is being forecasted or optimized, AI does well when the decision points exceed what can be reasonably handled with human management.
ML automation also does well when we have clearly defined questions and the availability of data to answer those questions. We like to begin with specific questions whenever we are asked to do any type of analysis, as this can quickly point to the necessity of gathering better data.
Places where we have seen success is in the optimization of prices and inventory. The data required here is readily available across industries and the decision points involved, especially when doing anything at a specific location level, are vast.
6. Your team has evolved from solving one customer problem at a time to developing industry solutions for Infor. What are some of the industry solutions currently in development?
We are helping with the Birst Smart analytics initiative, which will serve automated insights on what is going wrong and why it’s going wrong for any customer-defined KPI. In a traditional BI system, a user defines thresholds and alerts to be notified of an issue. If an issue arises, the user drills into the data to try and understand what is causing the issue. But with Birst Smart analytics, the automated insights feature proactively analyzes a universe of data to uncover anomalies or issues in the system, and it points the user to why the issue is likely occurring.
We are also working on projects across our CloudSuites to predict industry-specific KPIs that would be served up to the user in Birst as the reporting tool for the CloudSuites. This builds on the Birst Smart project, not only predicting what is going to happen but doing so for KPIs that we have deemed important for each industry. We have begun work in this area focusing on Healthcare, Manufacturing, and Distribution.
1Gartner: Maximize the Value of Your Data Science Efforts by Empowering Citizen Data Scientists, Carlie J. Idoine, Erick Brethenoux, 12 June 2018.
2Gartner: Doing Machine Learning Without Hiring (More) Data Scientists, Shubhangi Vashisth, Alexander Linden, 27 April 2018.
Mona Patel works in Birst’s Product Strategy team. With more than 20 years of experience building analytic solutions at The Department of Water and Power, Air Touch Communications, Oracle, MicroStrategy, EMC and IBM, Mona is now growing her career at Birst. Mona received her Bachelor of Science degree in Electrical Engineering from UCLA.