A simple data science experiment with Azure Machine Learning Studio

What is Machine Learning, data science and Azure Machine Learning Studio?

  • Machine Learning is concerned with computer programs that automatically improve their performance through experience. It learns from previous experience or data.
  • Data science, also known as data-driven science, is an interdisciplinary field about scientific methods, processes, and systems to extract knowledge or insights from data in various forms, either structured or unstructured, similar to data mining. (Wikipedia)
  • Azure Machine Learning Studio is a tool that uses to develop predictive analytic solutions in the Microsoft Azure Cloud.

Experiment Overview
Azure Machine Learning Studio is an excellent tool to develop and host Machine Learning Application. You don’t need to write code. You can develop an experiment by drag and drop. Here we will create a simple Machine Learning experiment using Azure Machine Learning Studio.

Tools and Technology used

  1. Azure Machine Learning Studio

Now create our experiment step by step

Step 1: Create Azure Machine Learning Workspace

  • Go to https://portal.azure.com and log in using your azure credential
  • Click More Services from left panel of azure portal
  • Click “Machine Learning Studio Workspace” under “Intelligence + Analytics” category
  • Add a work space by clicking add (+) button at the top left corner
  • Choose pricing tire and select. Figure shows pricing tire below.
  • Finally click create button

001

002

Step 2: Launch Machine Learning Studio

  • Click Launch Machine Learning Studio to launch machine learning studio
  • Then login to the portal

003

Step 3: Create a blank experiment

  • Select Experiment Menu. Then click New (+), at the bottom left corner.
  • Click Blank Experiment. In addition to blank experiment there are many other sample experiments. You can load and modify the experiment.
  • Once the new blank experiment has been loaded, you will then see the Azure ML Studio visual designer as follows.

004

005

Step 4: Add data set in the ML Studio visual designer

  • You can import data set or can use saved data set. In this case we use saved sample dataset.
  • Click Saved Datasets from left top corner.
  • Drag and drop “Adult Census Income Binary Classification dataset” from Saved Datasets -> Sample

006


Step 5: Select columns in dataset

  • Expand Data Transformation -> Manipulation
  • Drag and drop “Select Columns in Dataset” to the visual surface
  • Connect the “Dataset” with “Select Columns in Dataset” in visual surface
  • Click the Select Columns in Dataset
  • Click Launch column selector in the property pane
  • Select “WITH RULES”
  • Add age, education, marital-status, relationship, race, sex, income columns and finally click tick mark of the bottom right corner.

007

008

Step 6: Split up the dataset

  • Split your input data into two – Training data and Validation data
  • Expand “Data Transformation” -> “Sample and Split” from left pane
  • Drag and drop Split Data to Azure Machine Learning Studio visual surface
  • Connect the split module with “Select Columns in Dataset” in visual surface
  • Click the Split module and set the value of the Fraction of Rows to 0.80 in the right pane of the visual designer surface. This means 80 percent data will be used for training and rest of the data will be used for validation.

009

Step 7: Train the model

  • Expand “Machine Learning” -> “Train” from left pane
  • Drag and drop “Train Model” to Azure ML Studio visual surface
  • Connect split dataset1 to train model (second point of train model as figure below)
  • Expand Machine Learning -> Initialize Model -> Classification from left pane
  • Drag and drop “Two-Class Boosted Decision Tree” as shown figure
  • Connect “Two-Class Boosted Decision Tree” to Train Model (first point of train model as figure below)

010


Step 8: Choose columns for prediction

  • Click the Train Model
  • Click “Launch column selector” in the property pane
  • Select Include and add column name “Income”. Because this experiment will predict income.
  • Click tick mark on the bottom right corner

011

Step 9: Score the model

  • Expand “Machine Learning” -> “Score”
  • Drag and drop “Score Model” to the visual design surface.
  • Connect Train Model to Score Model (first point of Score Model as figure below)
  • Connect “Split” to “Score Model” (second point of Split with Second point of Score Model as figure below)

012


Step 10: Evaluate the model

  • Expand “Machine Learning” -> “Evaluate”
  • Drag and drop “Evaluate Model” to the visual design surface.
  • Connect “Score Model” to “Evaluate Model” (first point of Evaluate Model as figure below)
  • Now click “Run” at the bottom of the Azure ML Studio. After processing, if you see each stage marked as green, means its ok.
  • After completing process, right click on the Evaluate Model -> Evaluation Result -> Visualize
  • You will see the accuracy curve as shown below.
  • Click Save As at the bottom of the screen

013

014

015

Step 11: Setup a web service

  • Click Setup Web Service -> Predictive Experiment
  • Connect Web Service Input to Score model (As shown below figure)
  • Select “Column in Dataset”, remove income column from dataset. Because model is now ready to predict income.
  • Save and run the model from bottom of the ML studio

016

017

018

Step 12: Deploy Web Service

  • Click Deploy Web Service -> Deploy Web Service [Classic] from the bottom of ML Studio
  • After completing deployment process, you will see a dashboard. Here you will see different documents to test and consume services as shown below
  • Click “Test Button” from the Dashboard
  • You will see a popup dialog to take input
  • Type input as like below and Click Tick mark
  • You will see desired output as like figure. Here you see income > 50K

019

020

021

Now you have developed a simple data science experiment. You can now embed this with your application. API links, security key and necessary document is given in the dashboard.

Easy Solution On How to Stop Growing Log File Too Big

In various organizations, huge SQL databases are equipped, which perform more than millions of transactions per hour. A SQL server database has data files and transaction log files. Data files store the user data and transaction log files store all the changes user made in the database and all the details of the transactions performed while making changes.

Now, the issue is that this feature of logging the details, every time changes are made in SQL server can not be controlled or stopped. This causes a grave issue when the size of SQL server overgrows. However, the way in which these log files grow and configure can be controlled. So, to avoid SQL server log file growing unexpectedly, consider any of the following methods given below. Also, to manage the large area size, it is good to shrink log file size, we will discuss the ways to resolve the same issue in this content.

SQL Server- Solutions to Stop Growing Log File Too Big

There are numerous ways for truncating sql ldf too big file. Some of the chief solutions have been provided in the following segment of this content.

  • Monitor default Size Limit: In case SQL ldf file is growing too big then, put a large default size limit on the SQL server, so that it does not expands automatically and overloads the SQL server database.
  • Using the Memory Units: Remember to configure the expansion of log files by making use of the memory units instead of percentage if the SQL transaction log file grows quickly.
  • Changing the recovery model: Simple Recovery model definitely helps in controlling and shrinking the log file size. Based on how crucial the data is, user can choose any of the following recovery models namely,
    1. Simple Recovery Model
    2. Bulk-logged Recovery Model
    3. Full Recovery Model

In the simple model, the most recent backup of the database of SQL server is recovered while in the bulk-logged or full recovery model, database can be recovered up to the point of failure. This recovery is done by restoring transaction log file.

By default, Full recovery model is set. Then, user has to regularly back up the transaction log files to prevent them from becoming too large and to remove inactive transactions from transaction log. Also, consider this when taking back up of .ldf files.

NOTE: If user is defragmenting indexes, make use of DBCC INDEXDEFRAG and not DBCC DBREINDEX. If DBCC DBREINDEX is used then, transaction log file might expand drastically.

Using manual solution:

If maintenance is not carried on regularly, log file size grows too big. Therefore, it is recommended to take these manual steps before it takes up all available disk space. Now, let us look at the method to shrink a SQL Database’s transaction log file:

  1. Firstly, Open the SQL Server Management Studio and then log in to the proper SQL instance.
  2. In Object Explorer tree, expand the Database folder >> select database with large .ldf file.
  3. After this, Create a full backup of database by right-clicking on it >> Select Tasks >> Back Up.
    • User should make sure that Backup type is set to Full then, Delete any existing destinations >> add a new Disk destination.
    • Browse a location with a lot of free disk space >> rename the backup file with the .BAK extension.
    • Choose the bullet option for Overwrite all existing backup sets on the ‘Options’ page.
    • Finally, user can click on the ‘OK’ button to start the process of taking backup of log files.
  4. Similarly, create a transaction log backup of the database in the same manner as done above.
    • Right-click on database >> Select Tasks >> Backup and assure that backup type is set to Transaction log.
    • Choose the bullet option for Overwrite all existing backup sets on the ‘Options’ page.
    • Finally, user can click on the ‘OK’ button to start the process of taking backup of log files.
  5. The closing step is the shrinking of transaction log files by right-clicking database >> Tasks >> Shrink >> Files

NOTE: User may repeat steps 3,4 and 5 until the .ldf file size becomes physically smaller.

Conclusion

In this content, we have discussed various solutions on how to truncate .ldf files if transaction file size becomes too large. This is necessary in order to manage data with ease. Manual method and some general solutions have been written in this content to enlighten users when they come across such issues while dealing with SQL Server.

Judging Microsoft Imagine Cup – 2017

Judging Microsoft Imagine Cup (1st Round) – 2017 in Bangladesh
Venue: Microsoft Office, Dhaka, Bangladesh
Date: 19 March, 2017