101 Python datatable Exercises (pydatatable)

Python datatable is the newest package for data manipulation and analysis in Python. It carries the spirit of R's `data.table` with similar syntax. It is super fast, much faster than pandas and has the ability to work with out-of-memory data.

Written by Selva Prabhakaran | 15 min read

Python datatable is the newest package for data manipulation and analysis in Python. It carries the spirit of R’s data.table with similar syntax. It is super fast, much faster than pandas and has the ability to work with out-of-memory data. Looking at the performance it is on path to become a must-use package for data manipulation in python.

101 Python datatable Exercises (pydatatable). Photo by Jet Kim.

1. How to import datatable package and check the version?

Difficulty Level: L1

Show Solution

python

import datatable as dt
dt.__version__

python

'0.8.0'

You need to import datatable as dt for the rest of the codes in this exercise to work.

2. How to create a datatable Frame from a list, numpy array, pandas dataframe?

Difficulty Level: L1

Question: Create a datatable Frame from a list, numpy array and pandas dataframe.

Input:

python

import pandas as pd
import numpy as np

my_list = list('abcedfghijklmnopqrstuvwxyz')
my_arr = np.arange(26)
my_df = pd.DataFrame(dict(col1=my_list, col2=my_arr))

Desired Output:

Show Solution

python

import pandas as pd
import numpy as np
import datatable as dt

# Inputs
my_list = list('abcedfghijklmnopqrstuvwxyz')
my_arr  = np.arange(26)
my_df   = pd.DataFrame(dict(col1=my_list, col2=my_arr))


# Solution
dt_df1  = dt.Frame(my_list)
dt_df2  = dt.Frame(my_arr)
dt_df3  = dt.Frame(my_df)
dt_df4  = dt.Frame(A=my_arr, B= my_list)

3. How to import csv file as a pydatatable Frame?

Difficulty Level: L1

Question: Read files as datatable Frame.

Show Solution

Input: BostonHousing dataset

python

# Solution
import datatable as dt
df = dt.fread('https://raw.githubusercontent.com/selva86/datasets/master/BostonHousing.csv')
df.head(5)

3. How to read first 5 rows of pydatatable Frame ?

Difficulty Level: L1

Question: Read first 5 rows of datatable Frame.

Input URL for CSV file: https://raw.githubusercontent.com/selva86/datasets/master/BostonHousing.csv

Show Solution

python

# Solution
import datatable as dt
df = dt.fread('https://raw.githubusercontent.com/selva86/datasets/master/BostonHousing.csv', max_nrows= 5)
df

4. How to add new column in pydatatable Frame from a list?

Difficulty Level: L1

Question: Read first 5 rows of datatable Frame and add a new column of length 5.

Input URL for CSV file: https://raw.githubusercontent.com/selva86/datasets/master/BostonHousing.csv

Show Solution

python

# Input
import datatable as dt
df = dt.fread('https://raw.githubusercontent.com/selva86/datasets/master/BostonHousing.csv', max_nrows= 5)

# Solution
df[:,"new_column"] = dt.Frame([1,2,3,4,5])
df

5. How to do addition of existing columns to get a new column in pydatatable Frame?

Difficulty Level: L1

Question: Add age and rad columns to get a new column in datatable Frame.

Show Solution

Input: BostonHousing dataset

python

# Input
import datatable as dt
df = dt.fread('https://raw.githubusercontent.com/selva86/datasets/master/BostonHousing.csv')

# Solution
df[:,"new_column"] = df[:, dt.f.age + dt.f.rad]

6. How to get the int value of a float column in a pydatatable Frame?

Difficulty Level: L1

Question: Get the int value of a float column dis in datatable Frame.

Input: BostonHousing dataset

Show Solution

python

# Input
import datatable as dt
df = dt.fread('https://raw.githubusercontent.com/selva86/datasets/master/BostonHousing.csv')

# Solution
df[:, "new_column"] = df[:, dt.int32(dt.f.dis)]
df.head(5)

Show Solution

7. How to create a new column based on a condition in a datatable Frame?

Difficulty Level: L2

Question: Create a new column having value as ‘Old’ if age greater than 60 else ‘New’ in a `datatable` Frame.

Input: BostonHousing dataset

Show Solution

python

import datatable as dt
df = dt.fread('https://raw.githubusercontent.com/selva86/datasets/master/BostonHousing.csv')
df[:, "new_column"] = dt.Frame(np.where(df[:, dt.f.age > 60], 'Old', 'New'))
df.head(5)

8. How to left join two datatable Frames?

Difficulty Level: L1

Question: join two Frames.

Input:

python

import datatable as dt
df1 = dt.Frame(A=[1,2,3,4],B=["a", "b", "c", "d"])
df2 = dt.Frame(A=[1,2,3,4,5],C=["a2", "b2", "c2", "d2", "e2"])

Primary Key : A

Show Solution

python

import datatable as dt
df1 = dt.Frame(A=[1,2,3,4],B=["a", "b", "c", "d"])
df2 = dt.Frame(A=[1,2,3,4,5],C=["a2", "b2", "c2", "d2", "e2"])
df2.key = "A"
output = df1[:, :, dt.join(df2)]
output

9. How to rename a column in a pydatatable Frame?

Difficulty Level: L1

Question: Rename column zn to zn_new in a datatable Frame.

Input: BostonHousing dataset

Show Solution

python

import datatable as dt
df = dt.fread('https://raw.githubusercontent.com/selva86/datasets/master/BostonHousing.csv')
df.names = {'zn': 'zn_new'}
df.head(5)

10. How to import every 50th row from a csv file to create a datatable Frame?

Difficiulty Level: L2

Question: Import every 50th row of [BostonHousing dataset] (BostonHousing.csv) as a dataframe.

Input: BostonHousing dataset

Show Solution

python

# Solution: Use csv reader. Unfortunately there isn't an option to do it directly using fread()
import datatable as dt
import csv          
with open('local/path/to/BostonHousing.csv', 'r') as f:
    reader = csv.reader(f)
    for i, row in enumerate(reader):
        row = [[x] for x in row]
        # 1st row
        if i == 0:  
            df = dt.Frame(row)
            header = [x[0] for x in df[0,:].to_list()]
            df.names =  header
            del df[0,:]  
        # Every 50th row
        elif i%50 ==0:
            df_temp = dt.Frame(row)
            df_temp.names = header
            df.rbind(df_temp)

df.head(5)

11. How to change column values when importing csv to a Python datatable Frame?

Difficulty Level: L2

Question: Import the boston housing dataset, but while importing change the 'medv' (median house value) column so that values < 25 becomes ‘Low’ and > 25 becomes ‘High’.

Input: BostonHousing dataset

Show Solution

python

# Solution: Use csv reader
import datatable as dt
import csv          
with open('https://raw.githubusercontent.com/selva86/datasets/master/BostonHousing.csv', 'r') as f:
    reader = csv.reader(f)
    for i, row in enumerate(reader):
        row = [[x] for x in row]
        if i == 0:
            df = dt.Frame(row)
            header = [x[0] for x in df[0,:].to_list()]
            df.names =  header
            del df[0,:]  
        else:
            row[13] = ['High'] if float(row[13][0]) > 25 else ['Low']
            df_temp = dt.Frame(row)
            df_temp.names = header
            df.rbind(df_temp)

df.head(5)

12. How to change value at particular row and column in a Python datatable Frame?

Difficulty Level: L1

Question: Change value at row number 2 and column number 1 as 5 in a datatable Frame.

Input: BostonHousing dataset

Show Solution

python

# Solution: It follows row, column indexing. No need to use "loc", ".loc"
import datatable as dt
df = dt.fread('https://raw.githubusercontent.com/selva86/datasets/master/BostonHousing.csv')
df[2,1] = 5
df.head(5)

13. How to delete specific cell, row, column, row per condition in a datatable Frame?

Difficulty Level: 2

Questions:

Delete the cell at position 2,1.
Delete the 3rd row.
Delete the chas column.
Delete rows where column zn is having 0 value.

Input: BostonHousing dataset

Show Solution

python

# Solution: It follows row,colume indexing. No need to use "loc", ".loc"
import datatable as dt
df = dt.fread('https://raw.githubusercontent.com/selva86/datasets/master/BostonHousing.csv')

# Delete the cell at position `2,1`.
del df[2,1]

# Delete the `3rd` row.
del df[3,:]

# Delete the `chas` column.
del df[:,"chas"]

# Delete rows where column `zn` is having 0 value.
del df[dt.f.zn == 0,:]

df.head(5)

14. How to convert datatable Frame to pandas, numpy, dictionary, list, tuples, csv files?

Difficulty Level: L1

Question: Convert datatable Frame to pandas, numpy, dictionary, list, tuples, csv files.

Input: BostonHousing dataset

Show Solution

python

# Solution
import datatable as dt
df = dt.fread('https://raw.githubusercontent.com/selva86/datasets/master/BostonHousing.csv')

# to pandas df
pd_df = df.to_pandas()

# to numpy arrays
np_arrays = df.to_numpy()

# to dictionary
dic = df.to_dict()

# to list
list_ = df[:,"indus"].to_list()

# to tuple
tuples_ = df[:,"indus"].to_tuples()

# to csv 
df.to_csv("BostonHousing.csv")

15. How to get data types of all the columns in the datatable Frame?

Difficulty Level: L1

Question: Get data types of all the columns in the datatable Frame.

Input: BostonHousing dataset

Desired Output:

python

crim : stype.float64
zn : stype.float64
indus : stype.float64
chas : stype.bool8
nox : stype.float64
rm : stype.float64
age : stype.float64
dis : stype.float64
rad : stype.int32
tax : stype.int32
ptratio : stype.float64
b : stype.float64
lstat : stype.float64
medv : stype.float64

Show Solution

python

# Solution
import datatable as dt
df = dt.fread('https://raw.githubusercontent.com/selva86/datasets/master/BostonHousing.csv')
for i in range(len(df.names)):
    print(df.names[i], ":", df.stypes[i])

python

crim : stype.float64
zn : stype.float64
indus : stype.float64
chas : stype.bool8
nox : stype.float64
rm : stype.float64
age : stype.float64
dis : stype.float64
rad : stype.int32
tax : stype.int32
ptratio : stype.float64
b : stype.float64
lstat : stype.float64
medv : stype.float64

16. How to get summary stats of each column in datatable Frame?

Difficulty Level: L1

Questions:

For each column:

Get the sum of the column values.
Get the max of the column values.
Get the min of the column values.
Get the mean of the column values.
Get the standard deviation of the column values.
Get the mode of the column values.
Get the modal value of the column values.
Get the number of unique values in column.

Input: BostonHousing dataset

Show Solution

python

# Solution
import datatable as dt
df = dt.fread('https://raw.githubusercontent.com/selva86/datasets/master/BostonHousing.csv')
df.sum()
df.max()
df.min()
df.mean()
df.sd()
df.mode()
df.nmodal()
df.nunique()

17. How to get the column stats of particular column of the datatable Frame?

Difficulty Level: L1

Question: Get the max value of zn column of the datatable Frame

Input: BostonHousing dataset

Desired Output: 100

Show Solution

python

# Input
import datatable as dt
df = dt.fread('https://raw.githubusercontent.com/selva86/datasets/master/BostonHousing.csv')
df[:,dt.max(dt.f.zn)]

18. How to apply group by functions in datatable Frame?

Difficulty Level: L1

Question: Find the mean price for every manufacturer using Cars93 dataset.

Input:
Cars93

Desired Output:

python

     Manufacturer         C0
0            None  28.550000
1           Acura  15.900000
2            Audi  33.400000
3             BMW  30.000000
4           Buick  21.625000
5        Cadillac  37.400000
..
..

30     Volkswagen  18.025000
31          Volvo  22.700000

Show Solution

python

# Solution
import datatable as dt
df = dt.fread('https://raw.githubusercontent.com/selva86/datasets/master/Cars93_miss.csv')
df[:, dt.mean(dt.f.Price), dt.by("Manufacturer")].head(5)

19. How to arrange datatabe Frame in ascending order by column value?

Difficulty Level: L1

Question: Arrange datatable Frame in ascending order by Price.

Input:
Cars93

Desired Output:

python

Manufacturer    Model     Type  Min.Price  Price  Max.Price  MPG.city  \ 
0       Saturn       SL    Small        9.2    NaN       12.9       NaN   
1       Toyota    Camry  Midsize       15.2    NaN       21.2      22.0   
2         Ford  Festiva    Small        6.9    7.4        7.9      31.0   
3      Hyundai    Excel    Small        6.8    8.0        9.2      29.0   
4        Mazda      323    Small        7.4    8.3        9.1      29.0   


   Width  Turn.circle Rear.seat.room  Luggage.room  Weight   Origin  \
0   68.0         40.0           26.5           NaN  2495.0      USA   
1   70.0         38.0           28.5          15.0  3030.0  non-USA   
2   63.0         33.0           26.0          12.0  1845.0      USA   
3   63.0         35.0           26.0          11.0  2345.0  non-USA   
4   66.0         34.0           27.0          16.0  2325.0  non-USA   

            Make  
0      Saturn SL  
1   Toyota Camry  
2   Ford Festiva  
3  Hyundai Excel  
4      Mazda 323

Show Solution

python

import datatable as dt
df = dt.fread('https://raw.githubusercontent.com/selva86/datasets/master/Cars93_miss.csv')

# Solution1
df.sort("Price")

# Solution2
df[:,:, dt.sort(dt.f.Price)].head(5)

20. How to arrange datatabe Frame in descending order by column value?

Difficulty Level: L1

Question: Arrange datatable Frame in descending order by Price.

Input:
Cars93

Desired Output:

python

   Manufacturer     Model     Type  Min.Price  Price  Max.Price  MPG.city  \
0  Mercedes-Benz      300E  Midsize       43.8   61.9       80.0      19.0   
1       Infiniti       Q45  Midsize       45.4   47.9        NaN      17.0   
2       Cadillac   Seville  Midsize       37.5   40.1       42.7      16.0   
3      Chevrolet  Corvette   Sporty       34.6   38.0       41.5      17.0   
4           Audi       100  Midsize        NaN   37.7       44.6      19.0   

   MPG.highway             AirBags DriveTrain  ... Passengers  Length  \
0         25.0  Driver & Passenger       Rear  ...        5.0     NaN   
1         22.0                None       Rear  ...        5.0   200.0   
2         25.0  Driver & Passenger      Front  ...        5.0   204.0   
3         25.0         Driver only       Rear  ...        2.0   179.0   
4         26.0  Driver & Passenger       None  ...        6.0   193.0   

   Wheelbase  Width  Turn.circle Rear.seat.room  Luggage.room  Weight  \
0      110.0   69.0         37.0            NaN          15.0  3525.0   
1      113.0   72.0         42.0           29.0          15.0  4000.0   
2      111.0   74.0         44.0           31.0           NaN  3935.0   
3       96.0   74.0         43.0            NaN           NaN  3380.0   
4      106.0    NaN         37.0           31.0          17.0  3405.0   

    Origin                Make  
0  non-USA  Mercedes-Benz 300E  
1  non-USA        Infiniti Q45  
2      USA    Cadillac Seville  
3     None  Chevrolet Corvette  
4  non-USA            Audi 100

Show Solution

python

import datatable as dt
df = dt.fread('https://raw.githubusercontent.com/selva86/datasets/master/Cars93_miss.csv')

# Solution
df[::-1,:, dt.sort(dt.f.Price)].head()

21. How to repeat(append) the same data in datatable Frame?

Difficulty Level: L1

Question: Repeat(append) the same data 5 times in datatable Frame.

Input:
Cars93

Show Solution

python

# Input
import datatable as dt
df = dt.fread('https://raw.githubusercontent.com/selva86/datasets/master/Cars93_miss.csv')

# Solution
dt.repeat(df, 5)

22. How to replace string with another string in entire datatable Frame?

Difficulty Level: L1

Question: Replace Audi with My Dream Car in entire datatable Frame.

Input:
Cars93

Show Solution

python

# Input
import datatable as dt
df = dt.fread('https://raw.githubusercontent.com/selva86/datasets/master/Cars93_miss.csv')

# Solution
df.replace("Audi", "My Dream Car")
df.head(5)

23. How to extract the details of a particular cell with given criterion??

Difficulty Level: L1

Question: Extract which manufacturer, model and type has the highest Price.

Input:
Cars93

Desired Output:

python

 Manufacturer  Model     Type
 Mercedes-Benz  300E  Midsize

Show Solution

python

# Input
import datatable as dt
df = dt.fread('https://raw.githubusercontent.com/selva86/datasets/master/Cars93_miss.csv')

# Solution

# Get the highest price
print("Highest Price : ", df[:,dt.f.Price].max()[0,0])

# Get Manufacturer with highest price
df[dt.f.Price ==  df[:,dt.f.Price].max()[0,0], ['Manufacturer', 'Model', 'Type']]

python

Highest Price :  61.9

24. How to rename a specific columns in a dataframe?

Difficulty Level: L2

Question: Rename the column Model as Car Model.

Input:
Cars93

Show Solution

python

# Input
import datatable as dt
df = dt.fread('https://raw.githubusercontent.com/selva86/datasets/master/Cars93_miss.csv')

# Solution
old_col_name = "Model"
new_col_name = "Car Model"
df.names = [new_col_name if x == old_col_name else x for x in df.names]
df.head(5)

25. How to count NA values in every column of a datatable Frame?

Difficulty Level: L1

Question: Count NA values in every column of a datatable Frame.

Input:
Cars93

Desired Output:

python

Manufacturer  Model  Type  Min.Price  Price  Max.Price  MPG.city  \
0             4      1     3          7      2          5         9   

   MPG.highway  AirBags  DriveTrain  ...  Passengers  Length  Wheelbase  \
0            2        6           7  ...           2       4          1   

   Width  Turn.circle  Rear.seat.room  Luggage.room  Weight  Origin  Make  
0      6            5               4            19       7       5     3

Show Solution

python

# Input
import datatable as dt
df = dt.fread('https://raw.githubusercontent.com/selva86/datasets/master/Cars93_miss.csv')

# Solution
df.countna()

26. How to get a specific column from a datatable Frame as a datatable Frame instead of a series?

Difficulty Level: L1

Question :Get the column (Model) in datatable Frame as a datatable Frame (rather than as a Series).

Input:
Cars93

Show Solution

python

# Input
import datatable as dt
df = dt.fread('https://raw.githubusercontent.com/selva86/datasets/master/Cars93_miss.csv')

# Solution
df[:,"Model"].head(5)

	Model
	▪▪▪▪
0	Integra
1	Legend
2	90
3	100
4	535i

27. How to reverse the order of columns of a datatable Frame?

Difficulty Level: L1

Question : Reverse the order of columns in Cars93 datatable Frame.

Input:
Cars93

Show Solution

python

# Input
import datatable as dt
df = dt.fread('https://raw.githubusercontent.com/selva86/datasets/master/Cars93_miss.csv')

# Solution 1
df.head()
df[:,::-1].head(5)

28. How to format or suppress scientific notations in Python datatable Frame?

Difficulty Level: L2

Question: Suppress scientific notations like ‘e-03’ in df and print upto 6 numbers after decimal.

Input

python

import datatable as dt
df = dt.Frame(random=np.random.random(4)**10)
df
         random
0  3.518290e-04
1  5.104371e-02
2  5.895886e-06
3  1.274671e-09

Desired Output

python

         random   random2
0  3.518290e-04  0.000352
1  5.104371e-02  0.051044
2  5.895886e-06  0.000006
3  1.274671e-09  0.000000

Show Solution

python

# Solution
import datatable as dt
df = dt.Frame(random=np.random.random(4)**10)
df[:,"random2"] = dt.Frame(['%.6f' % x for x in df[:,"random"].to_list()[0]])
df

29. How to filter every nth row in a pydatatable?

Difficulty Level: L1

Question: From df, filter the 'Manufacturer', 'Model' and 'Type' for every 20th row starting from 1st (row 0).

Input:
Cars93

Show Solution

python

# Input
import datatable as dt
df = dt.fread('https://raw.githubusercontent.com/selva86/datasets/master/Cars93_miss.csv')

# Solution
df[::20, ['Manufacturer', 'Model', 'Type']]

30. How to reverse the rows of a python datatable Frame?

Difficulty Level: L2

Question: Reverse all the rows.

Input:
Cars93

Show Solution

python

# Input
import datatable as dt
df = dt.fread('https://raw.githubusercontent.com/selva86/datasets/master/Cars93_miss.csv')

# Solution
df[::-1,:]

31. How to find out which column contains the highest number of row-wise maximum values?

Difficulty Level: L2

Question: What is the column name with the highest number of row-wise maximum’s.

Input:
BostonHousing dataset

Desired Output:
tax

Show Solution

python

# Input
import datatable as dt
df = dt.fread("https://raw.githubusercontent.com/selva86/datasets/master/BostonHousing.csv")

# Solution
for i in range(len(df.names)):
    if df.sum()[0:1,:].to_list()[i] == max(df.sum()[0:1,:].to_list()):
        print(df.names[i])

python

tax

32. How to normalize all columns in a dataframe?

Difficulty Level: L2

Questions:

Normalize all columns of df by subtracting the column mean and divide by standard deviation.
Range all columns of df such that the minimum value in each column is 0 and max is 1.

Don’t use external packages like sklearn.

Input:
BostonHousing dataset

Desired Output:

python

       crim    zn     indus  chas       nox        rm       age       dis  \
0  0.000000  0.18  0.067815   0.0  0.314815  0.577505  0.641607  0.269203   
1  0.000236  0.00  0.242302   0.0  0.172840  0.547998  0.782698  0.348962   
2  0.000236  0.00  0.242302   0.0  0.172840  0.694386  0.599382  0.348962   
3  0.000293  0.00  0.063050   0.0  0.150206  0.658555  0.441813  0.448545   
4  0.000705  0.00  0.063050   0.0  0.150206  0.687105  0.528321  0.448545   

        rad       tax   ptratio         b     lstat      medv  
0  0.000000  0.208015  0.287234  1.000000  0.089680  0.422222  
1  0.043478  0.104962  0.553191  1.000000  0.204470  0.368889  
2  0.043478  0.104962  0.553191  0.989737  0.063466  0.660000  
3  0.086957  0.066794  0.648936  0.994276  0.033389  0.631111  
4  0.086957  0.066794  0.648936  1.000000  0.099338  0.693333

Show Solution

python

# Input
import datatable as dt
df = dt.fread("BostonHousing.csv")

# Solution
for i in df.names:
    df[:,i] = df[:,(dt.f[i] - df[:,dt.min(dt.f[i])][0,0])/(df[:,dt.max(dt.f[i])][0,0] - df[:,dt.min(dt.f[i])][0,0])]
df.head(5)

33. How to compute grouped mean on datatable Frame and keep the grouped column as another column?

Difficulty Level: L1

Question: In df, Compute the mean price of every fruit, while keeping the fruit as another column instead of an index.

Input

python

df = dt.Frame(fruit = ['apple', 'banana', 'orange'] * 3,
             rating =  np.random.rand(9),
             price  =  np.random.randint(0, 15, 9))

Desired Output:

python

    fruit        C0
0   apple  7.666667
1  banana  5.000000
2  orange  8.333333

Show Solution

python

# Input
import datatable as dt
df = dt.Frame(fruit = ['apple', 'banana', 'orange'] * 3,
             rating =  np.random.rand(9),
             price  =  np.random.randint(0, 15, 9))
df[:, dt.mean(dt.f.price), dt.by("fruit")]

34. How to join two datatable Frames by 2 columns?

Difficulty Level: L2

Question: Join dataframes df1 and df2 by ‘A’ and ‘B’.

Input

python

df1 = dt.Frame(A=[1, 2, 3, 4],
               B=["a", "b", "c", "d"],
               D=[1, 2, 3, 4])

df2 = dt.Frame(A=[1, 2, 4, 5],
               B=["a", "b", "d", "e"],
               C=["a2", "b2", "d2", "e2"])

Desired Output:

python

   A  B  D   C
0  1  a  1  a2
1  2  b  2  b2
2  3  c  3  
3  4  d  4  d2

Show Solution

python

# Input
import datatable as dt
df1 = dt.Frame(A=[1, 2, 3, 4], B=["a", "b", "c", "d"], D=[1, 2, 3, 4])
df2 = dt.Frame(A=[1, 2, 4, 5], B=["a", "b", "d", "e"], C=["a2", "b2", "d2", "e2"])

# Solution
df2.key = ["A","B"]
output = df1[:, :, dt.join(df2)]
output

35. How to create leads (column shifted up by 1 row) of a column in a datatable Frame?

Difficulty Level: L2

Question: Create new column in df, which is a lead1 (shift column A up by 1 row).

Input:

python

df = dt.Frame(A=[1,2,3,4],B=["a", "b", "c", "d"],d=[1,2,3,4])

Desired Output:

python

   A  B  d  A.1
0  1  a  1    2
1  2  b  2    3
2  3  c  3    4
3  4  d  4  NaN

Show Solution

python

# Input
import datatable as dt
df = dt.Frame(A=[1,2,3,4],B=["a", "b", "c", "d"],d=[1,2,3,4])

# Solution
dt.cbind(df1,df[1:,"A"],force= True)

36. Machine Learning Exercise – How to use FTRL Model to calculate the probability of a person having diabetes?

Difficulty Level: L3

Question 1: Use Follow the Regularized Leader (Ftrl) Model to calculate the probability of a person having diabetes.

Question 2: Find the feature importance of the features used in model.

Input:

Training Data : pima_indian_diabetes_training_data.csv

Testing Data : pima_indian_diabetes_testing_data.csv

Show Solution

python

import datatable as dt
from datatable.models import Ftrl

# Import data
train_df = dt.fread('pima_indian_diabetes_training_data.csv')
test_df = dt.fread('pima_indian_diabetes_testing_data.csv')

# Create Ftrl model
ftrl_model = Ftrl()

#  add parameter values while creating model
ftrl_model = Ftrl(alpha = 0.1, lambda1 = 0.5, lambda2 = 0.6)

# change paramter of existing model
ftrl_model.alpha = 0.1
ftrl_model.lambda1 = 0.5
ftrl_model.lambda2 = 0.6

# Prepare training and test dataset
train_df[:,"diabetes"] = dt.Frame(np.where(train_df[:, dt.f["diabetes"] == "pos"], 1,0))
test_df[:,"diabetes"] = dt.Frame(np.where(test_df[:, dt.f["diabetes"] == "pos"], 1,0))

x_train = train_df[:, ["pregnant", "glucose", "pressure", "mass", "pedigree", "age"]]
y_train = train_df[:, ["diabetes"]]

x_test = test_df[:, ["pregnant", "glucose", "pressure", "mass", "pedigree", "age"]]
y_test = test_df[:, ["diabetes"]]

# training the model
ftrl_model.fit(x_train,y_train)

# predictions of the model
targets = ftrl_model.predict(x_test)
print(targets.head(5))

# feature importance
fi = ftrl_model.feature_importances
fi

To be continued…

101 Python datatable Exercises (pydatatable)

1. How to import datatable package and check the version?

2. How to create a datatable Frame from a list, numpy array, pandas dataframe?

3. How to import csv file as a pydatatable Frame?

3. How to read first 5 rows of pydatatable Frame ?

4. How to add new column in pydatatable Frame from a list?

5. How to do addition of existing columns to get a new column in pydatatable Frame?

6. How to get the int value of a float column in a pydatatable Frame?

7. How to create a new column based on a condition in a datatable Frame?

8. How to left join two datatable Frames?

9. How to rename a column in a pydatatable Frame?

10. How to import every 50th row from a csv file to create a datatable Frame?

11. How to change column values when importing csv to a Python datatable Frame?

12. How to change value at particular row and column in a Python datatable Frame?

13. How to delete specific cell, row, column, row per condition in a datatable Frame?

14. How to convert datatable Frame to pandas, numpy, dictionary, list, tuples, csv files?

15. How to get data types of all the columns in the datatable Frame?

16. How to get summary stats of each column in datatable Frame?

17. How to get the column stats of particular column of the datatable Frame?

18. How to apply group by functions in datatable Frame?

19. How to arrange datatabe Frame in ascending order by column value?

20. How to arrange datatabe Frame in descending order by column value?

21. How to repeat(append) the same data in datatable Frame?

22. How to replace string with another string in entire datatable Frame?

23. How to extract the details of a particular cell with given criterion??

24. How to rename a specific columns in a dataframe?

25. How to count NA values in every column of a datatable Frame?

26. How to get a specific column from a datatable Frame as a datatable Frame instead of a series?

27. How to reverse the order of columns of a datatable Frame?

28. How to format or suppress scientific notations in Python datatable Frame?

29. How to filter every nth row in a pydatatable?

30. How to reverse the rows of a python datatable Frame?

31. How to find out which column contains the highest number of row-wise maximum values?

32. How to normalize all columns in a dataframe?

33. How to compute grouped mean on datatable Frame and keep the grouped column as another column?

34. How to join two datatable Frames by 2 columns?

35. How to create leads (column shifted up by 1 row) of a column in a datatable Frame?

36. Machine Learning Exercise – How to use FTRL Model to calculate the probability of a person having diabetes?

Machine Learning A-Z™: Hands-On Python & R In Data Science

Free Sample Videos:

1. How to import datatable package and check the version?

2. How to create a datatable Frame from a list, numpy array, pandas dataframe?

3. How to import csv file as a pydatatable Frame?

3. How to read first 5 rows of pydatatable Frame ?

4. How to add new column in pydatatable Frame from a list?

5. How to do addition of existing columns to get a new column in pydatatable Frame?

6. How to get the int value of a float column in a pydatatable Frame?

7. How to create a new column based on a condition in a datatable Frame?

8. How to left join two datatable Frames?

9. How to rename a column in a pydatatable Frame?

10. How to import every 50th row from a csv file to create a datatable Frame?

11. How to change column values when importing csv to a Python datatable Frame?

12. How to change value at particular row and column in a Python datatable Frame?

13. How to delete specific cell, row, column, row per condition in a datatable Frame?

14. How to convert datatable Frame to pandas, numpy, dictionary, list, tuples, csv files?

15. How to get data types of all the columns in the datatable Frame?

16. How to get summary stats of each column in datatable Frame?

17. How to get the column stats of particular column of the datatable Frame?

18. How to apply group by functions in datatable Frame?

19. How to arrange datatabe Frame in ascending order by column value?

20. How to arrange datatabe Frame in descending order by column value?

21. How to repeat(append) the same data in datatable Frame?

22. How to replace string with another string in entire datatable Frame?

23. How to extract the details of a particular cell with given criterion??

24. How to rename a specific columns in a dataframe?

25. How to count NA values in every column of a datatable Frame?

26. How to get a specific column from a datatable Frame as a datatable Frame instead of a series?

27. How to reverse the order of columns of a datatable Frame?

28. How to format or suppress scientific notations in Python datatable Frame?

29. How to filter every nth row in a pydatatable?

30. How to reverse the rows of a python datatable Frame?

31. How to find out which column contains the highest number of row-wise maximum values?

32. How to normalize all columns in a dataframe?

33. How to compute grouped mean on datatable Frame and keep the grouped column as another column?

34. How to join two datatable Frames by 2 columns?

35. How to create leads (column shifted up by 1 row) of a column in a datatable Frame?

36. Machine Learning Exercise – How to use FTRL Model to calculate the probability of a person having diabetes?

Related Articles

101 R data.table Exercises

How to reduce the memory size of Pandas Data frame

data.table in R – The Complete Beginners Guide

Python.SQL. NumPy. All free.

Machine Learning A-Z™: Hands-On Python & R In Data Science

Free Sample Videos:

Machine Learning A-Z™: Hands-On Python & R In Data Science

Machine Learning A-Z™: Hands-On Python & R In Data Science

Machine Learning A-Z™: Hands-On Python & R In Data Science

Machine Learning A-Z™: Hands-On Python & R In Data Science

Machine Learning A-Z™: Hands-On Python & R In Data Science

Python.
SQL. NumPy.
All free.