joppot

コピペで絶対動く。説明を妥協しない

プログラミング

4 functions of scikit-learn preprocesses data such as machine learning

投稿日:


Abstract

Hello every one this is candle. In this time we will prreprocess a data with scikit-learn which is machine learning library of python.

We will use scikit-learn called
With scikit-learn you can use what is called a converter, and you can convert the input data with fit_transform () method.Since there are many converters, I will introduce the following four converters that are often used in machine learning.

Imputer
StandardScaler
MinMaxScaler
OneHotEncorder


Condition

Python3
scikit-learn 0.19.1

For running sample code, you need numpy aside from these libs.



Imputer

http://scikit-learn.org/stable/modules/generated/sklearn.preprocessing.Imputer.html#sklearn.preprocessing.Imputer

imuter replaces the missing value (None) contained in the data with another specified value.
These values are set by default as arguments.

Imputer(missing_values='NaN', strategy='mean', axis=0, verbose=0, copy=True)

missing_values is type of float and replaces all values in the data corresponding to the specified value. Use this when you want to replace other real numbers that are not None.
strategy is type of str and sets mean (median), median (mode), mode (mode).
Axis is type of int . When 0 is specified, it replaced with the average value of the column (vertical).
If 1 is specified, replaces with the average value of the row (horizontal).

Let’s try it. Create imputer_test.py file in to the somewhere you like.

touch imputer_test.py

Write this

from sklearn.preprocessing import Imputer
import numpy as np
data = np.array([[7, 2, 3],
                 [8, None, 3],
                 [3, 8, 5]])
imputer = Imputer()
new_data = imputer.fit_transform(data)
print(new_data)

Run it.

python3 imputer_test.py
[[ 7.  2.  3.]
 [ 8.  5.  3.]
 [ 3.  8.  5.]]

The place where None was replaced with 5.

StandardScaler

http://scikit-learn.org/stable/modules/generated/sklearn.preprocessing.StandardScaler.html#sklearn.preprocessing.StandardScaler

Standardize the data.
The following values are set by default as arguments.

StandardScaler(copy=True, with_mean=True, with_std=True)

Create a file.

touch ss.py

Write these.

from sklearn.preprocessing import StandardScaler
import numpy as np
data = np.array([[7., 2., 3.],
                 [8., 5., 3.],
                 [3., 8., 5.]])
standard_scaler = StandardScaler()
new_data = standard_scaler.fit_transform(data)
print(new_data)

Run it.

python3 ss.py

[[ 0.46291005 -1.22474487 -0.70710678]
 [ 0.9258201   0.         -0.70710678]
 [-1.38873015  1.22474487  1.41421356]]

MinMaxScaler

http://scikit-learn.org/stable/modules/generated/sklearn.preprocessing.MinMaxScaler.html#sklearn.preprocessing.MinMaxScaler

It maps data to the specified range.
The following values are set by default as arguments.

MinMaxScaler(feature_range=(0, 1), copy=True)

feature_range is a tuple, and it is specified as (minimum value, maximum value).
The default value is mapped between 0 and 1.

Create a file.

touch mms.py

Write this

from sklearn.preprocessing import MinMaxScaler
import numpy as np
data = np.array([[0., 2.],
                 [3., 4.],
                 [10., 7.]])
standard_scaler = MinMaxScaler(feature_range=(0, 1))
new_data = standard_scaler.fit_transform(data)
print(new_data)

Run it.

python3 mms.py

[[ 0.   0. ]
 [ 0.3  0.4]
 [ 1.   1. ]]

As you can see from the output results, mapping is performed for each column (axis = 0) when the input of the converter is a two-dimensional array.

OneHotEncorder

http://scikit-learn.org/stable/modules/generated/sklearn.preprocessing.OneHotEncoder.html#sklearn.preprocessing.OneHotEncoder

Change label of integer value to one-hot label.

OneHotEncoder(n_values='auto', categorical_features='all', dtype=<class 'numpy.float64'>, sparse=True, handle_unknown='error')

Create a file

touch ohe.py

Write this

from sklearn.preprocessing import OneHotEncoder
import numpy as np
data = np.array([0, 2, 1, 1]).reshape(-1, 1)
one_hot = OneHotEncoder()
new_data = one_hot.fit_transform(data).toarray()
print(new_data)

Run it.

python3 ohe.py

[[ 1.  0.  0.]
 [ 0.  0.  1.]
 [ 0.  1.  0.]
 [ 0.  1.  0.]]

Extra

If the label is string, you can use the pandas Series method factorize() to convert the label of an integer value.

import pandas as pd
data = pd.Series(["apple", "orange", "banana", "banana"])
new_data, _ = data.factorize()
print(new_data)
>>
[0 1 2 2]

Conclusion

Preprocessing can be expected to increase learning performance by taking one time before doing machine learning. Please take advantage of it.

スポンサードリンク

If you think this article is good, share it please

-プログラミング
-,

執筆者:


comment

Your email address will not be published. Required fields are marked *

関連記事

Customize wordpress bogo plugin’s short code

English 日本語 Abstract Hello everyone, It’s me candle. In this time, I will show you how to customize wordpress bogo short code. The items to introduce are these. The flag display or hide Change the text Related article How to change the language form “en-US” to “en” by wordpres bogo Precondition WordPress bogo has been installed You can edit the theme of wordpress

Remove or allow the html tag with javascript + React and take measures against XSS

Abstract Hello everyone it’s me candle. In this time, we will write a program which displays only the permitted html tags by React and delete other tags. Notice, displaying originally html contents, it may has a security risk such as XSS. First of all, I am not a security expert, so there is a possibility of a bug in the code. Of course, I check it and test it as long as I do. If you find any vulnerabilities in the code, it would be helpful if you point out it in the post comment form. Condition Nothing Preparation of …

Deploy React to s3 with aws command

Abstract Hello veryone it’s me candle. In this time I will show you how to deploy a react project created by create-react-app to S3. A good thing about React web app is that you can publish the products created by building them as long as you upload them such as html and js files and images to storage. In this article I don’t write these three items. How to get a IAM that can access to S3 Allocate your own domain using Route53 SSL connection of React web app Condition You have aws command Please install the aws command in …

React creates dummy data using Faker.js

Abstract Hello everyone it’s me candle. Let’s try to use Faker.js in React. I think that there are various purposes for using Faker.js. In many cases you will use it for testing, but this time I’d like to create a dummy react state using Faker.js. Condition You have a basic react skill create-react-app was installed Create new project First of all, create a react project. Execute the below command in your favorite directory and create a new react project. create-react-app faker-demo cd faker-demo Installation of faker Install faker with this coommand. yarn add –dev faker it’s over. Generate dummy data …

React Project development environment building command create-react-app is too convenient

English 日本語 Abstract Hello everyone It’s me candle. In this time we will build a “react” development environment with create-react-app command and try to hello world. Past react development was created by installing babel and other packages from npm on your own using gulp or webpack. Past react development installed babel and other packages from npm and built servers with gulp and webpack. If you use the create-react-app command, you can create react environment easily. Let’s do it. Precondition Node is already installed


I work in the venture company as a CTO. I start to write program in University, first I learned java, C++ and PHP. In the company, I'm developing web services by Rails. I do like to automation.