joppot

コピペで絶対動く。説明を妥協しない

プログラミング

4 functions of scikit-learn preprocesses data such as machine learning

投稿日:


Abstract

Hello every one this is candle. In this time we will prreprocess a data with scikit-learn which is machine learning library of python.

We will use scikit-learn called
With scikit-learn you can use what is called a converter, and you can convert the input data with fit_transform () method.Since there are many converters, I will introduce the following four converters that are often used in machine learning.

Imputer
StandardScaler
MinMaxScaler
OneHotEncorder


Condition

Python3
scikit-learn 0.19.1

For running sample code, you need numpy aside from these libs.



Imputer

http://scikit-learn.org/stable/modules/generated/sklearn.preprocessing.Imputer.html#sklearn.preprocessing.Imputer

imuter replaces the missing value (None) contained in the data with another specified value.
These values are set by default as arguments.

Imputer(missing_values='NaN', strategy='mean', axis=0, verbose=0, copy=True)

missing_values is type of float and replaces all values in the data corresponding to the specified value. Use this when you want to replace other real numbers that are not None.
strategy is type of str and sets mean (median), median (mode), mode (mode).
Axis is type of int . When 0 is specified, it replaced with the average value of the column (vertical).
If 1 is specified, replaces with the average value of the row (horizontal).

Let’s try it. Create imputer_test.py file in to the somewhere you like.

touch imputer_test.py

Write this

from sklearn.preprocessing import Imputer
import numpy as np
data = np.array([[7, 2, 3],
                 [8, None, 3],
                 [3, 8, 5]])
imputer = Imputer()
new_data = imputer.fit_transform(data)
print(new_data)

Run it.

python3 imputer_test.py
[[ 7.  2.  3.]
 [ 8.  5.  3.]
 [ 3.  8.  5.]]

The place where None was replaced with 5.

StandardScaler

http://scikit-learn.org/stable/modules/generated/sklearn.preprocessing.StandardScaler.html#sklearn.preprocessing.StandardScaler

Standardize the data.
The following values are set by default as arguments.

StandardScaler(copy=True, with_mean=True, with_std=True)

Create a file.

touch ss.py

Write these.

from sklearn.preprocessing import StandardScaler
import numpy as np
data = np.array([[7., 2., 3.],
                 [8., 5., 3.],
                 [3., 8., 5.]])
standard_scaler = StandardScaler()
new_data = standard_scaler.fit_transform(data)
print(new_data)

Run it.

python3 ss.py

[[ 0.46291005 -1.22474487 -0.70710678]
 [ 0.9258201   0.         -0.70710678]
 [-1.38873015  1.22474487  1.41421356]]

MinMaxScaler

http://scikit-learn.org/stable/modules/generated/sklearn.preprocessing.MinMaxScaler.html#sklearn.preprocessing.MinMaxScaler

It maps data to the specified range.
The following values are set by default as arguments.

MinMaxScaler(feature_range=(0, 1), copy=True)

feature_range is a tuple, and it is specified as (minimum value, maximum value).
The default value is mapped between 0 and 1.

Create a file.

touch mms.py

Write this

from sklearn.preprocessing import MinMaxScaler
import numpy as np
data = np.array([[0., 2.],
                 [3., 4.],
                 [10., 7.]])
standard_scaler = MinMaxScaler(feature_range=(0, 1))
new_data = standard_scaler.fit_transform(data)
print(new_data)

Run it.

python3 mms.py

[[ 0.   0. ]
 [ 0.3  0.4]
 [ 1.   1. ]]

As you can see from the output results, mapping is performed for each column (axis = 0) when the input of the converter is a two-dimensional array.

OneHotEncorder

http://scikit-learn.org/stable/modules/generated/sklearn.preprocessing.OneHotEncoder.html#sklearn.preprocessing.OneHotEncoder

Change label of integer value to one-hot label.

OneHotEncoder(n_values='auto', categorical_features='all', dtype=<class 'numpy.float64'>, sparse=True, handle_unknown='error')

Create a file

touch ohe.py

Write this

from sklearn.preprocessing import OneHotEncoder
import numpy as np
data = np.array([0, 2, 1, 1]).reshape(-1, 1)
one_hot = OneHotEncoder()
new_data = one_hot.fit_transform(data).toarray()
print(new_data)

Run it.

python3 ohe.py

[[ 1.  0.  0.]
 [ 0.  0.  1.]
 [ 0.  1.  0.]
 [ 0.  1.  0.]]

Extra

If the label is string, you can use the pandas Series method factorize() to convert the label of an integer value.

import pandas as pd
data = pd.Series(["apple", "orange", "banana", "banana"])
new_data, _ = data.factorize()
print(new_data)
>>
[0 1 2 2]

Conclusion

Preprocessing can be expected to increase learning performance by taking one time before doing machine learning. Please take advantage of it.

スポンサードリンク

If you think this article is good, share it please

-プログラミング
-,

執筆者:


comment

Your email address will not be published. Required fields are marked *

関連記事

Create plantUML with emacs using plantuml-mode

Autumn leaves on wood table Abstract Hello everyone it’s me candle. In this time, we will introduce plantuml-mode of emacs supporting plantUML. plantuml-mode has three functions. Syntax highlight Autocomplete Display UML preview There are plenty of settings, so let’s do it patiently. Precondition You had plantuml command I installed plantuml with brew. but it is ok you prepare jar file. If you don’t have it yet, please refer to here. Build PlantUML environment on Mac using brew Install plantuml-mode The latest version of plant-mode is not yet registered in el-get packages, so we will get it from github directly. For …

How to fix the background scroll of react-modal

Abstract Hello everyone it’s me candle. This time I will solve the react-modal background moving problem when you scroll. Condition You use react Completed sample code If you want to run the sample code actually, you would need to install two libraries before. faker is installed for dummy data generation. yarn add faker react-modal First, I will write the sample code of the completed version. This is described in src/App.js. import React, { Component } from 'react' import Faker from 'faker' import Modal from 'react-modal' Modal.setAppElement('#root') class App extends Component { constructor(props) { super(props) this.state = { users: [], user: …

Tutorial How to use GSAP animation with React

Abstract Hello everyone it’s me candle. In this time we will use GSAP in React. The famous js library which can easily use animation is jQuery. However, it seems that compatibility between jQuery and React is bad. And there are no functions Also, as far as I know, React has no animation function that is easy to use(Yeah, Transition is). But, all times and places, it is rare case that animation is not necessary when you have created a web service. There are many candidates when using animation with React. Of course, you can import only the necessary functions and use …

How to change the language form “en-US” to “en” by wordpres bogo

English 日本語 Abstract Hello everyone, It’s me candle. In this time, I will show you to customize bogo plugins. The bogo is wonderful plugin which can adapt the wordpress site to many languages as a simple. but, there is a problem that you can’t choose general English. When writing English articles, you may not always have to write it limited to country. However, you can choose ‘en-UK’, ‘en-CA’ and ‘en-US’, but ‘en’ can not be chosen in bogo. I checked the source code. The bogo got a language list from wordpress function, and there is no general English in it. …

Use react-simple-format in React web

Abstract Hello everyone it’s me candle. In this article we will use simple-format in react. Simple format is famous for Ruby on Rails and is commonly used. React has a legacy react-simple-format and 16g’s made. The legacy lib is not useful, so I choose 16g’s made. https://github.com/16g/react-simple-format Condition Nothing Create project If you already have a React project please use it. If not, use the following command to create it. create-react-app simple_format_sample cd simple_format_sample How to use react-simple-format First install it. yarn add @16g/react-simple-format Open a appropriate component file. I write it in src/App.js. import React, { Component } from …


I work in the venture company as a CTO. I start to write program in University, first I learned java, C++ and PHP. In the company, I'm developing web services by Rails. I do like to automation.