Machine Learning(머신러닝)

sklearn model 백업, 재사용

sklearn 내부의 pickle lib 를 통해 모델을 저장하고 다시 로드하여 재사용할 수 있다. 

from sklearn import svm
from sklearn import datasets
clf = svm.SVC()
iris = datasets.load_iris()
X, y = iris.data, iris.target
clf.fit(X, y)  

import pickle
s = pickle.dumps(clf)
clf2 = pickle.loads(s)


아래 api를 통해 file 저장도 가능한 듯 하다. 
자세한 내용은 pickle 홈페이지에 있다. 

Write a pickled representation of obj to the open file object file. This is equivalent to Pickler(file, protocol).dump(obj).

If the protocol parameter is omitted, protocol 0 is used. If protocol is specified as a negative value or HIGHEST_PROTOCOL, the highest protocol version will be used.

Changed in version 2.3: Introduced the protocol parameter.

file must have a write() method that accepts a single string argument. It can thus be a file object opened for writing, a StringIO object, or any other custom object that meets this interface.


Read a string from the open file object file and interpret it as a pickle data stream, reconstructing and returning the original object hierarchy. This is equivalent to Unpickler(file).load().

file must have two methods, a read() method that takes an integer argument, and a readline() method that requires no arguments. Both methods should return a string. Thus file can be a file object opened for reading, a StringIO object, or any other custom object that meets this interface.

This function automatically determines whether the data stream was written in binary mode or not.


파일 저장은 아래와 같이 joblib 를 통해 저장할 수 있다.

In the specific case of the scikit, it may be more interesting to use joblib’s replacement of pickle (joblib.dump & joblib.load), which is more efficient on big data, but can only pickle to the disk and not to a string:

from sklearn.externals import joblib
joblib.dump(clf, 'filename.pkl') 

Later you can load back the pickled model (possibly in another Python process) with:

clf = joblib.load('filename.pkl') 

자세한 내용은 아래의 scikit learn 홈페이지에서 확인 할 수 있다. 

