본문 바로가기

Machine Learning(머신러닝)

sklearn model 백업, 재사용

sklearn 내부의 pickle lib 를 통해 모델을 저장하고 다시 로드하여 재사용할 수 있다. 



from sklearn import svm
from sklearn import datasets
clf = svm.SVC()
iris = datasets.load_iris()
X, y = iris.data, iris.target
clf.fit(X, y)  





import pickle
s = pickle.dumps(clf)
clf2 = pickle.loads(s)
clf2.predict(X[0:1])

y[0]

아래 api를 통해 file 저장도 가능한 듯 하다. 
자세한 내용은 pickle 홈페이지에 있다. 
https://docs.python.org/2/library/pickle.html
pickle.dump(objfile[protocol])

Write a pickled representation of obj to the open file object file. This is equivalent to Pickler(file, protocol).dump(obj).

If the protocol parameter is omitted, protocol 0 is used. If protocol is specified as a negative value or HIGHEST_PROTOCOL, the highest protocol version will be used.

Changed in version 2.3: Introduced the protocol parameter.

file must have a write() method that accepts a single string argument. It can thus be a file object opened for writing, a StringIO object, or any other custom object that meets this interface.

pickle.load(file)

Read a string from the open file object file and interpret it as a pickle data stream, reconstructing and returning the original object hierarchy. This is equivalent to Unpickler(file).load().

file must have two methods, a read() method that takes an integer argument, and a readline() method that requires no arguments. Both methods should return a string. Thus file can be a file object opened for reading, a StringIO object, or any other custom object that meets this interface.

This function automatically determines whether the data stream was written in binary mode or not.

pickle.dumps(obj[protocol])



파일 저장은 아래와 같이 joblib 를 통해 저장할 수 있다.


In the specific case of the scikit, it may be more interesting to use joblib’s replacement of pickle (joblib.dump & joblib.load), which is more efficient on big data, but can only pickle to the disk and not to a string:

>>>
from sklearn.externals import joblib
joblib.dump(clf, 'filename.pkl') 

Later you can load back the pickled model (possibly in another Python process) with:

>>>
clf = joblib.load('filename.pkl') 

자세한 내용은 아래의 scikit learn 홈페이지에서 확인 할 수 있다. 
http://scikit-learn.org/stable/tutorial/basic/tutorial.html#machine-learning-the-problem-setting


'Machine Learning(머신러닝)' 카테고리의 다른 글

강 인공지능과 약 인공지능  (0) 2018.09.17
Machine learning을 포함한 A.I 구조  (0) 2018.09.10
sklearn 성능 측정  (0) 2017.12.03
Scikit learn (Python)  (0) 2017.12.03
머신러닝 배우는 방법  (0) 2017.04.08