国产精品天干天干,亚洲毛片在线,日韩gay小鲜肉啪啪18禁,女同Gay自慰喷水

歡迎光臨散文網 會員登陸 & 注冊

python德國信用評分卡建模(附代碼AAA推薦)

2020-07-22 10:11 作者:python風控模型  | 我要投稿

python金融風控評分卡模型和數據分析微專業(yè)課:http://dwz.date/b9vv

up主金融微專業(yè)課

Minimization of risk and maximization of profit on behalf of the bank.

To minimize loss from the bank’s perspective, the bank needs a decision rule regarding who to give approval of the loan and who not to. An applicant’s demographic and socio-economic profiles are considered by loan managers before a decision is taken regarding his/her loan application.

The German Credit Data contains data on 20 variables and the classification whether an applicant is considered a Good or a Bad credit risk for 1000 loan applicants. Here is a link to the German Credit data (right-click and "save as"?).? A predictive model developed on this data is expected to provide a bank manager guidance for making a decision whether to approve a loan to a prospective applicant based on his/her profiles.

代表銀行將風險最小化并將利潤最大化。

為了從銀行的角度將損失降到最低,銀行需要制定決策規(guī)則,確定誰批準貸款,誰不批準。 在決定貸款申請之前,貸款經理會考慮申請人的人口統(tǒng)計和社會經濟概況。

德國信用數據包含有關20個變量的數據,以及1000個貸款申請者被視為好或壞信用風險的分類。 這是指向德國信用數據的鏈接(右鍵單擊并另存為)。 預期基于此數據開發(fā)的預測模型將為銀行經理提供指導,以根據他/她的個人資料來決定是否批準準申請人的貸款。

信用評分系統(tǒng)應用

http://archive.ics.uci.edu/ml/datasets/Statlog+(German+Credit+Data)

account balance 賬戶余額

duration of credit持卡時長

Data Set Information:

Two datasets are provided. the original dataset, in the form provided by Prof. Hofmann, contains categorical/symbolic attributes and is in the file "german.data".?
For algorithms that need numerical attributes, Strathclyde University produced the file "german.data-numeric". This file has been edited and several indicator variables added to make it suitable for algorithms which cannot cope with categorical variables. Several attributes that are ordered categorical (such as attribute 17) have been coded as integer. This was the form used by StatLog.?

數據集信息:

提供了兩個數據集。 原始數據集以Hofmann教授的形式提供,包含分類/符號屬性,并位于文件“ german.data”中。

對于需要數字屬性的算法,斯特拉斯克萊德大學產生了文件“ german.data-numeric”。 該文件已經過編輯,并添加了一些指標變量,以使其適用于無法處理分類變量的算法。 幾個按類別排序的屬性(例如屬性17)已編碼為整數。 這是StatLog使用的形式。


This dataset requires use of a cost matrix (see below)?

該數據集需要使用成本矩陣(請參見下文)
..... 1 2?
----------------------------?
1 0 1?
-----------------------?
2 5 0?

(1 = Good, 2 = Bad)?

The rows represent the actual classification and the columns the predicted classification.?
It is worse to class a customer as good when they are bad (5), than it is to class a customer as bad when they are good (1).?

行代表實際分類,列代表預測分類。

不好的時候將客戶分類為好(5),而不是好的時將客戶分類為壞(1)。

Attribute Information:

Attribute 1: (qualitative)?
Status of existing checking account?
A11 : ... < 0 DM?
A12 : 0 <= ... < 200 DM?
A13 : ... >= 200 DM / salary assignments for at least 1 year?
A14 : no checking account?

Attribute 2: (numerical)?
Duration in month?

Attribute 3: (qualitative)?
Credit history?
A30 : no credits taken/ all credits paid back duly?
A31 : all credits at this bank paid back duly?
A32 : existing credits paid back duly till now?
A33 : delay in paying off in the past?
A34 : critical account/ other credits existing (not at this bank)?

Attribute 4: (qualitative)?
Purpose?
A40 : car (new)?
A41 : car (used)?
A42 : furniture/equipment?
A43 : radio/television?
A44 : domestic appliances?
A45 : repairs?
A46 : education?
A47 : (vacation - does not exist?)?
A48 : retraining?
A49 : business?
A410 : others?

Attribute 5: (numerical)?
Credit amount?

Attibute 6: (qualitative)?
Savings account/bonds?
A61 : ... < 100 DM?
A62 : 100 <= ... < 500 DM?
A63 : 500 <= ... < 1000 DM?
A64 : .. >= 1000 DM?
A65 : unknown/ no savings account?

Attribute 7: (qualitative)?
Present employment since?
A71 : unemployed?
A72 : ... < 1 year?
A73 : 1 <= ... < 4 years?
A74 : 4 <= ... < 7 years?
A75 : .. >= 7 years?

Attribute 8: (numerical)?
Installment rate in percentage of disposable income?

Attribute 9: (qualitative)?
Personal status and sex?
A91 : male : divorced/separated?
A92 : female : divorced/separated/married?
A93 : male : single?
A94 : male : married/widowed?
A95 : female : single?

Attribute 10: (qualitative)?
Other debtors / guarantors?
A101 : none?
A102 : co-applicant?
A103 : guarantor?

Attribute 11: (numerical)?
Present residence since?

Attribute 12: (qualitative)?
Property?
A121 : real estate?
A122 : if not A121 : building society savings agreement/ life insurance?
A123 : if not A121/A122 : car or other, not in attribute 6?
A124 : unknown / no property?

Attribute 13: (numerical)?
Age in years?

Attribute 14: (qualitative)?
Other installment plans?
A141 : bank?
A142 : stores?
A143 : none?

Attribute 15: (qualitative)?
Housing?
A151 : rent?
A152 : own?
A153 : for free?

Attribute 16: (numerical)?
Number of existing credits at this bank?

Attribute 17: (qualitative)?
Job?
A171 : unemployed/ unskilled - non-resident?
A172 : unskilled - resident?
A173 : skilled employee / official?
A174 : management/ self-employed/?
highly qualified employee/ officer?

Attribute 18: (numerical)?
Number of people being liable to provide maintenance for?

Attribute 19: (qualitative)?
Telephone?
A191 : none?
A192 : yes, registered under the customers name?

Attribute 20: (qualitative)?
foreign worker?
A201 : yes?
A202 : no?

?

It is worse to class a customer as good when they are bad (5),

than it is to class a customer as bad when they are good (1).

當顧客不好時,將顧客歸類為好(5),

而不是將顧客分為好(1)。

隨機森林特征重要性評估(100顆樹)
隨機森林特征重要性評估(1000顆樹)

randomForest.py

random forest with 1000 trees:
accuracy on the training subset:1.000
accuracy on the test subset:0.772

準確性高于決策樹


比較之前


自己繪制樹圖


準確率不高,且嚴重過度擬合

accuracy on the training subset:0.991

accuracy on the test subset:0.680

# -*- coding: utf-8 -*-

"""

博主python金融風控評分卡模型和數據分析微專業(yè)課:http://dwz.date/b9vv

博主微信公眾號:pythonEducation @author: 231469242@qq.com

"""

import pandas as pd

import numpy as np

import matplotlib.pyplot as plt

from sklearn.ensemble import RandomForestClassifier

import matplotlib.pyplot as plt

import numpy as np

import pydotplus

from IPython.display import Image

import graphviz

from sklearn.tree import export_graphviz

from sklearn.datasets import load_breast_cancer

from sklearn.tree import DecisionTreeClassifier

from sklearn.model_selection import train_test_split

?

trees=1000

#讀取文件

readFileName="German_credit.xlsx"

#讀取excel

df=pd.read_excel(readFileName)

list_columns=list(df.columns[:-1])

x=df.ix[:,:-1]

y=df.ix[:,-1]

names=x.columns

x_train,x_test,y_train,y_test=train_test_split(x,y,random_state=0)

#調參

list_average_accuracy=[]

depth=range(1,30)

for i in depth:

? ? #max_depth=4限制決策樹深度可以降低算法復雜度,獲取更精確值

? ? tree= DecisionTreeClassifier(max_depth=i,random_state=0)

? ? tree.fit(x_train,y_train)

? ? accuracy_training=tree.score(x_train,y_train)

? ? accuracy_test=tree.score(x_test,y_test)

? ? average_accuracy=(accuracy_training+accuracy_test)/2.0

? ? #print("average_accuracy:",average_accuracy)

? ? list_average_accuracy.append(average_accuracy)

? ? ?

max_value=max(list_average_accuracy)

#索引是0開頭,結果要加1

best_depth=list_average_accuracy.index(max_value)+1

print("best_depth:",best_depth)

best_tree= DecisionTreeClassifier(max_depth=best_depth,random_state=0)

best_tree.fit(x_train,y_train)

accuracy_training=best_tree.score(x_train,y_train)

accuracy_test=best_tree.score(x_test,y_test)

print("decision tree:")? ?

print("accuracy on the training subset:{:.3f}".format(best_tree.score(x_train,y_train)))

print("accuracy on the test subset:{:.3f}".format(best_tree.score(x_test,y_test)))

?

n_features=x.shape[1]

plt.barh(range(n_features),best_tree.feature_importances_,align='center')

plt.yticks(np.arange(n_features),names)

plt.title("Decision Tree:")

plt.xlabel('Feature Importance')

plt.ylabel('Feature')

plt.show()

?

#生成一個dot文件,以后用cmd形式生成圖片

export_graphviz(best_tree,out_file="creditTree.dot",class_names=['bad','good'],feature_names=names,impurity=False,filled=True)

'''

best_depth: 12

decision tree:

accuracy on the training subset:0.991

accuracy on the test subset:0.680

'''

支持向量最高預測率

accuracy on the scaled training subset:0.867

accuracy on the scaled test subset:0.800效果高于隨機森林0.8-0.772=0.028


# -*- coding: utf-8 -*-

"""

Created on Fri Mar 30 21:57:29 2018

博主微信公眾號:pythonEducation

@author: 231469242@qq.com

SVM需要標準化數據處理<br>博主python金融風控評分卡模型和數據分析微專業(yè)課:http://dwz.date/b9vv

"""

#標準化數據

from sklearn import preprocessing

from sklearn.svm import SVC

from sklearn.model_selection import train_test_split

import matplotlib.pyplot as plt

import pandas as pd

?

#讀取文件

readFileName="German_credit.xlsx"

#讀取excel

df=pd.read_excel(readFileName)

list_columns=list(df.columns[:-1])

x=df.ix[:,:-1]

y=df.ix[:,-1]

names=x.columns

#random_state 相當于隨機數種子

X_train,x_test,y_train,y_test=train_test_split(x,y,stratify=y,random_state=42)

svm=SVC()

svm.fit(X_train,y_train)

print("accuracy on the training subset:{:.3f}".format(svm.score(X_train,y_train)))

print("accuracy on the test subset:{:.3f}".format(svm.score(x_test,y_test)))

'''

accuracy on the training subset:1.000

accuracy on the test subset:0.700

?

'''

#觀察數據是否標準化

plt.plot(X_train.min(axis=0),'o',label='Min')

plt.plot(X_train.max(axis=0),'v',label='Max')

plt.xlabel('Feature Index')

plt.ylabel('Feature magnitude in log scale')

plt.yscale('log')

plt.legend(loc='upper right')

?

#標準化數據

X_train_scaled = preprocessing.scale(X_train)

x_test_scaled = preprocessing.scale(x_test)

svm1=SVC()

svm1.fit(X_train_scaled,y_train)

print("accuracy on the scaled training subset:{:.3f}".format(svm1.score(X_train_scaled,y_train)))

print("accuracy on the scaled test subset:{:.3f}".format(svm1.score(x_test_scaled,y_test)))

'''

accuracy on the scaled training subset:0.867

accuracy on the scaled test subset:0.800

'''

#改變C參數,調優(yōu),kernel表示核函數,用于平面轉換,probability表示是否需要計算概率

svm2=SVC(C=10,gamma="auto",kernel='rbf',probability=True)

svm2.fit(X_train_scaled,y_train)

print("after c parameter=10,accuracy on the scaled training subset:{:.3f}".format(svm2.score(X_train_scaled,y_train)))

print("after c parameter=10,accuracy on the scaled test subset:{:.3f}".format(svm2.score(x_test_scaled,y_test)))

'''

after c parameter=10,accuracy on the scaled training subset:0.972

after c parameter=10,accuracy on the scaled test subset:0.716

'''

#計算樣本點到分割超平面的函數距離

#print (svm2.decision_function(X_train_scaled))

#print (svm2.decision_function(X_train_scaled)[:20]>0)

#支持向量機分類

#print(svm2.classes_)

#malignant和bening概率計算,輸出結果包括惡性概率和良性概率

#print(svm2.predict_proba(x_test_scaled))

#判斷數據屬于哪一類,0或1表示

#print(svm2.predict(x_test_scaled))


神經網絡

效果不如支持向量和隨機森林

最好概率

accuracy on the training subset:0.916
accuracy on the test subset:0.720


# -*- coding: utf-8 -*-

"""

Created on Sun Apr? 1 11:49:50 2018

博主微信公眾號:pythonEducation

@author: 231469242@qq.com

神經網絡需要預處理數據

博主python金融風控評分卡模型和數據分析微專業(yè)課:http://dwz.date/b9vv

"""

#Multi-layer Perceptron 多層感知機

from sklearn.neural_network import MLPClassifier

#標準化數據,否則神經網絡結果不準確,和SVM類似

from sklearn.preprocessing import StandardScaler

from sklearn.model_selection import train_test_split

import mglearn

import matplotlib.pyplot as plt

import numpy as np

import pandas as pd

?

#讀取文件

readFileName="German_credit.xlsx"

#讀取excel

df=pd.read_excel(readFileName)

list_columns=list(df.columns[:-1])

x=df.ix[:,:-1]

y=df.ix[:,-1]

names=x.columns

#random_state 相當于隨機數種子

x_train,x_test,y_train,y_test=train_test_split(x,y,stratify=y,random_state=42)

mlp=MLPClassifier(random_state=42)

mlp.fit(x_train,y_train)

print("neural network:")? ?

print("accuracy on the training subset:{:.3f}".format(mlp.score(x_train,y_train)))

print("accuracy on the test subset:{:.3f}".format(mlp.score(x_test,y_test)))

scaler=StandardScaler()

x_train_scaled=scaler.fit(x_train).transform(x_train)

x_test_scaled=scaler.fit(x_test).transform(x_test)

mlp_scaled=MLPClassifier(max_iter=1000,random_state=42)

mlp_scaled.fit(x_train_scaled,y_train)

print("neural network after scaled:")? ?

print("accuracy on the training subset:{:.3f}".format(mlp_scaled.score(x_train_scaled,y_train)))

print("accuracy on the test subset:{:.3f}".format(mlp_scaled.score(x_test_scaled,y_test)))

mlp_scaled2=MLPClassifier(max_iter=1000,alpha=1,random_state=42)

mlp_scaled2.fit(x_train_scaled,y_train)

print("neural network after scaled and alpha change to 1:")? ?

print("accuracy on the training subset:{:.3f}".format(mlp_scaled2.score(x_train_scaled,y_train)))

print("accuracy on the test subset:{:.3f}".format(mlp_scaled2.score(x_test_scaled,y_test)))

?

?

#繪制顏色圖,熱圖

plt.figure(figsize=(20,5))

plt.imshow(mlp_scaled.coefs_[0],interpolation="None",cmap="GnBu")

plt.yticks(range(30),names)

plt.xlabel("columns in weight matrix")

plt.ylabel("input feature")

plt.colorbar()

?

'''

neural network:

accuracy on the training subset:0.700

accuracy on the test subset:0.700

neural network after scaled:

accuracy on the training subset:1.000

accuracy on the test subset:0.704

neural network after scaled and alpha change to 1:

accuracy on the training subset:0.916

accuracy on the test subset:0.720

'''

xgboost

區(qū)分能力還可以

AUC: 0.8134
ACC: 0.7720
Recall: 0.9521
F1-score: 0.8480
Precesion: 0.7644

# -*- coding: utf-8 -*-

"""

Created on Tue Apr 24 22:42:47 2018

博主python金融風控評分卡模型和數據分析微專業(yè)課:http://dwz.date/b9vv

博主微信公眾號:pythonEducation

@author: 231469242@qq.com

出現module 'xgboost' has no attribute 'DMatrix'的臨時解決方法

初學者或者說不太了解Python才會犯這種錯誤,其實只需要注意一點!不要使用任何模塊名作為文件名,任何類型的文件都不可以!我的錯誤根源是在文件夾中使用xgboost.*的文件名,當import xgboost時會首先在當前文件中查找,才會出現這樣的問題。

? ? ? ? 所以,再次強調:不要用任何的模塊名作為文件名!

"""

import xgboost as xgb

from sklearn.cross_validation import train_test_split

import pandas as pd

import matplotlib.pylab as plt

#讀取文件

readFileName="German_credit.xlsx"

#讀取excel

df=pd.read_excel(readFileName)

list_columns=list(df.columns[:-1])

x=df.ix[:,:-1]

y=df.ix[:,-1]

names=x.columns

train_x, test_x, train_y, test_y=train_test_split(x,y,random_state=0)

dtrain=xgb.DMatrix(train_x,label=train_y)

dtest=xgb.DMatrix(test_x)

params={'booster':'gbtree',

? ? #'objective': 'reg:linear',

? ? 'objective': 'binary:logistic',

? ? 'eval_metric': 'auc',

? ? 'max_depth':4,

? ? 'lambda':10,

? ? 'subsample':0.75,

? ? 'colsample_bytree':0.75,

? ? 'min_child_weight':2,

? ? 'eta': 0.025,

? ? 'seed':0,

? ? 'nthread':8,

? ? ?'silent':1}

watchlist = [(dtrain,'train')]

bst=xgb.train(params,dtrain,num_boost_round=100,evals=watchlist)

ypred=bst.predict(dtest)

# 設置閾值, 輸出一些評價指標

y_pred = (ypred >= 0.5)*1

#模型校驗

from sklearn import metrics

print ('AUC: %.4f' % metrics.roc_auc_score(test_y,ypred))

print ('ACC: %.4f' % metrics.accuracy_score(test_y,y_pred))

print ('Recall: %.4f' % metrics.recall_score(test_y,y_pred))

print ('F1-score: %.4f' %metrics.f1_score(test_y,y_pred))

print ('Precesion: %.4f' %metrics.precision_score(test_y,y_pred))

metrics.confusion_matrix(test_y,y_pred)

print("xgboost:")?

#print("accuracy on the training subset:{:.3f}".format(bst.get_score(train_x,train_y)))

#print("accuracy on the test subset:{:.3f}".format(bst.get_score(test_x,test_y)))

print('Feature importances:{}'.format(bst.get_fscore()))

'''

AUC: 0.8135

ACC: 0.7640

Recall: 0.9641

F1-score: 0.8451

Precesion: 0.7523

#特征重要性和隨機森林差不多

Feature importances:{'Account Balance': 80, 'Duration of Credit (month)': 119,

?'Most valuable available asset': 54, 'Payment Status of Previous Credit': 84,

?'Value Savings/Stocks': 66, 'Age (years)': 94, 'Credit Amount': 149,

?'Type of apartment': 20, 'Instalment per cent': 37,

?'Length of current employment': 70, 'Sex & Marital Status': 29,

?'Purpose': 67, 'Occupation': 13, 'Duration in Current address': 25,

?'Telephone': 15, 'Concurrent Credits': 23, 'No of Credits at this Bank': 7,

?'Guarantors': 28, 'No of dependents': 6}

'''

最終結論:

xgboost 有時候特征重要性分析比隨機森林還準確,可見其強大之處

隨機森林重要因子排序? ? xgboost權重指數
Credit amount信用保證金? 149
age 年齡? ? ? ? ? ? ? ? ? ? ? ? ? ? 94
account balance 賬戶余額 80
duration of credit持卡時間 119 (信用卡逾期時間,每個銀行有所不同,以招商銀行為例,兩個月就會被??ǎ?/span>

?

2018-9-18數據更新

邏輯回歸驗證數據和catboost驗證數據差不多,可見邏輯回歸穩(wěn)定性

# -*- coding: utf-8 -*-

"""

博主python金融風控評分卡模型和數據分析微專業(yè)課:http://dwz.date/b9vv

作者郵箱 231469242@qq.com

博主微信公眾號:pythonEducation

技術文檔

https://www.cnblogs.com/webRobot/p/7216614.html

model accuracy is: 0.755

model precision is: 0.697841726618705

model sensitivity is: 0.3233333333333333

f1_score: 0.44191343963553525

AUC: 0.7626619047619048

?

根據iv值刪除后預測結果沒有變量完全保留的高

model accuracy is: 0.724

model precision is: 0.61320754717

model sensitivity is: 0.216666666667

f1_score: 0.320197044335

AUC: 0.7031

good classifier

?

帶入German_credit原始數據結果

accuracy on the training subset:0.777

accuracy on the test subset:0.740

A: 6.7807190511263755

B: 14.426950408889635

model accuracy is: 0.74

model precision is: 0.7037037037037037

model sensitivity is: 0.38

f1_score: 0.49350649350649356

AUC: 0.7885

"""

import math

import matplotlib.pyplot as plt

import pandas as pd

import numpy as np

from sklearn.cross_validation import train_test_split

from sklearn.linear_model.logistic import LogisticRegression

from sklearn.metrics import accuracy_score

from sklearn.cross_validation import cross_val_score

import statsmodels.api as sm

#混淆矩陣計算

from sklearn import metrics

from sklearn.metrics import roc_curve, auc,roc_auc_score

from sklearn.metrics import precision_score

from sklearn.metrics import accuracy_score

from sklearn.metrics import recall_score

from sklearn.metrics import f1_score

?

#df_german=pd.read_excel("german_woe.xlsx")

df_german=pd.read_excel("german_credit.xlsx")

#df_german=pd.read_excel("df_after_vif.xlsx")

y=df_german["target"]

x=df_german.ix[:,"Account Balance":"Foreign Worker"]

#x=df_german.ix[:,"Credit Amount":"Purpose"]

X_train, X_test, y_train, y_test = train_test_split(x, y, test_size=0.3, random_state=0)

classifier = LogisticRegression()

classifier.fit(X_train, y_train)

predictions = classifier.predict(X_test)

#驗證

print("accuracy on the training subset:{:.3f}".format(classifier.score(X_train,y_train)))

print("accuracy on the test subset:{:.3f}".format(classifier.score(X_test,y_test)))

#得分公式

'''

P0 = 50

PDO = 10

theta0 = 1.0/20

B = PDO/np.log(2)

A = P0 + B*np.log(theta0)

'''

def Score(probability):

? ? #底數是e

? ? score = A-B*np.log(probability/(1-probability))

? ? return score

#批量獲取得分

def List_score(pos_probablity_list):

? ? list_score=[]

? ? for probability in pos_probablity_list:

? ? ? ? score=Score(probability)

? ? ? ? list_score.append(score)

? ? return list_score

?

P0 = 50

PDO = 10

theta0 = 1.0/20

B = PDO/np.log(2)

A = P0 + B*np.log(theta0)

print("A:",A)

print("B:",B)

list_coef = list(classifier.coef_[0])

intercept= classifier.intercept_

?

#獲取所有x數據的預測概率,包括好客戶和壞客戶,0為好客戶,1為壞客戶

probablity_list=classifier.predict_proba(x)

#獲取所有x數據的壞客戶預測概率

pos_probablity_list=[i[1] for i in probablity_list]

#獲取所有客戶分數

list_score=List_score(pos_probablity_list)

list_predict=classifier.predict(x)

df_result=pd.DataFrame({"label":y,"predict":list_predict,"pos_probablity":pos_probablity_list,"score":list_score})

df_result.to_excel("score_proba.xlsx")

#變量名列表

list_vNames=df_german.columns

#去掉第一個變量名target

list_vNames=list_vNames[1:]

df_coef=pd.DataFrame({"variable_names":list_vNames,"coef":list_coef})

df_coef.to_excel("coef.xlsx")

y_true=y_test

y_pred=classifier.predict(X_test)

accuracyScore = accuracy_score(y_true, y_pred)

print('model accuracy is:',accuracyScore)

#precision,TP/(TP+FP) (真陽性)/(真陽性+假陽性)

precision=precision_score(y_true, y_pred)

print('model precision is:',precision)

#recall(sensitive)敏感度,(TP)/(TP+FN)

sensitivity=recall_score(y_true, y_pred)

print('model sensitivity is:',sensitivity)

??

#F1 = 2 x (精確率 x 召回率) / (精確率 + 召回率)

#F1 分數會同時考慮精確率和召回率,以便計算新的分數??蓪?F1 分數理解為精確率和召回率的加權平均值,其中 F1 分數的最佳值為 1、最差值為 0:

f1Score=f1_score(y_true, y_pred)

print("f1_score:",f1Score)

?

def AUC(y_true, y_scores):

? ? auc_value=0

? ? #auc第二種方法是通過fpr,tpr,通過auc(fpr,tpr)來計算AUC

? ? fpr, tpr, thresholds = metrics.roc_curve(y_true, y_scores, pos_label=1)

? ? auc_value= auc(fpr,tpr) ###計算auc的值

? ? #print("fpr:",fpr)

? ? #print("tpr:",tpr)

? ? #print("thresholds:",thresholds)

? ? if auc_value<0.5:

? ? ? ? auc_value=1-auc_value

? ? return auc_value


def Draw_roc(auc_value):

? ? fpr, tpr, thresholds = metrics.roc_curve(y, list_score, pos_label=0)

? ? #畫對角線

? ? plt.plot([0, 1], [0, 1], '--', color=(0.6, 0.6, 0.6), label='Diagonal line')

? ? plt.plot(fpr,tpr,label='ROC curve (area = %0.2f)' % auc_value)

? ? plt.title('ROC curve')?

? ? plt.legend(loc="lower right")

#評價AUC表現

def AUC_performance(AUC):

? ? if AUC >=0.7:

? ? ? ? print("good classifier")

? ? if 0.7>AUC>0.6:

? ? ? ? print("not very good classifier")

? ? if 0.6>=AUC>0.5:

? ? ? ? print("useless classifier")

? ? if 0.5>=AUC:

? ? ? ? print("bad classifier,with sorting problems")

#Auc驗證,數據采用測試集數據

auc_value=AUC(y, list_score)

print("AUC:",auc_value)

#評價AUC表現

AUC_performance(auc_value)

#繪制ROC曲線

Draw_roc(auc_value)


?博主網校主頁 :http://dwz.date/bwes

博主網校主頁?


python德國信用評分卡建模(附代碼AAA推薦)的評論 (共 條)

分享到微博請遵守國家法律
嵩明县| 灵山县| 泰和县| 于田县| 秦安县| 宽城| 韶关市| 仙桃市| 建宁县| 翁牛特旗| 历史| 钦州市| 金昌市| 西华县| 鄢陵县| 平陆县| 祁连县| 武平县| 金秀| 连城县| 龙江县| 城固县| 贵溪市| 陇西县| 高淳县| 姚安县| 二连浩特市| 辽阳县| 巴塘县| 土默特右旗| 贺州市| 汉沽区| 武功县| 稻城县| 朔州市| 巴彦淖尔市| 永春县| 海阳市| 四平市| 永吉县| 微山县|