銀行案例學習實例5_Reject Inference 拒絕引用
python金融風控評分卡模型和數(shù)據(jù)分析微專業(yè)課:http://dwz.date/b9vv

Reject Inference is a topic that separates credit scoring from the other classification problems such as marketing propensity models for cross / upselling, etc. As you will discover later, reject inference is about patching information gaps that exist during the development of application scorecards. Let us try to gain a more holistic perspective about patching information gaps through the way human beings have evolved.
拒絕推理是一個將信用評分與其他分類問題(如交叉/向上銷售的營銷傾向模型等)分開的主題。正如您稍后將發(fā)現(xiàn)的,拒絕推斷是關于修補應用程序記分卡開發(fā)過程中存在的信息差距。 讓我們嘗試通過人類進化的方式獲得更全面的視角來修補信息空白。
Connecting the Dots

Recently I watched a Hindi movie called ‘Ankhon Dekhi’, the movie title translates to ‘seen with your eyes’. In the beginning, the central character of this movie, after a dramatic event in his life, decides to believe only what he sees with his eyes. What follows are his adventures / misadventures while doing so. Although the theme of this movie has a high potential, I think it became a bit pretentious in its presentation especially towards the end. The idea of believing your eyes seems appropriate but it has its own shortcomings. Evolution has trained our brain to superseded our vision to make split second decisions. Numerous optical illusions are proofs of this phenomenon. We will explore some optical illusions / illustrations that will highlight how our brain and eyes work in this article. But before that, let us consider an example of split second decision making as a necessity for survival.
Imagine a human ancestor in the middle of the dark.?Our ancestor is hungry, he hasn’t? eaten in days. He sees a silhouette of a creature lurking in front of him.? This creature could be his next meal. On the other hand, this could be a predatory creature and our ancestor will become a delicious meal for this creature. The reason why humans are still around on this planet is because our ancestors’ eyes and brain have created some simple rules to deal with this situation. One of the instruments evolution has equipped humans with is ..
連接點
最近我看了一部名為“Ankhon Dekhi”的印地語電影,電影名稱翻譯為“用你的眼睛看”。一開始,這部電影的中心人物,在他生命中的戲劇性事件之后,決定只相信他用眼睛看到的東西。接下來是他這樣做的冒險/不幸事件。雖然這部電影的主題具有很高的潛力,但我認為它在演示中變得有點自命不凡,特別是在最后。相信你的眼睛的想法似乎是合適的,但它有其自身的缺點。進化訓練我們的大腦取代我們的視野,做出分裂的第二個決定。許多視錯覺都是這種現(xiàn)象的證據(jù)。我們將探討一些視覺錯覺/插圖,以突出我們的大腦和眼睛在本文中的工作方式。但在此之前,讓我們考慮一下分裂第二決策作為生存必需品的一個例子。
想象一下在黑暗中的人類祖先。我們的祖先很餓,他幾天沒有吃東西。他看到一個潛伏在他面前的生物的輪廓。這個生物可能是他的下一頓飯。另一方面,這可能是一種掠奪性生物,我們的祖先將成為這種生物的美味佳肴。人類仍然在這個星球上的原因是因為我們的祖先的眼睛和大腦已經(jīng)創(chuàng)造了一些簡單的規(guī)則來處理這種情況。進化為人類配備的儀器之一是.....
?
Power of Context

As promised earlier, let me present a couple of illustrations to emphasize the power of context. In the first of these illustrations (shown adjacent), try to compare the length of two yellow lines and decide which one is longer. In this case, you will most probably identify the top yellow line as longer than the bottom yellow line. In this illusion, your brain will supersede the information received through your eyes based on the context or surrounding patterns around the yellow lines. As you might appreciate our three-dimensional world will rarely, or most probably never, offer a pattern similar to the optical illusion of illustration 1. Hence, for most practical purposes our brain has made the right decision though may seem ridiculous in this case.
正如之前所承諾的那樣,讓我提出一些插圖來強調(diào)語境的力量。在第一個插圖(顯示為相鄰)中,嘗試比較兩條黃線的長度并確定哪一條更長。在這種情況下,您很可能將頂部黃線識別為比底部黃線長。在這種幻覺中,你的大腦將取代基于黃色線周圍的環(huán)境或周圍模式通過你的眼睛接收的信息。你可能會欣賞我們的三維世界很少,或者很可能永遠不會提供類似于插圖1的視錯覺的模式。因此,對于大多數(shù)實際目的,我們的大腦做出了正確的決定,盡管在這種情況下可能看起來很荒謬。

Illustration 2 – Source: ‘Thinking Fast and Slow’ by Daniel Kahneman
Now, let us have a look at the second illustration as shown adjacent. Notice B and 13 in the middle of the top and the bottom sequences, they are identical. You read the top sequence ABC and the bottom sequence 12,13,14. This is phenomenal, what your brain has just done in a split second is something most text mining and artificial intelligence algorithms try to do painstakingly. I must point out, CAPTCHA is a proof that most of these algorithms fail to capture what nature has equipped us with – the ability to join missing links.
Our brain tries to fill the gap in our information using the available information. This is precisely what we try to do while using reject inference for credit scoring.
現(xiàn)在,讓我們看看相鄰的第二個插圖。注意頂部和底部序列中間的B和13,它們是相同的。您閱讀了頂部序列ABC和底部序列12,13,14。這是驚人的,你的大腦剛剛在瞬間完成的是大多數(shù)文本挖掘和人工智能算法試圖做的苦心事。我必須指出,CAPTCHA證明了大多數(shù)這些算法無法捕捉大自然為我們提供的東西 - 加入缺失鏈接的能力。
我們的大腦試圖利用現(xiàn)有信息填補我們信息的空白。這正是我們在使用拒絕推理進行信用評分時嘗試做的事情。
?

正如您在上面的示意圖中所看到的,我們有關于已支付貸款的信息,根據(jù)其業(yè)績將其標記為好或壞。 但是,要為整個門戶群體創(chuàng)建整體記分卡,我們需要推斷被拒絕貸款的行為。 這種補充信息的過程稱為拒絕推斷,對于開發(fā)整體記分卡至關重要。 以下部分介紹了一些常用的執(zhí)行拒絕推理的方法。 我還必須指出,盡管在工業(yè)中廣泛使用,但以下方法并不完美。
?
?
Use Credit Bureaus
This method involves using information from credit bureaus to fill the gaps. If other lenders have disbursed loans to your rejected applicants then it makes sense to tag the rejected customers good or bad based on their performance with the other lenders. Although this method is possibly the best way to infer rejects with concrete information, it has the following challenges
It unlikely that all the rejected loans have got a loan with some other lenders around the development period of the scorecard
Difference in collection process and reporting among lenders could influence dubious tagging for customers’ performance
In most cases using credit bureaus?information alone won’t be sufficient enough to tag the entire through-the-door population. That is why we need analytical methods for reject inference as discussed in the next segment.
使用信用局
這種方法涉及使用信用局的信息填補空白。 如果其他貸款人已向您被拒絕的申請人發(fā)放貸款,那么根據(jù)他們與其他貸方的表現(xiàn)來判斷被拒絕的客戶是好還是壞是有意義的。 雖然這種方法可能是用具體信息推斷拒絕的最佳方法,但它存在以下挑戰(zhàn)
所有被拒絕的貸款都不可能在記分卡的開發(fā)期間與其他一些貸款人一起獲得貸款
貸款人收集流程和報告的差異可能會影響客戶績效的可疑標記
在大多數(shù)情況下,單獨使用信用局信息不足以標記整個門戶人口。 這就是我們需要分析方法進行拒絕推理的原因,如下一部分所述。
Augmentation through Parceling
Augmentation in different forms is the most commonly used methodology for reject inference. Now as shown in the above schematic we have fairly concrete tagging of good and bad loans for all the disbursed loans. We can easily run a classification algorithm like logistic regression (follow this link?Part 3), neural nets or decision tree to create a Known-Good-Bad (KGB) model. The same KGB model is used to score the rejected loans. Once the scoring is completed the analyst could create a table similar to the one shown below:
通過Parceling進行擴充
不同形式的增強是最常用的拒絕推理方法。 現(xiàn)在如上圖所示,我們對所有已發(fā)放貸款的好壞貸款進行了相當具體的標記。 我們可以輕松地運行分類算法,如邏輯回歸(遵循此鏈接第3部分),神經(jīng)網(wǎng)絡或決策樹來創(chuàng)建一個已知 - 良好 - 壞(KGB)模型。 同樣的克格勃模型用于對被拒絕的貸款進行評分。 評分完成后,分析師可以創(chuàng)建一個類似于下圖所示的表格:
Reject Inference
Let us try to understand the dynamics of the loan application process before establishing the necessity for reject inference. The ‘through-the-door’ loan applications are assessed by underwriters to establish the creditworthiness of the applicants. The underwriters will either accept or reject the applications based on the credentials of the applicants. Moreover, the customers with accepted applications will either avail the loans or not. This is shown in the schematic below:
拒絕推理
在確定拒絕推理的必要性之前,讓我們試著了解貸款申請流程的動態(tài)。?審批人員對“通過'貸款申請進行評估,以確定申請人的信譽。審批人員將根據(jù)申請人的證書接受或拒絕申請。 此外,接受申請的客戶將利用或不利用貸款。 這顯示在下面的示意圖中:

As you may notice in the above table, we have divided rejected applications into the same proportion of good / bad as in the disbursed loans for the score range. For instance, the score range of 232-241 has 22% bad loans. We have divided 2295 rejected applicants in this bucket into 505 (this is 22% of 2295) bad loans and 1790 good loans. We will randomly choose 505 rejected applications in the score range of 232-241 and assign them as bad loans (the remaining loans in this bucket will be assigned as good). Now we will create a holistic scorecard by re-running the classification algorithm i.e. logistic regression on the entire through-the-door population.
I hope you have noticed that we have used the principles of power-of-context discussed above by using score ranges as the criteria for augmentation.
正如您在上表中所注意到的那樣,我們將被拒絕的申請分為與分數(shù)范圍的已支付貸款相同的好/壞比例。 例如,232-241的得分范圍有22%的不良貸款。 我們已將2295名被拒絕的申請人分為505(這是2295的22%)不良貸款和1790良好貸款。 我們將在232-241的分數(shù)范圍內(nèi)隨機選擇505個被拒絕的申請,并將其分配為不良貸款(此桶中的剩余貸款將被分配為好)。 現(xiàn)在我們將通過重新運行分類算法來創(chuàng)建整體記分卡,即對整個門戶群體進行邏輯回歸。
我希望你注意到我們使用得分范圍作為擴充的標準,使用了上面討論的上下文的原則。
Fuzzy Augmentation
A fuzzy augmentation is an extended form of parceling, here rather than randomly assigning loans as good and bad we will create multiple copies of rejected loans in the proportion of good / bad % in the score range. For instance, 22 copies of a single rejected loan in?the score range of 232-241 will be tagged as bad and 78 copies as good. The process will be repeated for all the rejected loans. This is similar to the workings of fuzzy logic. Fuzzy augmentation is believed to be a superior method for reject inference to produce holistic scorecards.
模糊增強
模糊增值是一種擴展形式的分割,這里不是隨意分配貸款的好壞我們將按分數(shù)范圍內(nèi)好/壞百分比的比例創(chuàng)建多個被拒絕貸款的副本。 例如,在分數(shù)范圍232-241的單個被拒絕貸款的22份副本將被標記為壞,78份副本被標記為好。 所有被拒絕的貸款將重復該過程。 這類似于模糊邏輯的工作方式。 模糊增強被認為是拒絕推理以產(chǎn)生整體記分卡的優(yōu)良方法。
Sign-off Note
I know all the above methods for reject inference have their shortcomings. I have seen several experts and academicians cringe at the mention of the above methods for reject inference. However thus far, these are the best methods we have for reject inference with our current knowledge of mathematics and logic. I must say, nature is still hiding a few brilliant tricks under her sleeves such as our own ability to decipher CAPTCHAs. Some day when we will learn more about the inner workings of our own brain we might crack the bigger code for reject inference and millions of similar problems. Nature does reveal herself in piecemeal so there is still tremendous hope!
簽字筆記
我知道拒絕推理的所有上述方法都有其缺點。 我看到有幾位專家和學者在提到拒絕推理的上述方法時感到畏縮。 然而到目前為止,這些是我們用現(xiàn)有的數(shù)學和邏輯知識進行拒絕推理的最佳方法。 我必須說,大自然仍然隱藏在她的袖子下的一些聰明的技巧,比如我們自己破解CAPTCHA的能力。 有一天,當我們將更多地了解我們自己大腦的內(nèi)部運作時,我們可能會破解更大的拒絕推理和數(shù)百萬類似問題的代碼。 大自然確實零碎地展示自己,所以仍有巨大的希望!
up主微信公眾號pythonEducation
博主網(wǎng)校主頁 :http://dwz.date/bwes
