晶文摘

[大數據] 關連規則:關聯與相關性探勘實作

給晶新聞一個讚



關連規則:關聯與相關性探勘實作


作者: 夏肇毅

初稿: 20220819

 

重點:

關連規則:  資料所屬特徵同時出現的規則

購物籃分析

Support: 組合出現於資料集的頻率比,等於出現數/資料集總數

Confidence: 對所有出現X裡又出現Y的比,等於出現X與Y數/出現X數

Frequent Item Set: 頻繁項集,為支持度大於某數值之項集

Apriori

Fpgrowth

推薦系統

mlxtend

Weka

Wide & Deep

 

Search: 購物籃分析 python

 

Day 05:購物籃分析(Basket Analysis) - iT 邦幫忙

https://ithelp.ithome.com.tw › articles

Day 06:購物籃分析背後的演算法-- Apriori - iT 邦幫忙

Day 07:初探推薦系統(Recommendation System)

Day 08:協同過濾(Collaborative Filtering)

Day 10:以模型為基礎的協同過濾 (Model Based Filtering)

Day 11:混合的推薦模型 (Hybrid Model)列文

 

 

範例: 找出動物特徵相關性


資料來源: 加州大學爾灣分校(University of California, Irvine) “zoo.dataset”,http://archive.ics.uci.edu/ml/datasets/zoo

毛髮,羽毛,雞蛋,牛奶,飛行,水生,捕食者,帶齒,脊椎,呼吸,有毒,鰭,腿,尾巴

hair,feather,egg,milk,airborne,aquatic,predator,toothed,backbone,breathing,venomous,fin,leg,tail,

 

根據先後相關性來推薦:

範例: 找出採購相關性-尿布與啤酒 

應用: 結帳口推薦產品 促銷推薦產品 個人產品推薦


根據先後相關性來推薦:

範例: 找出閱讀新聞主題相關性-推薦新聞

應用: 個人化新聞推薦

 

Search: association rule mining

Association rule learning - Wikipedia


 

Search: Association python

Apriori: Association Rule Mining In-depth Explanation and ...

https://towardsdatascience.com › apriori-a…

Concepts of Apriori

Support: Fraction of transactions that contain an itemset.

Confidence: Measures how often items in Y appear in transactions that contain X

Frequent Item Set: An itemset whose support is greater than or equal to a minSup threshold

Apriori Algorithm

Algorithm Overview

Python Implementation

Apriori Function

Candidate Generation

Pruning

Get Frequent Itemset from Candidate

Result

Shortcomings

Candidate itemsets size at each stage

Time elapsed at each stage

Try on different datasets (in repo)

kaggle.csv

data7.csv

Improvements

Hashing: reduce database scans

Transaction reduction: remove infrequent transactions from further consideration

Partitioning: possibly frequent must be frequent in one of the partition

Dynamic Itemset Counting: reduce the number of passes over the data

Sampling: pick up random samples

 

Python 實戰篇:Apriori Algorithm ( Mlxtend library )

https://artsdatascience.wordpress.com › 2019/12/10 › p...

 

Support, Association rules, and Confidence in Python

https://towardsdev.com › support-associat...

 

Association Rules with Python | Kaggle

https://www.kaggle.com › mervetorkan





Apriori - mlxtend

http://rasbt.github.io › frequent_patterns


>學習 Apriori - mlxtend

Example 1 -- Generating Frequent Itemsets

Example 2 -- Selecting and Filtering Results

Example 3 -- Working with Sparse Representations


>產生資料集

dataset = [[..],[..],..]


>載入 模組


import pandas as pd

from mlxtend.preprocessing import TransactionEncoder

from mlxtend.frequent_patterns import apriori


>載入 資料至 pandas

te = TransactionEncoder()

te_ary = te.fit(dataset).transform(dataset)

df = pd.DataFrame(te_ary, columns=te.columns_)

df


>執行 apriori,使用於df之資料 最低支援為0.6

apriori(df, min_support=0.6)





Fpgrowth - mlxtend

http://rasbt.github.io › frequent_patterns




比較  fp-Growth 與apriori

Search:  association rule fp apriori


(PDF) Association rule mining with apriori and fpgrowth using ...

https://www.researchgate.net › publication



於Weka執行 FP Growth 比 Apriori快



>產生資料集

dataset = [[..],[..],..]


>載入 模組

from mlxtend.frequent_patterns import fpgrowth as fg

fg.fpgrowth(df, min_support=0.6)

 

>載入 資料至 pandas

te = TransactionEncoder()

te_ary = te.fit(dataset).transform(dataset)

df = pd.DataFrame(te_ary, columns=te.columns_)

df


>執行 fpgrowth,使用於df之資料 最低支援為0.6

fg.fpgrowth(df, min_support=0.6, use_colnames=True)

 

>比較執行時間

%timeit -n 100 -r 10 apriori(df, min_support=0.6)

%timeit -n 100 -r 10 fp.fpgrowth(df, min_support=0.6)

 

用Weka對資料集進行關聯規則分析!. 逐步講解 - Medium

https://medium.com › 用weka對資料集進行關聯規則分...

關聯規則(Association Rule)

Apriori

Fp-Growth



association_rules: Association rules generation from frequent itemsets

Function to generate association rules from frequent itemsets

from mlxtend.frequent_patterns import association_rules

http://rasbt.github.io/mlxtend/user_guide/frequent_patterns/association_rules/



深度學習推薦系統:

根據同時相關性 或 先後相關性來推薦

 

Search: 推薦系統

[Day -20] 推薦系統介紹(Recommendation System) - iT 邦幫忙

https://ithelp.ithome.com.tw › articles

隨機推薦:

依照熱門排序:

Content-based filtering:

Collaborative Filtering (協同過濾):

Model-based:

Memory-based:

推薦系統最常遇到的問題

冷啟動(Cold Start):

探索問題(Exploit & Explore, EE):


[Day-21] Wide & Deep 推薦系統實作



Search: Wide & Deep

Wide&Deep模型_推薦系統_原理

https://medium.com › data-scientists-playground › wide...