¦ÛµM»y¨¥³B²zNLPÀ³¥Î¹ê§@
§@ªÌ: ®L»F¼Ý
ªì½Z: 20220822
¦ÛµM»y¨¥³B²z(Natural language processing)
¬ì§Þ> [¤H¤u´¼¼z] CNN¡A¼v¹³¤À°Ï¶ô»PRNN
http://cubicpower.idv.tw/cubicnotes/notes-0000038.html
Google Tensorflow: Text
Sentiment analysis- IMDB large movie review dataset
Basic text classification
https://www.tensorflow.org/tutorials/keras/text_classification?hl=zh-tw
°ò¥»¤å¥»¤ÀÃþ
±¡ºü¤ÀªR
¤U¸ü¨Ã±´¯Á IMDB ¼Æ¾Ú¶°
¥[¸ü¼Æ¾Ú¶°
·Ç³Æ¼Æ¾Ú¶°¶i¦æ°V½m
°t¸m¼Æ¾Ú¶°¥H´£°ª©Ê¯à
³Ð«Ø¼Ò«¬
·l¥¢¨ç¼Æ©MÀu¤Æ¾¹
°V½m¼Ò«¬
µû¦ô¼Ò«¬
³Ð«ØÀH®É¶¡Åܤƪº·Ç½T«×©M·l¥¢¹Ï
¾É¥X¼Ò«¬
¹ï·s¼Æ¾Úªº±À½×
½m²ß¡GÃö©ó Stack Overflow °ÝÃDªº¦hÃþ¤ÀÃþ
Word embeddings
https://www.tensorflow.org/text/guide/word_embeddings?hl=zh-tw
µü´O¤J
±N¤å¥»ªí¥Ü¬°¼Æ¦r
One-hot ½s½X
¥Î°ß¤@ªº¼Æ¦r½s½X¨CÓ³æµü
µü´O¤J
³]¸m
¤U¸ü IMDb ¼Æ¾Ú¶°
¨Ï¥Î´O¤J¼h
¤å¥»¹w³B²z
³Ð«Ø¤ÀÃþ¼Ò«¬
½sĶ©M°V½m¼Ò«¬
À˯Á¸g¹L°V½mªºµü´O¤J¨Ã±N¥¦Ì«O¦s¨ìºÏºÐ
¥iµø¤Æ´O¤J
Text classification with an RNN
https://www.tensorflow.org/text/tutorials/text_classification_rnn?hl=zh-tw
¨Ï¥Î RNN ¶i¦æ¤å¥»¤ÀÃþ
³]¸m
³]¸m¿é¤JºÞ¹D
³Ð«Ø¤å¥»½s½X¾¹
³Ð«Ø¼Ò«¬
°V½m¼Ò«¬
°ïÅ|¨âөΦhÓ LSTM ¼h
Classify text with BERT
https://www.tensorflow.org/text/tutorials/classify_text_with_bert?hl=zh-tw
¨Ï¥Î BERT ¹ï¤å¥»¶i¦æ¤ÀÃþ
Ãö©ó BERT
±¡ºü¤ÀªR
±q TensorFlow Hub ¥[¸ü¼Ò«¬
¿ï¾Ü¤@Ó BERT ¼Ò«¬¶i¦æ·L½Õ
¹w³B²z¼Ò«¬
¨Ï¥Î BERT ¼Ò«¬
©w¸q§Aªº¼Ò«¬
¼Ò«¬°V½m
·l¥¢¨ç¼Æ
Àu¤Æ¾¹
¥[¸ü BERT ¼Ò«¬¨Ã¶i¦æ°V½m
µû¦ô¼Ò«¬
ø»sÀH®É¶¡Åܤƪº·Ç½T©Ê©M·l¥¢
¾É¥X±À²z
Search: kaggle µn¤J
¨âºØ¨ú¥ÎKaggle ¸ê®Æ¶°ªº¤èªk - Nancy SW
https://nancysw.medium.com › ¨âºØ¨ú¥Î-kaggle-¸ê®Æ...
Search: ²á¤Ñ¾÷¾¹¤H python
¦bPython ¤¤¹ê§@¹ï¸Ü«¬²á¤Ñ¾÷¾¹¤H - WANcatServer
https://wancat.cc › post › python-chatbot-context
Search: chatbot python github
zake7749/Chatbot: °ò©ó¦V¶q¤Ç°tªº±¡¹Ò¦¡²á¤Ñ¾÷¾¹¤H - GitHub
https://github.com › zake7749 › Chatbot
Mianbot
¤Ç°t¥Ü¨Ò
Àô¹Ò»Ý¨D
¨Ï¥Î¤è¦¡
²á¤Ñ¾÷¾¹¤H
pºâ¤Ç°t«×
³W«h®æ¦¡
°Ýµª´ú¸Õ¥Î¸ê®Æ¶°
Building a Simple Chatbot from Scratch in Python (using NLTK)
https://github.com › parulnith › Building-...
Building a Simple Chatbot from Scratch in Python (using NLTK)
NLP
Import necessary libraries
Downloading and installing NLTK
Installing NLTK Packages
Reading in the corpus
Tokenisation
Preprocessing
Keyword matching
Generating Response
Bag of Words
TF-IDF Approach
Cosine Similarity
Day 12¡G§Ö³t§¹¦¨¤@Ó¡y¹ï¸Ü¾÷¾¹¤H¡z(ChatBot)
Day 13¡G§Ö³t§¹¦¨¤@Ó¡y¹ï¸Ü¾÷¾¹¤H¡z(ChatBot) -- Äò
Search: ¤Ï¬~¿ú ¼Ò«¬ python
¡i¥É¤sAI¹ê¨Ò3¡j¦Û«Ø¤Ï¬~¿ú¶Â¦W³æ°»´ú¼Ò«¬¡A§Ö³t´ª¥X°ÝÃD
https://www.ithome.com.tw › news
(Top 1% Solution)¥É¤s¤H¤u´¼¼z¤½¶}¬D¾ÔÁÉ2020®L©uÁÉ - Medium
https://medium.com › ¥É¤s¤H¤u´¼¼z¤½¶}¬D¾ÔÁÉ2020®L...
ª¦Âιê§@
¸ê®Æ²M²z
¼Ò«¬°V½m¬yµ{
¬~¿ú¤å³¹¤ÀÃþ¼Ò«¬(AML Classifier)
CountVectorizer+¾ë¯À¨©¸´µ(Multinomial Naive Bayes)
TfidfVectorizer+¾ë¯À¨©¸´µ(Multinomial Naive Bayes)
BERT-Based Model
Bidirectional Encoder Representations from Transformers (BERT)
NLP¶}·½®M¥ó — Kashgari
®M¥ó¦w¸Ë
BERT + BiLSTM + CRF
Conditional Random Fields (CRF)
¿é¤J¸ê®Æ®æ¦¡
¸ê®ÆÅçÃÒ¶°
¼Ò«¬°V½m(AML Classifier)
Rule-based Approach
AML Keyword ListÀË°Q»P×¥¿
AMLµJÂI¤Hª«Â^¨ú¼Ò«¬(AML NER Model)
Name Entity Recognition
SOTA of Name Entity Recognition
BERT — ¥y¤lLevelªºNER¼Ò«¬
¼Ò«¬°V½m(NER Model)
¼Ò«¬¤ñ¸û
Bi-directional LSTM/GRU
Bi-directional BiLSTM/GRU + CRF
CNN + LSTM
Bi-directional LSTM + CNN (Customized)
Search: ¨¾¶B´Û¼Ò«¬ python
¾aAI§êºtÃa¤H¨Ó½m§L¡ISAS´¦ÅS¥ÎGAN³]pª÷¿Ä¨¾¶B´Û¼Ò«¬ªº·s ...
https://www.ithome.com.tw › news
¨BÆJ¤@¡Bºc«ØAmazon Fraud Detector ¼Ò«¬
https://pages.awscloud.com › Tech-blog_Amazon-Frau...
Search: ¶B´Û python
python «H¥Î¥d´Û¶B¼Ò«¬«Ø¥ß - µ{¦¡¤H¥Í
https://www.796t.com › content
¸ê®Æ·Ç³Æ: ¨Ó·½©óKaggle
·Ç³Æ¨Ãªì¨BÀ˵ø¸ê®Æ¶°
®É¶¡§Ç¦C¤Uªº¥æ©öµo¥ÍÀW²v¡]¤À¬°¶BÄF©M¥¿±`¡^
¶BÄF©M¥¿±`¥æ©ö¥æ©öª÷ÃBªºÀW²v¤À§G
¦U¯S¼x©M¦]ÅܼƪºÃö«Y
¥ÎÅÞ¿è°jÂk¤èªk¹ï«H¥Î¥d¸ê®Æ¶i¦æ«Ø¼Ò¤ÀªR
«H¥Î¥d¶BÄF¤ÀªR-¤£¥¿Å¸ê®Æ¤ÀªR»P³B²zkernel½Ķ-§¹¾ãª©
https://medium.com › ¾÷¾¹¾Ç²ßª¾ÃѾúµ{ › «H¥Î¥d¶BÄF...
¹w³B²z
ÁY©ñ©M¤À°t Scaling and Distributing
©î¤À¼Æ¾Ú Splitting the Data¡]±qì©lDataFrame¡^
ÀH¾÷¤í±Ä¼Ë©M¹L±Ä¼Ë
¤À§G©M¬ÛÃö©Ê Distributing and Correlating
²§±`ÀË´ú Anomaly Detection
°ºû©M¤À¸s Dimensionality Reduction and Clustering (t-SNE)
¤ÀÃþ¾¹ Classifiers
§ó²`¤J¦a¤F¸ÑÅÞ¿è¦^Âk A Deeper Look into Logistic Regression
¨Ï¥ÎSMOTE¶i¦æ¹L±Ä¼Ë Oversampling with SMOTE
´ú¸Õ
¨Ï¥ÎÅÞ¿è¦^Âk¶i¦æ´ú¸Õ Test Data with Logistic Regression
¯«¸gºôµ¸´ú¸Õ¡]¤í±Ä¼Ë»P¹L±Ä¼Ë¡^Neural Networks Testing (Undersampling vs Oversampling)
Part 5. Imbalanced Data ¤£¥¿Å¸ê®Æ - iT ¨¹À°¦£
https://ithelp.ithome.com.tw › articles
µû¦ô«ü¼Ð
Confusion Matrix ²V²c¯x°}
Precision and Recall ºë½T²v»P¥l¦^²v
F1 score
ROC(Receiver Operating Characteristic) ±µ¦¬ªÌ¾Þ§@¯S¼x¦±½u
¦±½u¤U±¿nºÙ Area Under Curve (AUC)
«²Õ¸ê®Æ
Oversampling ¹L±Ä¼Ë
SMOTE (Synthetic Minority Oversampling Technique)
Border Line SMOTE
Undersampling ¤í±Ä¼Ë
Tomek Link
Edited Nearest Neighbor
ª`·N¨Æ¶µ
¥ý¤Á¤À¸ê®Æ¡A¦A¹ï°V½m¸ê®Æ±Ä¼Ë¡C
±`³z¹L¥æ¤eÅçÃÒ±±¨î¹LÀÀ¦X¡C
Æ[¹î¤Ö¼Æ¼Ë¥»»P¦h¼Æ¼Ë¥»¤À¥¬±¡§Î¡C
Search: Credit Fraud || Dealing with Imbalanced Datasets
Credit Fraud || Dealing with Imbalanced Datasets - Kaggle
https://www.kaggle.com › janiobachmann
Credit Fraud || Dealing with Imbalanced Datasets
https://www.kaggle.com/janiobachmann/credit-fraud-dealing-with-imbalanced-datasets
https://www.kaggle.com/code/janiobachmann/credit-fraud-dealing-with-imbalanced-datasets/notebook
«H¥Î´Û¶B±´´ú¾¹
¤F¸Ñ§Ú̪º¼Æ¾Ú
¦¬¶°§Ú̪º¼Æ¾Úªº·Pı
¹w³B²z
ÁY©ñ©M¤À§G
©î¤À¼Æ¾Ú
ÀH¾÷¤í±Ä¼Ë©M¹L±Ä¼Ë
¤À§G©MÃöÁp
²§±`ÀË´ú
°ºû©M»EÃþ¡]t-SNE¡^
¤ÀÃþ¾¹
§ó²`¤J¦a¬ã¨sÅÞ¿è¦^Âk
¨Ï¥Î SMOTE ¹L±Ä¼Ë
´ú¸Õ
¨Ï¥ÎÅÞ¿è¦^Âk¶i¦æ´ú¸Õ
¯«¸gºôµ¸´ú¸Õ¡]¤í±Ä¼Ë»P¹L±Ä¼Ë¡^
±q¤£¥¿Åªº¼Æ¾Ú¶°¤¤ªÈ¥¿¥H«eªº¿ù»~¡G
¥Ã»·¤£n¹ï¹L±Ä¼Ë©Î¤í±Ä¼Ëªº¼Æ¾Ú¶°¶i¦æ´ú¸Õ¡C
¦pªG§ÚÌ·Q¹ê²{¥æ¤eÅçÃÒ¡A½Ð°O¦í¦b¥æ¤eÅçÃÒ´Á¶¡¹ï°V½m¼Æ¾Ú¶i¦æ¹L±Ä¼Ë©Î¤í±Ä¼Ë¡A¦Ó¤£¬O¤§«e¡I
¤£n¨Ï¥Î·Ç½T©Ê¤À¼Æ§@¬°¼Æ¾Ú¶°¤£¥¿Åªº«ü¼Ð¡]³q±`·|«Ü°ª¥B¨ã¦³»~¾É©Ê¡^¡A¦Ó¬O¨Ï¥Î f1-score¡Bprecision/recall ¤À¼Æ©Î²V²c¯x°}
Search: NER Wiki
Named-entity recognition - Wikipedia
https://en.wikipedia.org › wiki › Named-...
½Ķ³oÓºô¶ https://en-m-wikipedia-org.translate.goog/wiki/Named-entity_recognition?_x_tr_sl=en&_x_tr_tl=zh-TW&_x_tr_hl=zh-TW&_x_tr_pto=sc
Search: ²§±`ÀË´ú wiki
²§±`ÀË´ú- ºû°ò¦Ê¬ì¡A¦Û¥Ñªº¦Ê¬ì¥þ®Ñ
https://zh.wikipedia.org › zh-tw › Éݱ`??
²§±`ÀË´ú[½s¿è] ... ¦b¸ê®Æ±´°É¤¤¡A²§±`ÀË´ú¡]^»y¡Ganomaly detection¡^¹ï¤£²Å¦X¹w´Á¼Ò¦¡©Î¸ê®Æ¶°¤¤¨ä¥L±M®×ªº±M®×¡B¨Æ¥ó©ÎÆ[´úȪº¿ëÃÑ¡C ... ³q±`²§±`±M®×·|ÂàÅܦ¨»È¦æ´Û¶B ...
Search: SMOTE ¹L±Ä¼Ë wiki
Oversampling and undersampling in data analysis - Wikipedia
https://en.wikipedia.org › wiki › Oversampling_and_unde…
Search: Confusion Matrix wiki
https://en.wikipedia.org › wiki › Confusi...
½Ķ³oÓºô¶ https://en-m-wikipedia-org.translate.goog/wiki/Confusion_matrix?_x_tr_sl=en&_x_tr_tl=zh-TW&_x_tr_hl=zh-TW&_x_tr_pto=sc
Search: ºë½T²v ¥l¦^²v wiki
https://zh-yue.wikipedia.org › wiki › ºë½T²v¦P¥l¦^²v
Search: F1 score wiki
F-score - ºû°ò¦Ê¬ì¡A¦Û¥Ñªº¦Ê¬ì¥þ®Ñ
https://zh.m.wikipedia.org › zh-tw › F-score
Search: ROC AUC wiki
ROC¦±½u- ºû°ò¦Ê¬ì¡A¦Û¥Ñªº¦Ê¬ì¥þ®Ñ - Wikipedia
https://zh.m.wikipedia.org › zh-tw › ROC¦±?
Search: bin_goods bin_bads
·±±«Ø¼Ò¨t¦C¤§¼Æ¾Ú¯S¼x¿z¿ï¤èªkÁ`µ²¡]¤W¡^ - ³üŪ
https://read01.com › ¬ì§Þ › ¬ì¾Ç
... ¥Î©óµû¦ô¯S¼xªº¹w´ú¯à¤O,IV¬O¦bwoeªº°ò¦¤Wpºâªº¡A¦b¶i¦æwoe½s½X«e¡A»Ýn¹ï¯S¼x°µ¤À½c³B²z(Â÷´²¤Æ)¡AµM«ápºâ¨CÓ½cÅ餺ªº¦n¤H¼Æ(bin_goods)©MÃa¤H¼Æ(bin_bads), ...