´¹¤åºK

[¤H¤u´¼¼z] ¦ÛµM»y¨¥³B²zNLPÀ³¥Î¹ê§@

µ¹´¹·s»D¤@­ÓÆg



¦ÛµM»y¨¥³B²zNLPÀ³¥Î¹ê§@


§@ªÌ: ®L»F¼Ý

ªì½Z: 20220822








 


¦ÛµM»y¨¥³B²z(Natural language  processing)

¬ì§Þ> [¤H¤u´¼¼z] CNN¡A¼v¹³¤À°Ï¶ô»PRNN

http://cubicpower.idv.tw/cubicnotes/notes-0000038.html

Google Tensorflow: Text

Sentiment analysis- IMDB large movie review dataset

Basic text classification 

https://www.tensorflow.org/tutorials/keras/text_classification?hl=zh-tw

°ò¥»¤å¥»¤ÀÃþ

±¡ºü¤ÀªR

¤U¸ü¨Ã±´¯Á IMDB ¼Æ¾Ú¶°

¥[¸ü¼Æ¾Ú¶°

·Ç³Æ¼Æ¾Ú¶°¶i¦æ°V½m

°t¸m¼Æ¾Ú¶°¥H´£°ª©Ê¯à

³Ð«Ø¼Ò«¬

·l¥¢¨ç¼Æ©MÀu¤Æ¾¹

°V½m¼Ò«¬

µû¦ô¼Ò«¬

³Ð«ØÀH®É¶¡Åܤƪº·Ç½T«×©M·l¥¢¹Ï

¾É¥X¼Ò«¬

¹ï·s¼Æ¾Úªº±À½×

½m²ß¡GÃö©ó Stack Overflow °ÝÃDªº¦hÃþ¤ÀÃþ


Word embeddings

https://www.tensorflow.org/text/guide/word_embeddings?hl=zh-tw

µü´O¤J

±N¤å¥»ªí¥Ü¬°¼Æ¦r

One-hot ½s½X

¥Î°ß¤@ªº¼Æ¦r½s½X¨C­Ó³æµü

µü´O¤J

³]¸m

¤U¸ü IMDb ¼Æ¾Ú¶°

¨Ï¥Î´O¤J¼h

¤å¥»¹w³B²z

³Ð«Ø¤ÀÃþ¼Ò«¬

½sĶ©M°V½m¼Ò«¬

À˯Á¸g¹L°V½mªºµü´O¤J¨Ã±N¥¦­Ì«O¦s¨ìºÏºÐ

¥iµø¤Æ´O¤J


Text classification with an RNN 

https://www.tensorflow.org/text/tutorials/text_classification_rnn?hl=zh-tw

¨Ï¥Î RNN ¶i¦æ¤å¥»¤ÀÃþ

³]¸m

³]¸m¿é¤JºÞ¹D

³Ð«Ø¤å¥»½s½X¾¹

³Ð«Ø¼Ò«¬

°V½m¼Ò«¬

°ïÅ|¨â­Ó©Î¦h­Ó LSTM ¼h


Classify text with BERT

https://www.tensorflow.org/text/tutorials/classify_text_with_bert?hl=zh-tw

¨Ï¥Î BERT ¹ï¤å¥»¶i¦æ¤ÀÃþ

Ãö©ó BERT

±¡ºü¤ÀªR

±q TensorFlow Hub ¥[¸ü¼Ò«¬

¿ï¾Ü¤@­Ó BERT ¼Ò«¬¶i¦æ·L½Õ

¹w³B²z¼Ò«¬

¨Ï¥Î BERT ¼Ò«¬

©w¸q§Aªº¼Ò«¬

¼Ò«¬°V½m

·l¥¢¨ç¼Æ

Àu¤Æ¾¹

¥[¸ü BERT ¼Ò«¬¨Ã¶i¦æ°V½m

µû¦ô¼Ò«¬

ø»sÀH®É¶¡Åܤƪº·Ç½T©Ê©M·l¥¢

¾É¥X±À²z


 

 

Search: kaggle µn¤J

 

¨âºØ¨ú¥ÎKaggle ¸ê®Æ¶°ªº¤èªk - Nancy SW

https://nancysw.medium.com › ¨âºØ¨ú¥Î-kaggle-¸ê®Æ...

 



Search:  ²á¤Ñ¾÷¾¹¤H python


¦bPython ¤¤¹ê§@¹ï¸Ü«¬²á¤Ñ¾÷¾¹¤H - WANcatServer

https://wancat.cc › post › python-chatbot-context


Search:  chatbot python github

 

zake7749/Chatbot: °ò©ó¦V¶q¤Ç°tªº±¡¹Ò¦¡²á¤Ñ¾÷¾¹¤H - GitHub

https://github.com › zake7749 › Chatbot

Mianbot

¤Ç°t¥Ü¨Ò

Àô¹Ò»Ý¨D

¨Ï¥Î¤è¦¡

²á¤Ñ¾÷¾¹¤H

­pºâ¤Ç°t«×

³W«h®æ¦¡

°Ýµª´ú¸Õ¥Î¸ê®Æ¶°


 

 

Building a Simple Chatbot from Scratch in Python (using NLTK)

https://github.com › parulnith › Building-...

Building a Simple Chatbot from Scratch in Python (using NLTK)

NLP

Import necessary libraries

Downloading and installing NLTK

Installing NLTK Packages

Reading in the corpus

Tokenisation

Preprocessing

Keyword matching

Generating Response

Bag of Words

TF-IDF Approach

Cosine Similarity

 


Day 12¡G§Ö³t§¹¦¨¤@­Ó¡y¹ï¸Ü¾÷¾¹¤H¡z(ChatBot)

Day 13¡G§Ö³t§¹¦¨¤@­Ó¡y¹ï¸Ü¾÷¾¹¤H¡z(ChatBot) -- Äò




Search:  ¤Ï¬~¿ú ¼Ò«¬ python

 

¡i¥É¤sAI¹ê¨Ò3¡j¦Û«Ø¤Ï¬~¿ú¶Â¦W³æ°»´ú¼Ò«¬¡A§Ö³t´ª¥X°ÝÃD

https://www.ithome.com.tw › news

 

 

(Top 1% Solution)¥É¤s¤H¤u´¼¼z¤½¶}¬D¾ÔÁÉ2020®L©uÁÉ - Medium

https://medium.com › ¥É¤s¤H¤u´¼¼z¤½¶}¬D¾ÔÁÉ2020®L...

ª¦Âιê§@

¸ê®Æ²M²z

¼Ò«¬°V½m¬yµ{

¬~¿ú¤å³¹¤ÀÃþ¼Ò«¬(AML Classifier)

CountVectorizer+¾ë¯À¨©¸­´µ(Multinomial Naive Bayes)

TfidfVectorizer+¾ë¯À¨©¸­´µ(Multinomial Naive Bayes)

BERT-Based Model

Bidirectional Encoder Representations from Transformers (BERT)

NLP¶}·½®M¥ó — Kashgari

®M¥ó¦w¸Ë

BERT + BiLSTM + CRF

Conditional Random Fields (CRF)

¿é¤J¸ê®Æ®æ¦¡

¸ê®ÆÅçÃÒ¶°

¼Ò«¬°V½m(AML Classifier)

Rule-based Approach

AML Keyword ListÀË°Q»P­×¥¿

AMLµJÂI¤Hª«Â^¨ú¼Ò«¬(AML NER Model)

Name Entity Recognition

SOTA of Name Entity Recognition

BERT — ¥y¤lLevelªºNER¼Ò«¬

¼Ò«¬°V½m(NER Model)

¼Ò«¬¤ñ¸û

Bi-directional LSTM/GRU

Bi-directional BiLSTM/GRU + CRF

CNN + LSTM

Bi-directional LSTM + CNN (Customized)

 

 

Search:  ¨¾¶B´Û¼Ò«¬ python


¾aAI§êºtÃa¤H¨Ó½m§L¡ISAS´¦ÅS¥ÎGAN³]­pª÷¿Ä¨¾¶B´Û¼Ò«¬ªº·s ...

https://www.ithome.com.tw › news

 

 

¨BÆJ¤@¡Bºc«ØAmazon Fraud Detector ¼Ò«¬

https://pages.awscloud.com › Tech-blog_Amazon-Frau...

 

Search:  ¶B´Û python

 

python «H¥Î¥d´Û¶B¼Ò«¬«Ø¥ß - µ{¦¡¤H¥Í

https://www.796t.com › content

¸ê®Æ·Ç³Æ: ¨Ó·½©óKaggle

·Ç³Æ¨Ãªì¨BÀ˵ø¸ê®Æ¶°

®É¶¡§Ç¦C¤Uªº¥æ©öµo¥ÍÀW²v¡]¤À¬°¶BÄF©M¥¿±`¡^

¶BÄF©M¥¿±`¥æ©ö¥æ©öª÷ÃBªºÀW²v¤À§G

¦U¯S¼x©M¦]ÅܼƪºÃö«Y

¥ÎÅÞ¿è°jÂk¤èªk¹ï«H¥Î¥d¸ê®Æ¶i¦æ«Ø¼Ò¤ÀªR

 

 

«H¥Î¥d¶BÄF¤ÀªR-¤£¥­¿Å¸ê®Æ¤ÀªR»P³B²zkernel½Ķ-§¹¾ãª©

https://medium.com › ¾÷¾¹¾Ç²ßª¾ÃѾúµ{ › «H¥Î¥d¶BÄF...

¹w³B²z

ÁY©ñ©M¤À°t Scaling and Distributing

©î¤À¼Æ¾Ú Splitting the Data¡]±q­ì©lDataFrame¡^

ÀH¾÷¤í±Ä¼Ë©M¹L±Ä¼Ë

¤À§G©M¬ÛÃö©Ê Distributing and Correlating

²§±`ÀË´ú Anomaly Detection

­°ºû©M¤À¸s Dimensionality Reduction and Clustering (t-SNE)

¤ÀÃþ¾¹ Classifiers

§ó²`¤J¦a¤F¸ÑÅÞ¿è¦^Âk A Deeper Look into Logistic Regression

¨Ï¥ÎSMOTE¶i¦æ¹L±Ä¼Ë Oversampling with SMOTE

´ú¸Õ

¨Ï¥ÎÅÞ¿è¦^Âk¶i¦æ´ú¸Õ Test Data with Logistic Regression

¯«¸gºôµ¸´ú¸Õ¡]¤í±Ä¼Ë»P¹L±Ä¼Ë¡^Neural Networks Testing (Undersampling vs Oversampling)

 

 

Part 5. Imbalanced Data ¤£¥­¿Å¸ê®Æ - iT ¨¹À°¦£

https://ithelp.ithome.com.tw › articles

µû¦ô«ü¼Ð

Confusion Matrix ²V²c¯x°}

Precision and Recall ºë½T²v»P¥l¦^²v

F1 score

ROC(Receiver Operating Characteristic) ±µ¦¬ªÌ¾Þ§@¯S¼x¦±½u

¦±½u¤U­±¿nºÙ Area Under Curve (AUC)

 ­«²Õ¸ê®Æ

Oversampling ¹L±Ä¼Ë

SMOTE  (Synthetic Minority Oversampling Technique)

Border Line SMOTE

Undersampling ¤í±Ä¼Ë

Tomek Link

Edited Nearest Neighbor

ª`·N¨Æ¶µ

¥ý¤Á¤À¸ê®Æ¡A¦A¹ï°V½m¸ê®Æ±Ä¼Ë¡C

±`³z¹L¥æ¤eÅçÃÒ±±¨î¹LÀÀ¦X¡C

Æ[¹î¤Ö¼Æ¼Ë¥»»P¦h¼Æ¼Ë¥»¤À¥¬±¡§Î¡C



Search:  Credit Fraud || Dealing with Imbalanced Datasets

 

Credit Fraud || Dealing with Imbalanced Datasets - Kaggle

https://www.kaggle.com › janiobachmann

 

Credit Fraud || Dealing with Imbalanced Datasets

https://www.kaggle.com/janiobachmann/credit-fraud-dealing-with-imbalanced-datasets

 

https://www.kaggle.com/code/janiobachmann/credit-fraud-dealing-with-imbalanced-datasets/notebook

«H¥Î´Û¶B±´´ú¾¹

¤F¸Ñ§Ú­Ìªº¼Æ¾Ú

¦¬¶°§Ú­Ìªº¼Æ¾Úªº·Pı

¹w³B²z

ÁY©ñ©M¤À§G

©î¤À¼Æ¾Ú

ÀH¾÷¤í±Ä¼Ë©M¹L±Ä¼Ë

¤À§G©MÃöÁp

²§±`ÀË´ú

­°ºû©M»EÃþ¡]t-SNE¡^

¤ÀÃþ¾¹

§ó²`¤J¦a¬ã¨sÅÞ¿è¦^Âk

¨Ï¥Î SMOTE ¹L±Ä¼Ë

´ú¸Õ

¨Ï¥ÎÅÞ¿è¦^Âk¶i¦æ´ú¸Õ

¯«¸gºôµ¸´ú¸Õ¡]¤í±Ä¼Ë»P¹L±Ä¼Ë¡^

 

±q¤£¥­¿Åªº¼Æ¾Ú¶°¤¤ªÈ¥¿¥H«eªº¿ù»~¡G

¥Ã»·¤£­n¹ï¹L±Ä¼Ë©Î¤í±Ä¼Ëªº¼Æ¾Ú¶°¶i¦æ´ú¸Õ¡C

¦pªG§Ú­Ì·Q¹ê²{¥æ¤eÅçÃÒ¡A½Ð°O¦í¦b¥æ¤eÅçÃÒ´Á¶¡¹ï°V½m¼Æ¾Ú¶i¦æ¹L±Ä¼Ë©Î¤í±Ä¼Ë¡A¦Ó¤£¬O¤§«e¡I

¤£­n¨Ï¥Î·Ç½T©Ê¤À¼Æ§@¬°¼Æ¾Ú¶°¤£¥­¿Åªº«ü¼Ð¡]³q±`·|«Ü°ª¥B¨ã¦³»~¾É©Ê¡^¡A¦Ó¬O¨Ï¥Î f1-score¡Bprecision/recall ¤À¼Æ©Î²V²c¯x°}

 

 








Search:  NER Wiki

Named-entity recognition - Wikipedia

https://en.wikipedia.org › wiki › Named-...

½Ķ³o­Óºô­¶ https://en-m-wikipedia-org.translate.goog/wiki/Named-entity_recognition?_x_tr_sl=en&_x_tr_tl=zh-TW&_x_tr_hl=zh-TW&_x_tr_pto=sc

 

 

Search:  ²§±`ÀË´ú wiki

²§±`ÀË´ú- ºû°ò¦Ê¬ì¡A¦Û¥Ñªº¦Ê¬ì¥þ®Ñ

https://zh.wikipedia.org › zh-tw › Éݱ`??

²§±`ÀË´ú[½s¿è] ... ¦b¸ê®Æ±´°É¤¤¡A²§±`ÀË´ú¡]­^»y¡Ganomaly detection¡^¹ï¤£²Å¦X¹w´Á¼Ò¦¡©Î¸ê®Æ¶°¤¤¨ä¥L±M®×ªº±M®×¡B¨Æ¥ó©ÎÆ[´ú­Èªº¿ëÃÑ¡C ... ³q±`²§±`±M®×·|ÂàÅܦ¨»È¦æ´Û¶B ...

 


Search:  SMOTE ¹L±Ä¼Ë wiki

Oversampling and undersampling in data analysis - Wikipedia

https://en.wikipedia.org › wiki › Oversampling_and_unde…


Search:  Confusion Matrix wiki

Confusion matrix - Wikipedia

https://en.wikipedia.org › wiki › Confusi...

½Ķ³o­Óºô­¶ https://en-m-wikipedia-org.translate.goog/wiki/Confusion_matrix?_x_tr_sl=en&_x_tr_tl=zh-TW&_x_tr_hl=zh-TW&_x_tr_pto=sc

Search:  ºë½T²v ¥l¦^²v wiki

ºë½T²v¦P¥l¦^²v - ºû°ò¦Ê¬ì

https://zh-yue.wikipedia.org › wiki › ºë½T²v¦P¥l¦^²v

 

Search:  F1 score wiki

F-score - ºû°ò¦Ê¬ì¡A¦Û¥Ñªº¦Ê¬ì¥þ®Ñ

https://zh.m.wikipedia.org › zh-tw › F-score

Search:  ROC AUC wiki

ROC¦±½u- ºû°ò¦Ê¬ì¡A¦Û¥Ñªº¦Ê¬ì¥þ®Ñ - Wikipedia

https://zh.m.wikipedia.org › zh-tw › ROC¦±?

Search:  bin_goods bin_bads

­·±±«Ø¼Ò¨t¦C¤§¼Æ¾Ú¯S¼x¿z¿ï¤èªkÁ`µ²¡]¤W¡^ - ³üŪ

https://read01.com › ¬ì§Þ › ¬ì¾Ç

... ¥Î©óµû¦ô¯S¼xªº¹w´ú¯à¤O,IV¬O¦bwoeªº°ò¦¤W­pºâªº¡A¦b¶i¦æwoe½s½X«e¡A»Ý­n¹ï¯S¼x°µ¤À½c³B²z(Â÷´²¤Æ)¡AµM«á­pºâ¨C­Ó½cÅ餺ªº¦n¤H¼Æ(bin_goods)©MÃa¤H¼Æ(bin_bads), ...