2007年8月15日

Mining Customer Value From Association Rules to Direct Marketing

對於傳統的行銷方法而言,企業在決定行銷對象時,大多無特定目標發送廣告訊息。但事實上願意回應的消費者總是很少,造成企業在行銷資源上的浪費及顧客的反感。所以近年來,「Direct marketing for profit optimization」成為企業最想應用data mining技術解決的問題之一,它同時也是ACM在1998年所舉辦的KDD-CUP競賽問題。

一般而言,企業必須在考慮每份廣告的行銷成本之下,找出潛在傾向回應的消費者,進而只針對該消費族群行銷,並使得企業獲得的利益最大化。然而,它通常會面臨「imbalanced data」及「inverse correlation」的挑戰。在一般的狀況下,付諸回應行動的消費者佔整體行銷名單的比例非常少(在KDD-CUP-98 dataset中只佔5%),而大部分的data mining演算法卻是偏好學習資料整體的行為規則,所以要學習在稀少類別的規則是較困難的。另一方面,愈是可能回應的消費者,通常對企業貢獻的利益則愈少,所以若只想對願意回應的消費者行銷,成效相當有限。

以往的研究工作大多先預估顧客願意回應的機率,然後在較可能回應的顧客名單之中,再找出貢獻較大的族群做為行銷對象。他們的缺失在於顧客的貢獻在先前預測回應機率的步驟被忽略,尤其是當inverse correlation現象存在的時候,潛在較大貢獻的族群更是越容易被剔除。所以本篇論文使用focus association rule一次學習顧客回應及其貢獻這兩種行為,作法分成三個步驟。

我們取50%作為learning set,50%作為validation set。再從learning set中取出70%作為building set,30%作為testing set for turning minsup, maxsup

1. Rule Generating

我們在回應名單的資料集中,尋找大於使用者自訂minimum support的frequent item(稱為focus item),而且此focus item的出現次數必須低於在非回應名單資料集中的maximum support。然後再從回應名單資料集中找由focus item組成的frequent itemset做為"respond rule"。因為我們只從回應名單資料集中找frequent itemset,所以能解決「imbalanced data」問題。除此之外,我們可設定較低的minimum support以避免刪除profitable rules,同時也可設定maximum support來決定所要刪除的focus item數量,以控制演算法的效能及資料維度(可解決high demensionality問題)

2. Model Building

計算每條respond rule在building set中的平均獲利,並決定規則Cover record的順序: Average Profit > Generality > Simplicity > Totality of order

3. Model Pruning

預估每條respond rule在預測的testing error (pessimistic estimation),進而計算出每條respond rule預估的期望獲利。然後把建立由respond rule組成的covering tree,以bottom-up的方式,刪除期望獲利較低且較specific的規則

本篇論文最後產生的平均獲利高於KDD-Cup-1998第一名41%,也高於2001年之前所發表的三篇論文,傳送的mail數量也是最低。檢視TOP 10的規則,我們發現高獲利的規則不一定擁有較高的support及confidence,代表這些規則不一定是經常出現或是能準確地判斷顧客是否respond。最後,作者認為結果會有這麼顯著的提升是因為association rule克服了inverse correlation,並且擁有global search及不用預測顧客回應機率的優點。

2007年8月14日

POLYPHONET : An Advanced Social Network Extraction System from the Web

本篇論文出自於WWW 2006,介紹的是一個人與人之間社群網路系統的建立,該系統是利用搜尋引擎,找出包含給定人名的網頁,並依擷取下來的網頁中人名的Co-occurrence,去計算人與人間的關聯度,進而建構出Social network。

論文當中,作者首先介紹一些在Social Network Extraction上過去常用的基本方法,並提及關於同名同姓的問題。接著,將人與人之間的關係分類成數種, 由於兩兩計算兩個人名之間的關聯度需要相當大的計算量、因此如何縮簡使用搜尋引擎query網頁的次數、也是重要的問題之一. 另外作者也提出以與人有關的word做為描述該人的metadata、以及利用metadata提供人與人間關聯度的另一種算法。

之後作者展示了POLYPHONET實際使用的介面與結果,且在最後提及Super Social Network Mining的想法,該想法與Social Network Mining最大的差異在於Super Social Network Mining具有自我修正的機制,能夠視情況適當分割或者是合併,並希望將來也能將這套機制整合至系統內。

WWW 2006 Edinburgh, Scotland. pp. 397 - 406.

2007年8月9日

Homepage live: automatic block tracing for web personalization

這篇論文觀察到personal homepage的趨勢,包括Microsoft和Google都有做personal homepage,並在homepage中讓人放置許多功能區塊。因此論文提出一個系統Homapge Live,可以讓使用者在這個網頁上建立個人首頁,並且加入小區塊。Homepage live的小區塊特色是可以讓使用者作網頁中的block tracing,也就是網頁區塊的monitering。這篇論文重點在於系統在目標網頁版本更新後,如何正確地抓取使用者所想看到的區塊。

首先將原網頁P_old和新網頁P_new表示成DOM Tree然後執行以下步驟:
  1. 找出Fix Nodes:Fix Node代表在P_old及P_new中內容重複的node。
  2. 產生Reduced Tree:刪除掉P_old及P_new中的Fix Nodes。
  3. Mapping:此時P_old和P_new都已經成為reduced tree,將P_old和P_new作tree mapping,找出P_new中對應到的目標區塊。
Tree Mapping作者用Minimum Edit Distance Mapping來作,找出P_new中對應到P_old中使用者選取的區塊。

實驗的部分作者測試系統的accuracy和effectiveness,和Direct Path Finding(DPF)、Tag String Matching(TSM)、Tree Edit Distance(TED)三個方法比較。在accuracy部分證明系統可以超過DPF和TSM,並和TED達到一樣的效果; 在effectiveness部分證明系統時間複雜度遠低於TED,並且證明了系統的scalability。

出處:Proceedings of the 16th international conference on World Wide Web (WWW2007)


2007年8月7日

研究計畫


計畫名稱

補助或委託機構

起訖年月

計畫內擔任的工作

行動關懷社會之建構與服務(3/3)

國科會(97-2627-E-008-001-)

2008/8/1 2009/7/31

共同主持人

線上拍賣網站中銷售策略的研究

國科會(97-2221-E-008-088-)

2008/8/1 2009/7/31

主持人

Web資訊整合服務系統開發之研究

國科會(96-2221-E-008-091-MY2)

2007/8/1 2009/7/31

主持人

行動關懷社會之建構與服務(2/3)

國科會(96-2627-E-008-001-)

2007/8/1 2008/7/31

共同主持人

行動關懷社會之建構與服務(1/3)

國科會(95-2627-E-008-002-)

2006/8/1 2007/7/31

共同主持人

資料視覺化與群聚分析之雙向研究

國科會(95-2221-E-008-076-)

2006/8/1 2007/7/31

主持人

時序資料庫中跨交易關聯規則探勘之研究

國科會(94-2213-E-008-020-)

2005/8/1 2006/7/31

主持人

資料串流之非同步週期型樣探勘研究

國科會(93-2213-E-008-023- )

2004/8/1 2005/7/31

主持人

從半結構文件至自由格式文件之資訊擷取方法之研究

國科會(92-2213-E-008-028- )

2003/8/1 2004/7/31

主持人

IEPAD多屬性網頁資訊自動擷取系統的設計與資訊整合系統之研發(2/2)

國科會(91-2213-E-008-030- )

2002/8/1 2003/7/31

主持人

網際網路半結構化文件自動萃取系統之商品化研究

國科會(91-2622-E-008-018-CC3)

2002/6/1 2003/5/31

主持人

IEPAD多屬性網頁資訊自動擷取系統的設計與資訊整合系統之研發(1/2)

國科會(90-2213-E-008-042- )

2001/8/1 2002/7/31

主持人

基因體探索: 結構及生物意義-子計畫二:應用資料探採於重複序列資料庫以探索基因體調控規則(I)

國科會(90-2213-E-008-053- )

2001/8/1 2002/7/31

主持人

半結構性資訊擷取規則之建構與分析

國科會(89-2213-E-008-056- )

2000/8/1 2001/7/31

主持人

論文發表 (updated 06/05/2010)

Journal Papers

  1. T.-K. Fan and C.-H. Chang: Sentiment Oriented Contexture Advertising. To appear in Journal of Knowledge and Information System.
  2. M. Kayed and C.-H. Chang: FiVaTech: Page-Level Web Data Extraction from Template Pages. IEEE Trans. Knowl. Data Eng. Vol. 22, No.2, pp. 249-263, 2010.
  3. T.-K. Fan and C.-H. Chang: Exploring Evolutionary Technical Trends From Academic Research Papers. J. Inf. Sci. Eng. Vol. 26, No. 1, pp. 99-117, 2010.
  4. K.-Y. Huang and C.-H. Chang. Efficient Mining of Frequent Episodes from Complex Sequences, Information Systems, Vol. 33. No. 1, pp. 96-114, 2008. doi:10.1016/j.is.2007.07.003
  5. K.-Y. Huang, C.-H. Chang, and Kuo-Zui Lin. Efficient Discovery of Frequent Continuities by Projected Window List Technology, Journal of Information Science, Vol. 24, No. 4, pp. 1041-1064, 2008.
  6. Y.C. Wu and C.-H. Chang. Efficient Text Chunking using Linear Kernel with Mask Method, Knowledge Based Systems, Vol. 20, Issue 3, pp. 209-219, 2007. doi:10.1016/j.knosys.2006.04.016
  7. C.-H. Chang, M. Kayed, M. R. Girgis, K. Shaalan, A Survey of Web Information Extraction Systems, IEEE TKDE (SCI, EI), Vol. 18, No. 10, pp. 1411-1428, Oct. 2006.
  8. K.-Y. Huang and C.-H. Chang, SMCA: A General Model for Mining Asynchronous Periodic Patterns in Temporal Databases, IEEE Transactions on Knowledge and Data Engineering (SCI, EI), Vol. 17, No. 6, pp. 774-785, June 2005.
  9. C.-H. Chang and Z.-K. Ding, Categorical Data Visualization and Clustering Using Subjective Factors, Data and Knowledge Engineering (SCI expanded), Vol. 53, Issue 3, pp. 243-262, June 2005. doi:10.1016/datak.2004.09.001
  10. C.-N. Hsu, C.-H. Chang, C.-H. Hsieh, J.-J. Lu, and C.-C. Chang, Reconfigurable Web Wrapper Agents for Biological Information Integration, JASIST (SCI expanded), Special Issue on Bioinformatics, Vol. 56, No. 5, pp. 505--517, March 2005.
  11. C.-H. Chang and S.-C. Kuo, OLERA: A semi-supervised approach for Web data extraction with visual support, IEEE Intelligent Systems (SCI, EI), Vol. 19, No. 6, pp. 56--64, Dec. 2004.
  12. C.-H. Chang, J.-J. Chiou, H. Siek, J.-J. Lu, and C.-N. Hsu, Reconfigurable Web Wrapper Agents, IEEE Intelligent Systems (SCI, EI), Vol. 18, No. 5, pp. 34--40, Oct. 2003.
  13. C.-H. Chang, C.-N. Hsu, and S.-C. Lui, Automatic Information Extraction From Semi-Structured Web Pages By Pattern Discovery, Decision Support Systems Journal (SCI expanded), Vol. 35, Issue 1, pp. 129--147, Apr. 2003. doi:10.1016/S0167-9236(02)00100-8
  14. C.-C. Hsu and C.-H. Chang, WebYacht: A Concept-based Search Tool for WWW , International Journal on Artificial Intelligence Tools, 2000.
  15. C.-H. Chang and C.-C. Hsu, Enabling concept-based relevance feedback on World Wide Web, In IEEE Transactions on Knowledge and Data Engineering (SCI), Special Issue on Web Technologies, Vol.11, No.4, pp. 595-609, July/August 1999.
  16. C.-H. Chang and C.-C. Hsu, Enabling Web Information Retrieval through Query Expansion via Contrast Analysis , Computer Networks and ISDN Systems, Vol. 30, pp.621-623, 1998.
  17. C.-H. Chang and C.-C. Hsu, Customizable Multi-Engine Search Tool based on Clustering , In Computer Networks and ISDN Systems, Vol. 29, pp.1217-1224, 1997.

Conference Papers

  1. Chia-Hui Chang and Shu-Ying Li, MapMarker: Extraction of Postal Addresses and Associated Information for General Web Pages", To appear in IEEE/WIC/ACM Web Intelligence, 2010.
  2. Teng-Kai Fan and Chia-Hui Chang: Blogger-centric contextual advertising. CIKM 2009: 1803-1806, 2009.
  3. C.-H. Chang and J.-H. Lin: Decision Support and Profit Prediction for Online Auction Sellers. The First ACM SIGKDD Workshop on Knowledge Discovery from Uncertain Data (U'09), pp. 1-8, 2009.
  4. T.-K. Fan and C.-H. Chang: Sentiment Oriented Contextual Advertising. Advances in Information Retrieval, 31th European Conference on IR Research, ECIR 2009, Toulouse, France, April 6-9, 2009. LNCS 5478 Springer 2009, pp. 202-215.
  5. Teng-Kai Fan and Chia-Hui Chang, Exploring Evolutionary Technical Trends From Research Papers. The Eighth IAPR Workshop on Document Analysis Systems, DAS2008 (poster session). Nara, Japan. Sep 16-19, 2008.
  6. C.-H. Chang, Shih-Feng Yang, Che-Min Liou, Mohammed Kayed. Gadget Creation for Personal Information Integration on Web Portals, IEEE International Conference on Information Reuse and Integration (short paper), Las Vegas, USA, 2008.
  7. Mohammed Kayed, Chia-Hui Chang, Khaled Shaalan, and Moheb Ramzy Girgis. FiVaTech: Page-Level Web Data Extraction from Template Pages, ICDM 2007, Workshop on Web2.0 Environment, Omaha, NE, USA. Oct. 28-31, 2007.
  8. C.-H. Chang and Kun-Chang Tsai. Aspect Summarization from Blogsphere for Social Study, ICDM 2007, Workshop on Web2.0 Environment, Omaha, NE, Oct. 28-31, 2007.
  9. K.-Y. Huan, C.-H. Chang, Jiun-Hung Tung, Cheng-Tao Ho. COBRA: Closed Sequential Pattern Mining Using Bi-phase Reduction Approanch, Accepted by DaWak 2006, Krakow, Poland.
  10. Y.-C. Wu, C.-H. Chang, Y.-S. Lee: A General and Multi-lingual Phrase Chunking Model Based on Masking Method. Proceedings of the 7th International Conference on Computational Linguistics and Intelligent Text Processing (CICLING 2006), Mexico City, Mexico, February 19-25, 2006. LNCS 3878, pp. 144--155.
  11. K.-Y. Huang and C.-H. Chang. Efficient Mining Strategy for Frequent Serial Episodes in Temporal Database, The 8th Asia Pacific Web Conference (short paper), Harbin, China, Jan. 16-18, 2006. LNCS 3841, pp. 824--829.
  12. K.-Y. Huang, C.-H. Chang, and Kuo-Zui Lin, ClosedPROWL: Efficient Mining of Closed Frequent Continuities by Projected Window List Technology, SIAM International Conference on Data Mining (short paper), CA, USA, Apr. 21-23, 2005.
  13. K.-Y. Huang, C.-H. Chang, and Kuo-Zui Lin, COCOA: An Efficient Algorithm for Mining Inter-transaction Associations for Temporal Database, In the Proceedings of the 8th European Conference on Principles and Practice of Knowledge Discovery in Databases (PKDD04 (poster), Pisa, Italy, 2004. LNAI 3202 (SCI expanded), pp. 509--511.
  14. C.-H. Chang and Z.-K. Ding, Categorical Data Visualization and Clustering using Subjective Factors, In the Proceedings of the 6th International Conference on Data Warehousing and Knowledge Discovery (DaWaK04), Zaragoza, Spain, 2004. LNCS 3181 (SCI expanded), pp. 229-238.
  15. K.-Y. Huang and C.-H. Chang, Mining Periodic Patterns in Sequence Data, In the Proceedings of the 6th International Conference on Data Warehousing and Knowledge Discovery (DaWaK04), Zaragoza, Spain, 2004. LNCS 3181 (SCI expanded), pp. 401-410.
  16. K.-Y. Huang, C.-H. Chang, and Kuo-Zui Lin, PROWL: An Efficient Frequent Continuity Mining Algorithm on Event Sequences, In the Proceedings of the 6th International Conference on Data Warehousing and Knowledge Discovery (DaWaK04), Zaragoza, Spain, 2004. LNCS 3181 (SCI expanded), pp. 351-360.
  17. C.-H. Chang and S.C. Kuo, OLERA: On-Line Extraction Rule Analysis for Semi-structured Documents, The IASTED International Conference on ARTIFICIAL INTELLIGENCE AND APPLICATIONS (AIA 2004). Feb. 16-18, 2004, Austria.
  18. K.-Y. Huang and C.-H. Chang, Asynchronous Periodic Pattern Mining from Multi-event Time Series Databases, The IASTED International Conference on DATABASES AND APPLICATIONS (DBA 2004), Feb. 17-19, 2004, Austria.
  19. C.-N. Hsu, C.-H. Chang, H. Siek, J.-J. Lu, J.-J. Chiou, Reconfigurable Web Wrapper Agents for Web Information Integration, IJCAI 2003 Workshop on Information Integration on the Web, IIWeb-03, Aug. 2003, pp. 15-20.
  20. C.-H. Chang and Shi-Hsan Yang, Enhancing SWF for Incremental Association Mining by Itemset Maintenance, In the Proceedings of the seventh Pacific-Asia Conference on Knowledge Discovery and Data Mining (PAKDD03), Korea, 2003. LNAI 2637 (SCI expanded), pp. 301-312.
  21. C.-H. Chang, Sequential Pattern Mining for Web Extraction Rule Generalization, The 6th World multiconference on Systemics, Cybernetics and Informatics, July 14-18, 2002, Orlando, Florida
  22. C.-H. Chang, S.-C. Kuo, K.-Y. Hwang, T.-H. Ho and C.-L. Lin, Automatic Information Extraction for Multiple Singular Web Pages, In the Proceedings of the sixth Pacific-Asia Conference on Knowledge Discovery and Data Mining (PAKDD02), Taiwan, 2002. LNAI 2336 (SCI expanded), pp. 297-303.
  23. C.-H. Chang. and S.-C. Lui. IEPAD: Information Extraction based on Pattern Discovery, In the Proceedings of the tenth International Conference on World Wide Web (WWW10), pp. 681-688, May 2-6, 2001, Hong Kong.
  24. C.-H. Chang., S.-C. Lui, and Y.-C. Wu. Applying Pattern Mining to Web Information Extraction, In the Proceedings of the fifth Pacific-Asia Conference on Knowledge Discovery and Data Mining (PAKDD01), Hong Kong, 2001. LNAI 2035 (SCI expanded), pp. 4-15.
  25. C.-H. Chang, S.-C. Lui, and Y.-C. Wu. Semi-structured Information Extraction Applying Automatic Pattern Discovery , In Proc. of the fourteenth International Computer Symposium (ICS2000), Chia-Yi, Taiwan, Dec. 6-8. 2000.
  26. C.-H. Chang, C.-C. Hsu and C.-L. Hou, Exploiting Hyperlinks for Automatic Information Discovery on the WWW , In Proc. of the tenth IEEE International Conference on Tools with Artificial Intelligence, (ICTAI98), Nov. 1998, Chien Tan Youth Activity Center, Taipei, Taiwan.
  27. C.-H. Chang and C.-C. Hsu, Hypertext Information Retrieval for Short Queries , In Proc. of the IEEE Knowledge and Data Engineering Exchange Workshop, Nov. 1998, Chien Tan Youth Activity Center, Taipei, Taiwan.
  28. C.-H. Chang and C.-C. Hsu, Enabling Web Information Retrieval through Query Expansion via Contrast Analysis , In Proc. of the seventh International Conference on World Wide Web (WWW7), Apr. 14-18, 1998, Brisbane, Queensland, Australia
  29. C.-H. Chang and C.-C. Hsu, Constructing Personal Information Search Agents , In Proc. of the second Pacific Asia Conference on Knowledge Discovery and Data Mining (PAKDD98), Melbourne, Australia, 1998. LNAI 1394 (SCI expanded), pp. 374-375.
  30. C.-H. Chang and C.-C. Hsu, A Multi-Engine Search Tool based on Clustering , In Proc. of the sixth international conference on World Wide Web, (WWW6), Apr.7-11, 1997, Santa Clara, CA
Domestic Conference

  1. Shu-Gang Han and Chia-Hui Chang. Cleaning of Auction Data for Bidding Decision, 2007 National Computer Symposium, National Computer Symposium (全國計算機會議), Asia University, Taiwan.
  2. 胡舒涵, 張嘉惠. 會議公告網站資訊擷取之研究, The 11th conference on Artiticial Intelligence and and Application (TAAI 2006).
  3. 劉仁宇, 張嘉惠. 以網頁識別及清理改善資料擷取之研究, The 11th conference on Artiticial Intelligence and and Application (TAAI 2006).
  4. 林千翔, 張嘉惠. 基於特製隱藏式馬可夫模型之中文斷詞研究. ROCLING XVIII: Conference on Computational Linguistics and Speech Processing, 2006.
  5. 朱育德, 張嘉惠. 基於字詞內容之適應性對話系統. ROCLING XVIII: Conference on Computational Linguistics and Speech Processing, 2006.
  6. 楊智宇, 吳毓傑, 張嘉惠. 問題答覆系統中使用語句分類排序方式之設計與研究. The 9th conference on Artiticial Intelligence and and Application (TAAI 2004).
  7. 李泓儒, 張嘉惠. Web Cleaning: Page Segmentation and Data-rich Section Mining, The 1st Workshop on Intelligent Web Technology, IWT2004, In conjuction with the 16th ROCLING Conference.
  8. C.-H. Chang, Information Extraction: A Pattern Mining Approach for Free-Form Text, 2003 The Joint Conference on AI, Fuzzy System, and Gray System, Taipei, Taiwan, Dec. 4--6, 2003.
  9. C.-H. Chang, D.-S. Wu. Information Extraction and Query Model Design for Taiwan Wild Bird Database, 2003 The Joint Conference on AI, Fuzzy System, and Gray System, Taipei, Taiwan, Dec. 4--6, 2003.
  10. C.-H. Chang and C.-N. Hsu, Automatic Extraction of Information Blocks Using PAT Trees , In Proc. of the National Computer Symposium, Dec. 20-21, 1999, Taipei, Taiwan
  11. C.-H. Chang and C.-C. Hsu, Infomation Searching and Exploring Applying Clustering and Genetic Algorithm , In Proc. of the first Agent Technology Workshop, Dec. 4, 1997, Taipei, Taiwan