Contextual text summarization for content processing in mobile learning

(1)

Publications of the University of Eastern Finland Dissertations in Forestry and Natural Sciences No 150

Publications of the University of Eastern Finland Dissertations in Forestry and Natural Sciences

isbn: 978-952-61-1544-3 (printed) issnl: 1798-5668

issn: 1798-5668 isbn: 978-952-61-1545-0 (pdf)

issnl: 1798-5668

Guangbing Yang

Contextual Text Summarization for Content Processing in

Mobile Learning

This book provides a study of contextual text summarization for

content processing in mobile learning.

The problem of the effectiveness of automatic text summarization in mobile learning settings is

addressed, and a Bayesian contextual topic model has been proposed to indicate how the contextual information can impact the performance of summarization significantly. A prototype summarization system has been designed and developed in the mobile learning domain.

tations | 150 | Guangbing Yang | Contextual Text Summarization for Content Processing in Mobile Learning

Guangbing Yang

Contextual Text

Summarization for Content

Processing in Mobile

Learning

(2)

Contextual Text Summarization for Content Processing in

Mobile Learning

Publications of the University of Eastern Finland Dissertations in Forestry and Natural Sciences

No 150

Academic Dissertation

To be presented by permission of the Faculty of Science and Forestry for public examination in the Auditorium AG100, Agora building at the University of

Eastern Finland, Joensuu, on October 10, 2014, at 12 o’clock.

School of Computing

(3)

Grano Oy Joensuu, 2014

Editors: Prof. Pertti Pasanen, Prof. Pekka Kilpeläinen, Prof. Kai Peiponen, and Prof. Matti Vornanen

Distribution:

University of Eastern Finland Library / Sales of publications P.O. Box 107, FI-80101 Joensuu, Finland

tel. +358-50-3058396 http://www.uef.fi/kirjasto

ISBN: 978-952-61-1544-3 (printed) ISSNL: 1798-5668

ISSN: 1798-5668 ISBN: 978-952-61-1545-0 (pdf)

ISSNL: 1798-5668 ISSN: 1798-5676 (pdf)

P.O.Box 111

80101 JOENSUU, FINLAND email: yguang@cs.joensuu.fi Supervisors: Professor Erkki Sutinen, Ph.D.

University of Eastern Finland School of Computing

P.O.Box 111

80101 JOENSUU, FINLAND email: erkki.sutinen@cs.joensuu.fi Professor Kinshuk, Ph.D.

Athabasca University

School of Computing and Information Systems 1 University Drive, Athabasca

Alberta T9S 3A3 CANADA email: kinshuk@athabascau.ca

Associate Professor Dunwei Wen, Ph.D. Athabasca University

Alberta T9S 3A3 CANADA email: dunweiw@athabascau.ca Reviewers: Professor Ronghuai Huang, Ph.D.

Beijing Normal University Faculty of Education

Yanbo Building, Beijing Normal University, 19 Xiejiekouwai Street, Haidian District, Beijing 100875, P. R. CHINA

email: huangrh@bnu.edu.cn

Assistant Professor Jorge Villalón, Ph.D. Universidad Adolfo Ibáñez

School of Engineering and Sciences

Avenida Diagonal Las Torres 2640, Peñalolén. Santiago, Chile

email: jorge.villalon@uai.cl Opponent: Professor Marcos Katz, Ph.D.

University of Oulu

Department of Communications Engineering Erkki Koiso-Kanttilan katu 3, Room TS403

(4)

Grano Oy Joensuu, 2014

Editors: Prof. Pertti Pasanen, Prof. Pekka Kilpeläinen, Prof. Kai Peiponen, and Prof. Matti Vornanen

Distribution:

University of Eastern Finland Library / Sales of publications P.O. Box 107, FI-80101 Joensuu, Finland

tel. +358-50-3058396 http://www.uef.fi/kirjasto

ISBN: 978-952-61-1544-3 (printed) ISSNL: 1798-5668

ISSN: 1798-5668 ISBN: 978-952-61-1545-0 (pdf)

ISSNL: 1798-5668 ISSN: 1798-5676 (pdf)

P.O.Box 111

80101 JOENSUU, FINLAND email: yguang@cs.joensuu.fi Supervisors: Professor Erkki Sutinen, Ph.D.

University of Eastern Finland School of Computing

P.O.Box 111

80101 JOENSUU, FINLAND email: erkki.sutinen@cs.joensuu.fi Professor Kinshuk, Ph.D.

Alberta T9S 3A3 CANADA email: kinshuk@athabascau.ca

Associate Professor Dunwei Wen, Ph.D.

Alberta T9S 3A3 CANADA email: dunweiw@athabascau.ca Reviewers: Professor Ronghuai Huang, Ph.D.

Beijing Normal University Faculty of Education

Yanbo Building, Beijing Normal University, 19 Xiejiekouwai Street, Haidian District, Beijing 100875, P. R. CHINA

email: huangrh@bnu.edu.cn

Assistant Professor Jorge Villalón, Ph.D.

Universidad Adolfo Ibáñez

School of Engineering and Sciences

Avenida Diagonal Las Torres 2640, Peñalolén.

Santiago, Chile

email: jorge.villalon@uai.cl Opponent: Professor Marcos Katz, Ph.D.

University of Oulu

Department of Communications Engineering Erkki Koiso-Kanttilan katu 3, Room TS403

(5)

Mobile learning benefits from rapidly growing mobile technologies, but mobile content adaptation and development for mobile devices is still a significant problem. Due to the costs of data download with mobile devices and the problem of the oft-decried information overload and information redundancy, delivering a large amount of text contents makes it challenging for mobile learners, especially for learning purposes. Therefore, seeking out just the right amount of learning content is important for mobile learners to keep a balance between the cost and learning achievements. Although many techniques have been developed for mobile learning to process learning contents, few of them can produce just the right amount of content.

In the research reported in this thesis, the problem of the effectiveness of automatic text summarization in mobile learning settings and the problems of automatic text summarization in Bayesian based topic modeling and natural language processing (NLP) are studied. As a result, a contextual text summarization approach has been addressed to indicate how the contextual information, such as word co-occurrence (or word ordering called in some literatures of statistical language modeling) in a sentence of a document, re- latedness of topics discovered from documents, learner’s interests and preferences to the learning content, etc., can impact the performance of summarization significantly. In addition, a prototype summarization system that implemented our proposed approach has been designed and developed for validating the results of this research in mobile learning domain.

The experimental results have demonstrated that our approach is able to deal with the problems mentioned above and enhance performance of summarization significantly. The findings of this work indicate that properly summarized learning contents are able to satisfy appropriate levels of learning achievements, and align content size with the unique characteristics of mobile devices as well.

Both research of mobile learning and automatic text summa-

content processing in mobile learning. Furthermore, the success of the application of automatic text summarization in mobile learning settings has engaged in the research of text summarization.

Universal Decimal Classification: 004.78, 004.91, 004.93, 37.091.33 Library of Congress Subject Headings: Text processing (Computer sci- ence); Content analysis (Communication); Context-aware computing; Nat- ural language processing (Computer science); Computational linguistics;

Automatic abstracting; Mobile communication systems in education; Mo- bile computing; Computer-assisted instruction

Yleinen suomalainen asiasanasto: tekstianalyysi; sisällönanalyysi;

sisällönhallinta; tiivistelmät; konteksti; kontekstuaalisuus; kieliteknologia;

opetusteknologia; tietokoneavusteinen opetus; digitaalinen oppimateriaali;

verkko-oppimateriaali; mobiilisovellukset; mobiililaitteet

(6)

Mobile learning benefits from rapidly growing mobile technologies, but mobile content adaptation and development for mobile devices is still a significant problem. Due to the costs of data download with mobile devices and the problem of the oft-decried information overload and information redundancy, delivering a large amount of text contents makes it challenging for mobile learners, especially for learning purposes. Therefore, seeking out just the right amount of learning content is important for mobile learners to keep a balance between the cost and learning achievements. Although many techniques have been developed for mobile learning to process learning contents, few of them can produce just the right amount of content.

In the research reported in this thesis, the problem of the effectiveness of automatic text summarization in mobile learning settings and the problems of automatic text summarization in Bayesian based topic modeling and natural language processing (NLP) are studied. As a result, a contextual text summarization approach has been addressed to indicate how the contextual information, such as word co-occurrence (or word ordering called in some literatures of statistical language modeling) in a sentence of a document, re- latedness of topics discovered from documents, learner’s interests and preferences to the learning content, etc., can impact the performance of summarization significantly. In addition, a prototype summarization system that implemented our proposed approach has been designed and developed for validating the results of this research in mobile learning domain.

The experimental results have demonstrated that our approach is able to deal with the problems mentioned above and enhance performance of summarization significantly. The findings of this work indicate that properly summarized learning contents are able to satisfy appropriate levels of learning achievements, and align content size with the unique characteristics of mobile devices as well.

Both research of mobile learning and automatic text summa-

content processing in mobile learning. Furthermore, the success of the application of automatic text summarization in mobile learning settings has engaged in the research of text summarization.

Universal Decimal Classification: 004.78, 004.91, 004.93, 37.091.33 Library of Congress Subject Headings: Text processing (Computer sci- ence); Content analysis (Communication); Context-aware computing; Nat- ural language processing (Computer science); Computational linguistics;

Automatic abstracting; Mobile communication systems in education; Mo- bile computing; Computer-assisted instruction

Yleinen suomalainen asiasanasto: tekstianalyysi; sisällönanalyysi;

sisällönhallinta; tiivistelmät; konteksti; kontekstuaalisuus; kieliteknologia;

opetusteknologia; tietokoneavusteinen opetus; digitaalinen oppimateriaali;

verkko-oppimateriaali; mobiilisovellukset; mobiililaitteet

(7)

Preface

This Ph.D thesis contains the result of a culmination of research and learning that has taken place over a period of almost five years. The research reported in this thesis was realized within the framework of the Program: IMPDET, funded by School of Computing, Univer- sity of Eastern Finland. Firstly, I would like to thank Prof. Kinshuk, who led me to academy and introduced me to Prof. Erkki Sutinen and Associate Prof. Dunwei Wen. As my supervisor, Prof. Kin- shuk provided lots of support and gave direction on my research.

Thank you so much for always being there for me and many appre- ciations for your understanding and emotional support during my PhD study.

I would also like to thank my other two supervisors: Prof. Erkki Sutinen and Associate Prof. Dunwei Wen, especially Prof. Erkki Su- tinen, he gave excellent advice on writing and revising this disser- ation. Without their continuous support and advice on my work, I would have never reached the point of finishing my dissertation.

Special thanks also go to Prof. Nian-Shing Chen and Prof. Terry Anderson for their advice and suggestions on my work.

I would also like to thank my thesis preliminary examiners:

Prof. Ronghuai Huang and Assistant Prof. Jorge Villalón for their time and useful comments and Prof. Marcos Katz for acting as my opponent. I would also like to thank Dr. Jarkko Suhonen and other research fellows and colleagues in School of Computing of Univer- sity of Eastern Finland and NSERC/iCORE lab of Athabasca Uni- versity for many useful discussions, comments and suggestions.

At last, many thanks and love to my family for their encourage- ment, support, patience, and love.

Joensuu, September 8th, 2014 Guangbing Yang

(8)

Preface

This Ph.D thesis contains the result of a culmination of research and learning that has taken place over a period of almost five years. The research reported in this thesis was realized within the framework of the Program: IMPDET, funded by School of Computing, Univer- sity of Eastern Finland. Firstly, I would like to thank Prof. Kinshuk, who led me to academy and introduced me to Prof. Erkki Sutinen and Associate Prof. Dunwei Wen. As my supervisor, Prof. Kin- shuk provided lots of support and gave direction on my research.

Thank you so much for always being there for me and many appre- ciations for your understanding and emotional support during my PhD study.

I would also like to thank my other two supervisors: Prof. Erkki Sutinen and Associate Prof. Dunwei Wen, especially Prof. Erkki Su- tinen, he gave excellent advice on writing and revising this disser- ation. Without their continuous support and advice on my work, I would have never reached the point of finishing my dissertation.

Special thanks also go to Prof. Nian-Shing Chen and Prof. Terry Anderson for their advice and suggestions on my work.

I would also like to thank my thesis preliminary examiners:

Prof. Ronghuai Huang and Assistant Prof. Jorge Villalón for their time and useful comments and Prof. Marcos Katz for acting as my opponent. I would also like to thank Dr. Jarkko Suhonen and other research fellows and colleagues in School of Computing of Univer- sity of Eastern Finland and NSERC/iCORE lab of Athabasca Uni- versity for many useful discussions, comments and suggestions.

At last, many thanks and love to my family for their encourage- ment, support, patience, and love.

Joensuu, September 8th, 2014 Guangbing Yang

(9)

• CTM - Contextual Topic Model or Correlated Topic Model

• DUC - Document Understand Conference

• GEM - Griffiths, Engen and McCloskey distribution

• HMM-LDA - Hidden Markov model with LDA

• HDP - Hierarchical Dirichlet process

• HTCSM - Hierarchical Topic Coherence Sentence Model

• IR - Information Retrieval

• KL - Kullback-Lieber

• LDA - Latent Dirichlet Allocation

• LMS - Learning Management System

• MCMC - Markov Chain Monte Carlo

• MMR - Maximal Marginal Relevance

• NLP - Natural Language Processing

• QL - Query Likelihood Language Model

• ROUGE - Recall-Oriented Understudy for Gisting Evaluation

• TAC - Text Analysis Conference

• TREC - Text REtrieval Conference

• TTM - Two Tired Topic Model

• hLDA - the hierarchical Latent Dirichlet Allocation model

• m-SCORM - mobile Sharable Content Object Reference Model

• nCRP - Nested Chinese Restaurant Process

• pLSA - probabilistic Latent Semantic Analysis

D - number of documents Q - query variable

Ct - contextual topic model R - relevance language model

V - number of unique words (vocabulary size) L - maximum level number

w - a string of words (a word, a sentence, a document, or a corpus) N_d- number of word tokens in document d

Ns- number of word tokens in sentence s

z_d,i - the topic associated withi^th token in document d

x_d,i - the bigram status between the (i−¹)^th token and i^th token in document d

w_d,i - thei^th token in document d

θ_d - the multinomial distribution of topics w.r.t document d φ- the multinomial distribution of words w.r.t topic z

ψv - the binomial (Bernoulli) distribution of status variable w.r.t previous word

c_d- the path allocation w.r.t document d α- Dirichlet prior ofθ

β- Dirichlet prior ofφ

γ- nCRP parameter for path allocation δ- Dirichlet prior ofψ

m,π- parameters of GEM

(10)

• CTM - Contextual Topic Model or Correlated Topic Model

• DUC - Document Understand Conference

• GEM - Griffiths, Engen and McCloskey distribution

• HMM-LDA - Hidden Markov model with LDA

• HDP - Hierarchical Dirichlet process

• HTCSM - Hierarchical Topic Coherence Sentence Model

• IR - Information Retrieval

• KL - Kullback-Lieber

• LDA - Latent Dirichlet Allocation

• LMS - Learning Management System

• MCMC - Markov Chain Monte Corel

• MMR - Maximal Marginal Relevance

• NLP - Natural Language Processing

• QL - Query Likelihood Language Model

• ROUGE - Recall-Oriented Understudy for Gisting Evaluation

• TAC - Text Analysis Conference

• TREC - Text REtrieval Conference

• TTM - Two Tired Topic Model

• hLDA - the hierarchical Latent Dirichlet Allocation model

• m-SCORM - mobile Sharable Content Object Reference Model

• nCRP - Nested Chinese Restaurant Process

• pLSA - probabilistic Latent Semantic Analysis

D - number of documents Q - query variable

Ct - contextual topic model R - relevance language model

V - number of unique words (vocabulary size) L - maximum level number

w - a string of words (a word, a sentence, a document, or a corpus) N_d - number of word tokens in document d

Ns- number of word tokens in sentence s

z_d,i - the topic associated withi^th token in document d

x_d,i - the bigram status between the (i−¹)^th token and i^th token in document d

w_d,i - thei^th token in document d

θ_d - the multinomial distribution of topics w.r.t document d φ- the multinomial distribution of words w.r.t topic z

ψv - the binomial (Bernoulli) distribution of status variable w.r.t previous word

c_d - the path allocation w.r.t document d α- Dirichlet prior ofθ

β- Dirichlet prior ofφ

γ- nCRP parameter for path allocation δ - Dirichlet prior ofψ

m,π- parameters of GEM

(11)

This thesis is based on data presented in the following articles, referred to by the Roman numerals I-VII.

I Yang, G., Chen, N. S., Kinshuk, Sutinen, E., Anderson, T.,

& Wen, D., “The Effectiveness of Automatic Text Summa- rization in Mobile Learning Contexts,”Computers & Education 68(2013),233-243.

II Yang, G., Kinshuk, Sutinen, E., & Wen, D. (2012a, July 4-6),

“Chunking and extracting text content for mobile learning:

A query-focused summarizer based on relevance language model,“ InProceedings of the 12th IEEE International Conference on Advanced Learning Technologies (ICALT 2012).doi: 10.1109/I- CALT.2012.29.

III Yang, G., Wen, D., Kinshuk, Chen, N., & Sutinen, E. (2012b, July 18-20), “Personalized Text Content Summarizer for Mo- bile Learning: An Automatic Text Summarization System with Relevance Based Language Model,” InProceedings of the IEEE Fourth International Conference on Technology for Education. doi:

10.1109/T4E.2012.23.

IV Yang, G., Kinshuk, Wen, D., Sutinen, E. (2013 December 5- 8). “A Contextual Query Expansion Based Multi-document Summarizer for Smart Learning,” In Proceedings of IEEE 9th International Conference on Signal-Image Technology & Internet- Based Systems. doi: 10.1109/SITIS.2013.163. ISBN: 978-1-4799- 3211-5/13 2013 IEEE.

V Yang, G., Wen, D., Kinshuk, Chen, N. S., Sutinen, E, “A Novel Contextual Topic Model for Multi-document Summarization,”

submitted to journal Expert Systems with Applications. (Ac- cepted at Sept 10, 2014).

VI Yang, G., Kinshuk, Wen, D., Sutinen, E, “Enhancing Sentence Ordering by Hierarchical Topic Modeling for Multi-document

367-379). Springer-Verlag.

VII Wen, D., Gao, Y., Yang, G. “Semantic Analysis Enhanced Nat- ural Language Interaction In Ubiquitous Learning.” In Kin- shuk et al Eds. Ubiquitous Learning Environment and Technology, Springer’s Lecture Notes in Educational Technology series. ISBN:

978-3-662-44658-4. Springer-Verlag.

In this thesis, these papers will be referred to as (P1) to (P7), which correspond to the Roman numerals I-VII. The papers have been included in this thesis with permission of their copyright holders.

(12)

This thesis is based on data presented in the following articles, referred to by the Roman numerals I-VII.

I Yang, G., Chen, N. S., Kinshuk, Sutinen, E., Anderson, T.,

& Wen, D., “The Effectiveness of Automatic Text Summa- rization in Mobile Learning Contexts,”Computers & Education 68(2013),233-243.

II Yang, G., Kinshuk, Sutinen, E., & Wen, D. (2012a, July 4-6),

“Chunking and extracting text content for mobile learning:

A query-focused summarizer based on relevance language model,“ InProceedings of the 12th IEEE International Conference on Advanced Learning Technologies (ICALT 2012). doi: 10.1109/I- CALT.2012.29.

III Yang, G., Wen, D., Kinshuk, Chen, N., & Sutinen, E. (2012b, July 18-20), “Personalized Text Content Summarizer for Mo- bile Learning: An Automatic Text Summarization System with Relevance Based Language Model,” InProceedings of the IEEE Fourth International Conference on Technology for Education. doi:

10.1109/T4E.2012.23.

IV Yang, G., Kinshuk, Wen, D., Sutinen, E. (2013 December 5- 8). “A Contextual Query Expansion Based Multi-document Summarizer for Smart Learning,” In Proceedings of IEEE 9th International Conference on Signal-Image Technology & Internet- Based Systems. doi: 10.1109/SITIS.2013.163. ISBN: 978-1-4799- 3211-5/13 2013 IEEE.

V Yang, G., Wen, D., Kinshuk, Chen, N. S., Sutinen, E, “A Novel Contextual Topic Model for Multi-document Summarization,”

submitted to journal Expert Systems with Applications. (Ac- cepted at Sept 10, 2014).

VI Yang, G., Kinshuk, Wen, D., Sutinen, E, “Enhancing Sentence Ordering by Hierarchical Topic Modeling for Multi-document

367-379). Springer-Verlag.

VII Wen, D., Gao, Y., Yang, G. “Semantic Analysis Enhanced Nat- ural Language Interaction In Ubiquitous Learning.” In Kin- shuk et al Eds. Ubiquitous Learning Environment and Technology, Springer’s Lecture Notes in Educational Technology series. ISBN:

978-3-662-44658-4. Springer-Verlag.

In this thesis, these papers will be referred to as (P1) to (P7), which correspond to the Roman numerals I-VII. The papers have been included in this thesis with permission of their copyright holders.

(13)

In (P1), Prof. Kinshuk gave direction on the content processing of mobile learning, Prof. Chen advised on the journal paper writing, Prof. Sutinen advised the author to finalize the research problem defined in (P1), and Prof. Anderson edited and reviewed the paper, while the author determined the research questions, proposed the methods for solving the problem, designed and conducted the experiments, and then performed the result analysis under the guid- ance of Prof. Chen.

The idea, implementations, and experiments in (P2, P3, P4) originated from the author, while the direction on Latent Dirichlet Allo- cation topic model was given by Prof. Wen, and the idea of personalized summary in (P3) and the idea of implementing the text summarization into smart learning in (P4) originated from Prof. Kin- shuk. The author established the way of addressing the proposed approach in Bayesian topic modeling to multi-document summarization within the context of mobile learning.

In (P5), the idea of employing topic n-grams model originated from the author, while Prof. Chen and Prof. Kinshuk advised on the paper writing, and the author made the main development and implementation, and then performed the experiment.

In (P6), the idea of employing the hierarchical Bayesian topic model to the problem of sentence ordering originated from the author, while Prof. Wen advised the implementation of the EM algo- rithm for hyper-parameter inference and Prof. Kinshuk advised on the paper writing. The author determined the research questions, proposed the methods for solving the problem, designed and conducted the experiments, and then performed the result analysis.

In (P1-P6), the author was responsible for most of the writing and author’s supervisors were responsible for most of editing and revisions. In (P7), the author was responsible for writing several paragraphs regarding the automatic text summarization.

In (P1), Prof. Kinshuk gave direction on the content processing of mobile learning, Prof. Chen advised on the journal paper writing, Prof. Sutinen advised the author to finalize the research problem defined in (P1), and Prof. Anderson edited and reviewed the paper, while the author determined the research questions, proposed the methods for solving the problem, designed and conducted the experiments, and then performed the result analysis under the guid- ance of Prof. Chen.

The idea, implementations, and experiments in (P2, P3, P4) originated from the author, while the direction on Latent Dirichlet Allo- cation topic model was given by Prof. Wen, and the idea of personalized summary in (P3) and the idea of implementing the text summarization into smart learning in (P4) originated from Prof. Kin- shuk. The author established the way of addressing the proposed approach in Bayesian topic modeling to multi-document summarization within the context of mobile learning.

In (P5), the idea of employing topic n-grams model originated from the author, while Prof. Chen and Prof. Kinshuk advised on the paper writing, and the author made the main development and implementation, and then performed the experiment.

In (P6), the idea of employing the hierarchical Bayesian topic model to the problem of sentence ordering originated from the author, while Prof. Wen advised the implementation of the EM algo- rithm for hyper-parameter inference and Prof. Kinshuk advised on the paper writing. The author determined the research questions, proposed the methods for solving the problem, designed and conducted the experiments, and then performed the result analysis.

In (P1-P6), the author was responsible for most of the writing and author’s supervisors were responsible for most of editing and revisions. In (P7), the author was responsible for writing several paragraphs regarding the automatic text summarization.

1 Introduction

1.1 RESEARCH BACKGROUND AND MOTIVATION

Mobile devices are rapidly becoming mainstream in today’s world because of their immediacy, variety and cost effectiveness. These unique characteristics of mobile devices and the swift evolution of mobile technologies might be the most important drivers for mobile learning. Consequently, mobile learning benefits from these unique merits and this fast evolvement, and grows up rapidly to be a major trend in today’s e-Learning. However, this rapid evolution also gives rise to new challenges. Some particular challenges emphasize on the processing and delivery of the learning content.

From a learner’s perspective, the learning content, rather than the technology, is always the key element, even though the technology has improved the mobile learning significantly. Nowadays, most of

’smart’ mobile devices have terrific capability of processing multi- media, but mobile content adaptation and development for mobile devices is still a significant problem. Plain text is still the main content format used in mobile learning settings [1]. However, delivering a text of several thousand words or more on a mobile device is inappropriate largely due to the costs of data download with mobile devices. The costs of using mobile devices for learning has been the primary concern of students [1], especially in developing countries, because free Wi-Fi access points are not always available to mobile learners and keeping a balance between the costs and learning achievements makes them seek out just the right amount of learning content; even further, due to the problem of the oft- decried information overload and information redundancy, delivering large amounts of text content makes it challenging for mobile learners, especially for learning purposes.

Another challenge comes from mobile learners, especially “next generation" or “Net generation" learners. Most of the “next generation" learners prefer multi-tasking and have short attention span [2].

(16)

5.5 Implementation of Query-Focused Summarization Sys-

tems . . . 57

5.5.1 Model integration and sentence ranking . . . 59

6 SUMMARY OF THE RESULTS AND FINDINGS 61 6.1 Experimental Results . . . 61

6.1.1 Experiments in mobile learning settings . . . 61

6.1.2 Experiments in multi-document summarization 70 6.1.3 Experiments in model perplexity . . . 78

6.2 Prototype Applications . . . 78

6.3 Research Findings . . . 80

6.3.1 Findings for mobile learning . . . 80

6.3.2 Findings for multi-document summarization . 81 7 DISCUSSION 83 7.1 Implications . . . 83

7.2 Conclusions . . . 86

7.3 Future Perspectives . . . 87

BIBLIOGRAPHY 89

1 Introduction

1.1 RESEARCH BACKGROUND AND MOTIVATION

Mobile devices are rapidly becoming mainstream in today’s world because of their immediacy, variety and cost effectiveness. These unique characteristics of mobile devices and the swift evolution of mobile technologies might be the most important drivers for mobile learning. Consequently, mobile learning benefits from these unique merits and this fast evolvement, and grows up rapidly to be a major trend in today’s e-Learning. However, this rapid evolution also gives rise to new challenges. Some particular challenges emphasize on the processing and delivery of the learning content.

From a learner’s perspective, the learning content, rather than the technology, is always the key element, even though the technology has improved the mobile learning significantly. Nowadays, most of

’smart’ mobile devices have terrific capability of processing multi- media, but mobile content adaptation and development for mobile devices is still a significant problem. Plain text is still the main content format used in mobile learning settings [1]. However, delivering a text of several thousand words or more on a mobile device is inappropriate largely due to the costs of data download with mobile devices. The costs of using mobile devices for learning has been the primary concern of students [1], especially in developing countries, because free Wi-Fi access points are not always available to mobile learners and keeping a balance between the costs and learning achievements makes them seek out just the right amount of learning content; even further, due to the problem of the oft- decried information overload and information redundancy, delivering large amounts of text content makes it challenging for mobile learners, especially for learning purposes.

Another challenge comes from mobile learners, especially “next generation" or “Net generation" learners. Most of the “next generation" learners prefer multi-tasking and have short attention span [2].

(17)

They can perform more tasks simultaneously and shift their atten- tions quickly from one to another, but would probably be over- whelmed or frustrated if they are asked to read a long report. In order to motivate them to engage in the learning content, many educators have started to supply shorter contents in new curricula [2].

Furthermore, Miller’s [3] cognitive theory also supports the need for shorter contents.

Miller’s [3] information processing theory has been widely ap- plied in education, including e-Learning. Being a special case of e-Learning, mobile learning can also find theoretical support from this theory. This theory states that the short-term memory (or short attention span) could only hold 7±2 chunks of information where a chunk could refer to words, digits, and other meaningful units.

Based on this theory, large text-based content needs to be chunked into small units for more efficient information memorizing.

Text that has been chunked effectively should work well both stand alone and with the rest of the content. Especially in mobile learning, there are more rigorous needs for short content. Previous research demonstrates that short content can benefit mobile learning, and eventually enhances the learning experiences through improving screen reading and increasing speed of information retrieval and process [4].

Nonetheless, those research results may answer why condens- ing large size of content is necessary from educational perspective.

However, from the viewpoint of technology, there are still many arguments about how to shorten the text-based contents properly and effectively without losing the meaning.

To process learning contents effectively, many rapid authoring tools have been developed for mobile learning. However, the pur- pose of developing rapid authoring tools is to mainly assist with the instructional design, and not for facilitating mobile learners. In addition, due to the highly fragmented mobile technology land- scape and rapidly evolving standards, there is no single solution to make content work for every possible mobile device. This situation is compounded when reusing existing learning materials available

on the web, as manual reformatting and summation is both time consuming and expensive. As a result, educators are forced to design new learning content or reformat existing learning materials for delivery on different types of mobile devices. Researchers have developed semantically-aware learning objects retrieval systems for automatically delivering learning contents to learners [5]. Unfor- tunately, automatic placement of semantic tags into learning content is difficult, and hence, no significant application of semantic techniques is available to support automatic labelling of learning content. Therefore, an alternative solution is recommended to automatically analyze the learning content, identify topics, construct knowledge of semantic and lexical relationships, and indicate and synthesize important concepts as a summary for content delivery within the context of mobile learning.

Automatic text summarization can simplify texts into their most important ideas in a particular context. It can be simply described as a group of processes that automatically extract and synthesize source texts to summaries by identifying the importance and re- moving redundancy from a set of documents. Many techniques developed have produced significant results in automatic text summarization over last decades [6–11]. Particularly, the research in topic models has produced good experimental results for automatic topic identification and text classification [12–14]. These significant im- provements attract our attention on topic modeling techniques, especially the Latent Dirichlet Allocation (LDA) based models [15,16]

that may provide suitable solution for the problems mentioned pre- viously. Although showing good results, these techniques in automatic text summarization still face particular challenges to precisely identify similarity and differences, to effectively model higher order structures, and to properly integrate linguistic intuitions into probabilistic models in document processing. Since it is highly un- likely that topic and document processing can do without modeling higher order structure, integrating linguistic intuitions into probabilistic models poses a particular challenge. Also the scalability of increasingly complex models is an important issue for working

(18)