• Ei tuloksia

Contextual text summarization for content processing in mobile learning

N/A
N/A
Info
Lataa
Protected

Academic year: 2022

Jaa "Contextual text summarization for content processing in mobile learning"

Copied!
113
0
0

Kokoteksti

(1)

Publications of the University of Eastern Finland Dissertations in Forestry and Natural Sciences No 150

Publications of the University of Eastern Finland Dissertations in Forestry and Natural Sciences

isbn: 978-952-61-1544-3 (printed) issnl: 1798-5668

issn: 1798-5668 isbn: 978-952-61-1545-0 (pdf)

issnl: 1798-5668

Guangbing Yang

Contextual Text Summarization for Content Processing in

Mobile Learning

This book provides a study of contex- tual text summarization for

content processing in mobile learning.

The problem of the effectiveness of automatic text summarization in mobile learning settings is

addressed, and a Bayesian contextual topic model has been proposed to indicate how the contextual informa- tion can impact the performance of summarization significantly. A proto- type summarization system has been designed and developed in the mobile learning domain.

tations | 150 | Guangbing Yang | Contextual Text Summarization for Content Processing in Mobile Learning

Guangbing Yang

Contextual Text

Summarization for Content

Processing in Mobile

Learning

(2)

Contextual Text Summarization for Content Processing in

Mobile Learning

Publications of the University of Eastern Finland Dissertations in Forestry and Natural Sciences

No 150

Academic Dissertation

To be presented by permission of the Faculty of Science and Forestry for public examination in the Auditorium AG100, Agora building at the University of

Eastern Finland, Joensuu, on October 10, 2014, at 12 o’clock.

School of Computing

(3)

Grano Oy Joensuu, 2014

Editors: Prof. Pertti Pasanen, Prof. Pekka Kilpeläinen, Prof. Kai Peiponen, and Prof. Matti Vornanen

Distribution:

University of Eastern Finland Library / Sales of publications P.O. Box 107, FI-80101 Joensuu, Finland

tel. +358-50-3058396 http://www.uef.fi/kirjasto

ISBN: 978-952-61-1544-3 (printed) ISSNL: 1798-5668

ISSN: 1798-5668 ISBN: 978-952-61-1545-0 (pdf)

ISSNL: 1798-5668 ISSN: 1798-5676 (pdf)

P.O.Box 111

80101 JOENSUU, FINLAND email: yguang@cs.joensuu.fi Supervisors: Professor Erkki Sutinen, Ph.D.

University of Eastern Finland School of Computing

P.O.Box 111

80101 JOENSUU, FINLAND email: erkki.sutinen@cs.joensuu.fi Professor Kinshuk, Ph.D.

Athabasca University

School of Computing and Information Systems 1 University Drive, Athabasca

Alberta T9S 3A3 CANADA email: kinshuk@athabascau.ca

Associate Professor Dunwei Wen, Ph.D. Athabasca University

School of Computing and Information Systems 1 University Drive, Athabasca

Alberta T9S 3A3 CANADA email: dunweiw@athabascau.ca Reviewers: Professor Ronghuai Huang, Ph.D.

Beijing Normal University Faculty of Education

Yanbo Building, Beijing Normal University, 19 Xiejiekouwai Street, Haidian District, Beijing 100875, P. R. CHINA

email: huangrh@bnu.edu.cn

Assistant Professor Jorge Villalón, Ph.D. Universidad Adolfo Ibáñez

School of Engineering and Sciences

Avenida Diagonal Las Torres 2640, Peñalolén. Santiago, Chile

email: jorge.villalon@uai.cl Opponent: Professor Marcos Katz, Ph.D.

University of Oulu

Department of Communications Engineering Erkki Koiso-Kanttilan katu 3, Room TS403

(4)

Grano Oy Joensuu, 2014

Editors: Prof. Pertti Pasanen, Prof. Pekka Kilpeläinen, Prof. Kai Peiponen, and Prof. Matti Vornanen

Distribution:

University of Eastern Finland Library / Sales of publications P.O. Box 107, FI-80101 Joensuu, Finland

tel. +358-50-3058396 http://www.uef.fi/kirjasto

ISBN: 978-952-61-1544-3 (printed) ISSNL: 1798-5668

ISSN: 1798-5668 ISBN: 978-952-61-1545-0 (pdf)

ISSNL: 1798-5668 ISSN: 1798-5676 (pdf)

P.O.Box 111

80101 JOENSUU, FINLAND email: yguang@cs.joensuu.fi Supervisors: Professor Erkki Sutinen, Ph.D.

University of Eastern Finland School of Computing

P.O.Box 111

80101 JOENSUU, FINLAND email: erkki.sutinen@cs.joensuu.fi Professor Kinshuk, Ph.D.

Athabasca University

School of Computing and Information Systems 1 University Drive, Athabasca

Alberta T9S 3A3 CANADA email: kinshuk@athabascau.ca

Associate Professor Dunwei Wen, Ph.D.

Athabasca University

School of Computing and Information Systems 1 University Drive, Athabasca

Alberta T9S 3A3 CANADA email: dunweiw@athabascau.ca Reviewers: Professor Ronghuai Huang, Ph.D.

Beijing Normal University Faculty of Education

Yanbo Building, Beijing Normal University, 19 Xiejiekouwai Street, Haidian District, Beijing 100875, P. R. CHINA

email: huangrh@bnu.edu.cn

Assistant Professor Jorge Villalón, Ph.D.

Universidad Adolfo Ibáñez

School of Engineering and Sciences

Avenida Diagonal Las Torres 2640, Peñalolén.

Santiago, Chile

email: jorge.villalon@uai.cl Opponent: Professor Marcos Katz, Ph.D.

University of Oulu

Department of Communications Engineering Erkki Koiso-Kanttilan katu 3, Room TS403

(5)

Mobile learning benefits from rapidly growing mobile technologies, but mobile content adaptation and development for mobile devices is still a significant problem. Due to the costs of data download with mobile devices and the problem of the oft-decried information overload and information redundancy, delivering a large amount of text contents makes it challenging for mobile learners, especially for learning purposes. Therefore, seeking out just the right amount of learning content is important for mobile learners to keep a balance between the cost and learning achievements. Although many tech- niques have been developed for mobile learning to process learning contents, few of them can produce just the right amount of content.

In the research reported in this thesis, the problem of the ef- fectiveness of automatic text summarization in mobile learning set- tings and the problems of automatic text summarization in Bayesian based topic modeling and natural language processing (NLP) are studied. As a result, a contextual text summarization approach has been addressed to indicate how the contextual information, such as word co-occurrence (or word ordering called in some literatures of statistical language modeling) in a sentence of a document, re- latedness of topics discovered from documents, learner’s interests and preferences to the learning content, etc., can impact the per- formance of summarization significantly. In addition, a prototype summarization system that implemented our proposed approach has been designed and developed for validating the results of this research in mobile learning domain.

The experimental results have demonstrated that our approach is able to deal with the problems mentioned above and enhance performance of summarization significantly. The findings of this work indicate that properly summarized learning contents are able to satisfy appropriate levels of learning achievements, and align content size with the unique characteristics of mobile devices as well.

Both research of mobile learning and automatic text summa-

content processing in mobile learning. Furthermore, the success of the application of automatic text summarization in mobile learning settings has engaged in the research of text summarization.

Universal Decimal Classification: 004.78, 004.91, 004.93, 37.091.33 Library of Congress Subject Headings: Text processing (Computer sci- ence); Content analysis (Communication); Context-aware computing; Nat- ural language processing (Computer science); Computational linguistics;

Automatic abstracting; Mobile communication systems in education; Mo- bile computing; Computer-assisted instruction

Yleinen suomalainen asiasanasto: tekstianalyysi; sisällönanalyysi;

sisällönhallinta; tiivistelmät; konteksti; kontekstuaalisuus; kieliteknologia;

opetusteknologia; tietokoneavusteinen opetus; digitaalinen oppimateriaali;

verkko-oppimateriaali; mobiilisovellukset; mobiililaitteet

(6)

Mobile learning benefits from rapidly growing mobile technologies, but mobile content adaptation and development for mobile devices is still a significant problem. Due to the costs of data download with mobile devices and the problem of the oft-decried information overload and information redundancy, delivering a large amount of text contents makes it challenging for mobile learners, especially for learning purposes. Therefore, seeking out just the right amount of learning content is important for mobile learners to keep a balance between the cost and learning achievements. Although many tech- niques have been developed for mobile learning to process learning contents, few of them can produce just the right amount of content.

In the research reported in this thesis, the problem of the ef- fectiveness of automatic text summarization in mobile learning set- tings and the problems of automatic text summarization in Bayesian based topic modeling and natural language processing (NLP) are studied. As a result, a contextual text summarization approach has been addressed to indicate how the contextual information, such as word co-occurrence (or word ordering called in some literatures of statistical language modeling) in a sentence of a document, re- latedness of topics discovered from documents, learner’s interests and preferences to the learning content, etc., can impact the per- formance of summarization significantly. In addition, a prototype summarization system that implemented our proposed approach has been designed and developed for validating the results of this research in mobile learning domain.

The experimental results have demonstrated that our approach is able to deal with the problems mentioned above and enhance performance of summarization significantly. The findings of this work indicate that properly summarized learning contents are able to satisfy appropriate levels of learning achievements, and align content size with the unique characteristics of mobile devices as well.

Both research of mobile learning and automatic text summa-

content processing in mobile learning. Furthermore, the success of the application of automatic text summarization in mobile learning settings has engaged in the research of text summarization.

Universal Decimal Classification: 004.78, 004.91, 004.93, 37.091.33 Library of Congress Subject Headings: Text processing (Computer sci- ence); Content analysis (Communication); Context-aware computing; Nat- ural language processing (Computer science); Computational linguistics;

Automatic abstracting; Mobile communication systems in education; Mo- bile computing; Computer-assisted instruction

Yleinen suomalainen asiasanasto: tekstianalyysi; sisällönanalyysi;

sisällönhallinta; tiivistelmät; konteksti; kontekstuaalisuus; kieliteknologia;

opetusteknologia; tietokoneavusteinen opetus; digitaalinen oppimateriaali;

verkko-oppimateriaali; mobiilisovellukset; mobiililaitteet

(7)

Preface

This Ph.D thesis contains the result of a culmination of research and learning that has taken place over a period of almost five years. The research reported in this thesis was realized within the framework of the Program: IMPDET, funded by School of Computing, Univer- sity of Eastern Finland. Firstly, I would like to thank Prof. Kinshuk, who led me to academy and introduced me to Prof. Erkki Sutinen and Associate Prof. Dunwei Wen. As my supervisor, Prof. Kin- shuk provided lots of support and gave direction on my research.

Thank you so much for always being there for me and many appre- ciations for your understanding and emotional support during my PhD study.

I would also like to thank my other two supervisors: Prof. Erkki Sutinen and Associate Prof. Dunwei Wen, especially Prof. Erkki Su- tinen, he gave excellent advice on writing and revising this disser- ation. Without their continuous support and advice on my work, I would have never reached the point of finishing my dissertation.

Special thanks also go to Prof. Nian-Shing Chen and Prof. Terry Anderson for their advice and suggestions on my work.

I would also like to thank my thesis preliminary examiners:

Prof. Ronghuai Huang and Assistant Prof. Jorge Villalón for their time and useful comments and Prof. Marcos Katz for acting as my opponent. I would also like to thank Dr. Jarkko Suhonen and other research fellows and colleagues in School of Computing of Univer- sity of Eastern Finland and NSERC/iCORE lab of Athabasca Uni- versity for many useful discussions, comments and suggestions.

At last, many thanks and love to my family for their encourage- ment, support, patience, and love.

Joensuu, September 8th, 2014 Guangbing Yang

(8)

Preface

This Ph.D thesis contains the result of a culmination of research and learning that has taken place over a period of almost five years. The research reported in this thesis was realized within the framework of the Program: IMPDET, funded by School of Computing, Univer- sity of Eastern Finland. Firstly, I would like to thank Prof. Kinshuk, who led me to academy and introduced me to Prof. Erkki Sutinen and Associate Prof. Dunwei Wen. As my supervisor, Prof. Kin- shuk provided lots of support and gave direction on my research.

Thank you so much for always being there for me and many appre- ciations for your understanding and emotional support during my PhD study.

I would also like to thank my other two supervisors: Prof. Erkki Sutinen and Associate Prof. Dunwei Wen, especially Prof. Erkki Su- tinen, he gave excellent advice on writing and revising this disser- ation. Without their continuous support and advice on my work, I would have never reached the point of finishing my dissertation.

Special thanks also go to Prof. Nian-Shing Chen and Prof. Terry Anderson for their advice and suggestions on my work.

I would also like to thank my thesis preliminary examiners:

Prof. Ronghuai Huang and Assistant Prof. Jorge Villalón for their time and useful comments and Prof. Marcos Katz for acting as my opponent. I would also like to thank Dr. Jarkko Suhonen and other research fellows and colleagues in School of Computing of Univer- sity of Eastern Finland and NSERC/iCORE lab of Athabasca Uni- versity for many useful discussions, comments and suggestions.

At last, many thanks and love to my family for their encourage- ment, support, patience, and love.

Joensuu, September 8th, 2014 Guangbing Yang

(9)

CTM - Contextual Topic Model or Correlated Topic Model

DUC - Document Understand Conference

GEM - Griffiths, Engen and McCloskey distribution

HMM-LDA - Hidden Markov model with LDA

HDP - Hierarchical Dirichlet process

HTCSM - Hierarchical Topic Coherence Sentence Model

IR - Information Retrieval

KL - Kullback-Lieber

LDA - Latent Dirichlet Allocation

LMS - Learning Management System

MCMC - Markov Chain Monte Carlo

MMR - Maximal Marginal Relevance

NLP - Natural Language Processing

QL - Query Likelihood Language Model

ROUGE - Recall-Oriented Understudy for Gisting Evaluation

TAC - Text Analysis Conference

TREC - Text REtrieval Conference

TTM - Two Tired Topic Model

hLDA - the hierarchical Latent Dirichlet Allocation model

m-SCORM - mobile Sharable Content Object Reference Model

nCRP - Nested Chinese Restaurant Process

pLSA - probabilistic Latent Semantic Analysis

D - number of documents Q - query variable

Ct - contextual topic model R - relevance language model

V - number of unique words (vocabulary size) L - maximum level number

w - a string of words (a word, a sentence, a document, or a corpus) Nd- number of word tokens in document d

Ns- number of word tokens in sentence s

zd,i - the topic associated withith token in document d

xd,i - the bigram status between the (i−1)th token and ith token in document d

wd,i - theith token in document d

θd - the multinomial distribution of topics w.r.t document d φ- the multinomial distribution of words w.r.t topic z

ψv - the binomial (Bernoulli) distribution of status variable w.r.t previous word

cd- the path allocation w.r.t document d α- Dirichlet prior ofθ

β- Dirichlet prior ofφ

γ- nCRP parameter for path allocation δ- Dirichlet prior ofψ

m,π- parameters of GEM

(10)

CTM - Contextual Topic Model or Correlated Topic Model

DUC - Document Understand Conference

GEM - Griffiths, Engen and McCloskey distribution

HMM-LDA - Hidden Markov model with LDA

HDP - Hierarchical Dirichlet process

HTCSM - Hierarchical Topic Coherence Sentence Model

IR - Information Retrieval

KL - Kullback-Lieber

LDA - Latent Dirichlet Allocation

LMS - Learning Management System

MCMC - Markov Chain Monte Corel

MMR - Maximal Marginal Relevance

NLP - Natural Language Processing

QL - Query Likelihood Language Model

ROUGE - Recall-Oriented Understudy for Gisting Evaluation

TAC - Text Analysis Conference

TREC - Text REtrieval Conference

TTM - Two Tired Topic Model

hLDA - the hierarchical Latent Dirichlet Allocation model

m-SCORM - mobile Sharable Content Object Reference Model

nCRP - Nested Chinese Restaurant Process

pLSA - probabilistic Latent Semantic Analysis

D - number of documents Q - query variable

Ct - contextual topic model R - relevance language model

V - number of unique words (vocabulary size) L - maximum level number

w - a string of words (a word, a sentence, a document, or a corpus) Nd - number of word tokens in document d

Ns- number of word tokens in sentence s

zd,i - the topic associated withith token in document d

xd,i - the bigram status between the (i−1)th token and ith token in document d

wd,i - theith token in document d

θd - the multinomial distribution of topics w.r.t document d φ- the multinomial distribution of words w.r.t topic z

ψv - the binomial (Bernoulli) distribution of status variable w.r.t previous word

cd - the path allocation w.r.t document d α- Dirichlet prior ofθ

β- Dirichlet prior ofφ

γ- nCRP parameter for path allocation δ - Dirichlet prior ofψ

m,π- parameters of GEM

(11)

This thesis is based on data presented in the following articles, re- ferred to by the Roman numerals I-VII.

I Yang, G., Chen, N. S., Kinshuk, Sutinen, E., Anderson, T.,

& Wen, D., “The Effectiveness of Automatic Text Summa- rization in Mobile Learning Contexts,”Computers & Education 68(2013),233-243.

II Yang, G., Kinshuk, Sutinen, E., & Wen, D. (2012a, July 4-6),

“Chunking and extracting text content for mobile learning:

A query-focused summarizer based on relevance language model,“ InProceedings of the 12th IEEE International Conference on Advanced Learning Technologies (ICALT 2012).doi: 10.1109/I- CALT.2012.29.

III Yang, G., Wen, D., Kinshuk, Chen, N., & Sutinen, E. (2012b, July 18-20), “Personalized Text Content Summarizer for Mo- bile Learning: An Automatic Text Summarization System with Relevance Based Language Model,” InProceedings of the IEEE Fourth International Conference on Technology for Education. doi:

10.1109/T4E.2012.23.

IV Yang, G., Kinshuk, Wen, D., Sutinen, E. (2013 December 5- 8). “A Contextual Query Expansion Based Multi-document Summarizer for Smart Learning,” In Proceedings of IEEE 9th International Conference on Signal-Image Technology & Internet- Based Systems. doi: 10.1109/SITIS.2013.163. ISBN: 978-1-4799- 3211-5/13 2013 IEEE.

V Yang, G., Wen, D., Kinshuk, Chen, N. S., Sutinen, E, “A Novel Contextual Topic Model for Multi-document Summarization,”

submitted to journal Expert Systems with Applications. (Ac- cepted at Sept 10, 2014).

VI Yang, G., Kinshuk, Wen, D., Sutinen, E, “Enhancing Sentence Ordering by Hierarchical Topic Modeling for Multi-document

367-379). Springer-Verlag.

VII Wen, D., Gao, Y., Yang, G. “Semantic Analysis Enhanced Nat- ural Language Interaction In Ubiquitous Learning.” In Kin- shuk et al Eds. Ubiquitous Learning Environment and Technology, Springer’s Lecture Notes in Educational Technology series. ISBN:

978-3-662-44658-4. Springer-Verlag.

In this thesis, these papers will be referred to as (P1) to (P7), which correspond to the Roman numerals I-VII. The papers have been included in this thesis with permission of their copyright holders.

(12)

This thesis is based on data presented in the following articles, re- ferred to by the Roman numerals I-VII.

I Yang, G., Chen, N. S., Kinshuk, Sutinen, E., Anderson, T.,

& Wen, D., “The Effectiveness of Automatic Text Summa- rization in Mobile Learning Contexts,”Computers & Education 68(2013),233-243.

II Yang, G., Kinshuk, Sutinen, E., & Wen, D. (2012a, July 4-6),

“Chunking and extracting text content for mobile learning:

A query-focused summarizer based on relevance language model,“ InProceedings of the 12th IEEE International Conference on Advanced Learning Technologies (ICALT 2012). doi: 10.1109/I- CALT.2012.29.

III Yang, G., Wen, D., Kinshuk, Chen, N., & Sutinen, E. (2012b, July 18-20), “Personalized Text Content Summarizer for Mo- bile Learning: An Automatic Text Summarization System with Relevance Based Language Model,” InProceedings of the IEEE Fourth International Conference on Technology for Education. doi:

10.1109/T4E.2012.23.

IV Yang, G., Kinshuk, Wen, D., Sutinen, E. (2013 December 5- 8). “A Contextual Query Expansion Based Multi-document Summarizer for Smart Learning,” In Proceedings of IEEE 9th International Conference on Signal-Image Technology & Internet- Based Systems. doi: 10.1109/SITIS.2013.163. ISBN: 978-1-4799- 3211-5/13 2013 IEEE.

V Yang, G., Wen, D., Kinshuk, Chen, N. S., Sutinen, E, “A Novel Contextual Topic Model for Multi-document Summarization,”

submitted to journal Expert Systems with Applications. (Ac- cepted at Sept 10, 2014).

VI Yang, G., Kinshuk, Wen, D., Sutinen, E, “Enhancing Sentence Ordering by Hierarchical Topic Modeling for Multi-document

367-379). Springer-Verlag.

VII Wen, D., Gao, Y., Yang, G. “Semantic Analysis Enhanced Nat- ural Language Interaction In Ubiquitous Learning.” In Kin- shuk et al Eds. Ubiquitous Learning Environment and Technology, Springer’s Lecture Notes in Educational Technology series. ISBN:

978-3-662-44658-4. Springer-Verlag.

In this thesis, these papers will be referred to as (P1) to (P7), which correspond to the Roman numerals I-VII. The papers have been included in this thesis with permission of their copyright holders.

(13)

In (P1), Prof. Kinshuk gave direction on the content processing of mobile learning, Prof. Chen advised on the journal paper writing, Prof. Sutinen advised the author to finalize the research problem defined in (P1), and Prof. Anderson edited and reviewed the paper, while the author determined the research questions, proposed the methods for solving the problem, designed and conducted the ex- periments, and then performed the result analysis under the guid- ance of Prof. Chen.

The idea, implementations, and experiments in (P2, P3, P4) orig- inated from the author, while the direction on Latent Dirichlet Allo- cation topic model was given by Prof. Wen, and the idea of person- alized summary in (P3) and the idea of implementing the text sum- marization into smart learning in (P4) originated from Prof. Kin- shuk. The author established the way of addressing the proposed approach in Bayesian topic modeling to multi-document summa- rization within the context of mobile learning.

In (P5), the idea of employing topic n-grams model originated from the author, while Prof. Chen and Prof. Kinshuk advised on the paper writing, and the author made the main development and implementation, and then performed the experiment.

In (P6), the idea of employing the hierarchical Bayesian topic model to the problem of sentence ordering originated from the au- thor, while Prof. Wen advised the implementation of the EM algo- rithm for hyper-parameter inference and Prof. Kinshuk advised on the paper writing. The author determined the research questions, proposed the methods for solving the problem, designed and con- ducted the experiments, and then performed the result analysis.

In (P1-P6), the author was responsible for most of the writing and author’s supervisors were responsible for most of editing and revisions. In (P7), the author was responsible for writing several paragraphs regarding the automatic text summarization.

Contents

1 INTRODUCTION 1

1.1 Research Background and Motivation . . . 1

1.2 Problem Statement and Research Questions . . . 4

1.2.1 Problem statement . . . 5

1.2.2 Research questions . . . 5

1.3 Structure of the Thesis . . . 8

2 LITERATURE REVIEW 11 2.1 Content Processing in Mobile Learning . . . 11

2.2 Automatic Text Summarization . . . 12

2.3 Related Work in Topic Modeling . . . 16

2.3.1 History of topic modeling . . . 16

2.3.2 Latent variable based topic models . . . 19

3 RESEARCH DESIGN 23 3.1 Methodology . . . 23

3.1.1 Technical design . . . 23

3.1.2 System implementation . . . 28

3.2 Data Sets . . . 29

3.3 Experiment Settings . . . 32

3.3.1 Experiment settings for mobile learning envi- ronments . . . 32

3.3.2 Experiment settings for multi-document sum- marization . . . 33

3.3.3 Experiment settings for model perplexity . . . 34

4 PUBLICATION OVERVIEW 37 5 SYSTEM DESIGN AND IMPLEMENTATION 49 5.1 Main Components in Contextual Text Summarization 50 5.2 Relevance Language Model . . . 53

5.3 Contextual Topic Model (CTM) . . . 54

(14)

In (P1), Prof. Kinshuk gave direction on the content processing of mobile learning, Prof. Chen advised on the journal paper writing, Prof. Sutinen advised the author to finalize the research problem defined in (P1), and Prof. Anderson edited and reviewed the paper, while the author determined the research questions, proposed the methods for solving the problem, designed and conducted the ex- periments, and then performed the result analysis under the guid- ance of Prof. Chen.

The idea, implementations, and experiments in (P2, P3, P4) orig- inated from the author, while the direction on Latent Dirichlet Allo- cation topic model was given by Prof. Wen, and the idea of person- alized summary in (P3) and the idea of implementing the text sum- marization into smart learning in (P4) originated from Prof. Kin- shuk. The author established the way of addressing the proposed approach in Bayesian topic modeling to multi-document summa- rization within the context of mobile learning.

In (P5), the idea of employing topic n-grams model originated from the author, while Prof. Chen and Prof. Kinshuk advised on the paper writing, and the author made the main development and implementation, and then performed the experiment.

In (P6), the idea of employing the hierarchical Bayesian topic model to the problem of sentence ordering originated from the au- thor, while Prof. Wen advised the implementation of the EM algo- rithm for hyper-parameter inference and Prof. Kinshuk advised on the paper writing. The author determined the research questions, proposed the methods for solving the problem, designed and con- ducted the experiments, and then performed the result analysis.

In (P1-P6), the author was responsible for most of the writing and author’s supervisors were responsible for most of editing and revisions. In (P7), the author was responsible for writing several paragraphs regarding the automatic text summarization.

Contents

1 INTRODUCTION 1

1.1 Research Background and Motivation . . . 1

1.2 Problem Statement and Research Questions . . . 4

1.2.1 Problem statement . . . 5

1.2.2 Research questions . . . 5

1.3 Structure of the Thesis . . . 8

2 LITERATURE REVIEW 11 2.1 Content Processing in Mobile Learning . . . 11

2.2 Automatic Text Summarization . . . 12

2.3 Related Work in Topic Modeling . . . 16

2.3.1 History of topic modeling . . . 16

2.3.2 Latent variable based topic models . . . 19

3 RESEARCH DESIGN 23 3.1 Methodology . . . 23

3.1.1 Technical design . . . 23

3.1.2 System implementation . . . 28

3.2 Data Sets . . . 29

3.3 Experiment Settings . . . 32

3.3.1 Experiment settings for mobile learning envi- ronments . . . 32

3.3.2 Experiment settings for multi-document sum- marization . . . 33

3.3.3 Experiment settings for model perplexity . . . 34

4 PUBLICATION OVERVIEW 37 5 SYSTEM DESIGN AND IMPLEMENTATION 49 5.1 Main Components in Contextual Text Summarization 50 5.2 Relevance Language Model . . . 53

5.3 Contextual Topic Model (CTM) . . . 54

(15)

5.5 Implementation of Query-Focused Summarization Sys-

tems . . . 57

5.5.1 Model integration and sentence ranking . . . 59

6 SUMMARY OF THE RESULTS AND FINDINGS 61 6.1 Experimental Results . . . 61

6.1.1 Experiments in mobile learning settings . . . 61

6.1.2 Experiments in multi-document summarization 70 6.1.3 Experiments in model perplexity . . . 78

6.2 Prototype Applications . . . 78

6.3 Research Findings . . . 80

6.3.1 Findings for mobile learning . . . 80

6.3.2 Findings for multi-document summarization . 81 7 DISCUSSION 83 7.1 Implications . . . 83

7.2 Conclusions . . . 86

7.3 Future Perspectives . . . 87

BIBLIOGRAPHY 89

1 Introduction

1.1 RESEARCH BACKGROUND AND MOTIVATION

Mobile devices are rapidly becoming mainstream in today’s world because of their immediacy, variety and cost effectiveness. These unique characteristics of mobile devices and the swift evolution of mobile technologies might be the most important drivers for mo- bile learning. Consequently, mobile learning benefits from these unique merits and this fast evolvement, and grows up rapidly to be a major trend in today’s e-Learning. However, this rapid evolu- tion also gives rise to new challenges. Some particular challenges emphasize on the processing and delivery of the learning content.

From a learner’s perspective, the learning content, rather than the technology, is always the key element, even though the technology has improved the mobile learning significantly. Nowadays, most of

’smart’ mobile devices have terrific capability of processing multi- media, but mobile content adaptation and development for mobile devices is still a significant problem. Plain text is still the main content format used in mobile learning settings [1]. However, de- livering a text of several thousand words or more on a mobile de- vice is inappropriate largely due to the costs of data download with mobile devices. The costs of using mobile devices for learning has been the primary concern of students [1], especially in developing countries, because free Wi-Fi access points are not always available to mobile learners and keeping a balance between the costs and learning achievements makes them seek out just the right amount of learning content; even further, due to the problem of the oft- decried information overload and information redundancy, deliv- ering large amounts of text content makes it challenging for mobile learners, especially for learning purposes.

Another challenge comes from mobile learners, especially “next generation" or “Net generation" learners. Most of the “next genera- tion" learners prefer multi-tasking and have short attention span [2].

(16)

5.5 Implementation of Query-Focused Summarization Sys-

tems . . . 57

5.5.1 Model integration and sentence ranking . . . 59

6 SUMMARY OF THE RESULTS AND FINDINGS 61 6.1 Experimental Results . . . 61

6.1.1 Experiments in mobile learning settings . . . 61

6.1.2 Experiments in multi-document summarization 70 6.1.3 Experiments in model perplexity . . . 78

6.2 Prototype Applications . . . 78

6.3 Research Findings . . . 80

6.3.1 Findings for mobile learning . . . 80

6.3.2 Findings for multi-document summarization . 81 7 DISCUSSION 83 7.1 Implications . . . 83

7.2 Conclusions . . . 86

7.3 Future Perspectives . . . 87

BIBLIOGRAPHY 89

1 Introduction

1.1 RESEARCH BACKGROUND AND MOTIVATION

Mobile devices are rapidly becoming mainstream in today’s world because of their immediacy, variety and cost effectiveness. These unique characteristics of mobile devices and the swift evolution of mobile technologies might be the most important drivers for mo- bile learning. Consequently, mobile learning benefits from these unique merits and this fast evolvement, and grows up rapidly to be a major trend in today’s e-Learning. However, this rapid evolu- tion also gives rise to new challenges. Some particular challenges emphasize on the processing and delivery of the learning content.

From a learner’s perspective, the learning content, rather than the technology, is always the key element, even though the technology has improved the mobile learning significantly. Nowadays, most of

’smart’ mobile devices have terrific capability of processing multi- media, but mobile content adaptation and development for mobile devices is still a significant problem. Plain text is still the main content format used in mobile learning settings [1]. However, de- livering a text of several thousand words or more on a mobile de- vice is inappropriate largely due to the costs of data download with mobile devices. The costs of using mobile devices for learning has been the primary concern of students [1], especially in developing countries, because free Wi-Fi access points are not always available to mobile learners and keeping a balance between the costs and learning achievements makes them seek out just the right amount of learning content; even further, due to the problem of the oft- decried information overload and information redundancy, deliv- ering large amounts of text content makes it challenging for mobile learners, especially for learning purposes.

Another challenge comes from mobile learners, especially “next generation" or “Net generation" learners. Most of the “next genera- tion" learners prefer multi-tasking and have short attention span [2].

(17)

They can perform more tasks simultaneously and shift their atten- tions quickly from one to another, but would probably be over- whelmed or frustrated if they are asked to read a long report. In order to motivate them to engage in the learning content, many ed- ucators have started to supply shorter contents in new curricula [2].

Furthermore, Miller’s [3] cognitive theory also supports the need for shorter contents.

Miller’s [3] information processing theory has been widely ap- plied in education, including e-Learning. Being a special case of e-Learning, mobile learning can also find theoretical support from this theory. This theory states that the short-term memory (or short attention span) could only hold 7±2 chunks of information where a chunk could refer to words, digits, and other meaningful units.

Based on this theory, large text-based content needs to be chun- ked into small units for more efficient information memorizing.

Text that has been chunked effectively should work well both stand alone and with the rest of the content. Especially in mobile learning, there are more rigorous needs for short content. Previous research demonstrates that short content can benefit mobile learning, and eventually enhances the learning experiences through improving screen reading and increasing speed of information retrieval and process [4].

Nonetheless, those research results may answer why condens- ing large size of content is necessary from educational perspective.

However, from the viewpoint of technology, there are still many arguments about how to shorten the text-based contents properly and effectively without losing the meaning.

To process learning contents effectively, many rapid authoring tools have been developed for mobile learning. However, the pur- pose of developing rapid authoring tools is to mainly assist with the instructional design, and not for facilitating mobile learners. In addition, due to the highly fragmented mobile technology land- scape and rapidly evolving standards, there is no single solution to make content work for every possible mobile device. This situation is compounded when reusing existing learning materials available

on the web, as manual reformatting and summation is both time consuming and expensive. As a result, educators are forced to de- sign new learning content or reformat existing learning materials for delivery on different types of mobile devices. Researchers have developed semantically-aware learning objects retrieval systems for automatically delivering learning contents to learners [5]. Unfor- tunately, automatic placement of semantic tags into learning con- tent is difficult, and hence, no significant application of semantic techniques is available to support automatic labelling of learning content. Therefore, an alternative solution is recommended to au- tomatically analyze the learning content, identify topics, construct knowledge of semantic and lexical relationships, and indicate and synthesize important concepts as a summary for content delivery within the context of mobile learning.

Automatic text summarization can simplify texts into their most important ideas in a particular context. It can be simply described as a group of processes that automatically extract and synthesize source texts to summaries by identifying the importance and re- moving redundancy from a set of documents. Many techniques de- veloped have produced significant results in automatic text summa- rization over last decades [6–11]. Particularly, the research in topic models has produced good experimental results for automatic topic identification and text classification [12–14]. These significant im- provements attract our attention on topic modeling techniques, es- pecially the Latent Dirichlet Allocation (LDA) based models [15,16]

that may provide suitable solution for the problems mentioned pre- viously. Although showing good results, these techniques in auto- matic text summarization still face particular challenges to precisely identify similarity and differences, to effectively model higher or- der structures, and to properly integrate linguistic intuitions into probabilistic models in document processing. Since it is highly un- likely that topic and document processing can do without modeling higher order structure, integrating linguistic intuitions into proba- bilistic models poses a particular challenge. Also the scalability of increasingly complex models is an important issue for working

(18)

They can perform more tasks simultaneously and shift their atten- tions quickly from one to another, but would probably be over- whelmed or frustrated if they are asked to read a long report. In order to motivate them to engage in the learning content, many ed- ucators have started to supply shorter contents in new curricula [2].

Furthermore, Miller’s [3] cognitive theory also supports the need for shorter contents.

Miller’s [3] information processing theory has been widely ap- plied in education, including e-Learning. Being a special case of e-Learning, mobile learning can also find theoretical support from this theory. This theory states that the short-term memory (or short attention span) could only hold 7±2 chunks of information where a chunk could refer to words, digits, and other meaningful units.

Based on this theory, large text-based content needs to be chun- ked into small units for more efficient information memorizing.

Text that has been chunked effectively should work well both stand alone and with the rest of the content. Especially in mobile learning, there are more rigorous needs for short content. Previous research demonstrates that short content can benefit mobile learning, and eventually enhances the learning experiences through improving screen reading and increasing speed of information retrieval and process [4].

Nonetheless, those research results may answer why condens- ing large size of content is necessary from educational perspective.

However, from the viewpoint of technology, there are still many arguments about how to shorten the text-based contents properly and effectively without losing the meaning.

To process learning contents effectively, many rapid authoring tools have been developed for mobile learning. However, the pur- pose of developing rapid authoring tools is to mainly assist with the instructional design, and not for facilitating mobile learners. In addition, due to the highly fragmented mobile technology land- scape and rapidly evolving standards, there is no single solution to make content work for every possible mobile device. This situation is compounded when reusing existing learning materials available

on the web, as manual reformatting and summation is both time consuming and expensive. As a result, educators are forced to de- sign new learning content or reformat existing learning materials for delivery on different types of mobile devices. Researchers have developed semantically-aware learning objects retrieval systems for automatically delivering learning contents to learners [5]. Unfor- tunately, automatic placement of semantic tags into learning con- tent is difficult, and hence, no significant application of semantic techniques is available to support automatic labelling of learning content. Therefore, an alternative solution is recommended to au- tomatically analyze the learning content, identify topics, construct knowledge of semantic and lexical relationships, and indicate and synthesize important concepts as a summary for content delivery within the context of mobile learning.

Automatic text summarization can simplify texts into their most important ideas in a particular context. It can be simply described as a group of processes that automatically extract and synthesize source texts to summaries by identifying the importance and re- moving redundancy from a set of documents. Many techniques de- veloped have produced significant results in automatic text summa- rization over last decades [6–11]. Particularly, the research in topic models has produced good experimental results for automatic topic identification and text classification [12–14]. These significant im- provements attract our attention on topic modeling techniques, es- pecially the Latent Dirichlet Allocation (LDA) based models [15,16]

that may provide suitable solution for the problems mentioned pre- viously. Although showing good results, these techniques in auto- matic text summarization still face particular challenges to precisely identify similarity and differences, to effectively model higher or- der structures, and to properly integrate linguistic intuitions into probabilistic models in document processing. Since it is highly un- likely that topic and document processing can do without modeling higher order structure, integrating linguistic intuitions into proba- bilistic models poses a particular challenge. Also the scalability of increasingly complex models is an important issue for working

(19)

systems providing ad-hoc topic identification and text summariza- tion [17].

Research and theory on summarization has recognized that the critical information is helpful to gain a better comprehension of the learning materials [18](p. 30). Text summarization can simplify texts into their most important ideas in a particular context, but re- ducing the content may negatively impact the meaning conveyed within. Many solutions of automatic text summarization have been applied in literature with aim to provide learning support [19, 20], but few of them have quantitatively investigated learning achieve- ments of learners with the assistance of such solutions, especially in a mobile learning context. The problem that condensing con- tent may negatively impact the meaning needs to be quantitatively evaluated within the context of mobile learning.

With respect to the above challenges in the automatic text sum- marization and the particular needs of the content processing in mobile learning, this research is motivated for the study of prob- lems of how to identify topic accurately and integrate linguistic in- tuitions into a topic model in text summarization, and eventually to address an approach to processing text content effectively to align content size to match various characteristics of mobile devices. The outcomes of this research should be applicable in text-based content processing in mobile learning.

1.2 PROBLEM STATEMENT AND RESEARCH QUESTIONS This research analyzes the effectiveness of automatic text summa- rization in mobile learning context and studies problems of auto- matic text summarization by using techniques in Bayesian based topic modeling and natural language processing (NLP). By examin- ing these problems, a contextual text summarization approach has been addressed in this study. In addition, experimental evaluations have been conducted to quantitatively validate the results of this re- search in mobile learning domain. Due to challenges in automatic text summarization and mobile content processing discussed pre-

viously, this research aims to study the effectiveness of automatic text summarization in mobile learning contexts and problems of Bayesian based topic modeling in automatic text summarization.

The problem statement of this thesis and corresponding research questions are discussed as follows:

1.2.1 Problem statement

Properly summarized text not only can improve content processing in mo- bile learning but also can help mobile learners retrieve and process infor- mation more efficiently. In order to reduce the text size effectively and at the same time minimize the negative impact on learning achievement when the condensed texts are used in mobile learning settings, mobile learning needs to utilize advanced information technologies to align content size to various mobile characteristics and identify content with the consideration of learners’ preferences and interests as well.

In order to address an approach to deal with this problem, fol- lowing research questions are given in this thesis. The first research question helps to indicate the suitable technologies for this prob- lem. The second research question focuses on the study of the effectiveness of automatic text summarization in mobile learning contexts. The third research question aims to improve the existing techniques or develop new methods to enhance the performance of summarization and eventually benefit the mobile learning.

1.2.2 Research questions

The first research question aims to study literature to indicate suit- able techniques in information technology to tackle the problem of text summarization within the context of mobile learning.

RQ1: What kind of advanced techniques in information technology can perform the task of summarizing text content in mobile learning to align content size to match various mobile characteristics effectively?

As mentioned previously, current techniques in mobile learning

(20)

systems providing ad-hoc topic identification and text summariza- tion [17].

Research and theory on summarization has recognized that the critical information is helpful to gain a better comprehension of the learning materials [18](p. 30). Text summarization can simplify texts into their most important ideas in a particular context, but re- ducing the content may negatively impact the meaning conveyed within. Many solutions of automatic text summarization have been applied in literature with aim to provide learning support [19, 20], but few of them have quantitatively investigated learning achieve- ments of learners with the assistance of such solutions, especially in a mobile learning context. The problem that condensing con- tent may negatively impact the meaning needs to be quantitatively evaluated within the context of mobile learning.

With respect to the above challenges in the automatic text sum- marization and the particular needs of the content processing in mobile learning, this research is motivated for the study of prob- lems of how to identify topic accurately and integrate linguistic in- tuitions into a topic model in text summarization, and eventually to address an approach to processing text content effectively to align content size to match various characteristics of mobile devices. The outcomes of this research should be applicable in text-based content processing in mobile learning.

1.2 PROBLEM STATEMENT AND RESEARCH QUESTIONS This research analyzes the effectiveness of automatic text summa- rization in mobile learning context and studies problems of auto- matic text summarization by using techniques in Bayesian based topic modeling and natural language processing (NLP). By examin- ing these problems, a contextual text summarization approach has been addressed in this study. In addition, experimental evaluations have been conducted to quantitatively validate the results of this re- search in mobile learning domain. Due to challenges in automatic text summarization and mobile content processing discussed pre-

viously, this research aims to study the effectiveness of automatic text summarization in mobile learning contexts and problems of Bayesian based topic modeling in automatic text summarization.

The problem statement of this thesis and corresponding research questions are discussed as follows:

1.2.1 Problem statement

Properly summarized text not only can improve content processing in mo- bile learning but also can help mobile learners retrieve and process infor- mation more efficiently. In order to reduce the text size effectively and at the same time minimize the negative impact on learning achievement when the condensed texts are used in mobile learning settings, mobile learning needs to utilize advanced information technologies to align content size to various mobile characteristics and identify content with the consideration of learners’ preferences and interests as well.

In order to address an approach to deal with this problem, fol- lowing research questions are given in this thesis. The first research question helps to indicate the suitable technologies for this prob- lem. The second research question focuses on the study of the effectiveness of automatic text summarization in mobile learning contexts. The third research question aims to improve the existing techniques or develop new methods to enhance the performance of summarization and eventually benefit the mobile learning.

1.2.2 Research questions

The first research question aims to study literature to indicate suit- able techniques in information technology to tackle the problem of text summarization within the context of mobile learning.

RQ1: What kind of advanced techniques in information technology can perform the task of summarizing text content in mobile learning to align content size to match various mobile characteristics effectively?

As mentioned previously, current techniques in mobile learning

(21)

content processing have issues regarding how to automatically label content with metadata to identify the semantic meaning of content.

Looking for answers to above question may help us identify some potential techniques that are able to provide solutions to solve these issues.

In order to get a clear picture about the above question, it is nec- essary to have a detailed overview of the automatic text summariza- tion and related techniques. A literature review with a necessary gap analysis is described in the next chapter. The answer of RQ1 will be based on the review of previous literature, and will produce new knowledge on the use of automatic text summarization in mo- bile learning, especially in content processing. This knowledge can then be used to support the development of a new approach or en- hancement of existing models in text summarization. However, as discussed perviously, reducing content size may negatively impact learner’s comprehension of learning materials. Thus, the second research question is:

RQ2: What is the effectiveness of automatic text summarization in mobile learning context?

In order to answer the question properly, this question has been refined and narrowed down into following sub questions to indi- cate the effectiveness with respect to the balance between learning achievement and the unique metrics of mobile devices:

RQ2.1: Does the summary of learning contents contain enough informa- tion to support learners in reaching a sufficient level of learning achieve- ment?

RQ2.2: What is the best compression rate for summaries in which the summarized learning contents are short enough to facilitate learning with- out significantly negative impact on learner’s comprehension of the learn- ing materials?

RQ2.3: Furthermore, is the indicated important information in a short summary still useful when satisfying the unique metrics of mobile devices

by aligning content size?

In order to enhance the performance of the summarization, lan- guage models, such as the query likelihood and relevance model, and Bayesian based topic model are integrated to provide a new approach to tackle the problem of how to identify the similarity between sentences and how to order the sentences for better coher- ence. Thus, the third research question is:

RQ3: By finding topics from a set of documents, how can topic models identify semantic similarity between words and determine sentence order- ing properly to improve the relevance judgment and generate a coherent summary from a set of documents?

The semantic associations of words in a document can be ex- plored by analyzing the structure of the document. The statistical similarity between words in a document shall not relate to a spe- cific user information request (e.g., a query). It specifically involves word relations that are meaningful to the document and can be inferred either based on the document itself or with the help of context of the document. We define the context of the document as where the word is located, such as lexical co-occurrences, in the document and as a set of related words, such collocations, phrases, and composite words in the way of linguistics. For example, the word "apple" has various meanings by itself without involving the context of the document. It is a kind of fruit based on its lexical meaning. If interpreted under a particular context (previous or next words of it, or the document topics or subjects), for example "apple pie", this "apple" is understood as the fruit apple. If the word fol- lowing it is "computer", this "apple" is known as a computer brand name or a company’s name. Thus, the semantic meanings of a word are hard to be determined precisely without consulting the context of the word. We believe that the semantic association of words not only relates to their lexical meaning in common, it also associates with their shared reference [21]. The topic or subject of discourse in a document is prospected to be able to reveal the context infor-

(22)

content processing have issues regarding how to automatically label content with metadata to identify the semantic meaning of content.

Looking for answers to above question may help us identify some potential techniques that are able to provide solutions to solve these issues.

In order to get a clear picture about the above question, it is nec- essary to have a detailed overview of the automatic text summariza- tion and related techniques. A literature review with a necessary gap analysis is described in the next chapter. The answer of RQ1 will be based on the review of previous literature, and will produce new knowledge on the use of automatic text summarization in mo- bile learning, especially in content processing. This knowledge can then be used to support the development of a new approach or en- hancement of existing models in text summarization. However, as discussed perviously, reducing content size may negatively impact learner’s comprehension of learning materials. Thus, the second research question is:

RQ2: What is the effectiveness of automatic text summarization in mobile learning context?

In order to answer the question properly, this question has been refined and narrowed down into following sub questions to indi- cate the effectiveness with respect to the balance between learning achievement and the unique metrics of mobile devices:

RQ2.1: Does the summary of learning contents contain enough informa- tion to support learners in reaching a sufficient level of learning achieve- ment?

RQ2.2: What is the best compression rate for summaries in which the summarized learning contents are short enough to facilitate learning with- out significantly negative impact on learner’s comprehension of the learn- ing materials?

RQ2.3: Furthermore, is the indicated important information in a short summary still useful when satisfying the unique metrics of mobile devices

by aligning content size?

In order to enhance the performance of the summarization, lan- guage models, such as the query likelihood and relevance model, and Bayesian based topic model are integrated to provide a new approach to tackle the problem of how to identify the similarity between sentences and how to order the sentences for better coher- ence. Thus, the third research question is:

RQ3: By finding topics from a set of documents, how can topic models identify semantic similarity between words and determine sentence order- ing properly to improve the relevance judgment and generate a coherent summary from a set of documents?

The semantic associations of words in a document can be ex- plored by analyzing the structure of the document. The statistical similarity between words in a document shall not relate to a spe- cific user information request (e.g., a query). It specifically involves word relations that are meaningful to the document and can be inferred either based on the document itself or with the help of context of the document. We define the context of the document as where the word is located, such as lexical co-occurrences, in the document and as a set of related words, such collocations, phrases, and composite words in the way of linguistics. For example, the word "apple" has various meanings by itself without involving the context of the document. It is a kind of fruit based on its lexical meaning. If interpreted under a particular context (previous or next words of it, or the document topics or subjects), for example "apple pie", this "apple" is understood as the fruit apple. If the word fol- lowing it is "computer", this "apple" is known as a computer brand name or a company’s name. Thus, the semantic meanings of a word are hard to be determined precisely without consulting the context of the word. We believe that the semantic association of words not only relates to their lexical meaning in common, it also associates with their shared reference [21]. The topic or subject of discourse in a document is prospected to be able to reveal the context infor-

(23)

mation of the document. Based on this assumption, a sub research question is:

Sub-question:

RQ3.1: How does contextual information of the document affect topics and topical decomposition and how should the context be represented in topic models?

Regarding the coherence problem in a generated summary, the second sub question is:

RQ3.2: Can topics with contextual information be used as a valuable feature to identify sentence ordering?

Furthermore, as a kind of contextual information, learner’s pref- erences and interest might be helpful for identifying topics and determining the topic associations between documents. Thus, the third sub-question is:

RQ3.3: Are learner’s preferences and interest valuable factors for indicat- ing sentence relevance or importance in text summarization?

In order to answer the above research questions thoroughly, a deep literature review and detailed methodologies employed in this thesis are discussed in Chapter 2 and 3, respectively. Table 3.1 in Chapter 3 shows the associations among research questions, the research methods, the sections that discuss these questions, and the publications in this thesis that answer them.

1.3 STRUCTURE OF THE THESIS

This thesis contains seven chapters and seven publications, and is organized as follows. Chapter 1 introduces the motivation of this work and the background that gave rise to the research problem statement and research questions. Chapter 2 reviews literature in content processing for mobile learning, automatic text summariza- tion, and topic modeling techniques. Chapter 3 discusses the re-

search design and focuses on the methodology to answer those re- search questions from the technical perspectives. Chapter 4 sum- marizes the publications that are related to this thesis and explains my specific contribution in those publications. Chapter 5 discusses the design and implementation of a summarization system with our proposed models in details. Chapter 6 discusses the conducted experiments and summarizes the experimental results, followed by the findings and implications of the results. Finally, Chapter 7 sum- marizes this dissertation and provides the conclusions with the fu- ture perspectives followed.

(24)

mation of the document. Based on this assumption, a sub research question is:

Sub-question:

RQ3.1: How does contextual information of the document affect topics and topical decomposition and how should the context be represented in topic models?

Regarding the coherence problem in a generated summary, the second sub question is:

RQ3.2: Can topics with contextual information be used as a valuable feature to identify sentence ordering?

Furthermore, as a kind of contextual information, learner’s pref- erences and interest might be helpful for identifying topics and determining the topic associations between documents. Thus, the third sub-question is:

RQ3.3: Are learner’s preferences and interest valuable factors for indicat- ing sentence relevance or importance in text summarization?

In order to answer the above research questions thoroughly, a deep literature review and detailed methodologies employed in this thesis are discussed in Chapter 2 and 3, respectively. Table 3.1 in Chapter 3 shows the associations among research questions, the research methods, the sections that discuss these questions, and the publications in this thesis that answer them.

1.3 STRUCTURE OF THE THESIS

This thesis contains seven chapters and seven publications, and is organized as follows. Chapter 1 introduces the motivation of this work and the background that gave rise to the research problem statement and research questions. Chapter 2 reviews literature in content processing for mobile learning, automatic text summariza- tion, and topic modeling techniques. Chapter 3 discusses the re-

search design and focuses on the methodology to answer those re- search questions from the technical perspectives. Chapter 4 sum- marizes the publications that are related to this thesis and explains my specific contribution in those publications. Chapter 5 discusses the design and implementation of a summarization system with our proposed models in details. Chapter 6 discusses the conducted experiments and summarizes the experimental results, followed by the findings and implications of the results. Finally, Chapter 7 sum- marizes this dissertation and provides the conclusions with the fu- ture perspectives followed.

(25)

2 Literature Review

2.1 CONTENT PROCESSING IN MOBILE LEARNING

Researchers in mobile learning have proposed various content pro- cessing techniques in recent years. Some of the techniques focus on the presentation design of the content to fit the small screen of mobile devices [22]. Some others present the idea of intelligent con- tent strategy, namely "intelligent content", for automatically deliver- ing content to mobile learners [23]. Other content processing tech- niques consider re-creating the content by constructing new instruc- tional design frameworks, such as m-SCORM and semantically- aware learning objects retrieval [5]. These techniques, including many rapid authoring tools, have been built to support mobile con- tent development and delivery. Many of these may solve certain issues caused by the small screen, limited network bandwidth and storage spaces, and other similar restrictions of the mobile devices, but they do not align well with a significant affordance of mobile learning, which is the "on-the-fly" information access. In addition, none of them has provided an effective solution to improving the capabilities of exchanging content between multiple mobile plat- forms.

Based on the survey results in Quinn’s [24] research, the most desired mobile learning features has been found to be "single con- tent development for all mobile devices." As an open issue, the lack of standards of mobile platforms will not disappear in a short time;

consequently, the problem of the content exchange will likely per- sist in mobile learning for some time. Although it is extremely hard to standardize mobile platforms, it is still possible to find ways to enhance the capabilities of exchanging content between multiple mobile platforms. This is particularly true for plain text based con- tents because most mobile devices and mobile platforms can pro- cess plain texts without any problem. However, as discussed in previous section, processing a large amount of texts on mobile de-

Viittaukset

LIITTYVÄT TIEDOSTOT

They are in- spired from the behaviour of biological neurons that reach maximum eligibility for learning a short time after their activation and were mentioned in the con- text

This thesis presented the PULS text processing pipeline and described in detail several algorithms used for business news monitoring, namely grouping, multi- label classification

Going all the way from text processing to different ap- proaches to achieve natural language understanding in chatbots, it can be de- duced that the field of

As one of these creative approaches, Content and Language Integrated Learning (CLIL) enables the pupils to experience learning a language in real context. In short, CLIL is an

Keskustelutallenteen ja siihen liittyvien asiakirjojen (potilaskertomusmerkinnät ja arviointimuistiot) avulla tarkkailtiin tiedon kulkua potilaalta lääkärille. Aineiston analyysi

Neurophysiological In- dexes of speech processing defi cits in children with Specifi c Language Impairment. Comor- bidity of Auditory Processing, Language, and

Tuttua on sekin, että ››auto- matic data processing» on suomeksi automaattinen tietojenkäsittely. Yleistyvä ››text processing» on

On the other side, the Qurʾānic disciplines (ʿulūm al-Qurʾān) have developed a number of textual and hermeneutical tools to contextalize the text in view of the general