Mining of user correlationship in a mobile reading and social system by menez

Scientific Journal of Information Engineering April 2014, Volume 4, Issue 2, PP.38-43

Mining of User Correlationship in a Mobile Reading and Social System Yadong Fang1, Lei Zhang2, Jian Ye2,3# 1. Shandong Inspur Software Industry Co Ltd, Jinan Shandong 250011, China 2. Institute of Computing Technology, Chinese Academy of Sciences, Beijing 100190, China 3. Beijing Key Laboratory of Mobile Computing and Pervasive Device, Beijing 100190, China #

Email: jye@ict.ac.cn

Abstract Mobile learning not only enables a user to get what he wants to learn, but also provides an approach to improve the communication among users. Therefore, user correlation plays a very important role in finding resources with high relevance and potential friends of users. In this paper, a novel user correlation mining algorithm (UCMA) is proposed to analyze user’s reading history and interaction records. In order to get the overall user correlation, the algorithm introduces the feature of knowledge structure and strength of relationship between two users. At the end of this paper, the algorithm is evaluated with data collected from the prototype system. The result of the experiment shows that the proposed algorithm is feasible and effective in the calculation on the user correlation. Keywords: User Correlation; Mobile Learning; Knowledge Structure; Strength of Relationship

1 INTRODUCTION With the development of ubiquitous computing and Internet technologies, mobile learning has become an important application for users. We can easily get access to substantial electronic resources via iPad, Amazon Kindle or other smart devices. However, current learning technologies mainly focus on the sharing of learning resources and activities in a close structure, ignoring building up dynamic relationships between users and learning resources. Therefore, current researches make several attempts to solve this problem. However, the correlation between users has not been explored in recent works although it’s very important to both users and resource providers. In this paper, a novel user correlation mining algorithm (UCMA) is proposed to analyze user’s reading history and interaction records and to explore the correlation between users. The rest of this paper is organized as follows. Section 2 reviews related works on knowledge structure and user correlation and further points out the differences between ours and others. After clarifying some definitions, Section 3 outlines the architecture of UCMA. Section 4 describes user correlation exploration in detail. In Section 5, we evaluate the performance of UCMA using data collected by 80 volunteers over a period of 3 months. Finally, conclusions and future work insights are given in Section 6.

2 RELATED WORK Recent works mainly focus on the visualization of knowledge structure. [1] proposes a knowledge network model to promote intuition of researcher’s knowledge representation. The model includes knowledge points, knowledge stocks and relationship between knowledge points. However, it simply uses the number of words in intersection to represent the relationship between knowledge points, which is not accurate enough. [2] presents the CmapTools software as an example of how concept maps, a knowledge visualization tool, can be combined with recent technology to provide integration between knowledge and information visualizations. [3] proposes a concept mapping tool named VCE to create dynamic concept maps for users and facilitates users visual interaction to concepts and documents. Its central idea is to make implicit knowledge structures explicit. - 38 http://www.sjie.org

User correlation is widely used in collaborative recommender system. Given user-item matrix, [4] uses person correlation or vector cosine based similarity to calculate the similarity between two users. Then user-based collaborative filtering algorithm can be used for recommendations. [5] Proposes using asymmetric similarity measure to identify a neighborhood whose traits are strongly similar to those of an active user’s behavior. Thus the possibility of generating irrelevant recommendations can be reduced. There are two major differences between our work and the techniques. One is that we use transaction paths of related KPs and residence time at each KP to calculate the knowledge structure similarity. The other is that we get the overall user correlation through a comprehensive consideration of their knowledge structure and strength of relationship.

3 ALGORITHM ARCHITECTURE We define some terms related with knowledge structure and use them to model knowledge structure and further explore user correlation. The architecture of our algorithm will be briefly described later in this section.

3.1 Preliminary In this subsection, we will clarify some terms, including knowledge point, related knowledge point, knowledge point correlation, transaction path of related knowledge point, residence time at knowledge point. Knowledge Point (KP): A KP is usually a key figure, an important event or terminology. It’s predefined by experts or extracted from learning resource using text mining. For each KP, we use text and pictures to depict its detailed information and we also define related audios and videos for it. Related Knowledge Point: For each KP, we define its related knowledge points. Related KP is also predefined or extracted from learning resources using text mining. Knowledge Point Correlation (KPC): Suppose ki and kj are related KPs, the KPC between them is asymmetric. That means KPC of ki to kj is not the same as that of kj to ki. The reason is that they don’t share the same related KPs. When calculating the KPC of ki to kj , we not only consider the times of visiting KPs from ki to kj , but also take into account the whole knowledge network. Transaction Path of Related Knowledge Point: If you are interested in one KP, you can get access to its detailed information. You can listen to related audios or watch related videos to deepen the understanding of it. More importantly, you can carry on visiting its related KPs. Residence Time at Knowledge Point: The residence time at knowledge point is not just the time you spend on browsing its detailed information. We also take into consideration the correlation between KPs. For example, residence time at ki is not only the time you browsing ki, we also plus the browsing time of its related KP kj multiplied by certain coefficient. The coefficient is the KPC of ki to kj as we mentioned above.

3.2 Architecture of UCMA Traditional standards related to learning resources are limited to the sharing of materialized learning resources. They do not consider the resource of human connected by such learning resources. Thus, it is an important issue to be solved to build a cognitive network computing model based on user interaction and learning process and realize the sharing of dynamic social cognition network. In our work, we get user correlation through a comprehensive consideration of their knowledge structure and strength of relationship. The architecture consists of three processes: knowledge structure similarity calculation, interaction correlation calculation and user correlation measurement.

FIGURE 1: KNOWLEDGE STRUCTURE SIMILARITY CALCULATION - 39 http://www.sjie.org

Learners browse KPs and form their own knowledge structure. As shown in Figure 1, the calculation of knowledge structure similarity (KSS) consists of four phases: browsing history representation, KP correlation calculation, residence time calculation and finally KSS calculation. If a user gets interested in certain learning resource, he can make use of the communication platform we provide to exchange ideas with others. We can get user’s interaction correlation from users’ interaction history. As shown in Figure 2, user interaction correlation (UIC) includes two parts: statement correlation (SC) and private chat correlation (PCC). SC is drawn from user’s statements he makes in public chat room, while PCC is drawn from the times that he has private chat with others. Finally, we combine KSS, SC, PCC together and assign a weight to each factor to get the overall user correlation.

FIGURE 2: USER CORRELATION EXPLORATION

4 USER CORRELATION EXPLORATION The process of user correlation exploration is carried out in three steps: KSS calculation, UIC calculation and correlation measurement. Each step will be described indetail in this section.

4.1 KSS Calculation We select any two users, UA and UB, to calculate KSS between them. The set of books UA has read is BookA = {bA1 , bA2 . . . , bAnA}, where nA is the number of books UA has read. The set of books UB has read is BookB = {bB1 , bB2 . . . , bAnB}, where nB is the number of books B has read. The set of books both UA and UB have read is Bookcom = {b1, b2 . . . , bncom}, where ncom is the number of books both UA and UB have read. For each book in Bookcom, we will calculate the KSS. For instance, the KSS of book bk between UA and UB is KSSbk(A,B). Then KSS of UA and UB is KSS (U A , U B ) 

n  k 1 KSS b (U A , U B ) com

nA  nB  ncom

(1)

In order to get the overall KSS of UA and UB, we apply three steps as follows. Step 1: KP correlation calculation. We can obtain all the KPs of book bk. Consider KP as a node in the graph, if two KPs are related, we draw an edge to connect them. Then KPs together can form a knowledge network graph. We use random walk with restart (RWR) algorithm[6] to calculate the correlation between two KPs. RWR algorithm starts from one node in the graph and random walk across the edge. At any node, the algorithm randomly chooses an adjacent edge with a definite probability and moves across to the next node or returns to the starting point. For a non-periodic irreducible map, the probability of reaching any node reaches a stationary distribution after limited times. And another iteration will not change the probability distribution. Then the probability of reaching each node in the graph can be regarded as a degree of relevance with the starting point. RWR model can be represented as t+1 t c  = (1 - a)Sc  + aq

(2)

In the above equation, matrix c(t+1) is the probability distribution in the graph by the t step. Matrix q is the initial state and it’s a diagonal matrix with “1” in the main diagonal. S is the transition probability matrix. Si,j represents the node is at node i at present and the next step it will move to node j. Si,j is outlined in equation (3). Si, j 

Freq(i  j ) (i  j ) Freq( j )

- 40 http://www.sjie.org

(3)

Freq (i→j) is the number of times user gets access to KP j by visiting KP i first. Freq(i) is the number of times users getting access to KP i. If KP i and KP j are not related KPs, Si,j is zero. a is the restart probability. For a non-periodic irreducible map, after many iterations, (2) gets convergent. Then the correlation between KP i and KP j is represented as Cor (i, j )  c (i, j )

(4)

c+∞(i, j) is the probability from KP i to KP j when it reaches a stable distribution. In our work, we find that when t = 10, c(t) in equation (2) has already got convergent, so we set the value of t to 10. Step 2: Residence time calculation. Calculate UA and UB’s KP residence time according to their transaction paths when they read book bk. For instance, when UA gets access to KP i, his residence time at KP i is represented as

t+Ai = t Ai +  Cor(i, j)t Aj

(5)

In the above equation, j is KP i's related KP and t Ai is the time user UA spends on browsing KP i. Similarly, t A is j the time user UA spends on reading KP j. Take the following graph for example, UA first browses KP i, then KP y, then returns to i, and goes ahead to browse k and then k’s related KP x. i→k→x ↓ y * So user A’s residence time at KP i is t Ai  t Ai  Cor (i, y)t Ay  Cor (i, k)t Ak , while t*Ak  t Ak  Cor (k, x)t Ax . If KP i appears in another transaction path, we just add the two t A to get the final t A i

In equation (5), the reason that we use t A rather than t A lies in two aspects. First, it’s very likely that i is also j’s j j related KP or related KP of j’s related KP. UA could browse i once again in the same transaction path. It will make the calculation of t A much easier if we consider two adjacent KPs instead of such a long transaction path. Second, i the correlation of related KPs is a value between 0 and 1, sometimes much smaller than 1, so with the transaction path getting longer, the latter related KPs have less influence on previous related KPs. Step 3: Single book KSS calculation. Suppose both UA and UB have read book bk, then the set of KPs UA have read in book bk is KA = {kA1 , kA2 . . . , kAmA}and the set of KPs UB have read in book bk is KB ={kB1 , kB2 . . . , kBmB}. mA is number of KPs UA has browsed and mB is number of KPs UB has read. The set of KPs both UA and UB have read is Kcom = {k1, k2 . . . , kmcom}. mcom is the number of KPs UA and UB have both read. Then KSSbk(A,B) can be represented as  t*A t*B i , i  t*B t*A i  i mA + mB - mcom

m  i 1 min  com

KSS bk (U A ,U B ) 

   

(6)

In the above equation, mA + mB - mcom is the number of KPs UA and UB have read in all. t A and t B separately i i

 t*A t*B  represent UA and UB’s residence time at KP Ki ( Ki∈ Kcom ) min  * i , * i  is the bigger one of t A and t B divided i i  tB t A  i   i by the smaller one, which expresses the similarity of residence time UA and UB spend on KP Ki.

 t*A mcom  i 1 min  * i  tB  i

t*Bi  ,  represents the overall similarity of residence time UA and UB spend on the mcom KPs. After we t*Ai 

get KSSbk (UA,UB), we can use equation (1) to get the overall KSS of UA and UB. - 41 http://www.sjie.org

4.2 UIC Calculation UIC includes two parts: SC and PCC. Suppose the set of CRs UA has joined is CRA = {crA1 , crA2 . . . , crAlA} and the number of statements user UA gives is respectively sAi(i = 1, 2 . . . , lA). lA is number of CRs UA has joined. The set of CRs UB has joined is CRB = {crB1, crB2 . . . , crBlB } and the number of statements user UB gives is respectively sBi(i = 1, 2 . . . , lB). lB is number of CRs UB has joined. The set of CRs both UA and UB have joined is CRcom = {cr1, cr2 . . . , crlcom}. lcom is the number of CRs both UA and UB have joined. Then SC of UA and UB can be formulated as the following equation: SC(U A ,U B )=

 (lcom ) l A  lB

lcom

  min(sAi ,sBi )

(7)

i=1

α(lcom) is a coefficient depending on lcom. It will be enlarged with lcom increased. For instance, in our experiment, we find that when (lcom) = lcom2, the measure we propose achieves a high performance. Dividing the correlation by the factor lA·lB is motivated by the problem of unbalanced data of users. We use the number of times UA and UB having a private chat to represent PCC (UA,UB). As UA joined a lot of CRs with other users, we use statement correlation ratio (SCR) to represent the influence of UB to UA compared to other users. Suppose the set of users who have ever joined in the same CR with user U A is ISCRA = {ISCR1, ISCR2 . . .}, then SCR(U A ,U B )=

SC(U A ,U B ) (X  ISCR)  SC(U A, X)

(8)

Likewise, UA has a lot of private chat with other users, we use private chat correlation ratio(PCCR) to represent the influence of UB to UA compared to other users. Suppose the set of users who have ever had a private chat with UA is HPCA = {HPC1, HPC2 . . .}, then PCCR(U A ,U B )=

PCC(U A ,U B ) (Y  HPC)  PCC(U A ,Y)

(9)

4.3 Correlation Measurement When calculating user correlation, we take into account the above three factors: KSS, SCR and PCCR. To combine three factors together, it is necessary to determine which factor functions more importantly. Thus, the weight of each factor is needed. Equation (10) shows the completed formula of user correlation and λ1, λ2, λ3 denote the weights. Correlation(U A ,U B )= l1  KSS(U A ,U B )+l2  SCR(U A ,U B )+l3  PCCR(U A ,U B )

(10)

In our work, we conduct an online survey to get the weights. 100 users rate how important they think each factor is on a five-point scale. If the user thinks one factor is of very little importance, he gives it the score of 1. And if he thinks the factor is very important, he will score it as 5. Table 1 shows the result. From the result, we can getλ1 as 4.29, whileλ2 is 2.14 andλ3 is 3. TABLE 1: RESULT OF ONLINE SURVEY TO OBTAIN THE WEIGHTS

KSS Total

SCR

PCCR

Sum

Avg

Sum

Avg

Sum

Avg

429

4.29

214

2.14

300

5 EXPERIMENT We develop our prototype system based on Android platform. We randomly select 8 users from our system. These 8 users form a user group UG = {U1,U2 . . . ,U8}. Then we summon 80 volunteers to conduct the experiment. For each user in UG, we use our approach to calculate the correlation between him and other users in the system. Then N users with the largest correlation are selected to form a new group UG’= {U’1, U’2 . . . ,U’N}. Here, N equals 7. So - 42 http://www.sjie.org

there are eight groups in all. We make a comparison between two kinds of methods. One is we only consider KSS and we call it KSS method. The other is we consider both KSS and UIC and we call it KSS-UIC method. Thus, we can get two kinds of nDCG. The result is presented in Figure 3. Note that in a perfect ranking algorithm, nDCG is 1. The mean value of nDCG in KSS-UIC method is 0.94, which is very approximate to 1, so the performance of KSS-UIC method is very good. Meanwhile, the nDCG value in KSS-UIC method is larger than that in KSS method in average. In other words, it really brings advantages by introducing UIC.

FIGURE 3: COMPARISON OF NDCG BETWEEN KSS-UIC AND KSS METHOD

6 CONCLUSIONS In this paper, by considering user knowledge structure and strength of relationship, we propose UCMA to mine the correlation between users. We use the KPs user has read and residence time at each KP to represent his knowledge structure. By assigning different weights to KSS, SCR and PCCR, we get the overall user correlation. In the future, we intend to extend our work in two directions. First, we aim to improve the performance of UCMA by taking into account new features, such as residence time at each page or chapter and dynamic feedback of the weights of KSS, SCR and PCCR. Second, we would like to develop new applications, such as personalized friend recommendation and learning resources recommendation.

7 ACKNOWLEDGEMENT This work is supported by the National Natural Science Foundation of China (61070109) and Opening Project of Beijing Key Laboratory of Mobile Computing and Pervasive Device.

REFERENCES [1]

Jianbin Sun, Pengzhu Zhang. Visualization of Researcherâ&#x20AC;&#x2122;s Knowledge Structure Based on Knowledge Network[A]. International Congress of Inborn Errors of Metabolism[C]. San Diego, California: SIMD Press, 2009, 2067-2071

[2]

Alberto J.Canas, Roger Carff, Greg Hill, et al. Concept Maps: Integrating Knowledge and Information Visualization[J]. Knowledge and Information Visualization, 2005, 3426: 205-219

[3]

Xia Lin, Yen Bui, Dongming Zhang. Visualization of Knowledge Structures[A]. IEEE International Conference on Information Visualization[C]. Sacramento, California: IEEE Press, 2007.476-484

[4]

Xiaoyuan Su, Tag hi M. Khoshgoftaar. A Survey of Collaborative Filtering Techniques[J]. Advances in Artificial Intelligence, 2009: Article ID 421425

[5]

Marta Millan, Maria Trujillo, Edward Ortiz. A Collaborative Recommender System Based on Asymmetric User Similarity[A]. Proceedings of the 8th international conference on intelligent data engineering and automated learning[C]. Birmingham, UK: Springer, 2007(4881), 663-672

[6]

Hanghang Tong, Christos Faloutsos, Jia-Yu Pan. Fast Random Walk with Restart and Its Applications[A]. International Conference on Data Mining. Las Vegas, USA: IEEE Press, 2009, 613-622 - 43 http://www.sjie.org