服务承诺

资金托管

原创保证

实力保障

24小时客服

使命必达

关于我们

51Due提供Essay，Paper，Report，Assignment等学科作业的代写与辅导，同时涵盖Personal Statement，转学申请等留学文书代写。

51Due将让你达成学业目标

名企实习

私人订制你的未来职场世界名企，高端行业岗位等在新的起点上实现更高水平的发展

积累工作经验

多元化文化交流

专业实操技能

建立人际资源圈

The improvement and application of collaborative filtering algorithm--英国Essay代写范文

2016-10-10 来源: 51Due教员组类别: Essay范文

留学生计算机作业Essay代写范文:“The improvement and application of collaborative filtering algorithm”，这篇论文主要描述的是随着电子商务时代的来临，电子商务就是生产者与消费者直接进行营销最好的途径，采用商品个性化的推荐方式能够让客户有一个很有趣的购物体验，也是一种高效率的营销方式，商品的个性化推荐就涉及到协同过滤算法的改进与时间，这种算法的改良能够很好的提高用户的购物体验。

essay代写,协同过滤算法,留学生作业代写,Collaborative Filtering recommendation,论文代写

With the rapid development of electronic commerce, the consumer facing commodity recommendation now is the best project to do fast marketing. This article uses collaborative filtering algorithm to do personalized commodity recommendation. A improved collaborative filtering algorithm proposed in the article with the objective to improve user experience. The algorithm superiority also proved by the experiment over some actual data from a electronic business website.

I. INTRODUCTION

Commodity personalized recommendation is based on the analysis of customers’ interesting, and then push the commodity or promotion which the customers may interest in by webpage push, email, simple message and other mediums. Commodity personalized recommendation is a high efficient marketing method, and favored by current electronic commerce[1].

Support variant service and content to different customers is the key function for personal service. When it comes to electronic commercial website, to get the consume temptation of customers via the customers’ access record is the target of individual, and this personalized recommendation can bring customers high quality experience[2].

The main content of this article is to do commodity filter using collaborative recommendation method. In technically view, collaborative recommendation is to get the customers’ interest by the analysis on the recently access record of some specific customers[3], such as two customers which usually purchase similar pattern clothes are generally belong to a same kind of consumer group.

II. COMMODITY PERSONALIZED RECOMMENDATION

The method of commodity personalized recommendation

Internal memory based collaborative filtering method:

The core ideology of internal memory based collaborative filtering method is import the consumers’ consume record to the adjacency matrix,

and then get the current consumer’s interest via the analysis on the interest of approaching consumers.

Model based collaborative filtering method:
The core ideology of model based collaborative filtering method is to imitate consumers’ consuming behavior based on mathematical model, the consumers’ interest is used as the determining factors, and then some established rules will be used to forecast the impending purchasing behavior. For this method, clustering technology in data mining is used to generate the commodity categories; the features of a specific categoryare indicated by the date in the center of cluster.

Collaborative filtering algorithm

In collaborative filtering algorithm, the similarity of target commodity and existing cluster should be firstly calculated, and then the recommendation commodity will be searched from the cluster with the high similarity. Then searching method is used adjacency matrix to find the commodities have highest user rating, the concrete method is showing below:

A. According to target commodity, calculate the adjacency matrix
As the commodity categories are generated by clustering method, according to the features of cluster, the commodities have high similarity with target commodities usually exist in the adjacent commodities of this cluster. These commodities and the similarity of target commodities are used to form the adjacent matrix, and then treated as the searching space for recommend commodities.
B.To generate recommendation commodities on the basis of consumers’ score

The matrix method can be used to quantify user rating. The matrix method can not only describe the rating of a consumer to the specific commodity directly, but also can do analysis on the customer rating. Generally speaking, stands for an matrix with m row vectors and n column vectors, such as:

In the above matrix, the m row vectors stand for m customers, n column vectors stand for n commodities. The arbitrary element in tmatrix stands the rating value of consumer i to commodity j.

The similarity of two commodities is getting from the consumer rating of these two commodities. The project of m dimensions is used to indicate the rating vector, that is to say, the rating vector of commoditiesi and j are:

Two commodities’ similarity is defined by the cosine value of the vector:

The cluster calculationis based on the similarity of consumer rating and commodities and the established commodity cluster, and the objective of this calculation is to guarantee higher similarity of the same category commodities but lower or no similarity for the commodities of different categories.

A.Establish the consumer interest mode

As the consumer interest data is existed in the form of webpage access record, so the keywords vector is chosen to stand for the consumer interest. The presentation method is: stands for the ith keywords in the indication document, v stands for the weight of the keywords in the document, and then a document with n keywords can be presented as . Theweights of keywords in the document usually calculate out by the approaching degree of keywords and commodities.

B. Catch consumer interest

Catch consumer interest here means to get the dynamic status of consumer interest based on thecalculation of the category difference of commodities. In order to distinct the current interest and previous interest of consumers, the system needs to divide the consumer related clusters into current clusters and previous clusters. Meanwhile, with the purpose to reduce the previous consumer data, the system needs to set an time interval as the reference window, the data will be effective data only if the data lies in the reference window.

Table 1 shows the consume record to customer user1 in the past year. Every three months’ data was used as original cluster, so there are 10 clusters for this example. In the following table, 1 stands for consumer interest commodity after the clustering analysis based on the previous data, while 0 indicates that no interest commodity was caught by the system.

Table 1 the consume record of user1 in the past year

Month Commodity category 1 Commodity category 2 Commodity category 3

1-3 1 0 0

2-4 1 0 0

3-5 1 1 0

4-6 1 1 1

5-7 1 0 0

6-8 1 0 1

7-9 0 0 1

8-10 0 0 1

9-11 1 0 0

10-12 1 0 0

According to the consumers’ cluster consume record, system analysis which categories of commodities were the consumers paid attention on. Take user1 as an example, category1 commodity appeared for the continued six months, which indicates the consumer has strong interesting to this category commodities. In order to calculate the consumer interest degree, Jaccard coefficient which proposed by Han and Kamber in 2001 was employed by this system to indicates the difference between different categories, that is the diversity degree of consumer chose one specific category of commodities instead of other categories of commodities.

stands for the difference of commodity i and j. In above formula, r is the account of commodities which have 1 value in commodity i but 0 value in commodity j; s is the account of commodities which have 0 value in commodity i but 1 value in commodity j; q is the total account of commodities which have 1 value in both commodity i and j. From above definition, the higher value of indicates the higher difference between commodity i and j.

C. Build collaborative filtering algorithm architecture

There are four elements – consumers U, commodity item I, consumer rating P and the commodities aggregation recommend to consumers L – to build a collaborative filtering system. The process to build collaborative filtering algorithm can divide into the following steps:

First step, build consumers rating matrix according to the customer consume record

Here, matrix is defined as the consumer rating matrix; the element stands for the rating of jth commodity by ith consumer.

Second step, based on the consumer interest matrix to calculate its adjacency matrix

Cosine similarity is used to calculate the adjacency set to current consumer interest:

To the similarity of commodity a andcommodity b, stands for the rating of commodity a by ith consumer, while stands for the rating of commodity b by ith consumer.

On the basis of this method, the system will calculate the similarity of target commodity and the items in the consumer rating matrix which generated in first step, and the result is used to calculate current consumer interest matrix’s adjacency matrix.

Third step, generate recommendation candidate set with k commodities

To the adjacency matrix generated by second step, system tries to find the most possible commodities according to consumers rating, that is to say, the system forecasts the possible rating of consumer for assumed target commodity. The definition forecast rating of consumer i on commodity category j:

According to this method, system calculate the consumer rating of consumer i to all the candidate commodities in adjacency matrix, and first k commodities with highest consumer rating will be chosen as the recommendation commodities.

D. Weighted Operation Strategy for Consumer Interest Matrix

Based on the collaborative filter frame which constructed in previous part, the weighted operation should be done on the commodities in the consumer interest matrix. The objective of weighted operation is to distinguish which kinds of commodities have more attraction to the objective consumers, and no need to consider other commodities.

The commodities sales records which saved in the database is the fundament for the weighted operation on consumer interest matrix. Assignment operation will be done on the commodities in the consumer interest matrix based on the commodity purchase frequency and relevance, and as the result, the commodities which consumers are most likely to purchase will be have higher weigh in the current consumer interest matrix.

Due to the complexity of matrix weighted operation and based on the features of collaborative filtering algorithm, a new consumer interest matrix weighted operation was proposed by this article. The main process of this matrix weighted operation is showing below:

Firstly, the column vectors in consumer interest matrix were selected as the original elements of weighted matrix, this is to say the original weighted matrix L(k) has k elements which calculated from k commodities in consumer interest matrix. after the existed commodities purchased set in the database set as training set – C, the matrix L(k) will be traversed according to the given minimum support – S and reliability– T. The weighted operation is just as the following steps shows:

①When the average value of subset of is less than the minimum support S, this subset will be set as candidate set.

②The transposed matrix average value of subset of is greater than the reliability T, this subset will be set as candidate set.

③When the results of ① and ② come into existence, then a comparison will be done on the current subset and training set C, and the highest average value of C will be add to the elements in subset.

Calculation will be started only if the above three condition fulfilled over the current matrix, otherwise the transposition operation will be done on current matrix and set as the candidate training set waiting for the operation of the next consumer interest matrix.

Need to improve, the objective to do transposition operation on current matrix and its subset is to do the comparison operation on the commodity and commodity of the same kind of commodities in the next step, and the comparison operation on commodity and commodity from different commodity categories will be meaningless.

E. Consumer Interest Targeted Forgetting Rule

After the weighted operation on consumer interest matrix, the algorithm should to judge whether the consumers still have interesting with commodities which they paid attention on according to access time series of commodities. If the consumers did not access a commodity for a long time, the consumers did not have interest with this commodity as before. The key step for the algorithm is a forgetting rule for consumer interest commodities.

For a weighted consumer interest matrix the forgetting rate of consumer on current commodity is , the value of forgetting rate usually set as an empirical value and in the range of . For a specific moment , the definition of forgetting rate of consumer on commodity :

In the above definition, stands for the weighted value of commodity in consumer interest matrix.

When traverse the consumer interest matrix in moment (the traverse in moment is seen as thetth traverse of the matrix), the algorithm judge the forgetting rate of the commodity in the matrix and then saves in the stack. The recommendation possibility of commodity to consumer is:

F. Nonlinearity Forgetting Matrix Based Weighted Training

As the forgetting possibility of commodity in the consumer interest targeted forgetting rule definition has the feature of time nonlinear, the algorithm can avoid the continually accumulation of commodities of same category and has effect on other commodities into the range of recommendation range. A new weighted training method was proposed by this article based on the consumer forgetting matrix.

The main process of this new algorithm is to inspect the weighted matrix of some specific commodities categories randomly based on the recommendation possibility of commodity . When doing weighted operation on the commodities in the consumer interest matrix, the algorithm sufficient consider the value of , when stands for the consumer accept the recommendation commodity absolutely; while stands for the recommendation commodity was absolutely rejected by the consumer. So when , the algorithm calculates the consumer interest matrix and its weighted matrix to the get the largest from the weighted matrix. Meanwhile, in order to avoid the premature convergence of recommendation commodities, when the value of reached the pre-set value , the value of will be reset follow the below way:

Therein, the stands for the calculated account of current commodity for the current moment, while stands for the random value of region .

Based on the above method, if sets as 5, the algorithm can avoidrecommending a commodity to the consumer in 5 calculation cycles when traverse the costumers consume records. The actual significance is set the commodity consumed by the costumers as alternative set instead of the recommendation commodities to the consumers.

IV. ALGORITHM EXPERIENCE AND ANALYSIS

Some actual data from a special electronic business website was chosen to do the verification experiment of this improve algorithm, the consumer purchasing data should be first collected, the data items is showing in the following table.

Table 2 data sheet of consumer activity information

No. Field Type Meaning

1 ID Number User identifying

2 Buy Boole Purchase or not

3 Product Number Commodity identifying

4 Price Int Commodity transaction price

5 Type Char Commodity type

6 Time Time Purchase date

7 Score Int Commodity rating by consumer

… … … …

Contrasting approaches is adopted by this experience. One situation is using the forgetting factor based collaborative filtering method proposed by this article; the other situation is the common K-means method. Two experiments are done on the consumer access records of the e-business website (2982 records) and a comparison will be done on the results of these two experiments, the quantized commodities are described by the format of node. Firstly, the node s which has the highest recommendation possibility in all the nodes will be set; and this node will be set as the source node along to the method mentioned above to calculate consumer interest matrix vectors. The elements in the vectors stand for the effect of this node to its relative nodes. Then node t will be find which has smallest effect by node s. Then, s and t will be set as the initial cluster center to execute K-means cluster process. Hierarchic clustering method will be used again to verify whether the classified clusters are suitable. The partition result of commodity cluster is showing in the following figure. It is easy to find partition results of these two methods are quite similar.

Figure 1 Comparison of clustering results

Ward standard is used to check the rationality of cluster algorithm, to measure offset of cluster element and class element. To the cluster with n elements, the offset of cluster can be defined as:

Meanwhile, the offset between different clusters can be defined as:

The overall cluster offset is the sum of offset in cluster and offset between clusters, that is:

Here, chose indicator data to measure the merit of cluster, the definition is:

The bigger of means little offset in cluster, but high offset between clusters.

To the two experiments in this article, the pre-set similarity threshold value , the process data of these two experiments is showing in the following table.

Table 3 the process data of experiments

Original algorithm Improved algorithm

Account of cluster k 6 7

Offset in cluster w 1.422 0.557

Offset between cluster B 92.55 94.54

Value of R2 0.9849 0.9941

Just as the data showing in above table, the improved algorithm generate more cluster than original algorithm, meanwhile, the cluster offset of improved algorithm is lower than original algorithm which the commodities in same cluster are similar commodities.

Ⅴ. CONCLUSION

Collaborative filtering algorithm has well performance when do analysis on objective attributes. As the consumer access records are an important reference for the recommendation commodities for the consumers, a forgetting factor based collaborative filtering algorithm was proposed in this article. The improved algorithm overcomes the shortage of premature convergence of commodities recommendation process, and done cluster comparison for the consumer purchased commodities combined with quantize commodities recommendation method. The improved algorithm can be used to recommend commodities to the consumers.

ⅥRELATED WORKS

Tapestry method is one of the classical collaborative filtering methods which is a grade based automatic recommendation method; GroupLens system supports innominate collaborative filtering method which improved the knowledge posterior procedure of collaborative filtering process and enhanced the positioning accuracy[4]; meanwhile, the development of consumer interest model enlarged the application range of collaborative filtering algorithm including machine learning, Bayesian Network, neural network, fuzzy set and others, the improvement of consumer interest model speeded up the application of collaborative filtering algorithm in business area[5]. In the further development, consumer group oriented model, the integration of short term and long term consumer interest, the visual technology in user modeling can be seen as the trend of further research[6].

51due留学教育原创版权郑重声明：原创留学生作业代写范文源自编辑创作，未经官方许可，网站谢绝转载。对于侵权行为，未经同意的情况下，51Due有权追究法律责任。

51due为留学生提供最好的服务，亲们可以进入主页了解和获取多伦多大学代写的相关资讯提供加拿大paper代写以及美国作业代写辅导服务，详情可以咨询我们的客服QQ:800020041哟。-xz

上一篇：The Great Leap Forward movemen 下一篇：SNOW Mountain Resort: The Prid

代写范文——Essay范文

代写范文

留学资讯

写作技巧

论文代写专题

服务承诺

关于我们

名企实习

The improvement and application of collaborative filtering algorithm--英国Essay代写范文