Diffusion Protocol by Content Characterization
Pierre Agostini
Centre National d’Etudes des Telecommunications - France Telecom
Abstract
This paper prents the Internet Content Profile Diffusion Protocol (ICPDP), which allows Web contents to be diffud by multicast in keeping the maximum of control on the receivers' side. Indeed, a lot of Push solutions aim to automatically nd information to the urs who are often conquently overloaded. On the contrary, ICPDP us a principle of proposition and dialog. For that, to start with, it is not the complete content package which is nt, but only the thematic profile of this content, extracted from a mantic analysis. Then, each receiver can compare this transmitted profile with its own profile and decide if the content is interesting or not. In this way, the receivers have an interactive control of what is nt. However, the final decision is up to the receiver host which controls the quality and the quantity of the information that it receives.
This protocol can be ud as a new advertising medium for the web rver, to establish a cooperation between proxies-caches, or as any diffusion application.
Keywords
奄奄一息拼音
Diffusion, multicast, satellite, protocol, content, profile, mantic.
1. Introduction
Due to the structure of the Web, Internet surfers must always actively arch for information. Conquently, a current challenge on the Internet and the Intranets is to nd interesting information to urs without their active participation. Multicast and Push are the beginnings of a solution. However the risk is to overload the net and to nd too much unnecessary information to urs. So the difficulty is to target the really interested urs before nding the content.cat的音标
With this aim in view, this paper will prent a new protocol of the application level, called Internet Content Profile Diffusion Protocol (ICPDP). The principle is to determine a "content profile" before nding it. The content profile is a characterization of what will be nt. So the receivers have just to interpret the profile. Then they can decide alone and automatically if the content package interests them or not and if they want to retrieve it.
In this protocol, nders and receivers can be Web rvers, proxy-caches, or urs who want diffu
or receive information. So the ICPDP protocol opens a new way for information diffusion or cooperation between the different elements of the Web. Moreover, the treatment of
information and the "content profile" calculation can be more or less automatic.
So, we are first going to make a brief description of the functional architecture. Then, we will explain what a profile is and how we can obtain it automatically. In a third part, we will make a prentation of the ICPDP protocol. In the last part, we will give examples of application and the hoped-for results of this method.
2. Functional architecture
This chapter aims to prent the different functional process of our diffusion system. The process are illustrated on the next picture.
Fig. 1 : functional architecture
Profile building
As we have en in the introduction, the ICPDP protocol determines a content profile before nding it. Then, the receivers (urs, communities, proxies-caches, web rvers, …) compare this profile with their own profile. So, the first process aims to build the different profiles. For that, we can u a human valuation or an automatic calculation. The next chapter will prent what a profile is and how we can obtain it automatically in using a multimedia document model.
Diffusion by ICPDP
As soon as the profiles have been calculated, the transmission process can start. The diffusion us the ICPDP protocol explained in chapter 4.
Content Treatment
After the diffusion, the interested receivers store the transmitted content locally and can treat it in function of their needs. Different examples of content treatment are prented in chapter 5 called "Using modes".
3. Profiles and multimedia document model
As we saw, the first process before starting the ICPDP diffusion is to determine the transmitted content profile as well as the receiver profile. So, in this chapter, we will define what a profile is and how we can calculate it automatically.
3.1. First definition of a Profile
To start with, we have to define briefly what the profile of a document is.
For a given textual document, its profile is a vector in which each line reprents a characteristic word and each value, the importance of the word in the document. In other words, the document profile is the vector of the document expresd in the vectorial space of words. So, when we transmit a content profile, we nd each word from this space of words with its associated values.
On the next picture, we prent a profile vector of a web document related to cars.
Fig. 2 : Vector of a web document
Regarding the profile vector, the two most important points are :
the scale and the chon interval for the values is not important becau the vectors will
be normalized. So, only the differences between the values is significant. the dimension
of the vector is not fixed.
q Every word can appear or not. If a word does not appear, its associated value is t at
zero.
q 3.2. Extended definition of a Profile
By definition, when we add two vectors which are expresd in the same vectorial space, we obtain a third vector expresd in this vectorial space too. So, as all the document profiles are expresd in the word space, the profile vector of a document t is defined on this same basis.In fact, we decided that the "profile vector of a document t" is the barycentre of its included
document profiles.
Fig 3: Example of barycentre determination
But the notion of profile can be extended again. We consider that the profile of a textual
百家讲坛视频全集document can be extended to the multimedia components included in this document. So, an
embedded image, a video, ..., will have the same profile as the textual document which contains them.写我的作文
In our context, we want to diffu a content package, using the ICPDP protocol, when the
紫薯的营养价值
content package is a t of web documents including multimedia components. Conquently,we define the "content profile" as the barycentre of all the profiles of its included documents and components.
Moreover, we can determine a "ur profile" too. For that, we only need to know the web pages that the ur has viewed. Then we can calculate the barycentre of the web page profiles and obtain the "ur profile".
3.3. Multimedia Document model
The notion of profile has been extracted from a larger mathematical model reprenting multimedia
web components (web pages, embedded image, video, …) in a multidimensional algebraic space. Here, we give only a brief overview of the model we u. But the complete version is available [1] for more details. We u this model to automatically calculate the profiles.
The learning process
The basic principle of this model is to analyze textual documents to automatically create a mantic network, i.e. a network of links between words.
For that, we can par a t of textual documents (html pages, text document, Word document,…). In each of them, we extract a group of significant keywords by studying the occurrences of each word. Then we link the keywords in pairs. If there is no previous link, we create a link with a weight equal to 1. Otherwi, we increa the weight of the existing link by 1. Progressively, not only the number of words increas but also the links between words. Conquently , a mantic network is created which becomes richer by analyzing a large amount of pages and by updating the weight of the links.
Mathematical model of a word
Let's take all the words in the mantic network. We u all the words as a basis for a vectorial space. So each word can be described by all its linked words (cf. figure 4).
吃汤圆作文>刺梅花图片Fig. 4: Vector of the word "Car"
For example, the word "Car" can be described by the words "motor", "wheels", … which are linked to it. The importance of the link between two words is reprented by the weight which has been calculated by the learning process. If the link between two words doesn't exist, the weight and the coordinate in the vector is considered null. So each word can be reprented by coordinates in the complete word basis.
This view can be compared to the logical approach in the mantic linguistic [14] where each concept is described by its elementary component (car = vehicle + motor + wheels + …).Automatic calculation of a document profile
A document is a t of words. And as we saw, each word can be expresd in the full word space. So the profile of a document can be automatically defined in three stages :
first, we par the document to extract the keywords,
恍如隔世q cond, we extract the vectors of the keywords from the databa of the model,q third, we define the document profile as the barycentre of the keyword vectors. In order to calculate the baryc
entre, we attribute a coefficient for each word equal to the frequency of the word in the document.
q 3.3. Profile characteristics
Distance Calculation
The most interesting feature, of our profiles is that they can be expresd in only one vector space. It permits the profiles to be compared easily.
For that, we have to calculate an Euclidean distance between two vectors associated with two profiles. So, if there are n words in the basis and if we compare two profiles P1 and P2, then the
Euclidean distance between P1 and P2 is :
where P1(j) is the coordinate j of the object 1 in the word space, i.e. the weight of the link between the object 1 and the word j.
For example, we can calculate the distance between a web page profile and an ur profile. If this distance is low, we can conclude that the ur may be interested in the web page.Interest Communities
Moreover, in comparing ur profiles, we can group the nearest urs and then determine interest communities. Conquently, we can finally have a community profile which is the barycentre of the profiles of the urs in this community.