QoS in Web Caching

QoS in Web Caching

Manuel Afonso

mafonso@ci.uminho.pt

Alexandre Santos

alex@uminho.pt

Vasco Freitas

vf@uminho.pt

March 31, 1998

University of Minho

Campus Gualtar

4710 Braga - Portugal

Abstract

Nowadays, hierarchical organisation of caches is commonly used. The main objective is to overcome the cumbersome paths of communications among servers, trying to improve the response times experienced by end-users. The efforts are mostly focused on the still exponentially growing Web. There is an interest in the analysis of different caching architectures using always the same basic hardware, software and network setups. In order to be able to analyse the influence of architectural changes, caching server QoS specification must be studied.

This paper presents an approach to evaluate, from users point of view, Web Caching parameters to be used in QoS characterisation.

Several tools have been developed in order to evaluate QoS in Web Caching. Those tools have been used with a case study and obtained results are analysed.

Keywords: Proxy caching architectures, QoS, Web caching, ICP, HTTP.

Contents

Introduction
Basics of Internet Object Caching

2.1 Simple caches

2.2 Co-operating

2.2.1 The role of protocols

ICP

HTTP

2.2.2 Known problems

2.2.3 A new proposal: HTCP

Architecture testbed
Measuring the web caching QoS
Preliminar results

5.1 Proxy-www

5.2 RCCN proxies

Conclusions and further work

References

1. Introduction

Internet is now a widespread mean of interaction among people from everywhere. The World Wide Web, along with associated servers and browsers is, no doubt, the mostly popular Internet set of applications. At first glance, it seems quite nice, but a more attentive analysis shows some technological problems. As we all know, the Web is resource intensive, consuming a lot of bandwidth when documents are transferred - documents which can be as small as some Kbytes or as big as some Mbytes (specially sound, clips and image files). Of course, we can always think of upgrading circuits for a higher bandwidth, buying faster computers, extend memory, disks... Nevertheless, this solution is almost always economically impracticable, so high can costs grow as compared with the short/medium term benefits; demand is also always growing.

The actual and most commonly used solution to overcome the lack of bandwidth for such a high number of Web requests is Web Caching. This technique uses the knowledge acquired by several analysis on servers access logs and by looking into Web users behaviour, both individually and as members of an organisation, to reduce the latency experienced by end users when trying to fetch some documents through their Web-browser [1, 2].

The basic concept of caching is intermediate storage of copies of popular Web documents close to end users. It's taking advantage of temporal locality [3] on accesses - for example, in our University it´s very probable that several users will read the morning e-newspaper titles in a short period of time. Normally, Web documents are requested much more than once.

2. Basics of Internet object caching

Andrew Cormack [4] considers two distinguished types of caches: simple caches and co-operating caches. In simple caching, the communication is only possible hierarchically, through TCP connections; caches at the same level are not accessible. In co-operative caching, all caches can participate in the process of satisfying user requests.

Simple caches are being abandoned because they lead to wastage of both bandwidth and disk space. With this type of caches, if an object is not present, a request will be issued to the cache one level up in the hierarchy. ICP [5, 6] is not used, so it is no longer a good solution.

Co-operating caches, unlike the simple ones, admit richer co-operation which make them quite powerful. Nevertheless, there are some unwanted effects that need further analysis.

Next sections are devoted to the analysis of several aspects related with the behaviour of these caches.

2.1 Simple caches

Without any caching mechanisms, when a browser needs to get an object from a specified host (both present in URL), it will just make a direct connection to that host and tries to retrieve the object. Of course, if the object doesn't exist the end user will receive an error message. One advantage of using caching (either simple or co-operative) is that these messages can be avoided. Most recent proxies can use Proxy Automatic Configuration, PAC. The browser is configured to use a script and it can have pre-configured alternatives for the case where a proxy-cache or origin server is not available. Even thought this technique can increment the response times, this is negligible when compared with the augmented availability of information. PAC is also used for load balancing purposes in clusters of caches.

With simple caching there is the possibility of making requests to a cache. Each time a user makes a request through its browser, a TCP connection is made to the cache server instead of doing a connection to the original server. With this we expect to reduce the time needed to service requests. Within an organisation it is highly probable that more than one user requests the same object. So the cache can intercept requests for the same objects and avoid direct connections to the origin server for each of them, doing only one request. This technique can reduce both the wastage of bandwidth and latency serving requests.

Another useful aspect is that if the contacted cache doesn't have the requested object, it is able to forward the request either to a parent cache or to the original server.

However, these kinds of operations have some limitations and problems.

First, the hierarchy can not have more than two or three levels [7] because an object retrieved from an origin server (or from an upper level cache) using intermediate caches will be stored in all the caches used to convey the object to the user. This means that caches at higher levels need a lot of disk space, otherwise objects will be discarded, most of the time using Least-Recently Used technique, before they become stale.

The second disadvantage is the need for a TCP connection (which requires at least the exchange of eight packets) each time we want to retrieve an object. This is quite heavy. A better solution is using ICP for querying neighbour caches. ICP however has its own limitations, most of them caused by the lack of information in ICP headers; just part of the information in HTTP headers are in ICP headers. In order to solve these problems, ICP Working Group is developing the Hyper Text Caching Protocol, HTCP [8] - also known as HoT CraP.

HTCP messages are richer than ICP ones. Particularly, there are special headers carrying information about caching.

2.2 Co-operating caches

An institution with several departments may wish to have some caches at the same level (one per department, for example) being able to co-operate among them for serving requests. This co-operation is possible using the protocol ICP.

There are several possible methods of co-operation. Depending on the way they collaborate, caches are known as siblings or parents. The difference between these two types of co-operation is straightforward: parents can help to serve a request they receive even if they don't have a copy of that object; siblings can only serve a request if they already have a local copy. The way proxy-caches are organised defines a particular caching architecture.

The protocol HTTP/1.0 [9], used for Web transfers is quite complex and heavy. ICP, a simpler lightweight protocol, was designed for querying purposes among caches (and HTCP is under development...)

2.2.1 The role of protocols

Any proxy caching server (shortly named cache) able to co-operate with other caches is said to be a peer or neighbour of such caches. Peers admit further classification: both parent/child and sibling/peer relationships.

Communication among peers is, for the time, being accomplished by means of ICP protocol, as stated before.

ICP

As an example let's consider relationships among peers as pictured in figure 1, where C₁ has two siblings (C₃ and C₄) and one parent (C₂). Each time cache C₁ receives a request, it can send queries to caches C₂ , C₃ and C₄ using the message ICP_QUERY. (It sends a message to the origin server too, considering this as part of the selection process to serve a request).

Case one of the peers of C₁ had a fresh copy of the requested object it will reply with an ICP_HIT or ICP_HIT_OBJ. If one peer doesn't have the object or it will be stale in the next 30 seconds it will return an ICP_MISS.

Figure 2 shows a detailed diagram that explains what happens when a cache receives an ICP-request (opcode ICP_QUERY). The message ICP_DENIED should only occur if the client cache is not authorised to communicate with the cache receiving the ICP-request. In such cases, administrators should contact each other to solve the problem, which normally means changing the configuration file's ACLs.

Another possible message is ICP_MISS_NOFECTH and occurs when a parent cache is not able to forward requests, maybe due to network connection problems.

However, this cache continue to receive ICP-queries (to determine when the problems are solved) and is able to serve requests when objects are present in the cache.

Let's look now at how ICP-replies are processed. Figure 3 shows the way ICP-replies are processed in order to select one peer cache for getting a particular object.

HTTP

According to Figure 3, one of the following situations can happen:

An ICP_HIT_OBJ is received: no TCP connection is needed to retrieve the document because it comes piggybacked in the payload of an ICP reply packet;
An ICP_HIT is received: the cache that originated this response will be contacted - a TCP connection will be made, once it has the desired object;
An ICP_SECHO is received (treated as a HIT): it means that the reply from the origin server was the first to arrive and so the desired document must be retrieved with a TCP connection to the origin server (either the server is closer than existing caches or the network RTT is better than caches);
If neither case 1, 2 nor 3 happen, then the object will be retrieved from the fastest parent responding (network RTT / configured weight);
In other case, a direct connection is made to the origin server (unless the access is restricted by the use of ACLs).

An HTTP-request contains a method, an URL and some headers. The available methods and options depend on the version of protocol HTTP.

The important point here is that some of the fields present in HTTP-headers are not in ICP queries. So an ICP-reply can indicate that a particular object is present in cache and fresh (a HIT) and when the HTTP request is made a response can be issued indicating that the request cannot be fulfilled. Next sub-section discuss some of these problems.

2.2.2 Known problems

HTTP/1.1 is progressively being introduced but HTTP/1.0 is still the most used. So, let's stick with this version and analyse some of the options that can affect the behaviour of caching mechanisms. The options here presented are not present in ICP messages but do are in HTTP ones.

Content-Type

An object can be encoded in a way not accepted by the cache/browser that did the request. A TCP_MISS will occur when one cache previously sent an ICP message indicating that it had the object.

Expires

It indicates the time after which the object will be considered stale. The object can be cached for that period of time. These header can cause freshness problem as there is no means for caches to negotiating freshness parameters. A cache with more strict freshness rules can get stale data from another peer with relaxed rules of freshness. It's a TCP_MISS and the ICP message pointed to the existence of requested object. Normally, administrators agreed in similar freshness parameters.

If-Modified-Since-GET

Permits to ask if an object has been modified since the indicated date. It is used with method GET. In the case where the demanded object has not been modified a response "not modified" will be issued. Once again, the freshness problem can arise.

Authorization

It's the way a user can authenticate itself with a server. Responses containing an authorisation field are never cached. Squid.1.1 [11] considers objects with this header as private.

Last-Modified

Field enabling the calculation of TTL values. Generally a percentage of object's age.

Pragma

When the directive "no-cache" is present in a request message, the cache-chain must not be used. Instead, the request must follow to the origin server. This directive is sometimes issued by users when they press the browsers "reload" button.

2.2.3 A new proposal: HTCP

Attempts to solve these problems are being made. ICP Working Group proposes a new protocol, the already referenced HTCP. It's purposes are wider, pointing for a complete change in today's philosophy of caching. It's a kind of proposal for migration from client-driven caching (users' requests determine cached objects - if cachable) to "pro-active" caching.

HTCP has full HTTP request and response headers and extra useful caching information headers. Namely, Cache-Location:, Cache-Policy: and Cache-Expiry: headers are particularly important in what concerns caching.

Cache-Location: header adds flexibility. One cache can indicate alternative suppliers for the requested object, augmenting their availability of information.

Cache-Policy: header determines, for instance, if an object is cachable and/or can be shared (similar but more efficient than the Squid.1.1 private/public notion of objects).

Cache-Expiry: header indicates for how long an object can be considered fresh.

In spite of these, some researchers say that the long term solution will be "Adaptive Web

Caching" [12]. Briefly, this technique would use the theory of communication in groups, taking advantage of IP multicast and reaching reliability through Scalable Reliable Multicast protocol, SRM [13 ].

3. Architecture testbed

Commonly used caching architectures have only peers that behave as parents or siblings. Portugal has four top-level domain proxy-caches (There are two in Porto and another two in Lisbon. They are called the RCCN [14] proxies) and most of (if not all) the higher education institutions have their own proxy-cache co-operating with one of these top-level caches.

University of Minho has about 1000 teachers and around 14000 students accessing WWW, either by LAN access or dial-up. Our University is sharing the 10 Mbps of RCCN backbone with other education institutions. The link connecting the campus to this backbone has 4 Mbps bandwidth. With the European TERENA project [15] our international connections are quite better but still not enough for the for high volume of WWW traffic and still growing.

The easiest and cost effective solution to have better response times is reached, of course, by the use of caching techniques.

The actual architecture is composed by one proxy-cache (proxy-www) connecting the University to the "Internet world" - parent caches or remote servers - and a large number of children and siblings are attached to this cache. As there is a firewall, proxy-www is the mean to accessing servers outside this firewall. "Inside firewall" accesses are done through direct connections.

Keeping the same set of servers and studying the effects of establishing new architectural relations among servers is the goal. Being able to analyse QoS variation, along with those architectural changes is another goal.

Figure 5 shows, by means of accumulative values, typical distribution of documents' sizes accessed at the University of Minho (results obtained by analysing 42 non-consecutive days randomly selected) through a proxy/cache Web server.

4. Measuring the web caching QoS

There are several ways of evaluating the performance of co-operating proxy-caches. Some of the approaches use information concerning the utilisation of computational resources, such as memory, disk space, cpu usage, ... Other approaches consider bandwidth utilisation or latency perceived by the end user.

This section describes a new way of measuring the proxy-caches QoS in terms of response time, i.e., how long it takes to serve end user requests.

At first glance, it may seem that computing the average response time per request would be easy. However, in order to compute a significant measure, useful to compare performance of different proxy-caches architectures, some other considerations had to be taken into account.

As the objective is testing performance of different architecture configurations for a particular proxy-cache, some decisions were taken:

Consider classes (i=0..9) for the size of objects, as shown in table I.
Then we do a further split: within each class, we consider HTTP, Gopher and FTP requests (j=0,1,2)
In each class, we split requests, depending on the results, into four categories:

local hits (k=0);
hits on the hierarchy, at only one level up (k=1);
use of hierarchy, when a MISS occur from peers and requests should follow up on the hierarchy (k=2); and
DIRECT accesses to origin servers (k=3).

Classe i	Sizes
0	0KB-1KB
1	1KB-5KB
2	5KB-10KB
3	10KB-50KB
4	50KB-100KB
5	100KB-500KB
6	500KB-1MB
7	1MB-5MB
8	5MB-10MB
9	>= 10 MB

Table I - Size categories

The local hits are those requests which have in the field "Log Tags" - logs from Squid software - the following results:

UDP_HIT_OBJ (l=0) - when the object is small enough for being piggybacked in the ICP reply message;
UDP_HIT (l=1) - a valid copy of the desired object is in cache and a TCP connection can be initiated for the transfer;
TCP_HIT (l=2) - a connection was made and the object was successfully transferred.

The category of hits at the most one level up in hierarchy corresponds to those requests that resulted in one of the next responses:

PARENT_UDP_HIT_OBJ (l=3) - the response of one parent was positive and the object was samll enough for being piggybacked in the payload of an ICP reply packet;
PARENT_HIT (l=4) - the object was retrived successfully through a TCP connection;
SIBLING_UDP_HIT_OBJ (l=5) - similar to the first case, but the answer was originated by one sibling;
SIBLING_HIT (l=6) - indicates that the sibling had a fresh copy of the requested object, so it was retrieved with one TCP connection.

The case of use of hierarchy at more than one level up occurs when all the peers have responded negatively to ICP queries for an object. We assume that parents are working and so we only consider those requests with hierarchical access log tag FIRST_PARENT_MISS (l=7). This premise is adequate for the architectures we plan to test but other access log tags could be considered. For example, we could have considered the case where only one parent is available (SINGLE_PARENT).

The last case, direct access to origin servers considers all the requests with hierarchical access log code DIRECT (l=8).

Other considerations could be applied and are still under study. For example our institution has a firewall, but the discussion around this is not relevant for the objective of testing architecture performance of the proxy-cache because these aspects of the configuration will remain unchanged, although the architecture will indeed change.

For the purpose of calculating caching server QoS within an architecture we obtain for each (i,j,k,l) the following values:

the mean size of objects , standard deviation and coefficient of variation CV_ijkl,B;
the mean response time , standard deviation and coefficient of variation CV_ijkl,E;
the ratio (P_ijkl) between the number of those requests to the total number of requests.

Finally, the measure of QoS is obtained doing the following calculation:

where:

As it is well known the mean as representative measure has some limitations. For example, it can be affected by extreme values. So, we complement the QoS, as defined above, by two other measures, CV_B and CV_E, which show the degree of variability in size and in response time. They are defined as follow:

5. Preliminary results

The analyse was done to proxy cache at University of Minho, proxy-www, and to RCCN Web proxies located in Porto.

5.1 Proxy-www

The logs of last 8 days (20 to 27 March 98) were used to compute the defined measures and some other useful information were obtained. During this period, 414,242 requests were made (392,422 ICP and 21,820 TCP). All the requests were considered but only those related to HTTP are presented. The table II summarises some of the results - each cell value is the division of the amount of time by the respective number of bytes, in category (category QoS). There are some results that could be expected but others needs further study. For instance, why are origin servers QoS so low for all the classes but the first one? Probably it's due to the cost of making a TCP connection with small amount of exchanged data. However it needs further analysis.

Classe of sizes	Local HITs UDP_HIT /TCP_HIT		Hierarchical use (only PARENT_HIT)	Hierarchical use (+1 level up)	Origin servers accesses
0KB-1KB	0.00429	0.28624	20.53164	34.12768	51.30882
1KB-5KB	-	0.12103	1.10786	4.69322	0.56243
5KB-10KB	-	0.04898	2.55196	2.05087	0.05316
10KB-50KB	-	0.01870	0.19991	1.19320	0.01646
50KB-100KB	-	0.17426	0.05325	1.37516	-
100KB-500KB	-	1.78998	0.99472	0.34784	-
500KB-1MB	-	0.13229	-	0.63952	-
1MB-5MB	-	0.12732	-	0.00097	-
5MB-10MB	-	-	-	-	-
>= 10 MB	-	-	-	-	-

Table II - QoS by categories for HTTP (values in mili-seconds/byte transferred). Dash (-) indicates that no requests existed in that category.

The overall performance of proxy-www is characterised by the following values:

QoS = 0.29215 ms/byte;
CV_B = 0.01740;
CV_E = 0.15053;
QoS (worst case) = 36.19634

This value was obtained considering for each (i,j,k,l) the biggest ratio latency/size taking into account all the requests in category.

5.2 RCCN proxies

The object of analysis is the access logs of two RCCN proxies (let's call them proxy-1 and proxy-2).

For these two caches, we started by doing a matrix, considering the variables size and latency, both with several categories and then the other measures were calculated.

Tables III shows the proxy-1's number of requests aggregated by categories. In the period of analysis, we see that from the 2,718,861 requests (2,284,075 ICP and 434,787 TCP totalising 3.884 Gbytes) about 90% are objects smaller than 1 Kbyte. Considering only the first cell, where response times are below 200 ms, it has 85% of the total requests. These results encourages the concentration of efforts on these requests.

Sizes	0-1KB	1-5KB	5-10KB	10-50KB	50-100KB	100-500KB	0.5-1MB	1-5MB	5-10MB	>= 10 MB
Latency
< 200 ms	2306568	17624	6689	5243	5	0	0	0	0	0
< 500 ms	18846	9267	2274	2355	59	0	0	0	0	0
< 1 s	34432	19718	3781	2361	121	3	0	0	0	0
< 2 s	20789	21988	7043	5931	148	35	0	0	0	0
< 3 s	7972	7633	3201	3938	186	40	0	0	0	0
< 4 s	13605	8653	2796	2578	122	29	0	0	0	0
< 5 s	10414	10179	3237	2723	118	33	0	0	0	0
< 6 s	3647	4626	2698	2762	83	20	0	0	0	0
< 7 s	3011	3123	1774	2174	75	21	1	0	0	0
< 8 s	2402	2721	1540	1955	80	14	2	0	0	0
< 9 s	1422	2041	1397	1890	65	16	2	0	0	0
< 10 s	3460	2322	1225	1710	91	19	2	0	0	0
< 15 s	6840	8761	4871	7258	316	52	8	1	0	0
< 20 s	2486	3425	2405	4945	325	56	9	3	0	0
< 25 s	2223	3262	1769	3570	291	44	11	1	0	0
< 30 s	808	1558	1169	2773	300	41	6	5	0	0
< 35 s	784	1122	775	1969	278	49	8	3	0	0
< 40 s	346	590	478	1493	248	39	5	4	0	0
< 45 s	343	622	403	1224	215	44	2	4	0	0
< 50 s	568	701	415	1042	219	33	1	4	0	0
>= 50 s	4551	3335	2898	7477	2175	1208	165	249	32	7
	2445517	133271	52838	67371	5520	1796	222	274	32	7
	90.35%	4.92%	1.95%	2.49%	0.20%	0.07%	0.01%	0.01%	0.00%	0.00%

Table III - Requests of proxy-1 distributed by size and response times - only HTTP

The QoS by categories is presented in table IV (values represent ms/byte).

Classe of sizes	Local HITs UDP_HIT \| TCP_HIT		Origin servers accesses
0KB-1KB	0.02549	0.34049	51.46991
1KB-5KB	0.00215	0.15918	5.31436
5KB-10KB	0.00047	0.11093	2.91213
10KB-50KB	0.00045	0.21652	1.85093
50KB-100KB	-	0.60325	1.68454
100KB-500KB	-	1.10225	3.34183
500KB-1MB	-	1.06230	1.74949
1MB-5MB	-	1.27303	0.56491
5MB-10MB	-	0.36965	0.02444
>= 10 MB	-	-	0.11946

Table IV - QoS by categories for proxy-1 - only HTTP.

The overall metrics are:

QoS = 1.29833 ms/byte;
CV_B = 0.08106;
CV_E = 1.58333;
QoS (worst case) = 401.07010

The cache proxy-2 gave us similar results. During the period of analysis 2,338,865 requests were made (1,933,970 ICP and 404,896 TCP) which represents 3.300 Gbytes. As in proxy-1, almost 90% of the requests are objects smaller than 1 Kbyte and of these around 81% have response times below 200 ms (table V).

Sizes	0-1KB	1-5KB	5-10KB	10-50KB	50-100KB	100-500KB	0.5-1MB	1-5MB	5-10MB	>= 10 MB
Latency
< 200 ms	1881406	9951	3442	1953	0	0	0	0	0	0
< 500 ms	71718	5637	1415	1516	12	0	0	0	0	0
< 1 s	29922	14372	2697	1690	68	5	0	0	0	0
< 2 s	23546	20892	6451	4933	111	9	0	0	0	0
< 3 s	10518	10374	3869	3763	203	14	0	0	0	0
< 4 s	9995	8723	2818	2658	183	20	0	0	0	0
< 5 s	8828	9453	3262	2595	178	30	0	0	0	0
< 6 s	5421	5468	2526	2329	140	38	0	0	0	0
< 7 s	4196	4208	1848	1865	108	35	0	0	0	0
< 8 s	3247	3620	1657	1721	107	28	0	0	0	0
< 9 s	2426	2715	1483	1685	85	31	0	0	0	0
< 10 s	2759	2671	1306	1431	83	27	1	0	0	0
< 15 s	10047	10528	5223	6313	280	103	0	0	0	0
< 20 s	4374	4764	2875	4241	240	74	2	0	0	0
< 25 s	3251	4094	1957	3031	193	52	5	0	0	0
< 30 s	1846	2474	1235	2283	183	60	4	1	0	0
< 35 s	1411	1782	902	1620	160	43	6	1	0	0
< 40 s	916	1091	553	1214	133	38	4	1	0	0
< 45 s	712	873	394	1007	117	48	2	2	0	0
< 50 s	893	920	425	862	124	33	5	0	0	0
>= 50 s	10314	6297	2979	6220	1284	854	208	263	29	4
	2087746	130907	49317	54930	3992	1542	237	268	29	4
	89.64%	5.62%	2.12%	2.36%	0.17%	0.07%	0.01%	0.01%	0.00%	0.00%

Table V - Requests of proxy-2 distributed by size and response times - only HTTP

The QoS by categories is given in table VI.

Classe of sizes	Local HITs UDP_HIT \| TCP_HIT		Origin servers accesses
0KB-1KB	0.43797	0.49242	91.17730
1KB-5KB	-	2.99400	6.92480
5KB-10KB	-	0.66970	3.17078
10KB-50KB	-	0.48805	1.70526
50KB-100KB	-	0.63836	0.89883
100KB-500KB	-	0.55416	1.61413
500KB-1MB	-	0.41508	0.63983
1MB-5MB	-	0.42522	0.88920
5MB-10MB	-	0.21700	0.43159
>= 10 MB	-	0.17303	0.07662

Table VI - QoS by categories for proxy-2 - only HTTP.

The overall metrics are:

QoS = 2.28842 ms/byte;
CV_B = 0.05984;
CV_E = 0.51316;
QoS (worst case) = 741.45314

6 Conclusions and further work

The proposed measures may give some information about the performance, in terms of response times to requests.

However, there are some aspects that need further analysis. Probably, the category of requests below 200 ms should be split in order to have more detailed results; this approach reveals a lot of requests aggregated in a single category (which means loosing information). The presented latency times are absolute values; perhaps relative values, in mili-seconds per byte transferred, could be more useful.

In what concerns objects' size, there are some research pointing that requested objects with size greater than some number of standard deviation should not be considered. Probably this may be a better solution. However, it may be difficult to determine the optimum number of standard deviation. This needs further studies.

Another, not negligible, aspect is the day time at which requests are made. It's known, for instance, the accesses are faster at late hours and slower at working hours. For these reasons, probably day time should be considered in the analysis.

In spite of this, for the purpose of tuning one particular cache for better performance, i.e., choosing the better architecture, it is believed that these results can give important help.

At Minho University, planned experiences will evaluate the performance of ICP multicast based architectures and the results will be compared with actual architecture, where there is no use of multicast and proxy-www has only parents.

Also interesting, but maybe difficult to achieve, would be characterisation of access patterns. Knowing the characteristics, at least some, of the Portuguese community's access patterns to WWW could be rewarding. This knowledge could be very useful for international or transcontinental accesses - load balancing could be done based on rigorous data and the use of pre-fetching could improve the response times of end user. Caches could be specialised by domains.

References

WESSELS, Duane; CLAFFY, K - ICP and the Squid Web Cache, August 6, 97. http://www.nlanr.net/~wessels/Papers/icp-squid.ps.

WESSELS, Duane - Intelligent Caching for World-Wide Web Objects. "Thesis for the degree of Master of Science", Washington State University, 95. http://itp-www.colorado.edu/~wessels/ Proxy/wessels-thesis.ps

ALMEIDA, V.; BESTRAVOS, A.; CROVELLA, M.; OLIVEIRA, A. - Characterizing Reference Locality in the WWW, in Proceedings of IEEE-ACM PDIS'96. http://www.cs.bu.edu/faculty/best/papers/pdis96.ps

CORMACK, Andrew - Web Caching, September 96. http://www.niss.ac.uk/education/jisc/acan/caching.html.

WESSELS, Duane; CLAFFY, K. - Internet Cache Protocol (ICP), version 2, September 97. RFC2186. http://nic.ddn.mil/ftp/rfc/rfc2186.txt

WESSELS, Duane; CLAFFY, K - Application of Internet Cache Protocol (ICP), version 2, RFC 2187, September 97. http://nic.ddn.mil/ftp/rfc/rfc2187.txt.

MELVE, I.; SLETTJORD, L.; BEKKER, H.; VERSCHUNREN, T. - Building a web caching system - architectural considerations, in Proceedings of Joint European Networking Conference 8 (JENC8), pp. 121-1:121-9. http://www.uninett.no/prosjekt/desire/jenc8/webcache-jenc8.ps

VIXIE, Paul - Hyper Text Caching Protocol - HTCP/0.0., "Internett Draft", March 98. http://ds.internic.net/internet-drafts/draft-vixie-htcp-proto-00.txt

BERNERS-LEE, T.; FIELDING, R.; FRYSTYK, H. - Hypertext Transfer Protocol - HTTP/1.0, RFC 1945, May 96. http://nic.ddn.mil/ftp/rfc/rfc1945.txt.

FIELDING, R.; GETTYS, J.; MOGUL, J.; FRYSTYK, H.; BERNERS-LEE, T. - Hypertext Transfer Protocol - HTTP/1.1, RFC 2068, January 97. http://nic.ddn.mil/ftp/rfc/rfc2068.txt.

NLANR Squid Project, http://squid.nlanr.net/

ZHANG, Lixia; FLOYD, Sally; JACOBSON, Van - Adaptive Web Caching, April 25, 97. http://ircache.nlanr.net/Cache/workshop97/Pappers/Floyd/floyd.ps

A Reliable Multicast Framework for Light-weight Sessions and Application Level Framing. ftp://ftp.ee.bl.gov/papers/srm_ton.ps.Z

RCCN - Rede de Cálculo Científico Nacional. http://www.rccn.net.

TERENA Project, http://www.dante.net/