Top Banner
24
Welcome message from author
This document is posted to help you gain knowledge. Please leave a comment to let me know what you think about it! Share it to your friends and learn new things together.
Transcript
Page 1: What is Squid3 Proxy?
Page 2: What is Squid3 Proxy?

What is Squid?

• A caching proxy for– HTTP, HTTPS (tunnel only)– FTP– Gopher– WAIS (requires additional software)– WHOIS (Squid version 2 only)

• Supports transparent proxying• Supports proxy hierarchies (ICP protocol)

Page 3: What is Squid3 Proxy?

Proxy Servers• A proxy server serves at least two functions

– it offers an extended cache to the local users so that multiple users who access the same pages get a savings

– it offers control over what material can be brought into the organization’s network and thus on to the clients• for instance, it can filter material for viruses• it can also filter material to disallow access to pornography, etc

– other functions that it can serve include• an authentication server• performing SSL operations like encryption and decryption• collecting statistics on web traffic and usage

– additionally, the proxy server can offer an added degree of anonymity in that it is the proxy server that places requests of remote hosts, not an individual’s computer• thus, the IP addresses sent to servers is that of the proxy server not of

the client

Page 4: What is Squid3 Proxy?

What is a proxy?• Firewall device; internal users communicate with the proxy, which in turn talks to

the big bad Internet– Gate private address space (RFC 1918) into publicly routable address space

• Allows one to implement policy– Restrict who can access the Internet– Restrict what sites users can access– Provides detailed logs of user activity

What is a caching proxy?

• Stores a local copy of objects fetched– Subsequent accesses by other users in the organization are served from the

local cache, rather than the origin server– Reduces network bandwidth– Users experience faster web access

Page 5: What is Squid3 Proxy?

Transparent proxying• Router forwards

all traffic to port 80 to proxy machine using a route policy

• Pros– Requires no explicit proxy configuration

in the user’s browser

• Cons– Route policies put excessive CPU load on routers on many (Cisco) platforms– Kernel hacks to support it on the proxy machine are still unstable– Often leads to mysterious page retrieval failures– Only proxies HTTP traffic on port 80; not FTP or HTTP on other ports– No redundancy in case of failure of the proxy

• Recommendation: Don’t use it!– Create a proxy auto-configuration file and instruct users to point at it– If you want to force users to use your proxy, either

• Block all traffic to port 80• Use a route policy to redirect port 80 traffic to an origin web server and return a page

explaining how to configure the various web browsers to access the proxy

Page 6: What is Squid3 Proxy?

squid.conf runtime settings

• Default squid.conf file is heavily commented! Read it!

• Must set– cache_dir (one per disk)– cache_peer (one per peer) if participating in a hierarchy– cache_mem (8-16M preferred, even for large caches)– acl rules (default rules mostly work, but must reflect your address space)

Page 7: What is Squid3 Proxy?

squid.conf ACL example• acl SSL_ports port 443 563• acl gopher_ports port 70• acl wais_ports port 210• acl whois_ports port 43• acl www_ports port 80 81• acl ftp_ports port 21• acl Safe_ports port 1025-65535

• acl CONNECT method CONNECT• acl FTP proto FTP• acl HTTP proto HTTP• acl WAIS proto WAIS• acl GOPHER proto GOPHER• acl WHOIS proto WHOIS

• acl manager proto cache_object• acl localhost src 127.0.0.1/32• acl managerhost src 204.248.51.34/32• acl managerhost src 204.248.51.39/32• acl managerhost src 204.248.51.40/32• acl cawtech src 204.248.51.0/24• acl cawtech-internal src 172.16.0.0/16• acl all src 0.0.0.0/0.0.0.0

• http_access deny manager !localhost !managerhost• http_access deny CONNECT !SSL_ports• http_access deny HTTP !www_ports !Safe_ports• http_access deny FTP !ftp_ports !Safe_ports• http_access deny GOPHER !gopher_ports !Safe_ports• http_access deny WAIS !wais_ports !Safe_ports• http_access deny WHOIS !whois_ports !Safe_ports• http_access allow localhost• http_access allow cawtech• http_access allow cawtech-internal• http_access deny all

Page 8: What is Squid3 Proxy?

Caching

• Caching uses faster hardware to save information (code or data) that you have used recently so that, if you need it again, it takes less time to access– for processing a program, caching takes place in cache memory, which is

either stored on the CPU, or on the motherboard• storage is typically for a very brief time period (fractions of a second)

– for secondary storage, caching is stored in a buffer on the hard disk• storage is typically until there are new hard disk accesses

– for web access, caching is stored on the hard disk itself• storage is typically for about a month if the information being stored is

static (dynamic web content is usually not cached)

Page 9: What is Squid3 Proxy?

Forward vs Reverse Proxies• The typical form of proxy server is the forward proxy

– a collection of browsers (on the same LAN, or within an organization) share the same proxy server

– all client requests go to the proxy server• the server looks in its cache to see if the material is available• if not, the server looks to make sure that the request can be fulfilled (does

not violate any access rules), and sends the request over the Internet• once a response is received, the server caches it and responds to the client

• A reverse proxy server is used at the server end of the Internet – responses from the Internet come into the proxy server which then

determines which web server to route the request on to– this might be used to balance the load of many requests for a company that

runs multiple servers– it also allows the proxy server to cache information and respond directly if the

requested page is in its cache• we’ll consider reverse proxy servers in a bit

Page 10: What is Squid3 Proxy?

Commands / Comments• If you want to run Squid upon booting

– you might add the start-up command to a script in rc.d, init.d or inittab• Many people do not like running Squid in the main OS environment

– for security purposes, just as you might not want to run apache in the main OS environment, therefore they create a chroot environment

– this is a new root filesystem directory separate from the remainder of the filesystem

– anyone who hacks into squid will not be able to damage your file system, only the chroot environment

• The safest way to shut down Squid is through – squid –k shutdown

• do not use kill• To reconfigure Squid after changing squid.conf

– run squid –k reconfigure, this prevents you from having to stop/restart squid

• To rotate Squid log files, use squid –k rotate– put this in a crontab to rotate the files every so often (e.g., once a day)

Page 11: What is Squid3 Proxy?

ACLs in Squid• Since apache can be used as a proxy server, you might wonder why use squid?

– squid allows you to define access control lists (acls) which in turn can then be used to specify rules for access• who should be able to access web pages via squid?• what pages should be accessible? are there restrictions based on file name?

web server? web page content or size?• what pages should be cached?• what pages can be redirected?

– such rules are defined in two portions• acl definition (similar to what we saw when defining accessors in bind) • followed by an access statement (allow or deny statements)

– Squid offers a variety of acl definition types• IP addresses• IP aliases• URLs• User names (requiring authentication)• file types

Page 12: What is Squid3 Proxy?

ACL - Example• The most common form of acl is to define and permit access to specific clients

– we will define some src (source IP address) acls • typically with src, we define specific IP addresses or subnetworks (rather than

IP aliases)– acl src localhost 127.0.0.1

• here, we define the source acl “localhost” to be the IP address 127.0.0.1– acl src mynet 10.2/16

• this could also be 10.2.0.0/16• Now we use our acls to allow and deny access

– http_access allow localhost– http_access allow mynet– http_access deny all

• here, we are allowing access only from localhost and those on “mynet”, everyone else is denied

• order of the allow and deny statements is critical, we will explore this next time

Page 13: What is Squid3 Proxy?

Types of ACLs I• Aside from src, you can also specify ACLs based on

• src – the IP address of the user (client) whose requests are going from their browser to the squid proxy server

• dst – the URL of the web server (destination)• srcdomain and dstdomain – same as src and dst except that these permit IP aliases• srcdom_regex and dstdom_regex – same as srcdomain and dstdomain except that the IP aliases

can be denoted using regular expressions• time – specify the times and days of the week that the proxy server allows or denies access• port, method, proto – specify the port(s) that the proxy server permits access, the HTTP methods

allowable (or denied) and the protocal(s) allowable (or denied)• rep_mime_type – allow or deny access based on the type of file being returned

• we will study these (and others) in detail next time• myip – same as src, but it is the internal IP address rather than (possibly) an external IP address • arp – access controlled based on the MAC address

Page 14: What is Squid3 Proxy?

Types of ACLs II• port – specify one or more port numbers

– ranges separated by – as in 8000-8010– multiple ports are separated by spaces or on separate definitions– typically, you will define “safe” ports and then disallow access to any port that is not safe, for example:

• acl port safe_ports 80 443 8080 3128• http_access deny !safe_ports

• method – permissible HTTP method – GET, POST, PUT, HEAD, OPTIONS, TRACE, DELETE– squid also knows additional methods including PROPFIND, PROPPATCH, MKCOL, COPY, MOVE, LOCK,

UNLOCK, CONNECT and PURGE• acl method allowable_method GET HEAD OPTIONS• http_access deny !allowable_method

• proto – permissible protocol(s) – http, https, ftp, gopher, whois, urn and cache_object

• ex: acl proto myprotos HTTP HTTPS FTP• proxy_auth – requires user login and a file/database of username/passwords

– you specify the allowable user names here, such as• acl proxy_auth legal_users foxr zappaf newellg

• maxconn – maximum connections– you can control access based on a maximum number of server connections– this limitation is per IP address, so for instance you could limit users to 25 accesses, once the number is

exceeded, that particular IP address gets “shut out”

Page 15: What is Squid3 Proxy?

Time ACLs• To control when users can access the proxy server, based on either days of the week, or

times (or both)– S, M, T, W, H, F, A for Sunday – Saturday, D for weekdays– time specified as a range, hh:mm – hh:mm in military time

• The format is acl name time [day(s)] [hh:mm - hh:mm]– example: to specify weekdays from 9 am to 5 pm:

• acl weekdays time D 09:00 – 17:00– example: to specify Saturday and Sunday:

• acl weekend time SA• The first time must be less than the second

– if you want to indicate a time that wraps around midnight, such as 9:30 pm to 5:30 am, you have to divide this into two definitions (9:30 pm – 11:59 pm, and 12:00 am – 5:30 am)

– if days have different times, you need to separate them into multiple statements, such as wanting to define a time for M 3-7 and W 3-8 would require two definitions

Page 16: What is Squid3 Proxy?

More ACLs and Regular Expressions

• As stated earlier, you can specify regular expressions in srcdom_regex and dstdom_regex

• There are also regex versions to build rules for the URL– url_regex and urlpath_regex• for the full URL and the path (directory) portion of the

URL respectively– you might use this to find URLs that contain certain

words, such as paths that include “bin”, or paths/filenames that include words like “porn”

– ident_regex• to apply regular expressions to user names after the squid

server performs authentication

Page 17: What is Squid3 Proxy?

Other ACL Types• req_mime_type and rep_mime_type

– test content-type in either the request or response header– it only makes sense to use req_mime_type when uploading a file via

POST or PUT– example: acl badImage rep_mime_type image/jpeg

• Browsers– restrict what type(s) of browser can make a request

• External ACLs– this allows Squid to sort of “pass the buck” by requesting that some

outside process(es) get involved to determine if a request should be fulfilled or not• external ACLs can include factors such as cache access time,

number of children processes available, login or ident name, and many of the ACLs we have already covered, but now handled by some other server

Page 18: What is Squid3 Proxy?

User Names & Authentication• The ident acl can be used to match user names• The proxy_auth acl can specify either REQUIRED or specific users by

name that then require that a user log in– authentication requires that the user must perform a

username/password authentication before Squid can continue• any request that must be authenticated is postponed until

authentication can be completed– although authentication itself adds time, using ident or proxy_auth

also adds time after authentication has taken place because Squid must still look up the user’s name among the authentication records to see if the name has been authenticated

• Squid itself does not come with its own authentication mechanisms, so we have to add them as modules much like with apache

Page 19: What is Squid3 Proxy?

Log Files• As with Apache, Squid uses log files to store messages of importance

and to maintain access and error logs– however, one additional log that Squid has that Apache does not is

a cache log in order to record what files are cached– there are also optional log files available

• useragent.log and referer.log which contain information about user agent headers and web referers for every access

• swap.state and netdb_statestore information regarding the disk and network performance of Squid

– you can control the names of the log files and which of these optional log files are used through directives in your conf file

– because there are so many logs and they can generate a lot of content, there are log rotation tools available just as with Apache

Page 20: What is Squid3 Proxy?

cache.log• This log contains

– configuration information– warnings about performance problems– errors

• Entries are of the form– date time | message

• Configuration messages might include such things as – process ID of a starting squid process– successful (or failed) tests to the DNS and the DNS IP address (as

obtained from resolv.conf)– starting helper programs

• The remaining cache entries are made based on a specified debug level that dictate which types of operations should be logged here– normal information, warnings, errors, emergencies, etc

Page 21: What is Squid3 Proxy?

access.log• Much like Apache’s access log, Squid’s access log will store

every request received– each entry contains 10 pieces of information

• timestamp• response time• client address• status code of request• size of file transferred• HTTP method• URI• client identity (if available)• how requests were fulfilled on a cache miss (that is, where we had to go to

get the file)• content type

– status codes differ from Apache as they indicate cache access as well as server status codes, and include these:• TCP_HIT, TCP_MISS, TCP_REFRESH_HIT, TCP_REF_FAIL_HIT,

TCP_REFRESH_MISS, TCP_CLIENT_REFRESH_MISS, TCP_IMS_HIT, TCP_SWAPFAIL_MISS, TCP_NEGATIVE_HIT, TCP_MEM_HIT, TCP_DENIED, TCP_OFFLINE_HIT, TCP_REDIRECT and NONE

Page 22: What is Squid3 Proxy?

Directives for access.log• log_icp_queries – default is enabled, allows you to control

whether ICP (Internet Cache Protocol) requests are logged or not

• emulate_http_log – whether to use the same format as http server access logs (that is, match Apache’s server log) or use Squid’s native format which contains more information

• log_mime_hdrs – if set to on, Squid will add HTTP request and response headers to each log entry (this adds two more fields to each entry)

• log_fqdn – this toggles whether Squid records requests by destination IP address or hostname – if hostname, then Squid has to do a reverse DNS lookup which takes more time

• log_ip_on_direct – same as above except whether to log client’s (requestor’s) IP address or hostname

• strip_query_terms, uri_whitespace – whether to remove the query terms from an URL and whether to strip, chop, or encode white space in a URL (if any)

Page 23: What is Squid3 Proxy?

Store.log• The store.log file stores decisions to store and remove objects from the

Squid cache– if an object is cached, the entry includes where it was cached and

when– if an object is uncacheable, then the entry indicates why the object

was uncacheable– if a cache is full, a replacement strategy is used to decide what to

remove, and any such action is logged here• The store log contains the following fields:

– timestamp, action (SWAPOUT, RELESE, SO_FAIL), directory number (which cache), file number, cache key (the hash value of the object), status code, date, last_modified from the HTTP response header, expires, content-type, content-length/size, HTTP method and URI

Page 24: What is Squid3 Proxy?

Sample proxy auto-configuration (wpad.dachser.com)

• function FindProxyForURL(url, host)• {• if (isPlainHostName(host) ||• dnsDomainIs(host, ".cawtech.com"))• return "DIRECT";

• if ((url.substring(0, 5) == "http:") ||• (url.substring(0, 6) == "https:") ||• (url.substring(0, 4) == "ftp:") ||• (url.substring(0, 7) == "gopher:"))• return "PROXY proxy.cawtech.com:3128; DIRECT";

• return "DIRECT";• }