~neilb/bin/share/listlikelywebservers
- whichi is a list of all
machines/profiles which either have a http or https hole in the
firewall, or include the apache.h or apacheconf.h header files.
For all the "managed" machines in that list I then ssh in and run
another script ~neilb/bin/share/listwebsites
which gets apache to
spit out a list of web sites it's actually configured to respond to. For interest that list is here ServicesUnitAllWebSites.
Armed with this list I've another couple of scripts (in
~neilb/work/dice/web/cookies/
) that run 'wget' on the root of the web
site:
wget --quiet --keep-session-cookies --load-cookies cookies.txt --save-cookies cookies.txt--recursive --level 1 --no-check-certificate -O /dev/nullWhich builds up a cookies file for the sites. As wget doesn't understand javascript, it doesn't see the cookies that would be set by the likes of Google Analytics. So another pass using wget, but instead of
-O /dev/null
the output was piped into grep to search for
google-analytics.com/ga.js
. Matches are listed below, along with the
cookie results.
group cookies - for the record, in canny:/disk/canny1/cookie-search/ is a script 'do-search' which when run as root generates a file in the same location labeled cookie-search-<date> It follows the symlinks that group.inf does to the group areas as the groups web server. For each of those it searches for likely files (.html, .php, etc) and greps for "cookie|googleanalytics", and records any matches in the file mentioned previously.
The Uni's own trawl of inf.ed.ac.uk http://www.uwp.is.ed.ac.uk/cookie-audit/index.php?url=inf.ed.ac.uk