I've written some scripts to semi-automatically list all machines of "ours" that running a web service, ~neilb/bin/share/listlikelywebservers - whichi is a list of all machines/profiles which either have a http or https hole in the firewall, or include the apache.h or apacheconf.h header files.

For all the "managed" machines in that list I then ssh in and run another script ~neilb/bin/share/listwebsites which gets apache to spit out a list of web sites it's actually configured to respond to. For interest that list is here ServicesUnitAllWebSites.

Armed with this list I've another couple of scripts (in ~neilb/work/dice/web/cookies/) that run 'wget' on the root of the web site:

  wget --quiet --keep-session-cookies --load-cookies cookies.txt --save-cookies cookies.txt--recursive --level 1 --no-check-certificate -O /dev/null

Which builds up a cookies file for the sites. As wget doesn't understand javascript, it doesn't see the cookies that would be set by the likes of Google Analytics. So another pass using wget, but instead of -O /dev/null the output was piped into grep to search for google-analytics.com/ga.js. Matches are listed below, along with the cookie results.

group cookies - for the record, in canny:/disk/canny1/cookie-search/ is a script 'do-search' which when run as root generates a file in the same location labeled cookie-search-<date> It follows the symlinks that group.inf does to the group areas as the groups web server. For each of those it searches for likely files (.html, .php, etc) and greps for "cookie|googleanalytics", and records any matches in the file mentioned previously.

Session Cookies Who's Responsible Status
aicat.inf.ed.ac.uk (RAT to check)  
croxleyvm.inf.ed.ac.uk forum.idea - M Fourman Contact
csb.inf.ed.ac.uk Hongwu Ma reminder
data.inf.ed.ac.uk Ewan Klein Contact
ikiw.inf.ed.ac.uk Services Unit - turned off sessions added footer Fixed
madrid.inf.ed.ac.uk Lito Kriara/Mahesh Marina Fixed
.mail.inf.ed.ac.uk Services Unit - disabled old horde Fixed
mook.inf.ed.ac.uk CSBE Fixed
puma.inf.ed.ac.uk (RAT) Fixed
socap.informatics-ventures.co.uk Duncan Davidson (commercialisation) internal access only
vfbdev-blanik.inf.ed.ac.uk Douglas Armstrong Fixed
vfbdev.inf.ed.ac.uk Douglas Armstrong Fixed
vfbsandbox.inf.ed.ac.uk Douglas Armstrong Fixed
weblogin.inf.ed.ac.uk Infrastructure Unit Fixed
weblogintest.inf.ed.ac.uk Infrastructure Unit Fixed
wiki.lcfg.org MPU  
www.ed.ac.uk not us  
www.ehmn.bioinformatics.ed.ac.uk Hongwu Ma Ack
www.healthagents.net not us  
www.theon.inf.ed.ac.uk RAT  
www.virtualflybrain.org Douglas Armstrong Fixed

Non-Session cookies Who's Responsible Status
.blackfriars.inf.ed.ac.uk US unit Fixed
csb.inf.ed.ac.uk Hongwu Ma Ack
.eie11.com D Davidson Fixed
.eie12.com D Davidson Fixed
ifriend.inf.ed.ac.uk Infrastructure Unit  
.informatics-ventures.com D Davidson Fixed
.inspace.mediascot.org Mark Daniels/Jon Oberlander Ack
.nescdrupal.inf.ed.ac.uk Malcolm Atkinson Ack
.neuralmapformation.org D. Sterratt Fixed
.openvce.net A Tate Ack
.research.nesc.ac.uk Malcolm Atkinson Ack
www.ehmn.bioinformatics.ed.ac.uk Hongwu Ma Ack
www.theon.inf.ed.ac.uk RAT  

Google Analytics Who's Responsible Status
canny.inf.ed.ac.uk DR www.inf - Services Unit Fixed
cigar.inf.ed.ac.uk wcms.inf - Services Unit Fixed
crivvens.inf.ed.ac.uk BP Web - Services Unit Not Using cookies
dunk.inf.ed.ac.uk D Davidson Ack
eie10.com D Davidson Fixed
eie11.com D Davidson Fixed
eie12.com D Davidson In hand
fankle.inf.ed.ac.uk DAI Web - Services Unit Not Using cookies
idea.ed.ac.uk M Fourman Ack
informatics-ventures.com D Davidson Fixed
informatics-ventures.tv D Davidson Fixed
nescdrupal.inf.ed.ac.uk Malcolm Atkinson Fixed
nimrod.inf.ed.ac.uk ANC  
nrg.inf.ed.ac.uk Ian Simpson ? Fixed
research.nesc.ac.uk Malcolm Atkinson Fixed
tarn.inf.ed.ac.uk DCS Web - Services Unit Not Using cookies
vfbdev.inf.ed.ac.uk Douglas Armstrong Fixed
vfbsandbox.inf.ed.ac.uk Douglas Armstrong Fixed
wafer.inf.ed.ac.uk www.inf - Services Unit Fixed
wcms.inf.ed.ac.uk wcms.inf - Services Unit Fixed
www.anc.ed.ac.uk ANC (DTC - Jim Bednar) Contact
www.arcs.im pasta group (Nigel Topham ?) Contact
www.cisa.inf.ed.ac.uk wcms.inf - not using cookies Not Using cookies
www.dai.ed.ac.uk - Services Unit Not Using cookies
www.dcs.ed.ac.uk - Services Unit Not Using cookies
www-dr.inf.ed.ac.uk - Services Unit Fixed
www.emime.org S King Fixed
www.entrepedia.org D Davidson Fixed
www.iccs.inf.ed.ac.uk a link to ilcc.inf Not Using cookies
www.ilcc.inf.ed.ac.uk - Frank Keller, Sharon Goldwater Fixed
www.inf.ed.ac.uk - Services Unit Fixed
www.info.ed.ac.uk - Services Unit Fixed
www.informatics.ed.ac.uk - Services Unit Fixed
www.informatics-ventures.com Duncan Davidson Fixed
www.inspace.ed.ac.uk Mark Daniels/Jon Oberlander Ack
www.plasmo.ed.ac.uk CSBE Fixed
wwwtest.inf.ed.ac.uk - Services Unit Fixed
www.virtualflybrain.org Douglas Armstrong Fixed
groups.inf.ed.ac.uk/pasta/ Nigel Topham, (Björn Franke) Contact
conferences.inf.ed.ac.uk/emnlp2011/ Bonnie Webber, Miles Osborne Fixed
workshops.inf.ed.ac.uk/dbpl09/ Floris Geerts Can be removed
groups.inf.ed.ac.uk/entrepedia/skins/Entrepedia.php - currently broken Duncan Davidson Ack

The Uni's own trawl of inf.ed.ac.uk http://www.uwp.is.ed.ac.uk/cookie-audit/index.php?url=inf.ed.ac.uk

Accessing Cookie File Search Results

I've knocked up a quick CGI that might be the sort of thing we point homepages users at so they can see where their "cookie" appears https://homepages.inf.ed.ac.uk/cgi/neilb/cosign/cookies2012

Not sure how we'd do something similar for the groups cookie results.

Guidance

I'm sure I was asked to create a wiki page where we could give examples of solutions people could use for their web sites. I've started this at CookiesGuidance.

-- NeilBrown - 09 Mar 2012

Topic revision: r36 - 10 Aug 2012 - 10:04:49 - LindseyBrown
 
This site is powered by the TWiki collaboration platformCopyright © by the contributing authors. All material on this collaboration platform is the property of the contributing authors.
Ideas, requests, problems regarding TWiki? Send feedback
This Wiki uses Cookies