Welcome!

The Interface Between the Worlds of Cloud Computing and the Semantic Web

Paul Miller

Subscribe to Paul Miller: eMailAlertsEmail Alerts
Get Paul Miller via: homepageHomepage mobileMobile rssRSS facebookFacebook twitterTwitter linkedinLinkedIn


Blog Feed Post

Strata Conference 2010: Real World Applications in the Enterprise and Industry

Notes from a panel session, Real World Applications Panel: Enterprise and Industry, featuring Kenneth Cukier from The Economist, Adam Hurwitz from BIA, Jinesh Varia from Amazon Web Services, and Mario Veiga Pereira from PSR.

Cukier – Strata so far been focussed on the tools and toolmakers. Real-world applications will actually lead to biggest changes, but we haven’t really heard from them here. Big Data – big, fast, smart, messy.

Varia – Big Data Clouds. 3 case studies from AWS customers. Razorfish… cost of storing and analysing data. Best Buy wanted to analyse clickstream logs and predict patterns etc to aid advertising placement. 3.5bn records, 71 million cookies, 1.7 million targeted ads per day. Using AWS allowed move from upfront $500k CapEx to $0, ‘significant’ recurring OpEx to manageable costs, etc. Led to 500% increase in return on ad spend… but led to huge data quantities.

Yelp – 8 countries, 50 cities, 100GB of logs per day… 200 Elastic MapReduce jobs per day on Amazon, processing 3TB of data.

Etsy – >500GB web logs per day, 400K sellers.

Use cases useful… but I’d rather have heard them from the customer than from Amazon.

Storage getting cheaper. Analytics getting faster. Analytics getting smarter…

Pereira – ‘what happens when you turn on the lights?’ Capacity planning etc an increasingly complex issue for power companies. Smart algorithms, cloud computing and big data analysis creating opportunities to get better at these decisions. Deep modelling of resources. Model ‘every single generator in the country,’ etc. These models ‘give better results, because the situation changes dynamically.’

Hurwitz… big data ‘has been a game changer in the legal industry.’ ‘It’s a necessity if your legal department is to be successful.’ When a company gets sued, it goes through a discovery process; it gathers up all the relevant information and passes it to lawyers on both sides. It’s a complex process… and there are consequences if you miss things. SOLR, Lucene, Hadoop and a range of other tools used by legal teams to build searchable pools of the relevant data. Back-end works, but UIs struggle. Machine learning has been applied to classifying documents more quickly, accurately and affordably.

Cukier – data as by-product? Everyone using data internally, and using Big Data to do new things with old data. Has anyone taken data they’ve generated, used it internally, and then realised that it has tangible external value?

Pereira – Colombia sells their database to investors overseas. Energy companies exchange raw data in a marketplace.

Varia – amazon.com and amazon web services see value in DaaS. eg amazon.com products have lots of associated metadata… which is available through an API. Also companies like 80legs crawling the web, storing the data on AWS, and selling the crawl results to third parties.

Questions – how do you get a community (like legal) comfortable with using these tools?

Hurwitz – legal community was resistant. Economic reality that they had to embrace these tools to enable court-mandated e-discovery. A large case could include costs of millions of dollars, just to gather data before the case begins.

Read the original blog entry...

More Stories By Paul Miller

Paul Miller works at the interface between the worlds of Cloud Computing and the Semantic Web, providing the insights that enable you to exploit the next wave as we approach the World Wide Database.

He blogs at www.cloudofdata.com.