Welcome!

The Interface Between the Worlds of Cloud Computing and the Semantic Web

Paul Miller

Subscribe to Paul Miller: eMailAlertsEmail Alerts
Get Paul Miller via: homepageHomepage mobileMobile rssRSS facebookFacebook twitterTwitter linkedinLinkedIn


Blog Feed Post

O’Reilly Strata Conference – more keynotes

Here goes with the raw notes from the final day’s keynotes.

Simon Rogers, The Guardian Data Store

What we do with data. Guardian started in Manchester, first issue 4 pages. That first issue included a table of data about schools in Manchester; number of pupils, cost, etc.

Guardian now… aggregates data published by Government departments in PDF, extract data values and present in useful manner.

guardian.co.uk/data – Data Store. Make data available from all over the place, do some stuff with it, but invite others to engage, enhance, correct, use.

“Every time we publish government data, we get 2-3 calls from Gov departments, asking if they can have a copy.” :-)

Making data available to end users means they follow their interests… sometimes results in stories for the news.

Wikileaks… “without Guardian data journalism, we wouldn’t have got the stories from the data.”

Journalists had the data… but didn’t know what to do with it in order to find and tell stories. Needed to provide tools for journalists so that they could navigate the data.

Wikileaks first story – IED cases in Afghanistan; flat visualisation for the paper, plus an interactive tool on the site.

Google Fusion Tables – “an absolutely brilliant tool for mapping lots of data very quickly. Thank you, Google.”

Data exploration tools prove more popular than canned visualisations. People want to explore and understand the data.

Embassy Cables – latest release. As with the others, the paper chooses not to release some of the data. “4 million people can see these ‘secret’ cables.”

The power of stories; didn’t kill journalism. Instead it enhanced it. Provide good journalists with the tools to enrich their stories.

Guardian journalist James Cameron; “the only questions left will be answered by computers because only computers will know the questions to ask.”

Next, a panel… Amber Case, Brad Cross (formerly Flightcaster), Toby Siegler (Metaweb/ Google)

“How dependent are we going to get on data to tell us how to lead our lives?”

Amber – “Very. It will become not only convenient, but also the thing to do.”

Toby – “We’re already dependent on data… we just need to get better at deciding what to ignore.”

“Will we stop asking questions, and wait for the computer to tell us what to do? Will we reach The Shallows?”

Brad – “not clear that people distracted by twitter etc were the deep thinkers before…”

Amber – “large proportion of the population may just consume the channels they consumed before…”

Brad – “right now we only really have search… over time we’ll have more systems that do more for us… but they’re not too far away.”

“We put knowledge into a data store… sales figures, etc. With Big Data we don’t necessarily know what we know or want to know… so should we just keep everything?”

Amber – yes.

Toby – “what people say about themselves is a lot less useful than what they do. If you want to predict what people would do, you should collect the data exhaust and look at what they have done.”

Ethically… if we’ve captured all of our data… can we then subpoena data to show where people were, what they were doing, etc. How will law catch up to an expectation that data is available for all of our actions?

Brad – “not new. The problem will get worse, but it’s just something we have to deal with.”

How will this change government? Representative democracy is a hack… we can’t afford to send everyone to Washington/Ottawa. Can digital take us to a real democracy?

not sure they’re sure…

Amber – recording more data to greater resolution is a good thing and will deliver benefits… so long as it’s not always personally identifiable.

What’s the next technology to unlock big data…?

Amber – “not brain implants… Better location-based data, that knows where you’re going and pushes relevant data to you in real-time.”

Brad – “a mix of what’s happening on the internet now, and on mobile now. When you connect information about you on the server with information about you on your mobile device, there’s some really neat stuff that happens there… an agent.” Sounds like Siri?

Toby – “tried an experiment… connected phone records, sms etc to my social network… it was really cool… Integration of your personal data in this way, just for you to look at, is really useful.”

Next, Ed Boyajian from EnterpriseDB. Legacy Databases and the Data Deluge. Sell commercial version of free Postgres database. Enterprise customers coming to them, asking to solve problems with big data.

“Compare growth of IT budget to growth of data… and business demands to use that data more effectively… Gap between growth of data (and demands on it) and IT budget creates real problems.”

“By moving to open source database solutions, you can save money.”

“Mission for our company is to bring low cost high performance solutions to the market.”

Barry Devlin. 25 years ago this month, the first paper describing a data warehouse architecture was published.

Next, DJ Patil from LinkedIn. Analytics, and building teams around data/ analytics.

‘To connect the world’s professionals and help make them more productive and successful’… requires data. Insights on activity today, plus insights on possible career directions.

Showing graph of data science from LinkedIn data… demand is taking off. See the job board here, too.

529 of 790 LinkedIn-using attendees are connected to 1 other.

30 to 20 or more

85 to 10 or more

189 to 5 or more.

Is that good? Is it what we’d expect?

“Data scientists are frustrated by an inability to ship product.” “LinkedIn tried something different – made data science a top-level product team.”

good talk… but then I needed to slip out to a meeting…

Read the original blog entry...

More Stories By Paul Miller

Paul Miller works at the interface between the worlds of Cloud Computing and the Semantic Web, providing the insights that enable you to exploit the next wave as we approach the World Wide Database.

He blogs at www.cloudofdata.com.