This is my personal blog. The views expressed on these pages are mine alone and not those of my employer.

Wednesday, October 10, 2018

Using machine learning to index text from billions of images

The last year and a half I've led a project to take the computer vision/deep learning OCR pipeline I built at Dropbox and automatically run it and several other advanced machine learning models on billions of images daily in Dropbox to extract text for search. This turned out to be one of the largest computational projects Dropbox has ever done. The feature went live yesterday.

We published a technical blog post with more technical details on the system:

In our previous blog posts, we talked about how we updated the Dropbox search engine to add intelligence into our users’ workflow, and how we built our optical character recognition (OCR) pipeline. One of the most impactful benefits that users will see from these changes is that users on Dropbox Professional and Dropbox Business Advanced and Enterprise plans can search for English text within images and PDFs using a system we’re describing as automatic image text recognition.

Read more

Wednesday, April 12, 2017

Creating a Modern OCR Pipeline Using Computer Vision and Deep Learning

In this post we will take you behind the scenes on how we built a state-of-the-art Optical Character Recognition (OCR) pipeline for our mobile document scanner. We used computer vision and deep learning advances such as bi-directional Long Short Term Memory (LSTMs), Connectionist Temporal Classification (CTC), convolutional neural nets (CNNs), and more. In addition, we will also dive deep into what it took to actually make our OCR pipeline production-ready at Dropbox scale.


Monday, December 12, 2016

Notes from Neural Network NIPS 2016 conference

I've put up all my notes from the NIPS 2016 conference in Barcelona here. It includes lots of deep and reinforcement learning talks, paper sessions, and more. Hopefully these notes are useful for you!

Friday, January 22, 2016

Some Surprising Results of What it Would Take to Transport a Million People to Mars

Elon Musk of SpaceX has said he wants to send one million people to Mars. What would it take in terms of the transport side of the equation to actually make this happen?

My good friend Jeffery Greenblatt analyzes emerging technologies to answer questions exactly like this; for example, he recently published a scientific paper in Nature Climate Change on the impact of autonomous cars on greenhouse-gas emissions.

He has now turned his attention to the transport side of getting a large population to Mars:

"We perform the first comprehensive assessment of the energy, resource and infrastructure requirements of a large-scale human transport system between Earth and Mars. In it, we develop credible mass estimates for a system consisting of four appropriately-sized reusable spacecraft to move humans, and four additional types of reusable spacecraft for moving propellant (hydrogen/oxygen and methane/oxygen) from the Moon and Mars to in-orbit depots. Human consumables (air, water, food) and cargo mass estimates were included in the analysis. We base our estimates on public sources, and develop scenarios of infrastructure scale-up to achieve a Mars settlement size of 1 million people by the first half of the 22nd century. We do not examine the requirements of the Mars settlement itself."

His result uncovers some surprising repercussions of such large-scale transport; see his blog post and the paper itself for more details.

Tuesday, January 05, 2016

Cloudless: Open Source Deep Learning Pipeline for Orbital Satellite Data

I'm proud to announce the 1.0 release of Cloudless, an open source computer vision pipeline for orbital satellite data, powered by data from Planet Labs and using deep learning under the covers. This blog post contains details and a technical report on the project.


Sunday, December 13, 2015

Ten Deep Learning Trends I Saw at NIPS 2015

I attended the Neural Information Processing Systems (NIPS) 2015 conference this week in Montreal. It was an incredible experience, like drinking from a firehose of information. Special thanks to my employer Dropbox for sending me to the show (we're hiring!)
Here's some of the trends I noticed this week; note that they are biased towards deep and reinforcement learning as those are the tracks I attended at the conference:


Thursday, December 10, 2015

NIPS Day 3 Posters

These are the posters that caught my eye on the third day of NIPS. Note that they are HDR images, so zoom in using your computer to see more detail. Some of the posters below have extra images focusing on specific parts of the poster to give more visual detail.


Wednesday, December 09, 2015

NIPS Day 3 Morning Sessions

These are my notes from the morning sessions of day 3 of NIPS.


Tuesday, December 08, 2015

NIPS Day 1: Tutorials on Scaling Deep Learning, Probabilistic Programming, and Reinforcement Learning

I'm at my first annual NIPS conference this year in Montreal, the annual pow-wow for machine learning and deep learning in particular.

Monday, the first day, had several multi-hour in-depth tutorials from literally the folks that wrote the textbooks in these areas. I attended sessions on scaling deep learning via TensorFlow, presented by Google folks like Jeff Dean; a deep dive into probabilistic programming (being able to describe a statistical system and allow an inference engine to do the hard work of building a model from it); and an introduction to reinforcement learning (using a scalar reward signal to automatically discover the optimal policy for a behavior).


Monday, October 12, 2015

Intelligence Augmentation and the Myth of the “Golden Lost Age”

Maarten van Emden has a great piece on Douglas Engelbart and Intelligence Augmentation (IA) versus Artificial Intelligence (AI). First go read that piece then come back here :) This post follows up a bit and comments on some of what Maarten says.

Most people don't realize that much of the computer industry has been about the struggle between two worlds, IA and AI. John Markoff has chronicled the tension between both of these well with his books What the Dormouse Said: How the Sixties Counterculture Shaped the Personal Computer Industry and his newest book, Machines of Loving Grace: The Quest for Common Ground Between Humans and Robots.


Monday, September 21, 2015

Personal Photos Model Using Deep Learning

I've spent the last year going deep into deep learning, machine vision, and autonomous systems. I've completed two Coursera courses, Andrew Ng's Introduction to Machine Learning and Geoffrey Hinton's Neural Networks course. I've also been trying to stay on top of the (crazy flood) of deep learning literature via I've been doing this part time in the evenings and weekends while I work at Dropbox, generally via the colearning study groups I host weekly.

A few months ago I wanted to ground the theoretical knowledge I've been gaining with an actual coding project before taking more courses. Nothing like trying to actually apply what you've been learning to humble you :)

The attempt was to allow someone to train a neural network over their personal photo collection in order to do face detection on the people in those photos. They could then organize the photos by those people into automatic groups.


Monday, August 24, 2015

Where Will Work Be in 10 Years? recently asked me to write a short 100 word answer on what our workplaces will look like in ten years. One hundred words is pretty small; here's my full response:

It's not about what work will be in 10 years, but what it should be. Service work and the Uber-esque sharing economy support middle-class lifestyles via progressive legislation; these jobs can't be outsourced and support a real living wage. Silicon Valley makes its service workers (janitors, etc.) first class employees. Computers and networks eliminate and subsume middle management; collaborative tools allow teams to self-assemble, communicate, and work nimbly. AI doesn't replace people, but augments them. We learn to harness both analytics and empathy.
Today Silicon Valley companies like Google see their offices as exclusive preserves to provide high-end amenities to keep their employees. In the future they will integrate much better into their local communities, providing public spaces and leaving behind infrastructure that raise the quality of life for everyone, not just employees. The physical membrane separating offices from the outside will dissolve a bit.
The importance of physical offices will also decline; we are actually at peak "real office". Collaborative tools and stronger individual and organizational skills will allow telecommuting to truly go mainstream. Coworking spaces will provide community for these workers.

Thursday, May 07, 2015

Roots of Coworking Keynote at GCUC 2015

I gave a keynote at the GCUC 2015 coworking conference on the roots of coworking, giving my personal story on why I created coworking and what led up to it.

This is a slightly longer version of the talk I recorded before hand, with extra material:

Some nice tweet reactions from folks in the audience:


Thursday, November 06, 2014

Left Inkling

I haven't been public about it, but I left Inkling 2 1/2 months ago. It was a fabulous place to work but after nearly four years it was time to move on. During this time I've been doing part time front end engineering consulting for a startup. I'm interviewing with Dropbox tomorrow to do part time consulting for them as well. I've also been taking this time to learn new skills, in particular taking a Product Management course at General Assembly and teaching myself machine learning and neural networks.

Monday, April 14, 2014

Digital Storytelling Conference 2014

On Wednesday last week a bunch of us at Inkling jumped in a van in San Francisco and drove eight hours to U.C. Irvine for the Digital Storytelling conference. It was a great road trip and a fabulous one-day conference.


Monday, March 17, 2014

Next Generation eBook Review: The Glo Bible: Surprising Lessons We Can Learn from Religious eBooks

Today I'm reviewing a very interesting next generation religious eBook named Glo. The digital religious and eBook communities don't tend to talk to each other. However, the work happening in these digital religious communities around eBooks is incredibly interesting for several reasons.

Monday, March 10, 2014

You're Doing Web & eBook Footnotes Wrong

The Problem

Most web pages that have footnotes blindly mimic paper and put them at the bottom of the page, jumping the user to the footnote when clicked on:

Picking on Paul Graham's footnotes
This is silly for many reasons. First, computer screens aren't paper; they can easily accordion open and show extra information based on user intent. Second, they cause a user to lose context while reading an article, forcing them to jump away from what they are doing; this is annoying.

Monday, March 03, 2014

Inkling Habitat: How a 100,000 Line JavaScript Application Focused on Digital Publishing is Built

Last week I introduced you to one of the original animating ideas behind Inkling Habitat, treating books as software to transform the eBook production process. Today I'd like to take you behind the scenes and show you the technologies and processes we used to build Inkling Habitat itself. How did we build this software?

How is Inkling Habitat Built?

First, Inkling Habitat is a client side application that runs inside your web browser, built with JavaScript, HTML, and CSS. The client-side portion is roughly 100,000 lines of JavaScript, which is a big application.

Tuesday, February 25, 2014

Transforming eBook Production by Treating Books as Software

Photo by Jixar
Digital books are bundles of HTML, CSS, and JavaScript. How do we efficiently convert and create these pieces of software?
These bundles can be very complicated. For example, Ganong's Review of Medical Physiology has thirty-nine chapters. Now add in interactive quizzes, 3-D models, high-definition video, educational slide lines, and pop tip glossaries/footnotes, and if you're not careful you will need a small army to produce every eBook. If next generation eBooks require Fabergé egg levels of care and expense we'll never get the scale and quantity we need to make this new world real.
It turns out over the last fifty years we've developed an incredible set of techniques and tools for dealing with artifacts of incredible complexity: computer software itself. These tools include:
  • Source control systems
  • Issue databases
  • Automated testing
  • Cross compilers
  • Integrated Development Environments (IDEs)

Monday, February 17, 2014

You Are Not in the Book Business: You Are in the Long Form Content Business

Books are not a goal unto themselves. Instead, they are a means to an end: the transmission of authoritative long form content that possesses depth and breadth. Note that I'm specifically talking about illustrated non-fiction here; literary fiction is an entirely different animal.
Organizations fail when they identify themselves with a particular technology rather than a goal.

Monday, February 10, 2014

Making EPUB3 Play Nice with HTML5

Photo by adrigu
Last week I wrote about how EPUB3 is important even in an HTML5 universe. Today I want to write about how to make EPUB3 play nice with HTML5; as it turns out there are some significant problems when you try to use EPUB3 in HTML5.

Tuesday, February 04, 2014

What Early Rome Can Teach Us About Power & How We Lie to Ourselves

I was reading over Livy's The Early History of Rome recently and the following passage jumped out at me. It always amazes me how those in power rationalize their right to stay in power, independent of whether it's just.
To set the context, Rome has just deposed its king and monarchy and has formed a fledgling republic. The disgraced royal family travels the land trying to drum up an army to attack Rome and regain power. Talking to another king outside Rome they counsel:

Monday, February 03, 2014

Does EPUB3 Have Any Place in an HTML5 Universe?

Me throwing the HTML5 Gang Sign

A common question I hear is whether EPUB3 is useless above and beyond HTML5. Why not just eliminate EPUB and use plain HTML5?
EPUB3 does bring new things to the table that HTML5 does not provide.
I see standards and technologies like ecosystems that respond to the challenges that are thrown at them. Before eBook systems, the web only had infrastructure for individual web pages that were either documents or application-like. HTML before HTML5 didn't even help much with applications, except for basic forms, so HTML5 provided features to help with this area (offline web apps, the canvas tag, etc.).
There has never been evolutionary pressure on HTML to have better long form reading for artifacts akin to eBooks, so it never really provided the facilities to help with this.

Friday, January 31, 2014

Maybe U.S. Manufacturing Isn't As Bad As We Thought...

Louis Hyman, an American writer, economic historian, and old college friend of mine, recently shared an interesting report from the Congressional Research Service on the state of U.S. manufacturing.
It's notable for some counter-intuitive results. Here's the high level findings:

Thursday, January 30, 2014

Design Tips from a Design Pioneer

I was recently reading an interview with Hugh Dubberly, a design pioneer involved with HyperCard, the Knowledge Navigator film, Netscape, and more.
The following jumped out at me on the role of design, re-iterating to me how important it is to both take a global, systemic perspective while also firmly keeping who the actual customer is in mind:

Sunday, January 26, 2014

My Impressions of Digital Book World 2014

I recently attended Digital Book World 2014 in NYC and wanted to write a little bit about my experiences and opinions of the conference. First, I just wanted to thank Inkling for sending me to the conference as part of company training. Employees can elect to visit conferences and workshops and Inkling sent me to the conference in NYC from San Francisco all expenses paid, so I appreciate that.

One of my litmus tests for whether a conference was good or not is if I come home seeing parts of the world in a new way.

How was Digital Book World? Well, let me just say that it made me really miss the passing of Tools of Change. I personally found Digital Book World a bit corporate and stuffy, to be honest. I found many of the speakers not as well prepared as I would have liked, and many of them didn’t seem terribly passionate.

Sunday, January 19, 2014

The Three Things that Can Transform eBook Development

Developing next generation eBooks that have complex non-narrative content is like web-development was in 2001. You are dealing with a sea of balkanized reading platforms, many of which have buggy implementations of web standards, little documentation, and make it difficult to even do simple things well.
Designing eBooks in this world is a frustrating exercise. How can we move the needle and improve the state of the art across many of these reading systems so that eBook development becomes not a struggle and a pain but a joy and a pleasure?
Image by martinak15
I think a strategy of three key things could help to drastically change eBook production in the eReading world: Document, Score, and Shame/Success.

Thursday, January 16, 2014

The Start of Coworking (from the Guy that Started It)

I've seen a number of inaccuracies in news stories and on the Wikipedia page for coworking, and I wanted to write up a short article on the beginning of coworking to correct these.
Two coworkers from the first coworking space

Did I invent coworking and how did it start?

Yes I invented coworking.
In 2005 I was working at a startup named Rojo and was unhappy with my job. Before that I had worked for myself doing consulting and traveling and hungered for the community a job can provide. At that point I was confused because I had both worked for myself and worked at a job and was unhappy because I couldn't seem to combine all the things I wanted at the same time: the freedom and independence of working for myself along with the structure and community of working with others.

Monday, January 13, 2014

Introducing Stretchtext.js: Easily Communicate to Different Audiences in a Single Page

Photo by Marco Raaphorst

As a writer, I often struggle with whether to write a tutorial or document to beginners or experts. What if there's some interesting tangent that some readers might find informative but others want to ignore? What if you want to drill down deeply into a subject while just skimming the surface in other areas?

Digital screens can accordion open and closed and change their shape depending on the needs of the reader.
Traditional paper forces writing to be static and fixed: either you target beginners or experts. You can't provide tangents that readers can choose to follow or not. The reader and writer are both stuck in a single gear-shift.
Web pages are not paper and shouldn't mindlessly mimic dead pulp. Digital screens can accordion open and closed and change their shape depending on the needs of the reader.
I've created an implementation of Stretchtext in JavaScript which gets around these problems. You can mix in bits of Stretchtext into your page to allow readers to drill down into specific areas based on their interests and background. 

Monday, December 16, 2013

Away for Honeymoon

I'll be away from December 6th until January 7th on my honeymoon in India. There won't be any blog posts during this time and I won't have access to email. See you in 2014!

Friday, December 13, 2013

Touch Press: Complexities and Challenges of Creating Rich Digital Books

There are three key metrics that have to come together for digital illustrated, non-fiction titles to truly make the transition online. They are:


This page is powered by Blogger. Isn't yours?

Subscribe to Posts [Atom]