Building Online Communities Exploring Deep Learning

Building Online Communities: Deep Learning

One of the most important takeways from Davos that quickly became widely spread news, was that the world was about to enter the fourth industrial revolution, resulting from a convergence of a number of big technology changes (autonomous vehicles, sensors, biotechnology, 3D printing, robotics, artificial intelligence). One of the most important technological disruption  taking us fast to that extraordinary moment is Deep learning.

Deep learning is a branch of machine learning based on a set of algorithms that attempt to model high-level abstractions in data by using multiple processing layers, with complex structures or otherwise, composed of multiple non-linear transformations.

Making an analogy with the way the brain works, deep-learning software tries to imitate what happens in our brains, more exactly in the layers of neurons in the neocortex, where thinking takes place. Ultimately deep learning software aims to recognize patterns in digital representations of sounds, images, and other data.

What is deep learning? Image source: Slideshare by Larry Brown, Ph.D

Deep Learning is attracting so much the attention of big players, that recently, Google, Twitter and Yahoo have all acquired companies working with deep learning.

Skymind, a San Francisco-based business intelligence and enterprise software firm, is one of such companies. The company developed  an open-source, distributed deep-learning project in Java and Scala. The project is being developed by a team of data scientists, deep-learning specialists, Java systems engineers and semi-sentient robots. Deeplearning4j is the first commercial-grade, open-source, distributed neural net library written for Java and Scala, with one of the most active communities on Gitter

Adam & Chris, the founders of Deeplearning4j   shared their thoughts, experiences and lessons learned on open source community building in the following interview:

Tell us about a little bit about yourself and the Deeplearning4j community. How did it all begin?

We started building Deeplearning4j in late 2013. Adam had been involved with machine learning for about four years, at that time, and deep artificial neural networks were looking more and more promising. The first network in Deeplearning4j was a restricted Boltzmann machine, since that was the net that Geoff Hinton had come up with back in 2006, which was the turning point in the field. I was working for another startup doing PR and recruiting, and had previously worked as a journalist, so I took care of the documentation (and still do), since we believed that proper communication was key to making open-source code valuable.

What are they main issues discussed in the deeplearning4j channel?

The main issues used to be installation. Engineers in the community taught us a lot about how to write clearer instructions, and how to make the code and experience better. If we hadn’t had that feedback loop, Deeplearning4j would be worse. Open source communities are an amazing for quality control! The sooner you fix an issue, the less demands you get from the community about that issue. It’s a great incentive to move quickly.

Now the main issue is loading data and neural net tuning. We are working on communicating better about that, and about making the framework better, so that ETL and tuning get easier. Finally, there are a lot of basic questions about machine- and deep-learning. Many software engineers have figured out that deep learning and machine learning are really powerful tools, so they’re trying to grasp new ideas. We’ve written a lot ofintroductory material, and we point them to various web pages where those ideas are explained.

What common goals do you have as a community?

The community is centered around Deeplearning4j and our scientific computing library, ND4J, which powers the neural nets. So we answer questions about how to use the libs, and in the process, we help people understand more about deep learning in general. It’s not a deep learning hotline, unfortunately, so there are some questions we don’t tend to answer. But we do help engineers in the DL4J community build apps and understand how neural nets work. The common goal is to learn about deep learning, and to build cool shit. We’ve only seen the tip of the iceberg in terms of what deep learning can do. So far, we’ve seen huge advances in image recognition, machine translation, machine transcription and time series predictions. By many metrics, machine perception now equals or surpasses human perception, and that will change society in ways that are hard to imagine. Those changes just haven’t been implemented yet. So the secondary goal of the community is to bring this narrow form of AI into the world, so that it can make a difference.

Deep learning4J: schematic_overview

What are the most important factors that you have taken into account while creating and maintaining the community? What factors contribute to the success of your community?

Creating and maintaining a community is a huge commitment of time and effort. You have to be available, and you have to try to understand where other people are coming from. They don’t always know the jargon to ask precise questions, so you have to have the patience to figure out together with them what they’re trying to ask, or where they’re stuck. We’re not always as patient as we should be. Being available, making that effort, and offering support for powerful tools like this are a good way to build a community. When the makers of a big project are available to answer esoteric questions about how it works, that creates a lot of trust, because people know that you speak with authority and that if something is really broken, it’s going to get fixed. There’s a tight feedback loop between the community and the project creators.

What are the key challenges that you encounter while managing the community?

One of the challenges is: What questions do we care about, and what questions do people need to answer for themselves? If someone has really basic questions about Java, an IDE like IntelliJ, or a build tool like Maven, most of the time they need to figure that out for themselves. Our Gitter channel isn’t the right place to hash through that, although we do help in special cases, because sometimes you need to expand your heap space for neural nets to work.

You also have to find a balance between building the community and building the product. Ideally, you’d have a big team with full-time support engineers and the rest of the team working on the code base. But most open-source projects have very small teams. There are just a handful of people capable of support, and they’re the ones who also should be fixing bugs and adding features.

The promise of machine learning Image source: Slideshare by Larry Brown, Ph.D

How do you encourage participants’ commitment and contribution to the community?

You create a smart, friendly environment in the community. You remind them you appreciate contributions, and you show them, as best you can, what needs to be worked on. We created top-level files recognizing our contributors, showing people how to contribute, and laying down the rules of the community. We also wrote a devguide, and we now label all issues as bug, enhancement or documentation, so that people can scan the queue quickly and explore where they can add something.

Tell us a little bit about the time commitment required to set up and establish the community. How much community maintenance is required on an ongoing basis?

Skymind is a distributed team, with engineers in Australia, Europe and the US, and Deeplearning4j community members in almost every time zone. There’s a Skymind engineer watching the Gitter queue probably 12–16 hours out of any weekday. This is a pretty serious commitment, because there are less than 10 of us. It’s not their full-time job, but maybe they’ll be running unit tests and answering questions on Gitter in their downtime.

Based on your experience, do you feel that the open source communities have changed and evolved over the past years? If so, how?

Open-source is winning the enterprise stack, so it’s a lot more important than it used to be. The biggest organizations in the world are running on open-source software. Linux won the operating system, Hadoop won big data storage. And open-source won because when you do it right, you get better code. More eyeballs mean more uptime. So the size of the OSS community, and the quality of attention that software engineers bring to open-source projects, have both increased over the years.

What advice would you give to someone who wants to start an online open source community from scratch?

First, build something neat, something you care about. Focus on building one thing that works. Then, share it with people. They will help you improve it, and they may help you think about what to build next. Don’t do too much big upfront development. Try to scope it so that you can ship in a reasonable amount of time. A few weeks, say. Open-source is valuable because it’s a conversation, and the conversation leads you places, so that you and project evolve in ways you can’t anticipate. Also, by open-source early, you’re increasing your exposure and therefore your chances of getting help. We’ve had amazing developers join the community and the Skymind team.

What digital tools do you use to help manage and grow your community?

The code lives on Github, the conversation lives on Gitter. There are about 1360 devs on the Gitter channel now, so it’s probably one of the more lively neural net conversations on the planet. Our website is hosted on Github, so the content lives there, too. We generate a lot of automatic documentation with Javadoc (always a WIP…). We ask people to use Maven as their automated build tool. One of the biggest problems with any software is the install, and Maven helps make that a little easier. You need to constantly try to clear away obstacles, so that people can just use your code and not worry about other stuff.

Can you share a success story of a community member that happened thanks to their participation in your channel?

For most of the stories, you just had to be there. But in general, a lot of data scientists and Java engineers come, and they just build something for their companies that works. They’ll come back later and say: “We saw a 200% increase in ad coverage when we made DL4J part of the recommender system.” Another guy built an app with DL4J and then an investor saw it and he raised funds. So that’s all pretty cool. With open source, you’re throwing a rock out into the ocean, and you don’t always hear it hit the water. You can’t even see the ripples. So it’s encouraging when people come back and say “thanks” and tell us how it helped them. That makes it more meaningful.


You can visit the deeplearning4j community on Gitter.