Creating a data-sharing standard
Microsoft has recently announced a plan to change that. Desiring a way to standardize the process and terms, Microsoft has published rough drafts of three data-sharing agreements for open use by organizations under Creative Commons licenses. These drafts cover specific data-sharing scenarios and are open for the next few months to feedback from the public.
The idea of pre-written agreements isn’t exactly novel, but Microsoft feels they can do better than what already exists. Microsoft Corporate Vice President and Chief IP Counsel Erich Andersen said that they find it particularly valuable to have terms in place for using data purely for AI-training purposes. They apparently found existing resources lacking in this regard, as two of the three agreement proposals they’ve published involve AI training.
The proposed agreements
The three currently published agreement proposals are:
Open Use of Data Agreement (O-UDA): As the title says, the data sharing with this proposal comes with very few restrictions, just requiring those using it to pass on source and disclaimer information if the data is redistributed. This is an open data license for data with no personal information or ownership issues.
Computational Use of Data Agreement (C-UDA): This proposal specifies the use of the data shared, allowing “the data holder to make data available to anyone for computational use purposes, such as artificial intelligence, machine learning, and text and data mining,” according to the agreement’s overview (PDF).
Data Use Agreement for Open AI Model Development (DUA-OAI): This template even more specifically targets the use of data for training AI. The data itself is confidential, but the resulting AI model and code would be open source.
Open data for open innovation
Purposes for sharing data among organizations vary greatly, so these templates have limited scope. But Microsoft is planning to publish more as a contribution to the Open Data Initiative (ODI). The ODI was announced a year ago as a collaborative mission among Microsoft, Adobe, and SAP to improve customer experiences through optimizing their use of data. In fact, the mission statements of the ODI and these new templates have the same ring to them.
Microsoft cites Answer ALS as a primary example of why data sharing is important. Microsoft has invested $1 million in this project, creating support for Azure in the code of the biomedical research platform Galaxy, and cloud space for the billions of data points being collected from participants by the nonprofit. When the data set is complete, Microsoft will run it through AI analysis. They’ll then make both data and insights completely available to the researchers around the world trying to develop therapies and, they hope, a cure.
These and future proposed agreements could be a nice step toward quicker, smoother data sharing for nonprofits and commercial enterprises. What do you think of these proposals? What sort of agreements should Microsoft work on standardizing next? Let us know in the comments section, or on Facebook or Twitter.
If you liked this post, we think you’ll also like:
[ebook] Keras Succinctly