Welcome To Website IAS

Hot news
Achievement

Independence Award

- First Rank - Second Rank - Third Rank

Labour Award

- First Rank - Second Rank -Third Rank

National Award

 - Study on food stuff for animal(2005)

 - Study on rice breeding for export and domestic consumption(2005)

VIFOTEC Award

- Hybrid Maize by Single Cross V2002 (2003)

- Tomato Grafting to Manage Ralstonia Disease(2005)

- Cassava variety KM140(2010)

Centres
Website links
Vietnamese calendar
Library
Visitors summary
 Curently online :  12
 Total visitors :  7669241

The ‘unsexy’ essential to making agricultural predictive models
Thursday, 2018/06/28 | 08:03:07

CIAT News, by Maria Eliza Villarino | Jun 5, 2018

Figure: Data cleaning includes filling missing values in data sets. Photo by: geralt (via Pixabay)

 

Data science involves creating models that can predict the future, such as what the yields will be for the next planting season. This work, arguably, is “sexy.”

 

There’s an aspect of data science, though, that is “unsexy” but a requisite to actually developing those predictive models.

 

It’s called data cleaning or data curation.

 

Depending on the quality of a data set, data cleaning takes up between 60 percent and 80 percent of a data scientist or a data analytics team’s time.

 

Take for instance the data sets that the CIAT data team had to deal with for the recently concluded 2018 Syngenta Crop Challenge in Analytics. The team took home the competition’s top prize, besting some of the top-notch data scientists and teams from around the globe.

 

The contest’s organizers asked competitors to come up with models predicting the yield of hybrid maize using genetic and climate variability data sets that featured thousands of variables. According to Hugo Andres Dorado Betancourt, a member of the winning team, the data sets were large compared to what he and his teammates would deal with on a regular basis.

 

“We’re used to limited data, not those big volumes of data,” he said.

 

Dorado described the data sets to be of “good quality,” noting that Syngenta did a lot of “preprocessing,” which included using the same text casing and terms to label an entry.

 

Yet, it still took the team two out of the three months allotted to work on its entry and submit it to the competition.

 

The team used those two months to fill missing data and determine which of the variables to keep in order to develop the model. To do this, they sought the help of experts within CIAT.

 

Ordinarily, however, the CIAT data team has to deal with agricultural data sets that are unlike those provided by Syngenta. Thus, the process of cleaning them is longer.

 

Daniel Jimenez, who leads the team, attributed the lengthy time to clean agriculture data to the lack of standardization of common terms used in the field.

 

Some, he said, would call “rainfall” what others would call “precipitation.” Or it would be “farmers” to some, and “producers” or “growers” to others. Some data set entries would have hyphens (-), while others would include the underscore symbol (_)

 

If terms in databases are not standardized, running queries — say the number of farmers in Nicaragua under 45 years of age that had some level of education and own a farm in a region with precipitation above 1,000 mm — would be impossible.

 

“When you talk about FAIR [findable, accessible, interoperable, reproducible] data, we should be able to run those queries and run them automatically. But we cannot do that because we have to, at the moment, manually clean the data.” Jimenez said. “It’s definitely a frustrating process.”

 

Within the CGIAR Platform for Big Data in Agriculture, there’s a group of experts working to standardize agricultural terms. Both the Ontologies Data community of practice and the Organize module have developed tools for data harmonization.

 

Beyond the CGIAR system, there’s still much work to be done.

 

“We have decades of agricultural data that used to be locked up that are now made available. Most of them are unstructured, so it’s kind of a mess,” Jimenez said. “That’s why we need to be more serious about standardization, to make data interoperable.”

 

See: http://blog.ciat.cgiar.org/the-unsexy-essential-to-making-agricultural-predictive-models/

Back      Print      View: 1378

[ Other News ]___________________________________________________
  • Egypt Holds Workshop on New Biotech Applications
  • UN Agencies Urge Transformation of Food Systems
  • Taiwan strongly supports management of brown planthopper—a major threat to rice production
  • IRRI Director General enjoins ASEAN states to invest in science for global food security
  • Rabies: Educate, vaccinate and eliminate
  • “As a wife I will help, manage, and love”: The value of qualitative research in understanding land tenure and gender in Ghana
  • CIP Director General Wells Reflects on CIP’s 45th Anniversary
  • Setting the record straight on oil palm and peat in SE Asia
  • Why insect pests love monocultures, and how plant diversity could change that
  • Researchers Modify Yeast to Show How Plants Respond to Auxin
  • GM Maize MIR162 Harvested in Large Scale Field Trial in Vinh Phuc, Vietnam
  • Conference Tackles Legal Obligations and Compensation on Biosafety Regulations in Vietnam
  • Iloilo Stakeholders Informed about New Biosafety Regulations in PH
  • Global wheat and rice harvests poised to set new record
  • GM Maize Harvested in Vietnam Field Trial Sites
  • New label for mountain products puts premium on biological and cultural diversity
  • The Nobel Prize in Physiology or Medicine 2016
  • Shalabh Dixit: The link between rice genes and rice farmers
  • People need affordable food, but prices must provide decent livelihoods for small-scale family farmers
  • GM Seeds Market Growth to Increase through 2020 Due to Rise in Biofuels Use

 

Designed & Powered by WEBSO CO.,LTD