Wikimedia wants to make it easier for you and AI developers to search through its data

The late English writer Douglas Adams is best known as the author of the 1979 book Hitchhiker management of the galaxyThe field but Adams has much more than what is written in his Wikipedia inputField regardless of need Know that it is Birth sign Fish or that libraries around the world store their books under the same black numbers – 13230702 – You Maybe If you go to I overlooked the angle Vikimedia Foundation called Wikidata.

There are images, text, keywords and other information related to Adams are stored as in Web page and for robots among us, in formats intended for machines such as JsonField

Now Wikidata receives a new database suitable for artificial intelligence, which facilitates large language models to use information. The database comes from Wikipedia introduces the project From the German head of the Vikimediy Foundation Vikimedia Dohland, who oversees Wikidata. The Berlin team held a large language model last year to turn 19 million records into Wikidata from clumsily structured data into vectors that reflect the context and value around Wikidata.

In this vectorized format, information is best presented as a schedule with points and interconnected lines – Adams will be associated with a “person”, as well as the names of his books, Lidia Pinsher, the leadership of the Vikidata portfolio, said GraveField

While the interface user experience will remain the same, no, Wikipedia No According to the project managers, they become a bout -end chat -End will become easier for AI developers to gain access when creating, for example, their own chat bots using data.

According to Pinscher, the purpose of the project is to align the game field for artificial intelligence developers outside the Moninied Core of Big Tech. Companies such as Openai and Anpropic have Wikidata vectorization resources, as Pintscher and her team did. These are smaller outfits that most benefit from a new access to curatorial data stored in Wikidata storage facilities. “Indeed, for me, we are talking about giving them this advantage and at least give them a chance, right?” Pinzsher said.

She points to Government As an example of a project that used the vast data of Wikidata, supervised by volunteers forever. The platform allows users to find pens and emails on social networks for state officials around the world.

Most chats of AI are priority for popular words and topics on the Internet. According to Pintscher, in addition to giving a little technology up, the team hopes that a lighter access to Wikidata will lead to the fact that artificial intelligence systems that better reflect niche topics that are not widely represented on the Internet. This may be the best way to get information in ChatGPT, for example, “to generate a ton of content, and then wait the next time ChatGPT will go, and perhaps not, not taking into account what you have carried,” Pinzer said.

In practice, vectors will allow AI systems to access the context around information in addition to the information itself, said Philip Saade, the project manager of the Wikidata AI project, said GraveField

The team used the model from the Jina AI artificial intelligence company to turn the structured Wikidata data, captured before September 18, 2024, into vectors. IBM Company Datastax currently provides infrastructure for free storage of vector database in the project.

The team is waiting for reviews from developers who use the database before updating it using information added over the past year. Although the current database does not include completely new information added last year, Saadé says that small changes or changes for existing Wikidata will not reduce the usefulness of the database. “In the end, the vector that we calculate is similar to the general idea of ​​the subject, so if some small editing was made on Wikidata, it will not be super significant,” he said.

Follow topics and authors From this story, to see more likely in your personalized home page and get updates by e -mail.


Leave a Comment