Barceló P, Baumgartner A, Dalmau V, Kimelfeld B. Regularizing Conjunctive Features for Classification. 38th ACM SIGMOD-SIGACT-SIGAI Symposium on Principles of Database Systems
List of results published directly linked with the projects co-funded by the Spanish Ministry of Economy and Competitiveness under the María de Maeztu Units of Excellence Program (MDM-2015-0502).
List of publications acknowledging the funding in Scopus.
The record for each publication will include access to postprints (following the Open Access policy of the program), as well as datasets and software used. Ongoing work with UPF Library and Informatics will improve the interface and automation of the retrieval of this information soon.
The MdM Strategic Research Program has its own community in Zenodo for material available in this repository as well as at the UPF e-repository
Barceló P, Baumgartner A, Dalmau V, Kimelfeld B. Regularizing Conjunctive Features for Classification. 38th ACM SIGMOD-SIGACT-SIGAI Symposium on Principles of Database Systems
Barceló P, Baumgartner A, Dalmau V, Kimelfeld B. Regularizing Conjunctive Features for Classification. 38th ACM SIGMOD-SIGACT-SIGAI Symposium on Principles of Database Systems
We consider the feature-generation task wherein we are given a database with entities labeled as positive and negative examples, and the goal is to find feature queries that allow for a linear separation between the two sets of examples. We focus on conjunctive feature queries, and explore two fundamental problems: (a) deciding whether separating feature queries exist (separability), and (b) generating such queries when they exist. In the approximate versions of these problems, we allow a predefined fraction of the examples to be misclassified. To restrict the complexity of the generated classifiers, we explore various ways of regularizing (i.e., imposing simplicity constraints on) them by limiting their dimension, the number of joins in feature queries, and their generalized hypertree width (ghw). Among other results, we show that the separability problem is tractable in the case of bounded ghw; yet, the generation problem is intractable, simply because the feature queries might be too large. So, we explore a third problem: classifying new entities without necessarily generating the feature queries. Interestingly, in the case of bounded ghw we can efficiently classify without ever explicitly generating the feature queries.