 
	
Network systems have long relied on data — traffic measurements, system configurations, demand predictions, etc. — for planning, engineering, operations, and trouble-shooting purposes. In the era of artificial intelligence (AI), data will be even more directly driving the network systems where AI methods are employed. A plethora of recent programs at NSF and across federal agencies — e.g., National Research AI Institutes and a rich slate of other programs — will be ushering in exciting innovative ideas to apply AI in network systems. Enabling this exciting wave of research demands extraordinary new network data, as well as methods and policies for collecting them. AI researchers will be exploring new kinds of data, at unprecedented scopes, granularities, and scales, to revolutionize how network systems support the broad range of networked applications our society depends on. New application ideas would likely need new data, whereas new kinds of data would reveal new insights about the networks, applications, and their users, and would stimulate even more new application ideas. In a future world where AI is pervasively deployed across networks and applications the society depends on, having a good understanding of the data that will be shaping them and subsequently shaping us is timely and critical. What data will be made available for AI network systems? How will they be collected and used? What properties of such data, and in what ways, would they impact the networks and applications in technical, legal, and ethical contexts? And, most importantly, how will the mindsets of AI researchers, developers, and our next generation of students evolve and transform as we head into the future?
The Workshop on Data for AI in Network Systems intends to facilitate a discussion among AI, networking, and security research communities on:
Those interested in attending must submit a whitepaper by September 18 to receive an invitation.
While discussion of these questions will most certainly be in the contexts of various forms of AI methods and algorithms, the participants are encouraged to focus more on the logical relation among data, system, and application objectives than on algorithm specifics. Participants are also encouraged to broaden the scope of consideration beyond network systems to include the applications, systems (e.g., edge, cloud) and people, and discuss how AI solutions’ direct and indirect influences on the stakeholders in the society in technical, legal, and ethical contexts.
AI is used more successfully today for network security functions such as malware identification and mail filtering. Development of tools for these particular use cases has been accelerated because researchers have been able to acquire openly-shard large data sets to construct and train their endless ideas. For example, the publicly available EMBER (Elastic Malware Benchmark for Empowering Researchers) and SoReL-20M data sets contain millions of malware signatures with raw features.
The use of AI and ML on other network functions is far less. Aside from attempts to “AI-enable” certain legacy network functions, the “closed” nature of today’s network systems meant that very few researchers have the visibility into them to have sufficient information and motivation to explore new possibilities. The data available to AI researchers today are very limited, such as 1) bandwidth, 2) latency, 3) n-tuple with source, destination, and protocol, etc. This characterization of networks is not sufficient for use by applications in the future. To stimulate new AI ideas for network systems, it is important to consider objectives beyond the raw performance and economics of the infrastructure alone. Societal equity and ethical considerations are as important for networks AI research. Increasingly, errors and biases witnessed in AI applications can be traced back to their training data not grounded in equity, fairness, accountability, transparency and ethics (FATE).
This workshop is funded by the NSF and organized by Anita Nikolich, Ron Hutchins, Kuang-Ching Wang, and Tho Nguyen.