Table of Contents
SuRI 2014 Keynote Speaker
02/06/2014 @ 10:00 room BC420From Sentiment Analysis to Topic DiscoveryBing LIU, University of Illinois at Chicago
Sentiment analysis (SA) or opinion mining is the computational study of people's opinions, sentiments, attitudes, and emotions expressed in written language. It is one of the most active research areas in natural language processing (NLP) and data mining. The popularity of SA is mainly due to two factors: seemingly unlimited applications and many challenging research problems. SA can be regarded as a semantic analysis problem, but it is also highly targeted and bounded because a SA system does not need to fully "understand" each sentence or document, but only needs to comprehend some aspects of it, e.g., positive/negative opinions and their targets. SA, however, does not seem to be just a sub-problem of NLP. It is more like a mini version of full NLP because SA touches upon every core issue of NLP.
However, due to the targeted and bounded nature of SA, it allows us to perform deeper language analyses to gain much better insights into NLP than in the general setting because the complexity of the general setting of NLP is simply too overwhelming. In this talk, I will first define SA and discuss the current state of the art, and then go into details to discuss one recent study that aims to solve a SA problem but also contributes to machine learning in the areas of topic modeling, lifelong learning, and big data.Organized by Boi Faltings
Workshop on Cyber Risk and Information Security
02/06/2014 @ 11:45 room BC420Timing is Everything: Theoretical and Behavioral Results on Security Decision-Making in Continuous TimeJens GROSSKLAGS, Penn State University
This talk addresses security and safety choices that involve a decision on the timing of security actions. Examples of such decisions include when to make backups, when to patch, when to update passwords, or when to check the correctness of personal financial account information. Similarly, organizations typically set policies for password updates and key renewals. In the first part of the talk, I will present a game-theoretic framework that allows for the study of the economically optimal mitigation timing in the presence of different types of attacks. Particular focus is given to sophisticated attacks that have limited observability from the defender's perspective, i.e., they remain covert. In the second part of the talk, I will report results from several experimental studies that explore performance in timing-related security situations. Results shed light on how our innate abilities and cognitive predispositions shape our security behaviors when timing is the critical decision-making dimension.Organized by Jean-Pierre Hubaux
02/06/2014 @ 14:00 room BC420Integer SafetyJohn REGEHR, University of Utah
Integer overflows can lead to security problems and other bugs. This talk examines some results from checking a large number of C/C++ applications for integer overflows and also discusses the design space for efficient integer safety in programming languages.Organized by George Candea
02/06/2014 @ 15:15 room BC420Systematic Design of Minimalist Remote Attestation for Low-End Embedded SystemsGene TSUDIK, University of California, Irvine
In light of recent remote infestation malware attacks on specialized embedded systems, remote attestation has become a very timely and popular research topic. Remote attestation is the process of securely verifying internal state of a remote hardware platform. It can be achieved either statically (at boot time) or dynamically, at run-time. Generally, software-based attestation methods lack concrete security guarantees, while hardware-based approaches involve dedicated security co-processors that are too costly for low-end devices.
In this line of work, we pursue systematic design of a minimalist architecture based on bottom-up hardware/software co-design. Our work yields a a simple, efficient and secure approach for establishing a dynamic root of trust in a remote embedded device. It is aimed at low-end micro-controller units (MCUs) that lack specialized memory management or protection features. It requires minimal changes to existing MCUs and assumes few restrictions on adversarial capabilities.Organized by Jean-Pierre Hubaux
02/06/2014 @ 16:30 room BC420Opportunities and challenges of autonomous cars• Prof. Arnaud de la Fortelle, Director of Robotics Lab, Mines ParisTech
• Dr. John Scott, Chief Risk Officer, Zurich Global Corporate, Zurich Insurance Group
• Mr. Sylvain Glatz, Telecom Specialist, Swiss Federal Office of Communication (OFCOM / BAKOM)
• Mr. Victor Schlegel, Head Business Intelligence/Big Data Services, Swisscom AG
Technological developments have led to the anticipation that car manufacturers will be able to propose autonomous cars to the market, that public acceptance will be granted and that regulators and insurance companies will find ways to adapt regulation and insurance policies. But as modern vehicles become “smarter,” they may also be exposed to new types of risks.
This session will engage in an informed and open discussion about the opportunities and challenges involved in the development and deployment of autonomous cars on the Swiss and European markets.Organized by Jean-Pierre Hubaux, Michaël Thémans, CRAG and IRGC
03/06/2014 @ 09:15 room BC420A Visitor’s Guide to a Post-Privacy WorldAri JUELS, Cornell Tech, New York
Privacy, in today’s usual sense of confidentiality of sensitive personal information, is an historical anomaly. Until recently, it was commonplace for even the upper classes to live, die, and engage in the most intimate activities always under the eyes of those around them.
There is a considerable risk that we are drifting inescapably toward such a world again. The massive sweep and scale of information technology, however, will now come into play. All manner of personal data—location, health status, social contacts, perhaps even thoughts—may in coming decades be widely and continuously accessible by corporations, governments, friends, and acquaintances.
Today, the risk of abuse of sensitive personal information is typically addressed by ensuring that such information remains confidential. In a post-privacy world, it may be necessary instead to abandon most forms of privacy and aim instead at accountability. If it is impossible to prevent personal information from being leaked, it may be possible, at least, to ensure that it is used fairly. I’ll talk about what a post-privacy world might look like and how we might prepare to navigate it.Organized by Jean-Pierre Hubaux
03/06/2014 @ 10:30 room BC420Targeted cyber attacks: the new challenge in IT securityLevente BUTTYAN, Budapest University of Technology and Economics
Information stealing malware has been increasingly used in recent years in targeted cyber espionage activities. We will discuss in details Duqu and Flame as examples, and touch upon several other malware used in similar attacks. The common in these attacks is that they all targeted important organizations, they were able to remain undetected by traditional security mechanisms for years, and they used advanced infection techniques, often exploiting zero-day vulnerabilities in systems. After analysing some of the unique technical charactersitics of these attacks, we identify the challenges that they represent for the computer security community. Among other things, we discuss the problems of information asymmetry between attackers and defenders, the problems with code signing, and the problems of sharing incident related information.Organized by Jean-Pierre Hubaux
03/06/2014 @ 11:45 room BC420Usable Security: Are we nearly there yet?Angela SASSE, University College London (UCL)
The number of systems and services that people interact with has increased rapidly over the past 20 years. Most of those systems and services have security controls, but until recently, the usability of those mechanims was not considered. Research over the past 15 years has provide ample evidence that systems that are not usable are not secure, either, because users make mistakes or devise workarounds that create vulnerabilities. In this talk, I will present an overview of the most pressing problems, and what research on usable security (often called HCISec in the US) has produced in response to this challenge. I will argue that much of the research to date has missed the point by focusing on improving user interfaces to security mechanisms, or trying or educate or 'nudge' users towards secure behaviour. I will demonstrate that usable and effective security controls need to have minimal user workload and low friction with productive activity.Organized by Jean-Pierre Hubaux
03/06/2014 @ 14:00 room BC420Medical Device Cyber Security: The First 164 YearsKevin FU, University of Michigan, Electrical Engineering & Computer Science
Today, it would be difficult to find medical device technology that does not critically depend on computer software. Network connectivity and wireless communication has transformed the delivery of patient care. The technology often enables patients to lead more normal and healthy lives. However, medical devices that rely on software (e.g., drug infusion pumps, linear accelerators, pacemakers) also inherit the pesky cybersecurity risks endemic to computing. What's special about medical devices and cybersecurity? What's hype and what's real? What can history teach us? How are international standards bodies and the U.S. Food and Drug Administration draft guidance on cybersecurity affecting the global manufacture of medical devices? This talk will provide a glimpse into the risks, benefits, and regulatory issues for medical device cybersecurity and innovation of trustworthy medical device software.Organized by Jean-Pierre Hubaux
03/06/2014 @ 15:15 room BC420Cybersecurity is a Mess; is there a Way Out?Brian SNOW, former Technical Director of NSA
Reports of problems in cyber security continue to grow week by week. I will discuss, in part, what aspects are wrong in current security practice at the conceptual level, as well as at the implementation level. We also discuss WHY things are wrong, and HOW they might be fixed.
A non-exhaustive sample of topics that will be covered:
1. CONCEPTUAL issues such as the mismatch between human trust and cyber trust.
2. The differences between the straightforward effort needed to counter random failures as compared to the difficult effort to counter a generic malicious attack, and finally as compared to the near-impossible effort to counter a targeted malicious attack.
3. IMPLEMENTATION issues such as inadequate randomization, protocol errors, inadequate testing of system components, and absence of essential assurance processes (security "quality control").
4. System Architectural Design Components, that if not present will almost guarantee system failure.
5. Characterizing possible successful paths ahead…
6. Remaining “hard problems”.Organized by Arjen Lenstra
03/06/2014 @ 16:30 room BC420The Unbearable Lightness of Being... MistrustfulMoti YUNG, Google
Designing trusted systems that are strong and secure is a challenging problem. Given attackers that are well financed, dedicated, and highly interested in breaking the global infrastructure is one of the issues we have realized in the last few years (Advanced Persistent Threats, and organization performing mass surveillance, etc.). The talk will concentrate (by examples) on issues related to trust, specifications, and implementations of primitives so that they achieve strong security properties on one hand, and weak security on the other hand. What this means to secure systems overall will be briefly discussed as well.Organized by Jean-Pierre Hubaux
04/06/2014 @ 09:15 room BC420An Automated Social Graph De-anonymization TechniqueGeorge DANEZIS, University College London (UCL)
We discuss a generic and automated approach to re-identifying nodes in anonymized social networks, that allows the fast security evaluation of novel anonymization techniques. It uses machine learning (decision forests) to matching pairs of nodes in disparate anonymized sub-graphs. The technique uncovers artefacts and invariants of any black-box anonymization scheme from a small set of examples. Despite a high degree of automation, classification succeeds with significant true positive rates even when small false positive rates are sought. Our evaluation uses publicly available real world datasets to study the performance of the techniques against real-world anonymizations strategies, namely the schemes used to protect datasets of The Data for Development (D4D) Challenge. We show the technique is effective even given few training examples, or training examples across different social networks.Organized by Jean-Pierre Hubaux
04/06/2014 @ 10:30 room BC420Can I trust my televisionAndrew CLARK, Royal Holloway University London
John Stewart, Vice President and Chief Security Officer of Cisco reported in 2010 that by the end of the year each of us would have on average 5 ip connected devices. He further predicted that by the end of 2013 we would each have 140 devices (a world population in excess of 1 Trillion devices). One catalyst for this change is the exponential growth of ip based appliances such as IPTV, Blu Ray and even refrigerators. Combine this with smartphones and wifi enabled players and we realise that the very shape of our personal networks is changing. Did we reach Stewart’s predictions by the end of 2013?
In this talk we shall discuss a range of implications of this growth including;
Firstly the suggestion that this richness of potential forensic data should be an investigator’s dream – more data equals more information equals more chance of evidential corroboration, but with this explosion comes increased time to investigate and increased complexity of investigation.
Secondly that for many years we have relied on a protective security model where the end point devices can exert some measure of control (anti virus, anti malware etc) but these new appliances will not have the capability to undertake the same level of protection. If these appliances are insecure, what is the likelihood that they can be compromised for malicious use? Is there a risk that their owners may be accused of crimes that they did not commit? Is this the new “Trojan defence” vector?Organized by Arjen Lenstra
04/06/2014 @ 10:30 room BC420Dawn SONG, University of California, BerkeleyOrganized by George Candea
04/06/2014 @ 11:45 room BC420We Need AssuranceBrian Snow, former Technical Director of NSA
Assurance processes and mechanisms demonstrate that a system meets a desired set of properties and only those properties, that its functions are implemented correctly, and that the assurances hold up (even in the presence of malice) through manufacturing, delivery, and life-cycle of the system.
Examples will be given of the need for Assurance processes (and of consequences if none are used) in six areas: operating systems, software modules, hardware features, systems engineering, third party testing, and legal constraints.Organized by Arjen Lenstra
05/06/2014 @ 11:00 room BC420Spatial Computing and the Triggered Instruction Control ParadigmJoel EMER, Intel in Hudson
The historical improvements in the performance of general-purpose processors have long provided opportunities for application innovation. Word processing, spreadsheets, desktop publishing, networking and various game genres are just some of the many applications that have arisen because of the increasing capabilities and the versatility of general-purpose processors. Key to these innovations is the fact that general-purpose processors do not predefine the applications that they are going to run.
Currently, the capabilities of individual general-purpose processors are encountering challenges, such as diminishing returns in exploiting instruction-level parallelism and power limits. As a consequence, a variety of approaches are being employed to address this situation, including adding myriad dedicated accelerators. Unfortunately, while this improves performance it sacrifices generality. More specifically, the time, difficulty and cost of special purpose design preclude dedicated logic from serving as a viable avenue for application innovation.
There recently has been progress in addressing this dilemma between providing programmability and higher performance via an interesting middle ground between fully general-purpose computing and dedicated logic. In specific, spatial computing, where the computation is mapped spatially onto an array of small programmable processing elements addresses many of the cost-related liabilities of dedicated logic and is increasingly being applied to general computation problems. While field prgrammable gate arrays (FPGAs) are the best know spatial computing platform there are also a number of coarse grained variants.
In this talk, we will examine the range of spatial computing alternatives and explore in more depth the concept of triggered instructions, a novel control paradigm for arrays of processing elements (PEs) aimed at exploiting spatial parallelism. Triggered instructions completely eliminate the program counter and allow programs to transition concisely between states without explicit branch instructions. They also allow efficient reactivity to inter-PE communication traffic. The approach provides a unified mechanism to avoid over-serialized execution, essentially achieving the effect of techniques such as dynamic instruction reordering and multithreading, which each require distinct hardware mechanisms in a traditional sequential architecture.Organized by Babak Falsafi
10/06/2014 @ 10:00 room BC420System Support for Handling Transiency in Data Centers and Cloud SystemsPrashant SHENOY, University of Massachusetts
Modern distributed applications are built using the implicit assumption that the underlying data center servers will be stable and normally available, barring for occasional faults. In many emerging scenarios, however, data centers and clouds only provide transient, rather than continuous, availability of their servers. Transiency in modern distributed systems arises in many contexts, such as green data centers powered using renewable intermittent sources cloud platforms that provide lower-cost spot server instances which can be revoked from their users, or smart data centers that voluntarily curtail their energy usage when signaled by the smart electric grid.
In this talk, I will argue for treating transiency as a first-class design concern when building modern distributed systems and applications. I will present bounded virtual machine migration as a general mechanism for handling transiency in a broad class of systems and discuss how such a mechanism can be employed to design higher-level techniques to handle transiency in data centers and cloud systems. I will end with some preliminary results that demonstrate the benefits and feasibility of our approach.Organized by Babak Falsafi
10/06/2014 @ 11:15 room BC 420Interactive Theorem Proving: The Isabelle ExperienceTobias NIPKOW, Technical University of Munich
The dream of machines that prove mathematical theorems automatically goes back to the beginning computer science. Meanwhile it has become a reality, although we had to replace "automatic" with "interactive" to make it work. Modern Interactive Theorem Provers (or Proof Assistants) are like programming environments, except that they help us write proofs instead of programs. Crucially, they check the proofs for logical correctness and (try to) bridge gaps in the argument by constructing missing subproofs automatically.
This talk will present an overview of the field from the perspective of the Isabelle proof assistant. I will talk about a range of applications from the proof of the Kepler Conjecture via the analysis of programming languages and operating systems to the role of proof assistants in teaching. A demo of the Isabelle systems will help the audience to get a feeling for what is involved when trying to convince a machine that some theorem is true.Organized by Viktor Kuncak
10/06/2014 @ 15:15 room BC420From Software to Circuits: Open Source High-Level Synthesis for FPGA-based Processor/Accelerator SystemsJason ANDERSON, University of Toronto
In this talk, we will describe a high-level synthesis tool, called LegUp, being developed at the University of Toronto. LegUp accepts a standard C program as input and automatically compiles the program to a hybrid architecture containing an FPGA-based MIPS soft processor and custom hardware accelerators that communicate through a standard bus interface. Results show that the tool produces hardware solutions of comparable quality to a commercial high-level synthesis tool. LegUp, along with a set of benchmark C programs, is open source and freely downloadable (www.legup.org), providing a powerful platform that can be leveraged for new research on a wide range of high-level synthesis topics. The tool has been downloaded by over 1000 groups from around the world since its initial release in March 2011. The talk will overview LegUp's current capabilities, as well as current research directions underway.Organized by Paolo Ienne
11/06/2014 @ 14:00 room BC420Interactive Machine Learning via Adaptive SubmodularityAndreas KRAUSE, ETH Zürich
How can people and machines cooperate to gain insight and discover useful information from complex data sets? A central challenge lies in optimizing the interaction, which leads to difficult sequential decision problems under uncertainty. In this talk, I will introduce the new concept of adaptive submodularity, generalizing the classical notion of submodular set functions to adaptive policies. We prove that if a problem satisfies this property, a simple adaptive greedy algorithm is guaranteed to be competitive with the optimal policy.
The concept allows to recover, generalize and extend existing results in diverse domains including active learning, resource allocation and social network analysis. I will show results on several real-world applications, ranging from interactive content search over image categorization in citizen science to biodiversity monitoring via conservation drones.Organized by Rüdiger Urbanke
13/06/2014 @ 17:00 room BC420A retrospection of social media analyticsNick Koudas, University of Toronto
This talk will provide an overview of the work that we have been conducting over the last ten years on collecting, storing and analyzing social media data. I will present an overview of the BlogScope system an early social media analytics platform, Grapevine a social news system and Peckalytics. In each case we will outline the main challenges and technology we developed to address them. These include both algorithmic challenges as well as system and design optimizations that enabled us to support real-time analysis at scale.Organized by Matthias Grossglauser
16/06/2014 @ 16:30 room BC420Social Learning in Decision-Making GroupsVivek GOYAL, Boston University
People have always been influenced by the opinions of their acquaintances. Increasingly, through recommendations and ratings provided on all sorts of goods and services, people are also influenced by the opinions of people that are not even acquaintances. This ubiquity of the sharing of opinions has intensified the interest is the concept of herding (or informational cascades) introduced in 1992. While agents in most previous works have only individualistic goals, this talk focuses on social influence among agents in two collaborative settings.
We consider agents that perform Bayesian binary hypothesis testing and, in addition to their private signals, observe the decisions of earlier-acting agents. In the first setting, each decision has its own corresponding Bayes risk. Each agent affects the minimum possible Bayes risk for subsequent agents, so an agent may have a mixed objective including her own Bayes risk and the Bayes risks of subsequent agents; we demonstrate her tension between being informative to other agents and being right in her own decisions, and we show that she is more informative to others when she is open minded. In the second setting, opinions are aggregated by voting, and all agents aim to minimize the Bayes risk of the team's decision. We show that social learning is futile when the agents observe conditionally independent and identically distributed private signals (but not merely conditionally independent private signals) or when the agents require unanimity to make a decision. Our experiments with human subjects suggest that when opinions of people with equal qualities of information are aggregated by voting, the ballots should be secret. They have also raised questions about rationality and trust.Organized by Martin Vetterli
18/06/2014 @ 14:00 room BC420State-of-the-art on Cryptanalysis of the SHA Family of Hash FunctionsYu SASAKI, NTT - JAPAN
SHA-2 is a family of hash function which is internationally standardized and widely used all over the world. NIST determined to use SHA-2 continuously, even after the release of a new hash function standard SHA-3. From this background, the security analysis of SHA-2 deserves much attention. In this talk, the history of the cryptanalysis on SHA-2 is reviewed with respect to the following points.
1. Dedicated cryptanalysis on SHA-2 such as collision attack, preimage attack, various types of distinguisher.
2. Generic attack on a class of hash functions including the design of SHA-2.
3. Dedicated cryptanalysis on SHA-2 in the keyed usage such as MAC.Organized by Serge Vaudenay
19/06/2014 @ 14:00 room BC420Bitcoin contracts --- digital economy without lawyers?Stefan DZIEMBOWSKI, University of Warsaw
BitCoin is a digital currency system introduced in 2008 by an anonymous developer using a pseudonym "Satoshi Nakamoto". Despite of its mysterious origins, Bitcoin became the first cryptographic currency that got widely adopted --- as of May 2014 the Bitcoin capitalization is over 5 bln euro. Bitcoin owes its popularity mostly to the fact that it has no central authority, the transaction fees are very low, and the amount of coins in the circulation is restricted, which in particular means that nobody can "print" money to generate inflation. The financial transactions between the participants are published on a public ledger maintained jointly by the users of the system.
One of the very interesting, but slightly less known, features of the Bitcoin is the fact that it allows for more complicated "transactions" than the simple money transfers between the participants: very informally, in Bitcoin it is possible to "deposit" some amount of money in such a way that it can be claimed only under certain conditions. These conditions are written in the form of the "Bitcoin scripts" and in particular may involve some timing constrains. This property allows to create the so-called "contracts", where a number of mutually-distrusting parties engage in a Bitcoin-based protocol to jointly perform some task. The security of the protocol is guaranteed purely by the properties of the Bitcoin, and no additional trust assumptions are needed. This Bitcoin feature can have several applications in the digital economy, like creating the assurance contracts, the escrow and dispute mediation, the rapid micropayments, the multiparty lotteries.
In this talk I will give a short introduction to this area, present some recent results, and highlight the future research directions.Organized by Serge Vaudenay
19/06/2014 @ 16:30 room BC420Inferring application performance regardless of data completenessAlessandra Sala, Bell Labs Ireland
Modern communication networks, such as online social networks and call networks, give us a unique opportunity of observing, analyzing and better understanding human behaviors. In Telecommunication industry, user’s data are considered a precious source of information to shed light into novel insights to drive the design of future communication platforms. Unfortunately, in the presence of factors such as increasing privacy awareness, restrictions on application programming interfaces (APIs) and constrained sampling strategies, analyzing complete datasets is often unrealistic. For instance, partial network views are basically default in telco analytics, as customers typically have frequent contacts with customers of other providers - which naturally cannot be observed; or, accurately inferring user activity is the Holy Grail of mobile advertisement and targeted service offering because privacy restrictions usually do not allow the logging of complete URLs.
This talk discusses the potential and risks of mining partial data with the analysis of two specific use cases. In the first use case, we unveil the hidden effects in the evaluation of marketing campaign in social networks when the spread of information is estimated from a partial view of the network. The proposed methodology is able to quantify the error introduced due to network partiality based on a theoretical oracle scenario and correct for the introduced error at large extent. In the second use case, we show an approach to mine mobile web traces form heavily truncated URLs and inferring user activities with high accuracy. Truncated URLs are trimmed from information like location or purchased products, to mask possibly sensitive end user data. Furthermore, URLs derived from real web traces are highly noisy because dominated by unintentional web traffic like advertisement, web analytics or third parties scripts. We have developed a statistical model to segregate representative URLs characterizing the user activities from unintentional web traffic and demonstrated that our approach classifies user activities with 92% accuracy.Organized by Matthias Grossglauser
20/06/2014 @ 10:00 room BC420Saving DVFS: Software Decoupled Access-ExecuteStefanos Kaxiras, Uppsala University, Sweden
The end of Dennard scaling is expected to shrink the range of DVFS in
future nodes, limiting the energy savings of this technique. This work
evaluates how much we can increase the effectiveness of DVFS by using a
software decoupled access-execute approach. Decoupling the data access
from execution allows us to apply optimal voltage-frequency selection
for each phase and therefore improve energy efficiency over standard
The underlying insight of our work is that by decoupling access and
execute we can take advantage of the memory-bound nature of the access
phase and the compute-bound nature of the execute phase to optimize
power efficiency, while maintaining good performance. To demonstrate
this we built a task based parallel execution infrastructure consisting
of a runtime system to orchestrate the execution, compiler
infrastructure to automatically concert programs to
Decoupled-Access-Execute, and a modeling infrastructure based on
hardware measurements to simulate zero-latency, per-core DVFS.
Based on real hardware measurements we project that the combination of
decoupled access-execute compiled programs and DVFS has the potential to
improve EDP by 25% without hurting performance. On memory-bound
applications we significantly improve performance due to increased MLP
in the access phase and ILP in the execute phase. Furthermore we
demonstrate that our method can achieve high performance both in
presence or absence of a hardware prefetcher. We are now expanding our
work to target serial programs using a combination of compiler
techniques and hardware features.Organized by Babak Falsafi
20/06/2014 @ 14:00 room BC420Entity-Centric Data ManagementPhilippe CUDRE-MAUROUX, University of Fribourg
Until recently, structured (e.g., relational) and unstructured (e.g., textual) data were managed very differently: Structured data was queried declaratively using languages such as SQL, while unstructured data was searched using boolean queries over inverted indices. Today, we witness the rapid emergence of entity-centric techniques to bridge the gap between different types of content and manage both unstructured and structured data more effectively. I will start this talk by giving a few examples of entity-centric data management. I will then describe two recent systems that were built in my lab and revolve around entity-centric data management techniques: ZenCrowd, a socio-technical platform that automatically connects HTML pages to semi-structured entities, and TripleProv, a scalable, efficient, and provenance-enabled back-end to manage graphs of entities.Organized by Christoph Koch
20/06/2014 @ 15:15 room BC420Microfacet BRDF modelsLionel SIMONOT, Université de Poitiers, France
The volume and surface scattering of a material is usually deﬁned at a macroscopic scale by its bidirectional reﬂectance distribution function (BRDF) that describes the chance of reﬂection for different pairs of incoming and outgoing light directions. It is widely used in remote sensing, science material, lighting simulation or computer graphics.
Microfacet BRDF models are based on a geometrical description of the material surface as a collection of microfacets whose dimensions are much greater than the wavelength. The resulting BRDF dépends on the microfacet slope distribution, the shadowing-masking function and the BRDF of the individual facets. In the original Cook-Torrance model, and in most of cases even today, each facet is assumed to reflect light only in the specular direction. The third factor of the model is then given by the Fresnel relation depending on the material refractive index.
We will present the generalization of microfacet models for any radiometric response of the individual facets. Some crucial points (radiometric proof of the model, choice of the distribution and of the shadowing-masking function, numerical calculations…) will be discussed. And we propose to consider facets consisting of a flat interface on a Lambertian background. The Oren-Nayar model distribution of strictly Lambertian facets and the Cook-Torrance model distribution of strictly specular facets appear as special cases.Organized by Roger Hersch
No Events Found
Joint workshop with HongKong University of Science and Technology
16/06/2014 @ 09:00 room BC420Probabilistic Graphical Models for Aggregating Crowdsourced DataDit-Yan YEUNG, The Hong Kong University of Science and Technology, Hong Kong
Crowdsourcing is a problem-solving approach which takes advantage of the wisdom of the crowd by enlisting a crowd of contributors to get a task done. Some major challenges in crowdsourcing include how to evaluate the contributors and their contributions in the absence of ground-truth information and how to aggregate the contributions of multiple contributors with the goal of outperforming any single contributor. We consider two novel crowdsourcing tasks in this talk. The first one is a video annotation task which seeks to track an object of interest as it moves around in a video. Unlike the relatively simple classification and regression tasks considered by most existing crowdsourcing methods, video annotation is significantly more complex because it involves structured time series data. The second task is peer grading which is especially crucial to the grading of open-ended assignments in massive open online courses (MOOCs). This task is unique in that the peer graders (contributors) are themselves students whose submissions are graded by their peers. We propose probabilistic graphical models for these two crowdsourcing tasks. While the models are different due to the nature of the tasks, a subtle similarity is that learning the reliability of each contributor from data plays a central role in both machine learning models.Organized by Boi Faltings
16/06/2014 @ 10:15 room BC420When Augmented Reality Meets Big DataPan HUI, The Hong Kong University of Science and Technology, Hong Kong
With computing and sensing weaved into the fabric of everyday life, comes the era when we are awash in a flood of data. Augmented reality (AR) is able to collect a tremendous amount of data from the real world and display it in a very natural manner. It enables us to blend information from our senses and digitalised world to myriad ways that were not possible before. Users are allowed to interact with information without getting distracted from real world, and also be able to collect and analyze the growing torrent of big data about user engagement metrics within our personal mobile and wearable devices. It is mutual beneficial to combine augmented reality and big data together. Augmented reality provides a novel way to visualize and render big data. On the other side, big data offers rich information for augmented reality to breed new applications such as personalized recommendation and augmented driving. In this talk, we explore the potential to capture value from the integration of augmented reality and big data, following with several challenges that must be addressed to fully realize this potential and several applications that we have developed in the HKUST-DT System and Media Lab.Organized by Boi Faltings
16/06/2014 @ 11:30 room BC420Constructing Programs from Examples and other SpecificationsViktor Kuncak, EPFL
I will present techniques my research group has been developing to transform reusable software specifications, suitable for users and designers, into executable implementations, suitable for efficient execution. I outline deductive synthesis techniques that transform input/output behavior descriptions (such as examples, postconditions, and invariants) into conventional functions form inputs to outputs. We have applied these techniques to synthesize recursive operations on functional data structures.Organized by Boi Faltings
16/06/2014 @ 14:00 room BC420Object Recognition in a Sunny or Cloudy DayChi-Keung TANG, The Hong Kong University of Science and Technology, Hong Kong
In the first part of the talk we will address the following problem: given a single outdoor image, label the image as either sunny or cloudy. Never adequately addressed, this two-class labeling problem is by no means trivial given the great variety of outdoor images. In the second part we will report our progress on Imagenet Large Scale Visual Recognition Challenge 2014.Organized by Boi Faltings
16/06/2014 @ 15:15 room BC420Phylogenetic Transfer of Knowledge: Beyond Comparative ApproachesBernard MORET, EPFL
Advances in biotechnology have enabled researchers to study molecular biology from the point of view of systems, from focused efforts at functional annotation to the study of pathways, regulatory networks, protein-protein interaction networks, etc. However, direct observation of these systems has proved difficult, time-consuming, and often unreliable. Thus computational methods have been developed to infer such systems from high-throughput data, such as sequences, gene expression levels, ChIP-Seq signals, etc. For the most part, these methods have not yet proved accurate and reliable enough to be used in automated analysis pipelines. Most methods used to infer biological networks rely on data for a single organism; a few attempt to leverage existing knowledge about some related organisms. Today, however, we have data about a large variety of organisms as well as good consensus about the evolutionary relationships among these organisms, so that the latter can be used to integrate the former in a well founded manner, thereby gaining significant power in the analysis. We have coined the term Phylogenetic Transfer of Knowledge (PTK) for this approach to inference and analysis. A PTK analysis considers a family of organisms with known evolutionary relationships and "transfers" biological knowledge among the organisms in accordance with these relationships. The output of a PTK analysis thus includes both predicted (or refined) target data (such as networks) for the extant organisms and inferred details about their evolutionary history. While a few ad hoc inference methods used a PTK approach almost a dozen years ago, we first provided a global perspective on such methods just six years ago. The last few years have seen a significant increase in research in this area, as well as new applications. We review the general approach, show the type of improvement such methods afford over pairwise comparisons, and discuss remaining challenges.
This is joint work with Dr. Xiuwei Zhang, EBI, Hinxton, UK.Organized by Boi Faltings
17/06/2014 @ 09:00 room BC420Faster Machine Learning on Big Data SetsJames KWOK, The Hong Kong University of Science and Technology, Hong Kong.
On big data sets, it is often challenging to learn the parameters in a machine learning model. A popular technique is the use of stochastic gradient, which computes the gradient at a single sample instead of over the whole data set. Another alternative is distributed processing, which is particularly natural when a single computer cannot store or process the whole data set. In this talk, some recent extensions will be presented. For stochastic gradient, instead of using the information from only one sample, we incrementally approximate the full gradient by also using old gradient values from the other samples. It enjoys the same computational simplicity as existing stochastic algorithms, but has faster convergence. As for existing distributed machine learning algorithms, they are often synchronized and the system can move forward only at the pace of the slowest worker. I will present an asynchronous algorithm which requires only partial synchronization, and updates from the faster workers can be incorporated more often by the master.Organized by Boi Faltings
17/06/2014 @ 10:15 room BC420Spatial Crowdsourcing over Big Data, Challenges and OpportunitiesLei CHEN, The Hong Kong University of Science and Technology, Hong Kong
As one of the successful forms of using Wisdom of Crowd, crowdsourcing, has been widely used for many human intrinsic tasks, such as image labeling, natural language understanding, market predication and opinion mining. Meanwhile, with advances in pervasive technology, mobile devices, such as mobile phones, tablets, and PDA, have become extremely popular. These mobile devices can work as sensors to collect various types of data, such as pictures, videos and texts. Therefore, in crowdsourcing, a requester can unitize power of mobile devices and their location information to ask for resources related a specific location, the mobile users who would like to take the task will travel to that place and get the data (videos, audios, or pictures) and then send the data to the requester. This type of crowdsourcing is called spatial crowdsourcing. Due to the rapid growth of mobile device uses and amazing functionality provided by mobile devices, spatial crowdsourcing will become more popular than general crowdsourcing, such as Amazon Turk and Crowdflower.
In this talk, I will first briefly review the history of crowdsourcing and discuss the key issues related to crowdsourcing. Then, I will demonstrate the power of spatial crowdsourcing with our recent developed software, gMission. Finally, I will highlight challenges and research opportunities about spatial crowdsourcing over Big Data.Organized by Boi Faltings
17/06/2014 @ 11:30 room BC420Visualizing the Invisible: Recognizing and Visualizing Emotions in Event-Related TweetsPearl Pu, EPFL
Spectators are increasingly using social platforms to comment about big public events such as sports games and political debates. The quantity of such data is too overwhelming to be processed by a human. During the 2012 Olympic games, 150 million tweets were generated on Twitter alone.
To understand the public's perception of these events, it is important to recognize the subjective content revealed in such "big data". This has motivated us to develop a system to automatically detect and visualize the patterns and trends of user sentiments as expressed in their comments, and how their sentiments evolve over time. Previous work in opinion mining has addressed some of these issues. But the majority of them identify only two categories of emotions: positive and negative, leaving a more detailed and insightful analysis to be desired.
In this talk, I describe EmotionWatch, a data mining and visualization tool, that helps people make sense of spectators’ emotional reactions in public events using a fine-grained, multi-category emotion model.Organized by Boi Faltings
17/06/2014 @ 14:00 room BC420Anisotropic Path ProblemsSiu-Wing CHENG, The Hong Kong University of Science and Technology, Hong Kong
Finding shortest paths is a classical geometric optimization problem, but relatively less is known about cost models that depend on the travel direction than the standard Euclidean case or the isotropic weighted region case. But there are daily examples in which travel direction does matter. For example, the strength and direction of wind, current, or other force field may need to be considered. When planning a roadway or hiking on a terrain, it may be impossible to ascend or descend along slopes that are too steep, and the cost of a subpath may depend on the its slope. In this talk, we discuss some of the algorithmic results that we obtained in recent years, including approximation algorithms for the anisotropic path problems in the plane, on a terrain, and on polyhedral surfaces. In particular, our results allow us to find an approximate shortest path on a terrain with gradient constraints and under cost functions that are linear combinations of path length and total ascent. This talk represents joint work with several collaborators, including Jiongxin Jin, Hyeon-Suk Na, Antoine Vigneron, and Yajun Wang.Organized by Boi Faltings
17/06/2014 @ 15:15 room BC420From Gossip to VotingPatrick Thiran, EPFL
An increasingly larger number of applications require networks to perform decentralized computations over distributed data. A representative problem of these ``in-network processing" tasks is the distributed computation of the average of values present at nodes of a network, known as gossip algorithms. They have received recently significant attention across different communities (networking, algorithms, signal processing, control) because they constitute simple and robust methods for distributed information processing over networks. The first part of the talk is a short survey of some results on real-valued (analog) gossip algorithms. The second part is devoted to quantized gossip on arbitrary connected networks, and to a particular instance of this problem, the voting problem: nodes initially vote for Yes (1) or No (0), and they want to know the majority opinion. We show that the majority voting problem is solvable with only 2 bits of memory per agent.
(This is a joint work with Florence Bénézit and Martin Vetterli).Organized by Boi Faltings
17/06/2014 @ 16:30 room BC420Algorithms for the Matroid Secretary ProblemOla SVENSSON, EPFL
The last decade has seen an increased interest in generalizations of the secretary problem, a classical online problem. These generalizations have numerous applications in mechanism design for settings involving the selling of a good (e.g. ad) to agents (eg. page
views) arriving online. One
of the most well-studied variants is called the matroid secretary problem. The matroid secretary problem is general enough to deal with complex settings and, at the same time, it is sufficiently restricted to admit good algorithms. A famous conjecture states that there is in fact an online algorithm that performs almost as well as any offline algorithm with perfect information.
In this talk, we discuss algorithms that make progress on this conjecture. In particular, we present a new algorithm that improves on the previously best algorithm, both in terms of its competitive ratio and its simplicity. The main idea of our algorithm is to decompose the problem into a distribution over a simple type of matroid secretary problems which are easy to solve. We show that this leads to a O(log log n)-competitive algorithm, which means that we lose at most a O(log log n) factor by making our decisions online compared to selecting an optimal solution calculated offline.
This is joint work with Moran Feldman (EPFL) and Rico Zenklusen (ETHZ).Organized by Boi Faltings