Des livres qui ont de la valeur: Most Secret War
(Ceci est le début d'une série d'articles de blog sur des livres qui sont moins connus, qui ont une niche de sujet étrange et qui représentent la génialité de l'ingéniosité humaine.)
Most Secret War[1] est un livre de Reginald Jones sur l'intelligence scientifique au cours de la Seconde Guerre mondiale. L'une des inspirations (avec Robert Watson-Watt) pour le rôle du personnage « Q » d’Ian Fleming, sont travail était de rechercher dans les rapports de renseignement des informations scientifiques exploitables. Cela a conduit à de sérieuses actions de guerre électronique telles que la bataille des faisceaux où les Britanniques ont tenté de brouiller les signaux de radionavigation des bombardiers Allemands, l'interception des rapports de station radar Allemande lors de la bataille d'Angleterre et la corruption, parfois comique, des transmissions de chasseurs Allemands de nuit.
L'IA et la Loi (Partie III) - La Partialités Algorithmique est Bonne Pour Vous
(Cet article est la troisième partie d'une série basée sur différentes conversations avec des avocats et des dirigeants sur l'Intelligence Artificielle, la nature de la technologie et son application aux problèmes de l'entreprise. La deuxième partie est ici. Avertissement: des références culturelles sont saupoudrées à gauche et à droite, vous avez été prévenu!)
Cet article traite des partialités dans l'intelligence artificielle, ce que c'est, ce que ce n'est pas et pourquoi c'est un avantage et non un problème.
La presse semble nous effrayée que l'IA prends notre travail, avec le cris que la Silicon Valley est éthiquement perdue, que des machines envahissent le monde ou font généralement des dégâts. Les préoccupations relatives à l’injustice algorithmique ne sont pas sans fondement, mais les mêmes recours que ceux que nous avons précédemment utilisés pour résoudre l’injustice s’appliquent.
Le parti pris expliqué
Une partialité est définie comme "un préjugé en faveur ou contre une chose, une personne ou un groupe par rapport à un autre, généralement considérée comme injuste", mais, dans l'IA et le contexte mathématique, le parti pris s'avère être quelque chose de souhaitable. La Figure 1 est l’exemple de classification d'un blog précédent où nous avons un classificateur simple qui sépare les cercles oranges des cercles verts. Le classificateur n'est pas parfait: il fait un travail raisonnable dans la plupart des cas, mais il classe mal l'un des cercles oranges et il n'y a aucun moyen d'améliorer l'instance du classificateur à l'aide de cette technique de régression linéaire car aucune ligne droite ne séparera les cercles verts des cercles oranges.
Cependant, supposons que les cercles oranges représentent un résultat vraiment négatif : un cancer non détecté ou une défaillance critique d'un avion, non signalé. La classification incorrecte est disproportionnée lorsque comparée avec une personne qui subie une biopsie inutile ou l'arrêt d’une machine pour maintenance préventive injustifiée. Étant donné que le coût de la classification erronée du cercle orange est si élevé, nous préférerions avoir des cercles verts classés de manière erronée plutôt que de manquer un des cercles oranges. Ce processus consistant à préférer délibérément une classe à une autre s'appelle une partialité (bias).
Ce qui va mal avec le Web Sémantique
Le Symposium Américain sur les Technologies Sémantiques a eu lieu il y a un mois à Wright State University où de bonnes discussion on eu lieux avec Craig Knoblock sur la fiabilité des serveurs SPARQL1, Eric Kansa sur le stockage des données archéologiques et Open Context, Eric Miller sur le fonctionnement du W3, les fermiers de l'ouest et les vieilles motos, Matthew Lange sur le suivi de la nourriture avec les données liées et une conversation fructueuse avec Evan Wallace sur les normes de stockage des données agricoles.
En réfléchissant à ces conversations, j'ai décidé d'écrire ce que je pense que sont les conclusions troublantes pour notre technologie, à savoir: a) l'adoption du Web Sémantique est tardive, b) nous continuons à ré-résoudre les vieux problèmes et c) notre manque continu de soutien pour nos propre projet, après quoi je proposerai quelques solutions.
Talk: Ontologies, Semantic Web, and Linked Data for Business
This is a half-day workshop about the current business uses of the semantic web. It is targeted at executives, project managers and subject matter experts who want to understand what problems the technology can solve. This workshop will concern itself with the basic building blocks of the semantic web and the solutions that each aspect brings to an organization. The objective of the workshop is not to provide in-depth technical training, rather, we wish to present an overview that will enable a varied audience to determine what this technology provides for their organization. Specific aspects will include recent standards such as the FIBO and schema.org and the recruitment and training of staff, as well as opportunities for localization to different markets and lowering the cost of regulatory reporting. In order to anchor the discussions the tribulations of a fictional company, "The Triples Coffee Company", will be used to present business cases within different areas of an enterprise. Example solutions will then be outlined using a semantic web approach for each business case.
Jointly presented with Robert Warren, Ph.D., Jennifer Schellinck, Ph.D., Patrick Boily, Ph.D.
AI and the Law (Part II) - How AI Works
(Note: This is the second part of a series of posts that were based on several conversations with lawyers and executives about AI, the nature of technology and its application to business problems. The first part is here.)
At the heart of it, AI is about asking the following question: "Can I use the computer to make decisions that would normally require a human being?" Of course, the obvious answer is yes; human beings make all sorts of decisions all day long, ranging from the complex to the mundane. Accounting and operations systems have been making decisions for human beings for years, be it from calculating credit scores and interest rates to determining the best time to order feedstock.
Let's use a simple example to explain the difference. Take the plot on the right hand side where I have orange dots and green dots. With basic statistical methods (linear regression was invented in the early 1900's[1]), we can create a simple classifier that will separate the green from the orange by simply drawing a line through the graph. It's not perfect, some oranges are misclassified as green and vice-versa, but we do very well with a really simple method. We can do better using mathematical techniques that are more sophisticated, see a non-linear method on the left, but fundamentally the problem remains simple: telling the granny smith apples from the oranges.
The problem is simple to solve because it is well defined. The objective is clear, the definition of success is clear (keep both sets of dots separated) and the way to tell them apart is by their colour. The only thing that remains is applying the recipe that matches the problem, a linear regression in this case, to solve the problem. Loosely speaking, there is no intelligence needed because the problem defines its own process to a solution.
Similarly, let's take another toy problem: tic-tac-toe. Every school child, even those that don't eventually end up working on AI, learns that the game can be tied or won by the first person playing. The second player can always force a tie, but can never win the game. All of us learn this by playing the game repeatedly while young: children over time explore the set of possible game layouts in tic tac toe and eventually learn that there are a finite number of starting X's and O's that can lead to victory or loss.
Very roughly, there are over 360,000 possible positions in tic-tac-toe. One basic machine learning method to learn to play tic-tac-toe is brute-force: try every single move and counter-move until every game is enumerated and then only choose moves that end in winning the game. Obviously, children can't keep track of 360,000 tic-tac-toe boards at the same time and while effective, the method does not scale well even for computers (Your desktop computer can't store the 10^47 possible combinations of chess). Therefore, children learn to take shortcuts and reduce those 360,000 positions into a set of starting moves that ensures that they win the game every time. That is intelligence: no one taught them the process required to find the solution, they just did. Artificial Intelligence is the science of creating algorithms that can do the same for certain classes of problems.
We say "classes of problems" because we only have the computing power and the programming know-how to handle limited problems, like recognizing a person, a text or a musical genre. This is different from what people sometimes refer to as "True AI", which is the human-like machine that walks and talks (and invariably tries to take over the world) on television shows. In my opinion, you are unlikely to have the Terminator do your filing for you anytime in the near future.
However for specific types of problems, AI works very well: classification, clustering, searching, reduction, etc... In turn, this means that the most of the work that goes into implementing an AI engine is actually trying to match very simple mathematical solutions to complex business problem. Going back to our first example of apples and oranges, the problem was delivered on a plate: colour and position. The solution needs more thinking when the problem isn't so well defined. In some cases, we know that some of the dots are different but not why or which ones (eg: Outlier Detection). In others, the objective may be to "Group the similar dots together" without having any idea of what makes the dots similar (eg: Market Segmentation).
Machine Learning vs Artificial Intelligence
In the vernacular, the terms Machine Learning and Artificial Intelligence are sometimes used interchangeably through they refer to different things. Artificial Intelligence is the catch all phrase for different computational techniques that have an intelligence component to them irrespective of the flexibility or adaptability of the method. Machine Learning refers to methods that are capable of learning themselves from the data without having their decision model encoded by a human being.
Take a program that can play tic-tac-toe. It clearly has an intelligence component in order to function, but the software will not learn from its interaction with the user (The problem is simple enough that there is no point to). But a program that recognizes cats in videos needs a machine-driven learning component is order for it to learn what a cat looks like.
Classifying And Finding Things With Artificial Intelligence
The figure below represents a very simplified block diagram of the AI process for classification; starting from the left to the right. We have a data set that we want processed, which can be documents, images, songs, video, etc... In practice, not everything within that data set is relevant to the context of what we are trying to do and so we transform each document into a set of features that we think are valuable to solve our problem. A feature might be a specific word in a document, another might be the word's part-of-speech (verb, noun, adverb, etc) or a typographical aspect such as the word being underlined.
Since the feature set explicitly determines what part of the data set the algorithm will actually look at, feature generation is an extremely important part of the Artificial Intelligence process. It has spawned it's own field of study, Feature Engineering, and at times some have insisted that Artificial Intelligence is carefully crafted Feature Engineering. In practice, many engines have enough computational resources that they will simply generate every feature possible from the input data and the algorithm will simply choose the features that are most promising (This is called the "Throwing things at the wall to see what sticks" approach). It's wasteful, but computing time has become much cheaper than the people time required to create an efficient design.
The algorithm is the brain of AI, which is ironic in that the algorithm in itself is usually very simple and generic; the same algorithm that flies a flying drone might be the same that keeps your phone camera images from being blurry. However, the devil is in the details and the implementation of the algorithm is usually not portable from the phone to the drone. Examples of algorithms might be k-means, C4.5 or okapi, each one trying to perform the task of clustering, rule generation or information retrieval. As part of the process, the algorithm will take in the features and select the most promising. As part of that selection, some external information such as a trained model or parameter might be provided to the algorithm to guide it's decision making.
The results are then checked against a benchmark, sometimes called a gold standard, to ensure that the system is doing what it is supposed to. If the results aren't exactly what is required the model or the parameters of the algorithm might be changed.
Overall, the basics of AI aren't that complex, but its implementation and arrangement needs to be focused on the objectives of the project, otherwise one gets into the loop of "garbage in, garbage out". Depending on the case, the tuning of parameters can be frustrating and model generation becomes an art and not a science. There are many different frameworks, libraries and code bases available both freely and commercially to experiment with which I encourage you to do.
Next: Part III - Algorithmic Bias is good for you.
AI and the Law (Part 1)
(Note: This is the first part of a series of posts that were based on several conversations with lawyers and executives about AI, the nature of technology and its application to business problems.)
What is Artificial Intelligence?
Does it really represent an improvement over what we already have? An entirely new class of solutions to ongoing problems? Or the flavour of the week in a market that is overwhelmed with buzzwords?
Skepticism is endemic to technology culture in industry, government or academia. It's a byproduct of working in an area who's foundation is innovation and ideas. When it costs significantly less to say that you have something than to actually get it to work: a "show me" attitude is necessary. Ironically, IT has been so far primarily about what we would call classical Management Information System. The software may be really slick, the hardware may be really fast and we can store a lot of data, but really most of what the industry has focusing on so far is simply replacing physical forms and paperwork with the electronic equivalent: tabulating ledgers for accounting, generating reports and mailing checks and invoices over the Internet instead of in paper form. These are boring, unglamorous tasks but they have been IT's big success: taking things that were mundane, repetitive and that were cost centres and streamline it using technology.
And now, we have Artificial Intelligence.
The underlying idea that a machine would replace a human being in making decisions isn't all that new. One of the better know historical exhibits (and fraud) is The Turk, a mechanical automaton that would play chess against a person. Of course, playing chess was asking a lot of simple clockwork mechanisms and the builder had constructed a false compartment in which a human player would hide and move the mannequin using a system of pulleys and cams. This may have been the first vapourware product ever, but the idea that a machine could perform tasks at a human level had taken root.
What we'll call modern Artificial Intelligence appeared in the mid-1950's when scientist began to look at ways that elements of human cognition could be modelled using mathematics. That in itself wasn't novel, humankind had moved on from the abacus. What they were aiming at were higher cognition functions like learning from examples and extrapolating solutions for problems that the machine had never seen before.
In this series of blog posts, the basics of AI will be reviewed and its application to practical business problems outlined. As with many technologies it has had its false starts and the causes of the AI Winters periods will be reviewed which in turn, will give a sense of why it is making such a resurgence.
Next: Part II - How AI Works.
Presentation at Derby University: Artificial Intelligence and the Law