14 May 2018

Data Science: Reinforcement Learning (Just the Quotes)

"A neural network training method based on presenting input vector x and looking at the output vector calculated by the network. If it is considered 'good', then a 'reward' is given to the network in the sense that the existing connection weights get increased, otherwise the network is "punished"; the connection weights, being considered as 'not appropriately set,' decrease." (Nikola K Kasabov, "Foundations of Neural Networks, Fuzzy Systems, and Knowledge Engineering", 1996)

"A training paradigm where the neural network is presented with a sequence of input data, followed by a reinforcement signal." (Joseph P Bigus, "Data Mining with Neural Networks: Solving Business Problems from Application Development to Decision Support", 1996)

"learning mode in which adaptive changes of the parameters due to reward or punishment depend on the final outcome of a whole sequence of behavior. The results of learning are evaluated by some performance index." (Teuvo Kohonen, "Self-Organizing Maps" 3rd Ed., 2001)

"A learning method which interprets feedback from an environment to learn optimal sets of condition/response relationships for problem solving within that environment" (Pi-Sheng Deng, "Genetic Algorithm Applications to Optimization Modeling", Encyclopedia of Artificial Intelligence, 2009)

"A sub-area of machine learning concerned with how an agent ought to take actions in an environment so as to maximize some notion of long-term reward. Reinforcement learning algorithms attempt to find a policy that maps states of the world to the actions the agent ought to take in those states. Differently from supervised learning, in this case there is no target value for each input pattern, only a reward based of how good or bad was the action taken by the agent in the existent environment." (Marley Vellasco et al, "Hierarchical Neuro-Fuzzy Systems" Part II, Encyclopedia of Artificial Intelligence, 2009)

"a type of machine learning in which an agent learns, through its own experience, to navigate through an environment, choosing actions in order to maximize the sum of rewards." (Lisa Torrey & Jude Shavlik, "Transfer Learning",  2010)

"a machine learning technique whereby actions are associated with credits or penalties, sometimes with delay, and whereby, after a series of learning episodes, the learning agent has developed a model of which action to choose in a particular environment, based on the expectation of accumulated rewards." (Apostolos Georgas, "Scientific Workflows for Game Analytics", Encyclopedia of Business Analytics and Optimization", 2014)

"A type of machine learning in which the machine learns what to do by discovering through trial and error the way to maximize a reward." (Gloria Phillips-Wren, "Intelligent Systems to Support Human Decision Making", 2014)

"it stands, in the context of computational learning, for a family of algorithms aimed at approximating the best policy to play in a certain environment (without building an explicit model of it) by increasing the probability of playing actions that improve the rewards received by the agent." (Fernando S Oliveira, "Reinforcement Learning for Business Modeling", 2014)

"a special case of supervised learning in which the cognitive computing system receives feedback on its performance to guide it to a goal or good outcome." (Judith S Hurwitz, "Cognitive Computing and Big Data Analytics", 2015)

"The knowledge is obtained using rewards and punishments which there is an agent (learner) that acts autonomously and receives a scalar reward signal that is used to evaluate the consequences of its actions." (Nuno Pombo et al, "Machine Learning Approaches to Automated Medical Decision Support Systems", 2015)

"It is also known as learning with a critic. The agent takes a sequence of actions and receives a reward/penalty only at the very end, with no feedback during the intermediate actions. Using this limited information, the agent should learn to generate the actions to maximize the reward in later trials. For example, in chess, we do a set of moves, and at the very end, we win or lose the game; so we need to figure out what the actions that led us to this result were and correspondingly credit them." (Ethem Alpaydın, "Machine learning : the new AI", 2016)

"A learning algorithm for a robot or a software agent to take actions in an environment so as to maximize the sum of rewards through trial and error." (Tomohiro Yamaguchi et al, "Analyzing the Goal-Finding Process of Human Learning With the Reflection Subtask", 2018)

"Training/learning method aiming to automatically determine the ideal behavior within a specific context based on rewarding desired behaviors and/or punishing undesired one." (Ioan-Sorin Comşa et al, "Guaranteeing User Rates With Reinforcement Learning in 5G Radio Access Networks", 2019)

"Brach of the Artificial Intelligence field devoted to obtaining optimal control sequences for agents only by interacting with a concrete dynamical system." (Juan Parras & Santiago Zazo, "The Threat of Intelligent Attackers Using Deep Learning: The Backoff Attack Case", 2020)

"Machine learning approaches often used in robotics. A reward is used to teach a system a desired behavior." (Jörg Frochte et al, "Concerning the Integration of Machine Learning Content in Mechatronics Curricula", 2020)

"This area of deep learning includes methods which iterates over various steps in a process to get the desired results. Steps that yield desirable outcomes are content and steps that yield undesired outcomes are reprimanded until the algorithm is able to learn the given optimal process. In unassuming terms, learning is finished on its own or effort on feedback or content-based learning." (Amit K Tyagi & Poonam Chahal, "Artificial Intelligence and Machine Learning Algorithms", 2020)

"A machine learning paradigm that utilizes evaluative feedback to cultivate desired behavior." (Marten H L Kaas, "Raising Ethical Machines: Bottom-Up Methods to Implementing Machine Ethics", 2021)

"Is an area of machine learning that learn for the experience in order to maximize the rewards." (Walaa Alnasser et al, "An Overview on Protecting User Private-Attribute Information on Social Networks", 2021)

"Reinforcement learning is also a subset of AI algorithms which creates independent, self-learning systems through trial and error. Any positive action is assigned a reward and any negative action would result in a punishment. Reinforcement learning can be used in training autonomous vehicles where the goal would be obtaining the maximum rewards." (Vijayaraghavan Varadharajan & Akanksha Rajendra Singh, "Building Intelligent Cities: Concepts, Principles, and Technologies", 2021)

"Reinforcement Learning uses a kind of algorithm that works by trial and error, where the learning is enabled using a feedback loop of 'rewards' and 'punishments'. When the algorithm is fed a dataset, it treats the environment like a game, and is told whether it has won or lost each time it performs an action. In this way, reinforcement learning algorithms build up a picture of the 'moves' that result in success, and those that don't." (Accenture)

No comments:

Related Posts Plugin for WordPress, Blogger...

About Me

My photo
IT Professional with more than 24 years experience in IT in the area of full life-cycle of Web/Desktop/Database Applications Development, Software Engineering, Consultancy, Data Management, Data Quality, Data Migrations, Reporting, ERP implementations & support, Team/Project/IT Management, etc.