Modelling of adaptive foraging in swarm robotic systems
Professor Alan Winfield,
Tel: +44 (0) 117 32 83159
As the robots in a swarm have only local perception and very limited local communication abilities, one of the challenges in designing swarm robotic systems with desired collective behaviour is to understand the effect of individual behaviour on the group performance.
This project dedicates the research on design and optimisation of interaction rules for a group of foraging robots that try to achieve energy efficiency collectively.
Adaptive Collective Foraging
Consider a swarm foraging scenario: there are a number of food-items randomly scattered in the arena and as food is collected more will, over time, grow to replenish the supply. A swarm of robots are searching and retrieving food-items back to the nest. Each food-item collected will deliver an amount of energy to the swarm but the activity of foraging will consume a certain amount of energy at the same time. Generally, more robots engaged in foraging, more food-items the swarm can collect. However, due to the limited availability of food, not all robots will be success after a long time searching and the activity engaged in foraging consumed a certain amount of energy for the swarm. Moreover, more robots in foraging, more interference among the robots will occur, the robot will take longer to collect a food-item back, as a result, more energy will be consumed. Robots therefore have to switch between foraging and resting in order to maximise the net energy income to the swarm. The challenge is to achieve this with no centralised control and robots with only minimal local sensing and communication ability.
Figure 1 Finite state machine for foraging
The investigation starts with designing a set of interaction rules for the individual robots, inspired from the widely observed self-organisation phenomenon in biological system, so as improve the energy efficiency at the group level. A threshold-based controller, using two internal time thresholds -- resting time ( Thr ) and searching time ( Ths ) threshold, is introduced to regulate the behaviours for the robot in order to improve the energy efficiency. Three cues: internal cues, social cues and environmental cues are then proposed to adjust the internal time thresholds in a self-organised manner.
- Environmental cues. For a robot that is searching for food, if it collides with other robots it will reduce Ths and increase Thr because "there are already enough foragers in the arena, I'd better go back to the nest sooner so I don't get in the others' way"
- Internal cues. After a successful food retrieval cycle, a robot will reduce Thr because "there may be more food, I'd better go back to work as soon as possible". Alternatively if a robot fails to find food, indicated by searching time is up, the robot will increase Thr since "there seems to be little food available, I'd better rest for longer".
- Social cues. If the robot returning home lies a successful retrieval pheromone in return increasing the density of the pheromone, then the robots resting at home will reduce Thr and increase Ths because "somebody else has found food, there may be more so I'd better get back to work sooner". On the contrary, on perceiving the increase of the failed retrieval pheromone, the resting robots will increase Th_r and reduce Th_s because "somebody else failed to find food, there may be a food shortage so I'd better stay in the nest for longer".
The combination of these cues could result in two interesting phenomena at the group level: positive feedback and negative feedback, which are believed to be important components in the emergence of self-organised behaviour in social insects or animals colonies.
We explored different strategies with different combination of these three cues for our foraging robots and carried out experiments with sensor-based simulation tools Player/Stage. The simulation results show that the robot swarm with these strategies seems to be able to guide the system towards energy optimisation collectively.
Macroscopic Probabilistic Modelling
The adaptation algorithm comes with a number of adjustment parameters which are used to tune the contribution of each designed cues. They are however chosen intuitively. Although an intuitive selection of parameters is necessary and acceptable for most engineering solutions, what will happen if these parameters change? Does our adaptation algorithm still work? Is it possible to optimise performance with a different set of parameters? To address these questions, we develop a macroscopic probabilistic model.
The essential idea of the probabilistic model approach is to treat the interactions among robots, or between robots and environment, as stochastic events with Markov properties. A probabilistic finite state machine (PFSM) is used to describe the foraging task at group level. Each block in a PFSM represents the state and the total number of robots in that state. The transitions between two states contribute the changes of number of robots in those states. The dynamics of the system can then be captured using a group of difference equations, with each representing the total number of robots and the changes in corresponding state. The complexity of the model varies with the tasks that the robot swarm tries to accomplish and the number of states in the finite state machine.
Figure 2 Probabilistic finite state machine for foraging
Following the full PFSM for swarm foraging shown in Figure 2, we can derive a number of difference equations (DEs) to describe the changes in average number of robots in each state. Consider NS(k) first, Figure 2 shows that there are altogether seven transitions affecting the number of robots in state Searching S . At each time step, some of robots will wake up from state Resting and resume to state searching . Meanwhile, the robots in state Searching may find a food-item or collide with other robot and therefore transfer to state Grabbing or AvoidanceS . The part of robots which move to state AvoidanceS will move back again after τa seconds. Some robots in state Grabbing may lost the sight of the food (due to the competition) or collide with other robots, and will hence move back to state Searching . Note that the latter (in state AvoidanceG ) will have to execute an avoidance behaviour before they move to state Searching . Moreover, some robots may spend too much time on searching food but get nothing, thus they have to give up their task and move back to state Resting . All these transitions will increase or decreases the number of robots in state Searching . Therefore, in summary, the number of robots in state Searching at time step k+1 can be expressed as
The meaning of the notations and a more detailed derivation of these DEs can be found at reference 5. Clearly, with the adaptation abilities described above, the resting time threshold Tr and searching time threshold Ts are no longer fixed for all the robots. To deal with the adaptive swarm foraging, we introduce the concept of private resting time and searching time thresholds, and their counterparts -- public resting time and searching time thresholds. the private time thresholds play the role as the original time thresholds, i.e. deciding when the transition from one state to another is triggered. But the public time thresholds will affect the value of the private time thresholds, as the private thresholds are always ``inherited'' from corresponding public thresholds. Figure 4 and Figure 5 shows the relationship between these private and public time parameters.
Figure 3 Relationship between private and public resting time threshold parameters. Here `inherited' means ``making a copy'', while `merge' refers to the update of the Tr̂ based on some rules.
Figure 4 Relationship between private and public searching time threshold. The influence domain of each time threshold is separated with dotted lines. Private Ts(h) and Ts(h) coexist with private Tr(h) and Tr(h)
Finding the mathematical description of transition probabilities is a further challenge for the macroscopic modelling approach. Although it may be possible to obtain the transition probabilities by running the experiments with one or two real robots, in some cases, this is not always true, particularly for probabilities which are not time independent, as in our case. In chapter 4, we developed a novel geometrical approach for the estimation of state transition probabilities and the other time parameters for our macroscopic mathematical model. Three assumptions are used to estimate these parameters:
- uniform distribution of food
- equal chance to occupy any spot in the arena
- uniform change of relative heading between two robots.
We believe that these assumptions could be reasonably satisfied by tuning the individual behaviours slightly. With the basic assumptions, we are able to obtain the mathematical expressions for all the required transition probabilities and time parameters. All the parameters appearing in the mathematical description are known parameters related to the robots' and the environmental settings, for example, the detection range of the bumper sensor, the speed of the robots and the size of the arena, etc. There are no free parameters that need to be obtained through other means.
Validation and Optimisation
Figure 5 Screen shot of simulation, robots move at speed 0.15 m/s. Left: 8 robots are engaged in foraging. Right: close up view of robot, with a food-item in gripper; IR sensors range 0.2 m, the field of view for the camera is 60o and the view distance is 2 m.
The proposed macroscopic model has been validated using simulation tools Player/Stage. A screen shot from the simulation is shown in Figure 5. The results show that the model achieves very good accuracy in predicting the net energy of the swarm, not only in the final stage but also in the instantaneous level.
Figure 6 Instantaneous food-items uncollected in the arena
(left). Initially there are 40 food-items randomly scattered in the
arena. The error bars show the standard deviation of 40 times data
The time to collect 80% of food-items for the swarm with different size (right). M(0)=40 , Pnew=0 , τs=100s , τr=0 .
Figure 7 Comparison of the instantaneous energy of the swarm (left) and number of robots in selected states (right) without adaptation mechanism between the macroscopic probabilistic model and the simulation.
Figure 8 Comparison of the instantaneous energy of the swarm (left) and number of robots in selected states (right) with adaptation mechanism between the extended macroscopic probabilistic model and the simulation.
With the mathematical model, the selection of adjustment factor is in fact a multi-object optimisation problem which can be solved using different existing methodologies. A real-coded steady state genetic algorithm is developed to help to find the best solutions. Figure 9 shows some results with different parameters. Clearly, the changes of the predicted net energy of the swarm over time are no longer linear even after quite a long period evolving. Now that the resting time threshold and searching time threshold vary over time in the extended macroscopic model, the dynamics of the model therefore become more complicated, resulting in it not reaching a steady state but oscillating for some set of adjustment factors. However, despite the "less satisfied'' performance in predicting the instantaneous net energy of the swarm over time in some cases, the GA using the macroscopic model to evaluate chromosome can find a near-optimal solutions for the adaptation algorithm. The final net energy of the swarm from the corresponding simulation, as shown in Table 1, could be good evidence.
- case 1: Pnew=0.040 , both Tr and Ts are adjustable;
- case 2: Pnew=0.045 , both Tr and Ts are adjustable;
- case 3: Pnew=0.045 , Tr is adjustable, but Ts is fixed.
Figure 9 The instantaneous net energy of the swarm with best set of adjustment factors found by the GA. The error bars show a standard deviation from 10 runs simulations.
Table 1 Best set of adjustment factors found by the GA and the corresponding net energy of the swarm from the model and the simulation