Interactive Camera Network Design Using a Virtual Reality Interface

The traditional literature on camera network design focuses on constructing automated algorithms. These require problem-specific input from experts in order to produce their output. The nature of the required input is highly unintuitive, leading to an impractical workflow for human operators. In this work we focus on developing a virtual reality user interface allowing human operators to manually design camera networks in an intuitive manner. From real world practical examples we conclude that the camera networks designed using this interface are highly competitive with, or sometimes even superior to, those generated by automated algorithms, but the associated workflow is more intuitive and simple. The competitiveness of the human-generated camera networks is remarkable because the structure of the optimization problem is a well known combinatorial NP-hard problem. These results indicate that human operators can be used in challenging geometrical combinatorial optimization problems, given an intuitive visualization of the problem.


INTRODUCTION
In the camera placement problem the goal is to find an optimal configuration of cameras to perform some observation task.Many practical problems can be formulated as an observation task, so it is no suprise that this problem has been studied extensively.For example, the design of surveillance systems for both large buildings and outdoor environments can be formulated as a camera network design problem [5,8].In this formulation the objective is to maximize the camera coverage over a floorplan of the environment.The same problem arises in designing custom tracking systems in Virtual Reality applications.In the photogrammetry community a similar problem is studied, where the goal is to select image acquisition locations from which a 3D reconstruction will result in minimal uncertainty over reconstructed points [9,13,21].In this formulation both coverage over environment points and expected measurement quality are arXiv:1809.07593v1[cs.HC] 20 Sep 2018 important.The same formulation of the problem is also used in the field of inspection planning where reconstruction is generalized to arbitrary measurement acquisition functions [1,4,19].In computer graphics the same problem occurs in view selection where the goal is to select a limited number of views (renders) of an object/scene that together provide the most efficient summary of information [6,7].
In general three distinct steps are important in camera network design: representation of the problem, formulation of the cost/quality function and optimization of this cost/quality function.We will discuss these steps in Section 2 and highlight the fundamental problems in practical settings.We will present a virtual reality user interface where the user is in charge of placing all cameras thereby avoiding all of the traditional steps, without loss of quality.We argue that this last finding is remarkable given the structure of the problem which is also why we choose to cover this aspect in a separate section.
Our main idea, letting users design a camera network is opposite to most approaches trying to automate this design.We will motivate that there are structural problems with automated design approaches that are not addressed properly.In this work, the strong spatial reasoning skills of humans which is crucial in solving this design task, will be enhanced by the limitless possibilities of virtual reality.
In Section 2 we will discuss the structural problems with automated camera design algorithms from both a mathematical and user interactive perspective.Section 3 will elaborate on our proposed interface that will be evaluated in Section 4.

THE AUTOMATED CAMERA NETWORK DESIGN PROBLEM Problem structure
As discussed earlier the goal of camera network design is to find an optimal camera configuration that performs some observation task.In order to design a relevant camera network knowledge about the environment is necessary, traditionally in the form of a CAD model which is frequently available.Further knowledge about the camera geometry and area of interest is strictly necessary.Early work on the art gallery problem tries to position cameras such that the entire area of interest is visible using geometric algorithms [5,14].Approximate solutions are available for specific instances of this type of problem, that typically rely on a discretization of the problem [3].Two things need to be discretized, firstly the area of interest should represented by a finite set of elements ie E = {e 1 , ..., e n } (Environment).Secondly the space of all possible camera locations needs to be a finite set of configurations ie V = {v 1 , ..., v m } (Viewpoints).This discretization reduces the problem of finding a camera configuration that covers the area of interest, to the classical set covering problem (SCP) [2,20].This notion of camera network design is not really useful because it lacks the freedom to model realistic camera network performance models [4].A richer problem that encompasses the SCP is known as submodular function maximization [10].This class of problems is well studied and a lot of complexity results of these problems are known.We will start by introducing a general formulation and proceed by giving concrete examples on how classical camera design modelling choices fit within this formulation.The formulation starts by defining a weight function t : E × N −→ R + that for each environment point returns a quality value as a function of the number of cameras for which the point is visible.This visibility is formalized using a visibility function f : E × 2 V −→ N that computes for each camera configuration U ⊂ V and each environment point e ∈ E the number of cameras f (e, U ) in U viewing e.The last function that we introduce fuses the quality function t with the visibility function f ie д(e, U ) = t(e, f (e, U )).In this formulation the camera network design problem is formulated as maximizing the quality over all environment points with a limited set of cameras: If t(e, −) is a strictly increasing function with non-positive second derivative, then G can be proven to be a monotone submodular function, thus the problem becomes the monotone submodular maximization problem with a cardinality constraint [10].The intuitive notion that the importance of covering an environment point by an increasing number of cameras only decreases, exactly coincides by the technical requirements for t(e, −).
An example of such a t is if it is zero when the number of cameras viewing an environment point is zero and one if this number is larger than zero, then Problem 1 (Eq. 1) becomes the classical set covering problem (SCP).
Submodular function maximization is a combinatorial NP-hard problem that is well studied.An advantage is that there is an optimal algorithm to perform this maximization task with a cardinality constraint for monotone submodular G [12].This algorithm is the known as the greedy algorithm that builds a solution by greedily adding the best cameras to the final solution set.The greedy algorithm yields a tight (1 − 1/e)-approximation 1 , which means that there is an upper bound on the optimal function value that is a constant factor (1/(1 − 1/e)) higher than the value returned by the greedy algorithm.The downside is that it is proven that no algorithm can improve on the greedy algorithm with a polynomial number of steps [12] which is exactly why it is optimal.
It is important to notice that the aforementioned results do not exclude superior optimizers that leverage some additional problem structure, but these algorithms will not generalize well and may be difficult to find.Furthermore the geometric structure is hard to leverage as geometric algorithms have a prohibitive computational complexity [16].These algorithms are therefore not suited in general purpose camera network design applications.

Camera network performance functions
In general, it is very challenging to formally define what exactly makes a camera system good.An additional drawback is that the notion of quality is highly problem specific.An exact formulation is however needed in automated design approaches so a lot of propositions for such functions are available.In this section we will review popular function modelling choices and discuss their impact on the structure of the problem.
A first modelling choice is to encode the notion that it is better for an environment point to be viewed by multiple cameras [8,9,13,20].This corresponds to extending the previously defined function t to arbitrary functions.The simplest function t is to simply count the number of cameras (cameras can have weights) and clip the function above some defined threshold [8] which results in a monotone submodular G. Another choice models the propagation of uncertainty in measuring environment points [9,13].This formulation however does not guarantee a monotone G which makes the greedy algorithm less effective and makes the automated design problem more challenging.Note that function t can be a function dependent on camera or viewpoint information.
Another modeling choice is to assign weights to viewpoints or/and environment points based on some notion of importance [6,8,20].No weighting scheme can fundamentally change the problem structure as long as the weights remain positive.
Finally regularizers can be added to the problem: The positive constant α and positive regularizer function r are used as a tool to discourage the optimizer to choose certain undesired camera configurations [8].An example where this can be useful is a stereo reconstruction camera system.Stereo reconstruction requires overlap in what neighbouring cameras perceive.The regularizer can penalize a lacking of overlap guiding the optimizer to solutions that exhibit this overlap.From the perspective of the problem structure this is however not a good idea.The regularizer completely destroys the monotonicity which will result in a far more challenging optimization problem with unbounded solution quality [10].

Solution strategy
In the literature a lot of solution strategies are provided for the camera network design problem.Next-best-view planning uses an optimizer that builds a solution by subsequently adding the best viewpoint to the solution, which is exactly the greedy algorithm that is optimal for the problem [6,9,19,21].Another approach is to employ an evolutionary optimization strategy [13].A modular relaxation of the true submodular function is also used together with a branch-andbound solution method [8].The latter algorithms does not provide a theoretically better upper bound than the original greedy algorithm, so their performance is highly problem specific.Specialized branch-and-bound methods exist for submodular functions as well, but they suffer from the fact that submodular functions are much harder to bound than modular functions, which limits their practical applicability [10].
In this section we linked the camera network design problem to monotone submodular maximization with a cardinality constraint.We discussed manipulations of the problem seen in camera network design literature and discussed their impact on the structure of the problem.From the structure of the problem we notice that automated algorithms are fundamentally limited in the quality of solution they can provide.An example stressing why we are fundamentally bound to the greedy algorithm is that the number of possible camera systems in an experiment in Section 4 is of the order 10 100 (draw 25 cameras from 10000 candidates).Because the greedy algorithms is optimal, any improved algorithm should address a significant subset of the huge number2 of possible solutions which is not realistic.

User interaction
As discussed earlier the first step in most approaches is to discretize the inherently continuous camera network design problem.Discretizing the area of interest is in most cases straightforward.Strategies such as voxelization and random sampling of points exist and require very limited interaction from a user.Sampling of possible viewpoints is however much more difficult.Any viewpoint from this set can be chosen by the algorithm, so care should be taken that only valid configurations are present.This introduces following problems: (1) The space of all possible camera positions must be represented by a finite number of samples that should be dense enough to accurately represent the problem.(2) The number of samples should be low enough to avoid prohibitively long optimization times.
(3) Cameras cannot be placed at every position so a lot of domain specific knowledge is necessary to select these possible positions.
From an operational perspective this is not ideal because the user that should provide this information to the application needs domain specific knowledge, and must know about sampling strategies which is rare.From a user interaction perspective there are also issues in finding a method that avoids having to select every point manually, but retains a qualitative sampling density.
While the user often has a good intuitive notion of what constitutes a good camera network, the choice of quality function is challenging and highly problem specific.Furthermore many methods have weights that need to be chosen that have a significant impact on the final result.Choosing this function with associated weights requires an intimate knowledge about the mathematics of a problem and the measurement specification of the cameras.And even then, iterations are necessary to align optimizer results with user expectations.
Defining a regularizer r and associated weight α is often required to avoid pathological behaviours of automated optimization algorithms [8] and to encode complicated network requirements.However to our knowledge no literature exists on how to define these, while they will have a crucial impact on the final measurement system.
A general problem with all of the problems associated with user interaction and in our opinion the worst, is that all the information that needs to be provided is very abstract.This abstractness excludes practitioners from the adoption of automated algorithms for the design of camera systems.
User interaction for camera network design has not received a lot of attention in literature.A GUI has been proposed that gives the user tools to perform the required discretization and assign importance weights to environment parts [8].Viewpoint sampling is however limited to a uniform sampling over the region of interest which is not realistic in many real-world cases, and the cost function is fixed.

VIRTUAL REALITY INTERFACE Motivation and overview
The basic principle of our virtual reality interface is simple.The user is placed in the scene of interest together with an initial camera setup.The application will calculate д(e, U ) for each environment point and visualize these values as an interactive colored volume (cloud).The user can manipulate all cameras and see the effect on the environment quality in real time.The user is in charge of performing the optimization, but can apply geometrical reasoning to solve this problem.Real-time color feedback of the values of д(e, U ) allows the user to decide what is important for his application while performing the optimization.
A schematic overview of the difference between our proposed workflow and the traditional workflow is given in Fig. 2. Our proposed workflow avoids the need to perform a challenging discretization step and the step that designs a specific quality function.These steps both require intimate knowledge about both the practical problem and theory of camera design planning.The advantage of the traditional workflow is that a computer can automatically solve the camera network design problem.But in reality results often do not correspond to what is expected by the user.This is mainly because the quality function is not aligned with the intuitive expectation of the user.Another common problem is that many informal notions of quality cannot be directly encoded in the cost function.
The structure of our interface consists of three main parts as shown in Fig. 3.In this section we will give a brief overview of their function but each will gets a more detailed treatment in a later section.The first part is a simulator that handles all geometries, maintains its positions and performs dynamics calculations.The second part is responsible for calculating the coverage and quality for a given camera system.And the final part is responsible for rendering to the virtual reality device and managing user interaction.

Simulator
As a simulator we use a commercially available robot simulator called V-REP [15].Using this simulator has the advantage that its many available features directly translate to features that can be used in our interface.This flexibility of this simulator allows for the modelling of complicated real world dynamic systems up to a high fidelity and the subsequent design of camera systems in these environments.
In our proposed workflow the simulator will also be responsible for the definition and modelling of the specific problem.Firstly, the geometry of the problem should be available in the shape of a triangular mesh.This data format is widely available by the use of CAD software packages in construction and design.When no CAD model is available one could always resort to 3D reconstruction from image data.In our experiments section we will show both cases to highlight the flexibility of our solution.Next we require geometrical knowledge of each camera that can be positioned.This information consists of a perspective angle of the camera, the resolution and the maximum/minimum measurement distance.
Finally, a discretization of the space of interest needs to be performed.We provide a box that can be positioned on the scene of which the size and position can be changed.This box is shown in purple, the left image of Fig. 4. The user can select a discretization resolution which determines in how many voxels the box will be subdivided.From each voxel we select the center point and together these points determine the set E. We choose to represent E as points because geometrical calculations using points are much faster than geometrical objects that have volume.

Interactive quality computation
Interactively calculating quality values over the space of interest is a challenging task.The visibility between all environment points and all cameras in the scene need to be calculated in an environment of arbitrary complexity.To compute this visibility there are two options, the first is ray-tracing and the second is z-buffer visibility calculation.Because the first is slower with today's hardware we use the latter.This algorithm works by rendering the scene for a specific camera.For each rendered image there is an associated z-buffer which stores the depth of each pixel resulting from the rasterization algorithm used to render the image.Next all points of interest are projected to the camera and get {u, v, z}-coordinates.These are pixel locations together with the a depth value.Using these coordinates we can compare the depth value of each point with the depth value of the z-buffer at the same location.If the z-value of the point is smaller than the z-buffer value, the point is visible for the camera.In our results we where able to achieve computation times of the order of 300ms to compute the quality of 50k points of interest for 10 cameras in an environment of 300k triangles.These results will be discussed more formally in the results section, but show the interactivity and scalability the z-buffer approach.In our implementation we use the publically available implementation of this algorithm in the visualization-toolkit (VTK) [18].
As a quality function д we simply count the number of cameras that can view a certain point.In our approach this function helps the user understand what is covered by the camera system and is thus less critical.Specialized functions can be used when they are known but are not required.

Virtual reality process
The final part of our application deals with managing the virtual reality component.This encompasses both the rendering of all information and managing the user interaction.Important in this process is that the obtained frame rate of the rendering procedure is high enough to provide a comfortable virtual reality experience, so the focus of this part is speed.The information we render is: (1) All problem geometry (2) An interactively rendered volume (3) Virtual camera feeds As rendering engine we again use the visualization-toolkit (VTK) [18] which connects to the publicly available openvrapi which in turn connects to popular virtual reality hardware.The engine can handle real time volume rendering by using highly optimized implementation of GPU based ray-casting [11].As a final feature the camera feeds obtained by rendering individual cameras as discussed in Section 3 are displayed as dynamic textures viewable by the user.These feeds contain useful information that the user can leverage during his design task.An example where this can be useful is the design of live stereo reconstruction setups.In these setups it is important than neighboring cameras have enough overlap in what they perceive.In this application the user can see what each camera sees, and insure himself that the required overlap is present.
The user is able to manipulate camera positions by dragging each camera with a controller.The virtual reality thread will update the position inside the simulator.This in turn results in a changing of the camera position in the volume thread.This eventually results in an updated quality function that changes the appearance of the volume.Furthermore the user is able to change his position and scale relative to the scene.We believe that the latter is of fundamental importance to perform the designing task.The user can for example do a rough initialization of the camera setup when the scene is small (zoomed out), and perform more detailed manipulations in a larger scene.
Finally the user is able to choose a color/opacity function.The color/opacity transfer function selects a color and an opacity for every point in the volume as a function of its value.An example function is an opacity of zero if the environment point is visible and red with opacity in the range 0 < o < 1 if a point is visible.This example is exactly the color/opacity function of the right figure of Fig. 1.In our implementation we support the choice of one user defined custom function that can be interactively changed.Our experience is that the use of two functions, one of which shows the quality, and the other what is invisible provides the user all information that is needed.

EXPERIMENTS
In these experiments we will investigate the quality of the user-generated solutions in two complex real-world cases, and the computational performance of our proposed interface to highlight its scalability.In the evaluation of the quality we will compare user-generated solutions with automatically generated solutions.We will investigate these aspects by focusing on three concrete camera design problems, where camera networks have different objectives, scenes have different complexities, etc. Images and basic info about each of the three scenes is depicted in Fig. 4.
We will limit the evaluation of quality to evaluations of different cost functions that do not include regularizers.This limited evaluation is motivated by the fact that automated algorithms blindly follow a cost function, but need to be corrected to give satisfactory solutions.This means that the value of the cost function is composed of an unregularized value minus the regularization penalty.User-generated solutions will inevitably result in solutions with a low regularization cost, because the user chose this solution and is satisfied with the configuration.If we only evaluate the unregularized cost function we therefore give an advantage to automated algorithms and not our proposed approach.
As automated algorithm we will use the well known greedy algorithm [6,9,19,21] on a problem specific cost function.

Optimality : Harbour
In this section we will consider a real-world camera network design task where the camera network is used to guarantee safety.When containers are unloaded from ships they are positioned on the harbour quay.After this unloading workers have to confirm information on the container and thus have to move among these containers.To track these workers a camera system is attached to the crane to ensure that it can be safely operated.From a camera network design perspective this is a structurally dense problem.Many cameras need to be placed over a relatively small area to provide enough redundancy to guarantee safety.The scene used in our examples is depicted in Fig. 4 (left).The orange lines indicate the possible positions of cameras in this problem which is limited to two beams on the crane.To construct a set of possible viewpoints we linearly subdivide each beam in 20 positions, and defined 15 possible orientations, creating a set of 2x20x15 (600) configurations.The goal is to select 10 positions that maximize coverage over the area of interest, defined as the purple box in Fig. 4 and a denser sampling between the containers.This box is discretized in 30k points and between the containers we uniformly sample another 16k points to force the optimizer to focus on areas between the container.The design of a quality function is challenging because there is no true concrete quality function, as a result we will consider multiple quality functions.
In fact we randomly sample quality functions that encode the notion that redundancy is required.We sample quality functions by generating a decreasing sequence of 6 values {s 1 , ..., s 6 } : i s i = 1.Each value s i is multiplied by the number of environment points that is viewed by more than i − 1 cameras and summed together to create a quality function.This set of function is monotonely increasing, bounded (and second derivative < 0) and conveys the notion that redundancy increases quality.In this experiment we will generate random quality functions t i and calculate for each function a camera network U * i .We also have a solution designed by an expert using our interface U * e .The quality of different solutions U * i versus U * e will be evaluated by studying ratio's The former describes how well different quality functions agree on the optimality of designed networks using a different quality function.The latter describes the optimality of the user-generated solution with respect to different possible quality functions.A subset of the obtained ratio's is displayed in Fig. 5. From these results we can conclude that the solutions of different quality functions tend to agree on other quality functions.There is a much larger difference between the optimality of the user-generated camera network with respect to different quality functions.This means that for some quality functions the user-generated scores high, but for others it scores lower.On average automatically generated solutions are 11% better than the user-generated solution with respect to specific quality functions.If we compare the coverage (visible/total environment points) which is also important the user-generated solutions scores equal to or even better than automatically generated solutions.To compare the difference between an automatically generated network and a user-generated network we show the obtained quality volumes for both solutions in Fig. 6.Fig. 6 provides an insight why automatically generated solutions tend to score higher on specific quality functions.The automatically generated solution (right) focuses many cameras at areas of a larger volume.The user-generated solution focuses many cameras on small areas between containers, because these areas form a potential safety hazard which results in a lower overall function value.This indicates that scoring higher on a quality function does not necessarily results in better networks.Based on these results we conclude that the usergenerated solution is at least highly competitive with the automatically generated solution.We consider a safety application, an application where only experts are able to design a network.This is why we only have one user-generated solution generated by an expert.

Optimality : Office
The second case is an office scene where the goal is to place 25 cameras.The scene used in our examples is partly depicted in Fig. 4 (middle).A top view of the entire scene is shown in Fig. 7.The objective of the camera system is to maximally cover the scene.Cameras can be placed at ceiling lever over the entire area of the office.It is also important to note that there is an informal quality of metric that is impossible to encode.This results from practical problems in setting up the camera network, such as cabling possibilities, camera mounting options, etc.This area is relatively large which result in a relatively sparse camera placement problem.It also features a large complicated area in the middle which will be challenging.A sparse camera placement is the ideal setting for automated algorithms.This is because all cameras cover a different area, thus their measurement volumes are disjoint, which in turn implies a limited degree of submodularity.This scene also features many complex occlusions which further sparsifies the problem.For the automated algorithm we uniformly and randomly sample 10k points over the entire office at ceiling level.For each position we generate a random orientation that does not point upward.As cost function we consider the SCP problem because the main concern of the camera network is to cover as much of the scene as possible.Because we consider this problem more accessible to non experts we will provide solutions of 5 users.These users had no prior experience in designing camera networks.Each user was unaware of the automatically generated network, and did not get to see the specific scene before the test started.The user was tasked to maximize the coverage of the camera network.All cameras where placed outside the scene before the test started as to not influence the final solution.All evaluations of coverage are relative to a dense sampling of the environment of 4.8 million samples (resolution of 0.1 meter).The traditional workflow depicted in Fig. 2 already indicated that there often are iteration steps to improve the performance of automated algorithms.During the experiments we noticed that we could improve the performance by increasing the sample size of the viewpoint set.This shows that the obtained results with automated algorithms is dependent on the expert and his knowledge/luck performing the design task.Surprisingly all users in the experiment performed better than the initial result of the automated algorithm.This is surprising in part because the structure of the problem is ideal for automated algorithms.We also expected the scene to be too large (1600m 2 with 25 cameras) for users to keep an overview over everything which is necessary to find good solutions.After improving tweaking of the viewpoint sample size we where able to increase the performance of the automated algorithm, but users with our interface still scored competitive or better.Based on these experimental results we conclude that users with our virtual reality interface are at least highly competitive with automated algorithms, if not better.

Performance
It is important to note that we do not consider this as a formal performance test, there are too many variables that have an effect on performance.The main goal is to convince the reader of the scalability of the interface.The experimental scenes introduced in Fig. 4 have different areas and triangle counts and thus have a different complexity and size.Other parameters that have an effect on the problem complexity are the number of cameras and the number of voxels.In this experiment we will focus on two parameters that indicate performance.The first which we call latency is defined as the time between a user moving a camera and the user seeing its effect on the volume.It is important to note that the latency does not affect the framerate in the virtual reality device because they operate in different computational threads.A second metric is the average framerate of the virtual reality device.
In this section we will evaluate both metrics on our implementation of the presented interface.Even though there is room for optimization of our code with respect to performance, the results will give a lower bound on the achievable performance.All experiments where performed on a computer with Nvidia GTX1070 GPU, intel I7-8700 CPU and sufficient RAM.Both latency and average framerate are recorded during a walk through the scene.The latency is averaged over the entire run, but is independent on the position of the user.
In this experiment we add an additional scene which is the car park depicted in Fig. 4 in the right image.This scene is reconstructed from images and features the highest triangle count with 530k triangles.The images are obtained from the ETH3D 3D reconstruction benchmark [17] 3 and reconstructed using commercially available Autodesk ReCap Photo software.
The main results are summarized in Fig. 8.These results are obtained by positioning 10 cameras of a resolution of 400 by 640 pixels in each scene.Average framerates for all scenes are over 40 FPS.We further noticed that the average framerate was independent of the number of voxels.The effect of the number of voxels on the latency is linear as expected, and remains linear up to a high number of voxels (2 million).We also increased the number of cameras up to 30 cameras and noticed the same linear trend.In the interest of saving space we therefore omitted this result from this work.
The framerates of the virtual reality device are enough to provide a pleasant virtual reality experience, even for scenes of up to 530k triangles and a volume of 2M voxels which qualifies as a large scale problem.The latency measured in our experiments indicates there is a clear linear trade-off 78% 74%  between interactivity and accuracy.A user can therefore choose depending on which is more important for them.

DISCUSSION
The traditional recipe in the camera network design literature or related fields is to automate the design process, and distances users from this process.The motivation for this automation is that the task is too difficult for users to perform qualitatively.We however believe that this difficulty is not necessarily related to the problem structure.Automated algorithms also have structural difficulties with solving these problems as we have shown, and users can rely on strong geometrical reasoning which is not possible for computers.We identified that the problem users have with designing camera networks, is that it is difficult to visualize exactly what is happening.Using virtual reality we where able to visualize the usually invisible and therefore augment users capabilities.The advantage of this approach is that users are much more involved with the task at hand, and the usual abstraction of automated algorithms is unnecessary.An additional advantage is that the user can translate what is important in a specific problem much more easily which results in much less design iterations, because he understands the problem.We believe that the importance of our results transcend the camera network design literature.There are many problems in optimization that can uniquely be solved by experts.Often experts rely on an intuitive understanding of the problem to solve these issues.If this understanding is visual, a translations of this understanding to a visualization in virtual reality can enable non-experts to also solve these problems.We have presented an example of such a translation, for a challenging problem and shown that users are indeed capable of being competitive with automated approaches.
From our experiments we cannot conclude that usersgenerated solutions are systematically better than computergenerated solutions.We believe that if the problem gets more challenging users can improve on the quality of automated algorithms.In further work we will study the submodularorienteering problem, which is a combination of submodular maximization and the travelling salesman problem [22].Automated algorithms are provably less efficient, but the problem is highly geometrical.This problem occurs in robot path planning for inspection tasks on which many communities rely (dimensional metrology, infrared thermography, etc) [1].

CONCLUSION
We started by linking the camera network design problem to the monotone-submodular maximization problem with a cardinalilty constraint.Using this link we can conclude that any automated algorithm solving this problem is structurally limited in the solution quality that can be obtained.Furthermore for automated algorithms to be able to solve the network design problem user interaction is required.This required used interaction is however very abstract which is a big obstacle in the adoption of these algorithms.
In this work we proposed a virtual reality based user interface where the user can solve the camera network design problem manually, by applying geometrical reasoning.The workflow associated with this approach is much more intuitive and allows users without specialized knowledge to design camera networks.However, users with specialized knowledge can solve more specialized problems without having knowledge of automated camera network design algorithms.
From our experiments we concluded that user specified camera networks are highly competitive with automatically generated solutions.We demonstrated this in two structurally different real-world camera network design problems.Our experiments treated automated algorithms favorably by not including regularizers, that are required to make automatically generated solutions more realistically feasible.We also demonstrated the scalability of our approach to solve problems in geometries resulting from 3D reconstructions from images up to high fidelity.

Figure 2 :Figure 3 :
Figure 2: Orange arrows with a book pictograph indicate steps in a workflow that require specialist knowledge by the user.Orange arrows with a brain pictograph require visual reasoning of a user but no specialist knowledge, and the green arrows indicate work performed by a computer.

Figure 4 :
Figure 4: Overview of the various scenes used in our experiments and some associated metrics.The source of the scene is either from a CAD workflow or reconstructed from images.

Figure 5 :
Figure 5: Table that cross evaluates solutions generated for different valid quality functions on these quality functions.The bottom row evaluates the user-generated solution on each quality function, and the right column shows the total coverage for each solution.This table does not represent every quality function we tested, but only a random subset.The total set contains 60 quality functions and their average is shown on the right.

Figure 6 :
Figure 6: Comparison of a user-generated network (left) with an automatically generated network (right).The color values shown as a volume indicate for each point in the of interest its redundancy (how many cameras see the point).

Figure 7 :Figure 8 :
Figure 7: Top view of the office scene with uncovered areas shown in red.The left image displays a user-generated solution while the right shows a computer-generated solution.All percentages shown are coverage percentages so higher is better.The camera network consists of 25 cameras with aspect ration 2:1 and perspective angle of 90 degrees.