Deep Architecture
Damjan Jovanovic
“What could all this stuff mean for architectural design? ... There is a need for new strategies for working, new practices of engagement, new interpretative procedures and deployment tactics.�
A speculative architectural design research by Addin Cui
Forewords Deep Architecture: a speculative architectural design research
03
"Unpaired Image-to-Image Translation using CycleConsistent Adversarial Networks"
Something 2019
05
Hoaxurbanism M. Casey Rehm
16
Redirection Geopolitic Cycle 2018
18
Style Transfer "A Neural Algorithm of Artistic Style"
06
14
Mega Depth
"Observing AI" David Ruy
08
"Mega Depth: "
20
Ornamental Crime 2019
12
Some Depth 2019
22 Normal Map
"Appearance-Preserving Simplification"
30 32
Some Details 2019
02
BigGAN
CycleGAN
3D GAN "Learning a Probabilistic Latent Space of Object shapes via 3D Generative Adversarial Modeling"
34
"Large Scale GAN Training for High Fidelity Natural Image Synthesis"
36 EBM
"Implicit Generation and Generalization Methods for Energy-Based Models"
38 RNN
"A Neural Representation of Sketch Drawings"
40
Reinforcement Learning
"DeepTraffic: Crowdsourced Hyperparameter Tuning of Deep Reinforcement Learning Systems for Multi-Agent Dense Traffic Navigation"
42
Collections "The Promise of Generative Design"
44
"Architecture Automation" 2018
48
"Content Policy" Damjan Jovanovic
54
A speculative architectural design research by Addin Cui
“The Analytical Engine might act upon other things besides number‌ supposing, for instance, that the fundamental relations of pitched sounds in the science of harmony and of musical composition were susceptible of such expression and adaptations, the engine might compose elaborate and scientific pieces of music of any degree of complexity or extent.â€? -- Ada Lovelace, 1842. Deep learning deployment in architecture has the potential to dissolve the gap between functional and post functional methodologies. How can one think of architecture, in an age defined by emerging automation techniques, without completely yielding to either utilitarian algorithms like parametricism and biomimicry, or pseudo randomness of endless form finding experiments? How do we value us? Major architectural practices have been investing in highly automated design tools with renowned software companies. Design labor cost will be cut. Design employee will be forced to de-skill and re-skill. Architects will be left only with our aesthetics interpretations as the last frontier. An architect must actively engage with the cutting edge automation. An architect must dance with the uncertainty within, since the neural network is as unpredictable as the human psyche. An architect must prepare for it. In an attempt to embrace this inevitability, this thesis forges a constructive relationship with the advancement of contemporary deep learning tools. Taking advantage of the excellent performance with recent computer vision developments in two dimensional tensor, mapping as a modeling method will be in use. Spatial maps are generated through a depth estimation tool within CycleGAN framework. Details are then mutated via a high resolution Style Transfer neural network, combined with a high fidelity BigGAN. A technique of Appearance Preservation retains the level of details in a real time digital environment, to create an immersive interactive experience as designed.
Forewords
03
04
Obscure, 2018 by author Credit: A Neural Algorithm of Artistic Style, essay by Leon A. Gatys; Platform build by Ostagram; Fondation Louis Vuitton by Gehry Partners; Blessed Ludovica Albertoni by Gian Lorenzo Bernini;
Something
In AD 9102, empowered by ground breaking advancement in quantum computation, a society of universal income finally arrives upon humanity. There is no more concept of work. There is no more concept of salary. There is no more concept of money. There is no more concept of survival. There is only life. Everyone acts as a saint. Everyone has a abundant family. Everyone travels among prosper united states. Everyone forms a peace world. Everyone lives their life. Someone enjoys performance and applause. Someone enjoys overdosed into ECU daily. Someone enjoys abusing artificial slavery. Someone enjoys watching all of them. Where the f are architects? Architects are fed up with perfection. Architects are politicians. Architects are rebellions. Architects are seeking for fetish. To do what? Something intrigues others and themselves. Something exfoliated from the norm. Something fast and slow, strong and mild. Something deep.
Forewords
05
Figure 2: Images that combine the content of a photograph with the style of several well-known
A Neural Algorithm of Artistic Style arXiv:1508.06576v2 [cs.CV] 2 Sep 2015
Leon A. Gatys,1,2,3∗ Alexander S. Ecker,1,2,4,5 Matthias Bethge1,2,4 1
2 3
Werner Reichardt Centre for Integrative Neuroscience
and Institute of Theoretical Physics, University of Tübingen, Germany Bernstein Center for Computational Neuroscience, Tübingen, Germany
Graduate School for Neural Information Processing, Tübingen, Germany 4
Max Planck Institute for Biological Cybernetics, Tübingen, Germany
Department of Neuroscience, Baylor College of Medicine, Houston, TX, USA ∗ To whom correspondence should be addressed; E-mail: leon.gatys@bethgelab.org 5
In fine art, especially painting, humans have mastered the skill to create unique visual experiences through composing a complex interplay between the content and style of an image. Thus far the algorithmic basis of this process is unknown and there exists no artificial system with similar capabilities. However, in other key areas of visual perception such as object and face recognition near-human performance was recently demonstrated by a class of biologically inspired vision models called Deep Neural Networks.1, 2 Here we introduce an artificial system based on a Deep Neural Network that creates artistic images of high perceptual quality. The system uses neural representations to separate and recombine content and style of arbitrary images, providing a neural
06
algorithm for the creation of artistic images. Moreover, in light of the striking similarities between performance-optimised artificial neural networks and biological vision,3–7 our work offers a path forward to an algorithmic understanding of how humans create and perceive artistic imagery. 1
Figure 3: Detailed results for the style of the painting Composition VII by Wassily Kandinsky. The rows show the result of matching the style representation of increasing subsets of the CNN layers (see Methods). We find that the local image structures captured by the style representation increase in size and complexity when including style features from higher layers of the network. This can be explained by the increasing receptive field sizes and feature complexity along the network’s processing hierarchy. The columns show different relative weightings between the content and style reconstruction. The number above each column indicates the
The class of Deep Neural Networks that are most powerful in image processing tasks are called Convolutional Neural Networks. Convolutional Neural Networks consist of layers of small computational units that process visual information hierarchically in a feed-forward manner (Fig 1). Each layer of units can be understood as a collection of image filters, each of which extracts a certain feature from the input image. Thus, the output of a given layer consists of so-called feature maps: differently filtered versions of the input image. When Convolutional Neural Networks are trained on object recognition, they develop a representation of the image that makes object information increasingly explicit along the processing hierarchy.8 Therefore, along the processing hierarchy of the network, the input image is transformed into representations that increasingly care about the actual content of the image compared to its detailed pixel values. We can directly visualise the information each layer contains about the input image by reconstructing the image only from the feature maps in that layer9 (Fig 1, content reconstructions, see Methods for details on how to reconstruct the image). Higher layers in the network capture the high-level content in terms of objects and their arrangement in the input image but do not constrain the exact pixel values of the reconstruction. (Fig 1, content reconstructions d,e). In contrast, reconstructions from the lower layers simply reproduce the exact pixel values of the original image (Fig 1, content reconstructions a,b,c). We therefore refer to the feature responses in higher layers of the network as the content A Neural Algorithm of Artistic Style representation. Leon A. Gatys, Alexander S. Ecker, Matthias Bethge “In fine art, especially painting, have mastered To obtain a humans representation of the the style of an input image, we use a feature space originally skill to create unique visual experiences through composing a complex interplay between the content and style of an 8 This feature space is built on top of the filter responses designed to capture information. image. Thus far the algorithmic basis texture of this process is unknown and there exists no artificial system with similar capabilities. othernetwork. key areas ofItvisual in eachHowever, layer ofin the consists of the correlations between the different filter responses perception such as object and face recognition near-human performance was recently demonstrated by a class of thevision spatial extent of the feature maps (see Methods for details). By including the feature biologicallyover inspired models called Deep Neural Networks. Here we introduce an artificial system based on a Deep Neural Network thatof creates artistic layers, images of we high obtain a stationary, multi-scale representation of the input correlations multiple perceptual quality. The system uses neural representations to separate and recombine content and style of arbitrary image,a neural whichalgorithm captures itscreation texture images, providing for the of information but not the global arrangement. artistic images. Moreover, in light of the striking similarities between performance optimized artificial neural networks and biological vision,3–7 our work offers a path forward to 2 an algorithmic understanding of how humans create and perceive artistic imagery.”
07
Style Transfer
The recent development of artificial intelligence is producing startling results. To better understand its implications for architecture, my partner Karel Klein and I have been experimenting with style transfer algorithms based on convolutional neural networks. In a recent gallery installation, entitled Apophenia, we generated fake aerial photographs using this AI model. Though we are still in a relatively early stage of research into this topic, we have some initial observations.
Observing AI David Ruy, 2018 Clog: Artificial Intelligence
08
AI reforms the problem of form versus content. This dichotomy was already fully articulated in ancient Greece two millennia ago. It is the conceptual substrate of religions (bodies and souls), art histories (shapes and meanings), and scientific traditions (objects and forces). What this long history always had as a decisive component is the interpretative function of the human observer for constructing values within its framework — values being the prerequisite for new actions. AI models transform the problem of form versus content into a problem of numerical modeling. The enacting of new actions can then be automated, skipping the middleman — the human beholder. It is nonetheless important to observe that AI models are abstractions. They are not concrete; not the protein-based neuroelectrical apparatuses found in side the bodies of biological entities; not events in the evolution of biological intelligence. They are numericallybased conceptual apparatuses that can be freely deployed on any platform that can store and
compute signaling events. They are not brains, they are abstractions of brains. Humans are producing these abstractions. In future versions of software, AI will likely introduce new metaphors of action that will become more important than the concept of "undo." Consider the strange genius of the "undo" metaphor: as you work, your decisions are recorded. You can then move back and forth within the event history of your labor. This astonishing ability to "edit" your event history is commonplace now. But there are even more surprising things on the way. Primitive attempts to embed the knowledge of the expert into software has given us things like "autocorrect." Right now, this is nothing more than reminders to fix types, grammatical suggestions, and automations of formatting conventions. Might we see new software metaphors such as, "show me other options," "make it better," "tell me what you think," or simply, "do it for me?" I'm not thinking of word processors, I'm thinking of design software.
Ultimately, AI is not about substituting human intelligence, but about substituting human labor. Earlier this year, in The New York Times, Thomas Edsall wrote an opinion piece entitles, "Robots Can't Vote, But They Helped Elect Trump." He cites a study by MIT that maps where automation technologies are most substituting human labor in the United States. It uncannily correlated to those states that flipped from blue to red in the last presidential election. Automation has been putting people out of work; this is the flip side of technological progress. AI isn't collaborative labor, but slave labor. Dystopian science fiction may have gotten that part right. Unlike sci-fi, however, the important conversation is not whether AI should be given rights, but about redistributing wealth produced by this automated labor equitably. Whether we'll be enlightened enough as a society to navigate this new frontier remains to be seen.
Neural Art Style Transfer - Google Deep Dream Team, 2015
10
1
Werner Reichardt Centre for Integrative Neuroscience
and Institute of Theoretical Physics, University of Tübingen, Germany 2 Bernstein Center for Computational Neuroscience, Tübingen, Germany 3
Graduate School for Neural Information Processing, Tübingen, Germany 4
Max Planck Institute for Biological Cybernetics, Tübingen, Germany
Department of Neuroscience, Baylor College of Medicine, Houston, TX, USA ∗ To whom correspondence should be addressed; E-mail: leon.gatys@bethgelab.org 5
In fine art, especially painting, humans have mastered the skill to create unique visual experiences through composing a complex interplay between the content and style of an image. Thus far the algorithmic basis of this process is unknown and there exists no artificial system with similar capabilities. HowNeural Art Style Transfer - Ostagram Platform
arXiv:1508.06576v2 [cs.CV] 2 Sep 2015
Leon A. Gatys,1,2,3∗ Alexander S. Ecker,1,2,4,5 Matthias Bethge1,2,4
Neural Art Style Transfer - Deep Dream Generator
A Neural Algorithm of Artistic Style
ever, in other key areas of visual perception such as object and face recognition
near-human performance was recently demonstrated by a class of biologically
inspired vision models called Deep Neural Networks.1, 2 Here we introduce an
artificial system based on a Deep Neural Network that creates artistic images
of high perceptual quality. The system uses neural representations to separate and recombine content and style of arbitrary images, providing a neural
11
algorithm for the creation of artistic images. Moreover, in light of the strik-
ing similarities between performance-optimised artificial neural networks and
biological vision,3–7 our work offers a path forward to an algorithmic understanding of how humans create and perceive artistic imagery. 1
12
Ornamental Crime 2018 Credit: A Neural Algorithm of Artistic Style, essay by Leon A. Gatys et al.; Platform build by Ostagram; Original photograph of Gehry Partners’ LRCBH, by Ken Lane
Exquisite Wood Relief 2018 Credit: A Neural Algorithm of Artistic Style, essay by Leon A. Gatys et al.; Platform build by Ostagram;
13
14
Unpaired Image-to-Image Translation using CycleConsistent Adversarial Networks Jun-Yan Zhu, Taesung Park, Phillip Isola, Alexei A. Efros “Image-to-image translation is a class of vision and graphics problems where the goal is to learn the mapping between an input image and an output image using a training set of aligned image pairs. However, for many tasks, paired training data will not be available. We present an approach for learning to translate an image from a source domain X to a target domain Y in the absence of paired examples. Our goal is to learn a mapping G: X Y such that the distribution of images from G(X) is indistinguishable from the distribution Y using an adversarial loss. Because this mapping is highly under-constrained, we couple it with an inverse mapping F: Y → X and introduce a cycle consistency loss to push F(G(X)) ≈ X (and vice versa). Qualitative results are presented on several tasks where paired training data does not exist, including collection style transfer, object transfiguration, season transfer, photo enhancement, etc. Quantitative comparisons against several prior methods demonstrate the superiority of our approach.”
CycleGAN
15
16
Hoaxurbanism M. Casey Rehm, Studio Kinch, 2018. Images from Instagram & official website Credit: CycleGAN essay by Jun-Yan Zhu et al.
17
IP Redirection Geopolitic study through CycleGAN 2018 Credit: Original Images from Google Earth; CycleGAN essay by Jun-Yan Zhu; CycleGAN build by Xiaowei Hu
18
neM aiX naijuF NC
sageV saL adaveN SU
NAGelcyC hguorht yduts citilopoeG noitcerideR PI 8102 ;uhZ naY-nuJ yb yasse NAGelcyC ;htraE elgooG morf segamI lanigirO :tiderC uH iewoaiX yb dliub NAGelcyC
esoJ naS ainrofilaC SU
19
3.3. Depth enha
Multi-view ste of object MegaDepth: sotohLearning P tenretnSingle-View I morf nnumber o i t c i Depth derP people and cars, di and traffic signal understand the se Zhengqiattempt Liyletovamitiga nNoah S Departmentyof tisrComputer evinU llenScience rlematic oC ,hcpixels. e& T lCor leW n for semantic segm (a) Input photo (b) Raw depth (c) Refined depth enough for this us We propose thr igure 2: Comparison between MVS depth maps with Abstract methods. the creation of our nd without our proposed refinement/cleaning to remove spuriou he raw MVS depth maps (middle) exhibit depth bleeding Single-view prediction a fundamental problem ond, we use the s op) or incorrect depth ondepth people (bottom).isOur methods computer vision.such Recently, deep learning methods have each photo as pro right) canincorrect or remove outlier depths. led to significant progress, but such methods are limited depthbydata. Finall depth to aut(b the available training data. Current datasets based on 3D (a)MVS Image ships, sensors key limitations, .2. Depth map have refinement D3ekaM mincluding orf egamI )b( indoor-only muessFigure oloCimages fo 8: otoDepth hpwhich tenretpredict nI )can a( cannot (NYU), small numbers of training examples (Make3D), andbe reconstr The raw depth sampling maps from(KITTI). C OLMAPWe contain manytooutliers Training setSemantic Method RM sparse propose use multi-view In- filterin KITTI Liu et al. [23] rom a range of sources, including: (1)a transient (peosegment6 ternet photo collections, virtuallyobjects unlimited data semantic source, Eigen et al. [7] 6 le, cars, to etc.) that appear in a single image but nonetheless tation method, tra generate training data via modern structure-from-motion Zhou et al. [43] 6 Godard et al. [13] re assigned noisymethods, depth discontinuand(incorrect) multi-viewdepths, stereo(2) (MVS) and present a(consisting large of 1505 Make3D Laina et al. [19] 8 ies, and (3) bleeding of background depths into foreground pixels into thre depth dataset called MegaDepth based on this idea.theData Liu et al. [22] 8 bjects. Other MVSfrom methods exhibit similar problems due to derived MVS comes with its own challenges, includNYU Eigen et al. [6] 10 n o i t c i d e r p h t p e d w e i v e l g n i s r u O ) d ( n o i t c i d e r p h t p ed1. weForeground iv-elgnis ruO )10 c( nherent ambiguities in stereo matching. Figure 2(b) shows Liu et al. [22] ing noise and unreconstructable objects. We address these jects that oft Laina et al. [19] 10 wo example depth maps produced by C OLMAP that illuschallenges with new data cleaning methods, asCSwell as autoZhou et al. [43] stati 7 cluding ate these issues. Such outliers have a highly negative effect matically augmenting our data with ordinal depth DIW relations Chen et al. [4]and dy 7 tains) n the depth prediction networks we seek to train. To address MD Ours 6 generated usingtwo semantic segmentation. We validate the use 2. Background his problem, we propose new depth refinement methods MD+KITTI Ours 5 of large amounts of Internet data by showing that models I T T I K m o r f e g a m I ) e ( towers, moun esigned to generate high-quality training data: trained on MegaDepth exhibit strong generalization—not full of Table 5: Results ondetails the KIT First, we devise a modified MVS algorithm based on only to novel scenes, but also to other diverse datasets ining datasets and approaches 3. Sky, denoted C OLMAP,cluding but moreMake3D, conservative in its depth estimates, based KITTI, and DIW, even when no images filterin Trainingdepth set Method n the idea that we would prefer less training data over bad 1
20
h Prediction tpeD weiV-from elgniS Internet gninraePhotos L :htpeDageM
haoSnavely N iL iqgnehZ rnell oC & Tech, ecneCornell icS retuUniversity pmoC fo tnemtrapeD
Figure 3: Examples of automatic ordinal labeling. Blue mask: foreground (Ford ) derived from semantic segmenta tcartsbA tion. Red mask: background (Bord ) derived from recon structed depth. melborp latnemadnuf a si noitciderp htped weiv-elgniS evah sdohtem gninrael peed ,yltneceR .noisiv retupmoc ni yb detimto il factors era sdosuch htemas hcthe us taccuracy ub ,ssergoofrpthe tnaestimated cfiingis otcamera del pose D3 n(c) o dDIW esathe b spresence tesat(d) adBest tnofeNYU rlarge ru[23] C .occluders. ata(e) d Best gniMake3D niHence, art e[19] lbawe liavfound a e(f)hMD t that it is or [4] b) GT sphoto ega(Blue=near, m ybeneficial lno-roored=far.) dntoi g(b) niImage dutraining lcfrom nofi ,the sMake3D nomodels mwere il yekof ehighly vahonsrKITTI oreliable snes data. limit toitaatisubset depth (a) Internet of iColosseum tions on KITTI. None trained dna ,)D3maps. ekaM(We selpdevise maxe gansimple iniart fobut sreeffective bmun llam s ,)U N( way toYcompute a MS RMS(log) can -nIAbs weRel iv-subset itSqluRel m eof suhigh-quality odatasets. t esoporOne p eW .)Isee TTmaps, Ithat K( we gnby iachieve lpthresholding mas much esrapbetter s byvisu depth the pr 6.52 0.275 ,ecru0.202 os afraction ta1.614 d detiof mireconstructed lquality nu yllacompared utriv pixels. a ,tosnother oitIn cenon-KITTI lloc otohdatasets, pifte≥ nr30% etand our particular, of an 6.31 0.282 0.203 1.548 nearby objects such as traffi n o i t o m m o rf-erIut(ignoring curdictions ts nredcan omreasonably aiv aregion tadcapture gnS) iniaconsists rt etarenofegvalid ot depth 6.86 0.283 0.208 image 1.768 the sky signs, cars, and trees, due to our ordinal depth loss. 5.93 0.247 egra0.148 l a tvalues, ne1.334 serp then dna we ,sdokeep htemthat )SVimage M( oeras etstraining weiv-itldata um dfor nalearning DIW. Finally, we test our network on the DIW dataset [4 8.68 0.422 0.339 3.136 a t a D . a e d i s i h t n o d esabconsists htpecriterion DofagInternet eM dprefers elphotos lac teimages swith atadgeneral hwithout tpedscenelarge depth. This 8.70 0.447 0.362 Euclidean 3.465 DIW stru -dulc0.521 ni ,transient sprediction e5.016 gnella(d) hforeground cOur nwsingle-view o Each sti hobjects tidepth w seprediction m(e.g., oc Shas V Ma selfies”). m orf dpair eviof repoints dthe same 0.37 0.510 “no At tures. image in DIW single with (c) Our single-view depth 0.10 0.526 0.540 5.059 human-labeled depth Make3 eseh0.515 t ssetime, rd5.049 da such eW .sforeground-heavy tcejbo elbatcordinal urtsnoimages cernrelationship. u dare na eextremely sionAsgwith ni useful 0.07 0.527 KITTI, Fo -otua0.267 sa for lle2.686 wanother sa ,sdoand htem gninautomatically awe elcdoatnot ad use wenDIW htiwdata segduring neltraining lahtraining. c purpose: generating data 7.58 0.334 talerfor h3.260 tplearning ed lanidDIW, ro hquality tiw depth ataisdcomputed rurelationships. o gnivia tnethe mgWeighted ua yllacHuman itam Disagre 7.12 0.474snoi0.393 ordinal ment Rate (WHDR), which measures the frequency of di 6.68 0.414esu e 0.368 ht eta2.587 dilav eW agreement .noitatnem ges citpredicted names gdepth nisu maps detarand enehuman g between ann Automatic ordinal depth labeling. As noted above, tran 5.25 0.229 0.139 1.325 sledom t(e) ahImage t gnifrom wohKITTI s yb aontaadtest tenset. retNumerical nI fo stnuresults oma egrshown al fo in Table tations sient or difficult to reconstruct objects, suchare as people, cars Our MD-trained network again has the best performanc t o n — n o i t a z i l a r e n e g g n o r t s t i b i h x e h t p e D a g e M n o d e n i a r t TTI test set for various trainand street signs arealloften missing from MVS reconstructions among non-DIW trained models. Figure ou n i s t e s a t a d e s r e v i d r e h t o o t o s l a t u b , s e n e c s l e v o n o t y9lnvisualizes o s. Columns are as in Table 4. Therefore, using Internet-derived data alone, we will lack predictions and those of other non-DIW-trained network sMega egamiDepth on nehw neon veDIW ,WID d n a , I T T I K , D 3 e k a M g n i d u l c test images. Our predictions achieve visually bett WHDR% ground 1 truth depth for such objects, and will likely do a poor MegaDepth: Learning Single-View Depth Prediction from Internet Photos Zhengqi Li, Noah Snavely
“Single-view depth prediction is a fundamental problem in computer vision. Recently, deep learning methods have led to significant progress, but such methods are limited by the available training data. Current datasets based on 3D sensors have key limitations, including indoor-only images (NYU), small numbers of training examples (Make3D), and sparse sampling (KITTI). We propose to use multi-view Internet photo collections, a virtually unlimited data source, to generate training data via modern structure-from-motion and multi-view stereo (MVS) methods, and present a large depth dataset called MegaDepth based on this idea. Data derived from MVS comes with its own challenges, including noise and unreconstructable objects. We address these challenges with new data cleaning methods, as well as automatically augmenting our data with ordinal depth relations generated using semantic segmentation. We validate the use of large amounts of Internet data by showing that models trained on MegaDepth exhibit strong generalization---not only to novel scenes, but also to other diverse datasets including Make3D, KITTI, and DIW, even when no images from those datasets are seen during training.”
21
OPP Interior MegaDepth 2019 Credit: Original images by David Burdeny; CycleGAN essay by Jun-Yan Zhu; MegaDepth essay and build by Zhengqi Li
22
23
Credit: Original image by Alex Powell; CycleGAN essay by Jun-Yan Zhu; CycleGAN build by Xiaowei Hu; MegaDepth essay and build by Zhengqi Li; A Neural Algorithm of Artistic Style, essay by Leon A. Gatys; Platform build by Ostagram;
Exquisite Modernism 2019
24
26
One point perspective depth map implementation 2019
27
Spherical depth map implementation 2019
28
performs a mip-mapped look-up of the normal and applies a
gle counts are: 7,809, 3,905, 1,951, 975, 488
Appearance-Preserving Simplification Jonathan Cohen
Marc Olano
Dinesh Manocha
University of North Carolina at Chapel Hill
Abstract
Figure 4: A patch from the leg of an armadillo model and its associated normal map.
i j
Figure 5: Lion model.
a spring system with uniform weights. A side-by-side comparison of various choices of weights in [12] shows that uniform weights produce more evenly-distributed vertices than some other choices. For parameterizations used only with one particular map, it is also possible to allow more area compression where data values are similar. While this technique will generally create reasonable parameterizations, it would be better if there were a way to also guarantee that F(X) is one-to-one, as in the graph drawing literature.
Vi , V j ∈m 0 , i ≠ j
4.2 Creating Texture and Normal Maps
Given a polygonal surface patch, m0, and its 2D parameterization, F, it is straightforward to store per-vertex colors and normals into the appropriate maps using standard rendering software. To create a map, scan convert each triangle of m0, replacing each of its vertex coordinates, Vj, with F(Vj), the texture coordinates of the vertex. For a texture map, apply the Gouraud method for linearly interpolating the colors across the triangles. For a normal map, interpolate the per-vertex normals across the triangles instead (Figure 4). The most important question in creating these maps is what the maximum resolution of the map images should be. To capture all the information from the original mesh, each vertex's data should be stored in a unique texel. We can guarantee this conservatively by choosing 1/d x 1/d for our map resolution, where d is the minimum distance between vertex texture coordinates: (1) min F( V ) − F( V )
d=
If the vertices of the polygonal surface patch happen to be a uniform sampling of the texture space (e.g. if the polygonal surface patch was generated from a parametric curved surface patch), then the issues of scan conversion and resolution are simplified considerably. Each vertex color (or normal) is simply
We present a new algorithm for appearance-preserving simplification. Not only does it generate a low-polygon-count approximation of a model, but it also preserves the appearance. This is accomplished for a particular display resolution in the sense that we properly sample the surface position, curvature, and color attributes of the input surface. We convert the input surface to a representation that decouples the sampling of these three attributes, storing the colors and normals in texture and normal maps, respectively. Our simplification algorithm employs a new texture deviation metric, which guarantees that these maps shift by no more than a user-specified number of pixels on the screen. The simplification process filters the surface position, while the runtime system filters the colors and normals on a per-pixel basis. We have applied our simplification technique to several large models achieving significant amounts of simplification with little or no loss in rendering quality. CR Categories: I.3.5: Object hierarchies, I.3.7: Color, shading, shadowing, and texture Additional Keywords: simplification, attributes, parameterization, color, normal, texture, maps
The majority of work in the field of simplification has focused on surface approximation algorithms. These algorithms bound the error in surface position only. Such bounds can be used to guarantee a maximum deviation of the object’s silhouette in units of pixels on the screen. While this guarantees that the object will cover the correct pixels on the screen, it says nothing about the final colors of these pixels. Of the few simplification algorithms that deal with the remaining two attributes, most provide some threshold on a maximum or average deviation of these attribute values across the model. While such measures do guarantee adequate sampling of all three attributes, they do not generally allow increased simplification as the object becomes smaller on the screen. These threshold metrics do not incorporate information about the object’s distance from the viewpoint or its area on the screen. As a result of these metrics and of the way we typically represent these appearance attributes, simplification algorithms have been quite restricted in their ability to simplify a surface while preserving its appearance.
1.1 Main Contribution
30
Figure 6: Texture coordinate deviation and correction on the lion’s tail. Left: 1,740 triangles full resolution. Middle and Right: 0.25 mm maximum image deviation. Middle: 108 triangles, no texture deviation metric. Right: 434 triangles with texture metric.
The order in which these operations are performed has a large impact on the quality of the resulting surface, so simplification algorithms typically choose the operations in order of increasing error according to some metric. This metric may be local or global in nature, and for surface approximation algorithms, it provides some bound or estimate on the error in surface position. The operations to be performed are typically maintained in a priority queue, which is continually updated as the simplification progresses. This basic design is applied by many of the current simplification algorithms, including [6-8, 15]. To incorporate our appearance-preservation approach into such an algorithm, the original algorithm is modified to use our texture deviation metric in addition to its usual error metric. When an edge is collapsed, the error metric of the particular surface approximation algorithm is used to compute a value for Vgen, the surface position of the new vertex (see Figure 3). Our texture deviation metric is then applied to compute a value for vgen, the texture coordinates of the new vertex. For the purpose of computing an edge’s priority, there are several ways to combine the error metrics of surface approximation along with the texture deviation metric, and the appropriate choice depends on the algorithm in question. Several possibilities for such a total error metric include a weighted combination of the two error metrics, the maximum or minimum of the error metrics, or one of the two error metrics taken alone. For instance, when integrating with Garland and Heckbert’s algorithm [6], it would be desirable to take a weighted combination in order to retain the precedence their system accords the topology-preserving collapses over the topology-modifying collapses. Similarly, a weighted combination may be desirable for an integration with Hoppe’s system [15], which already optimizes error terms corresponding to various mesh attributes.
We present a new algorithm for appearance-preserving simplification. We convert our input surface to a decoupled representation. Surface position is represented in the typical way, by a set of 1 INTRODUCTION triangles with 3D coordinates stored at the vertices. Surface colors 249,924 triangles 62,480 triangles 7,809 triangles 975 triangles Simplification of polygonal surfaces has been an active area of and normals are stored in texture and normal maps, respectively. 5 mm max image deviation 0.25 mm max image deviation maxcolors image 6.6 mm image research in computer graphics. The main goal of simplification is1.3 mm These anddeviation normals are mapped to themax surface withdeviation the aid to generate a low-polygon-count approximation that maintains the ofnormal a surface parameterization, represented as 2D texture coordiFigure 12: Close-up of several levels of detail of the armadillo model. Top: maps Bottom: per-vertex normals high fidelity of the original model. This involves preserving the nates at the triangle vertices. model’s main features and overall appearance. Typically, there are The surface position is filtered using a standard surface apthree appearance attributes that contribute to the overall appearproximation algorithm that makes local, complexity-reducing ance of a polygonal surface: simplification operations (e.g. edge collapse, vertex removal, etc.). The color and normal attributes are filtered by the run-time system 1. Surface position, represented by the coordinates of the at the pixel level, using standard mip-mapping techniques [1]. polygon vertices. Because the colors and normals are now decoupled from the surface position, we employ a new texture deviation metric, which 2. Surface curvature, represented by a field of normal effectively bounds the deviation of a mapped attribute value’s vectors across the polygons. position from its correct position on the original surface. We thus 3. Surface color, also represented as a field across the guarantee that each attribute is appropriately sampled and mapped polygons. to screen-space. The deviation metric necessarily constrains the simplification algorithm somewhat, but it is much less restrictive The number of samples necessary to represent a surface accurately than retaining sufficient tessellation to accurately represent colors depends on the nature of the model and its area in screen pixels and normals in a standard, per-vertex representation. The preser(which is related to its distance from the viewpoint). For a vation of colors using texture maps is possible on all current simplification algorithm to preserve the appearance of the input graphics systems that supports real-time texture maps. The surface, it must guarantee adequate sampling of these three preservation of normals using normal maps is possible on protoattributes. If it does, we say that it has preserved the appearance type machines today, and there are indications that hardware with respect to the display resolution. e-mail: {cohenj,dm}@cs.unc.edu, olano@engr.sgi.com WWW: http://www.cs.unc.edu/~geom/APS
Figure 1: Bumpy Torus Model. Left: 44,252 triangles full resolution mesh. Middle and Right: 5,531 triangles, 0.25 mm maximum image deviation. Middle: per-vertex normals. Right: normal maps
Appearance-Preserving Simplification Jonathan Cohen, Marc Olano, Dinesh Manocha “We present a new algorithm for appearance-preserving simplification. Not only does it generate a low-polygoncount approximation of a model, but it also preserves the appearance. This is accomplished for a particular display resolution in the sense that we properly sample the surface position, curvature, and color attributes of the input surface. We convert the input surface to a representation that decouples the sampling of these three attributes, storing the colors and normals in texture and normal maps, respectively. Our simplification algorithm employs a new texture deviation metric, which guarantees that these maps shift by no more than a user-specified number of pixels on the screen. The simplification process filters the surface position, while the runtime system filters the colors and normals on a per-pixel basis. We have applied our simplification technique to several large models achieving significant amounts of simplification with little or no loss in rendering quality.�
Normal Map
31
32
Spherical normal map and ambient occlusion map implementation I 2019
33
Real time sample in Unreal Engine 4, I 2019
3D GAN Implementation on Knight chess pieces 2019 Design Team: Andrew Chittenden, Ben Weisgall, Talbot Schmidt, Daniel Aris, Addin Cui Instructor: M. Casey Rehm
34
Learning a Probabilistic Latent Space of Object Shapes via 3D Generative-Adversarial Modeling Jiajun Wu, Chengkai Zhang, Tianfan Xue, William T. Freeman, and Joshua B. Tenenbaum “We study the problem of 3D object generation. We propose a novel framework, namely 3D Generative Adversarial Network (3D-GAN), which generates 3D objects from a probabilistic space by leveraging recent advances in volumetric convolutional networks and generative adversarial nets. The benefits of our model are three-fold: first, the use of an adversarial criterion, instead of traditional heuristic criteria, enables the generator to capture object structure implicitly and to synthesize high-quality 3D objects; second, the generator establishes a mapping from a low-dimensional probabilistic space to the space of 3D objects, so that we can sample objects without a reference image or CAD models, and explore the 3D object manifold; third, the adversarial discriminator provides a powerful 3D shape descriptor which, learned without supervision, has wide applications in 3D object recognition. Experiments demonstrate that our method generates highquality 3D objects, and our unsupervisedly learned features achieve impressive performance on 3D object recognition, comparable with those of supervised learning methods.�
3D GAN
35
BigGAN Breeder Playground 2019 credit: Joel Simon, GANBreeder platform
36
Large Scale GAN Training for High Fidelity Natural Image Synthesis Andrew Brock, Jeff Donahue, Karen Simonyan “Despite recent progress in generative image modeling, successfully generating high-resolution, diverse samples from complex datasets such as ImageNet remains an elusive goal. To this end, we train Generative Adversarial Networks at the largest scale yet attempted, and study the instabilities specific to such scale. We find that applying orthogonal regularization to the generator renders it amenable to a simple "truncation trick," allowing fine control over the trade-off between sample fidelity and variety by reducing the variance of the Generator's input. Our modifications lead to models which set the new state of the art in class-conditional image synthesis. When trained on ImageNet at 128x128 resolution, our models (BigGANs) achieve an Inception Score (IS) of 166.5 and Frechet Inception Distance (FID) of 7.4, improving over the previous best IS of 52.52 and FID of 18.6.�
BigGAN
37
38
Implicit Generation and Generalization Methods for Energy-Based Models Yilun Du, Igor Mordatch "Energy based models (EBMs) are appealing due to their generality and simplicity in likelihood modeling, but have been traditionally difficult to train. We present techniques to scale MCMC based EBM training, on continuous neural networks, and show its success on the high-dimensional data domains of ImageNet32x32, ImageNet128x128, CIFAR-10, and robotic hand trajectories, achieving significantly better samples than other likelihood models and on par with contemporary GAN approaches, while covering all modes of the data. We highlight unique capabilities of implicit generation, such as energy compositionality and corrupt image reconstruction and completion. Finally, we show that EBMs generalize well and are able to achieve state-of-the-art out-of-distribution classification, exhibit adversarially robust classification, coherent long term predicted trajectory roll-outs, and generate zero-shot compositions of models."
EBM
39
40
A Neural Representation of Sketch Drawings David Ha, Douglas Eck "We present sketch-rnn, a recurrent neural network (RNN) able to construct stroke-based drawings of common objects. The model is trained on thousands of crude human-drawn images representing hundreds of classes. We outline a framework for conditional and unconditional sketch generation, and describe new robust training methods for generating coherent sketch drawings in a vector format."
RNN
41
42
DeepTraffic: Crowdsourced Hyperparameter Tuning of Deep Reinforcement Learning Systems for MultiAgent Dense Traffic Navigation Lex Fridman, Jack Terwilliger, Benedikt Jenik "We present a traffic simulation named DeepTraffic where the planning systems for a subset of the vehicles are handled by a neural network as part of a modelfree, off-policy reinforcement learning process. The primary goal of DeepTraffic is to make the hands-on study of deep reinforcement learning accessible to thousands of students, educators, and researchers in order to inspire and fuel the exploration and evaluation of deep Q-learning network variants and hyperparameter configurations through large-scale, open competition. This paper investigates the crowdsourced hyperparameter tuning of the policy network that resulted from the first iteration of the DeepTraffic competition where thousands of participants actively searched through the hyperparameter space."
Reinforcement Learning
43
“Computational design promises to have a profound, far-reaching impact upon the architecture profession by automating the creation of bespoke building plans that solve specific project challenges. The latest generative design software developed by companies such as Autodesk is capable of independently producing customized architectural plans free of any further human interaction following the setting of initial parameters and the launch of programs. Architects can adjust the software to produce an endless slew of architectural designs that satisfy the specific criteria associated with a given project in terms of cost, constructibility or performance dynamics, at a pace and level of productivity that would be impossible for human beings to match.�
44
The Promise of Generative Design Marc K. Howe, Danil Nagy
Auto Desk
45
According to Danil Nagy, designer and senior research scientist with Autodesk Research Group’s The Living, generative design programs can play a critical role in helping architects resolve the practical challenges encountered during the design process, such as creating spatial layouts that are more conducive to pedestrian traffic or help to foster colleague interaction, or maximizing exposure of building occupants to daylight and exterior views. "The starting point for our use of generative design was trying to determine which aspects of the architectural design process were the most complex and difficult for humans to think through, and figuring out how to get a computer to work them out for us," said Nagy. "It’s all about isolating those very tricky practical issues, and then using a computer to automate the development of solutions for those problems." An outstanding example of the tremendous potential of generative design as a problem-solving design tool is Autodesk’s development of a new office and research space in Toronto’s MaRS Innovation District. One of the key goals in developing the new office and research space was the creation of a spatial layout that would foster
happenstance interaction between the diverse range of talent setting up shop in the MaRS district. "Our focus was on space planning – we wanted people to travel easily around the office, yet we also wanted to create these zones of congestion where people could interact and meet with each other," said Nagy. "We developed the architectural concept of breaking up shared spaces into amenities zones that in turn break up the office." The researchers realized that generative design was an ideal means for solving the practical hurdles associated with this innovative design mandate. "Once we had the concept, the design task was to arrange all these spaces in the office to basically maximize certain design aspects of the office and minimize some of the difficulties," said Nagy. "We felt that this kind of space planning was the perfect opportunity for the generative design process." The use of generative design in tandem with modern computing power enabled the researchers to attack the design challenge with a level of intensity that would have been impossible for human beings to replicate: "You can go through the feedback loop of sketching
out a design and then testing it yourself maybe just a few times. The computer can iterate through that process extremely quickly, so in this case we did 10,000 design options in just a few days. The computer can then evaluate each one and tell us which ones it thinks are the best. Human architects can then evaluate them on the basis of subjective factors such as aesthetics." In addition to solving vexing practical challenges, generative design could also raise the efficiency and economy of the overall development process by producing designs that improve scheduling and coordination between multiple stakeholders. "There are major potential savings from an efficiency standpoint for the actual building process, because architecture isn’t just about designing the vision of a building in a computer - it’s also about interfacing with contractors," said Nagy. "A lot of efficiency and cost gains come from the scheduling of construction, and coordinating the process of actually getting things built. Generative design could be used to maximize construction efficiency or budget efficiency once other stakeholders like clients and contractors are brought into the design process."
46
While computational design has the potential to bring myriad benefits to the development process, some observers have expressed deep misgivings about its potential negative impacts. A commonly voiced concern is that computational design could have the same impact upon the architecture profession that automation has wrought in other sectors of the economy, achieving dramatic efficiency and cost gains at the expense of human personnel. According to Nagy, however, members of the architectural profession need not fret about the potential impact of computational design upon their livelihoods, given the indispensable role that human beings play in making decisions with respect to style, aesthetics and the way the interior space of a building is directly experienced by occupants. At the end of the day architecture is an artistic as well as a practical discipline, and uniquely human faculties are still needed to determine if a particular design is satisfactory or pleasing from an aesthetic perspective. For Nagy generative design is about empowering architects as opposed to making them obsolete, by giving them the ability to access virtually endless design options that resolve the practical challenges of projects. "After the generative design process, architects still have to go through the suite of design options generated and look at them using their own intuition to pick the best one," said Nagy. The emergence of generative design does mean, however, that the role of the architect itself will change, and members of the profession
will need to expand their skill sets in order to take full advantage of the tremendous power offered by the latest computational tools. According to Nagy, "We talk about this technology as more of an augmented or collaborative process, and as it develops architects will need a new set of skills to take advantage of them. You’re not just designing the spaces, you’re also designing the way that you interact with the computer to explore all those different options." For this reason Nagy expects programming knowledge to become an indispensable component of the architectural skill set. "At the cusp of innovation, architects will have to know how to program, and we’re already seeing this," he said. "I teach at Columbia GSAPP, and part of the standard curriculum for all incoming architecture students is scripting and programing. You will have to engage with the computer on its own terms, because if we want to use the potential of computers to help us through a design problem we need to be able to present that problem to the computer in its own language. That’s why in future computer programming will be a fundamental skill for architects."
The Promise of Generative Design Marc K. Howe, Danil Nagy
The first shock shot of architecture happened to me when I encountered the CCTV tower in person. How come a building could be built like that, I wondered. The exterior pattern looked weirdly twisted and imbalanced, so did the massing. It could be read as a trapezoid with partial transparency, a continuous loop of chunks, a childish stack of six slightly sheared blocks that happen to be holding themselves, a block that got gorged off and bitten off from two diagonal sides, an angled cantilever that bridges between two tilted towers, a pair of squat legs having pigeon feet. It has thousands of ways to interpret. Some of these formal readings could be foresee from OMA’s early study models, some of them could not. It revealed an incredibly massive variation within its individuality. Architecture Automation Addin Cui, 2018 Instructor: Marrikka Trotter, Sanford Kwinter
48
Is the CCTV tower somehow related to automation in architecture? It is and it is not. It is on the exact contrary, in the sense that the work itself was an individual work, which was extremely specific to the owner, a “predominant state television broadcaster”[2] in an “Unitary Marxist-Leninist one-party socialist republic” [3], highly customized in structure & mechanical systems and highly distinguishable in design. The automated process within its ontogenesis might be inside the digital design softwares, as deep as primary codes and binaries. However it is extremely relative to automation, in the sense that it stood out so much from a context, the early 21th century Beijing, which contained massive amount of Russian-Chinese Bauhaus buildings, that seemed like their designs should have been fully automated, but they never actually were. This duplicative context formed contemporary urban Chinese people a stereotypical conceptual grid of “contemporary building”, and the CCTV tower was exfoliated from it. At that moment, before my actual architecture journey even started, I speculated an opportunity. If I develop a fully automated system, which generates massive amount of optimal variations from a number of known, standard and tested building formats, which are usually not interesting but economically necessary, I could put most of the architects out of job. I could kick them out of their comfort zone with their poor and fake Bauhaus, “European style” and “American style”. I could make them suffer and struggle towards somethings my optimal system could not come up with, eventually producing more boundary breaking works. It would
Collections
end up with more breathtaking projects like the CCTV, I told myself. Is a fully automated system happening? Yes. It has been possibly happening thanks to people like Gehry, Dassault and Autodesk. It was Frank Gehry who brought Dassault’s world advanced system CATIA (Computer-Aided Threedimensional Interactive Application) from aerospace engineering into architecture, which in turn develops into the concept BIM (Building Information Modeling). The monopoly Autodesk is taking BIM’s lead with Revit for now and has developed a laboratory prototype in fully generative building production. Some people are getting more provocative in it. A Chinese entrepreneur team XKool, founded by several former OMA and Google employees, is developing this system called “Project Rosetta”. Aside from providing a rather standard service of efficiency and performance based optimal algorithms that is no longer fresh in architecture, they are targeting at an automated style learning system and create copyrights of styles through neural network models. According to their release conference, “(Author translation) Project Rosetta is, to analyze and understand the specific designs and styles of an architect through deep learning, which in turn generates a unique design algorithm model, which continues to enrich and reinforce a XKool algorithm model. In fact, the Project Rosetta can tailor the algorithm model with complete copyright to the enterprises in need based on the active deep learning model of specific design styles and logic. Their design thinking
can be universally applicable in multiple projects, maximizing the designer's design value, and revitalizing those unused projects. On the platform of XKool, the user completely owns the copyright of the algorithm model, and can realize the authorization and transaction of the algorithm model. ” [5] For XKool, their ambition is to algorithmize all existing architects in their immanence from their works through a centralized deep learning model. It seems like my naive speculative idea might be outdated and the future is now. It is even further than the idea of superseding only the repetitive jobs, but taking over the entire architecture industry through a collective, live algorithm model. This is a systematic architecture service towards owners that look for building buildings, more hardcore architecture than ever. Is it an autopoiese model? No, it is not. A deep learning model is meant to be alive in its haecceity at every moment, constantly evolving itself and consuming designs from the outside. A deep learning model is fundamentally different from a reinforcement learning model that iterates within itself towards a quantified reward goal. A deep learning model relies highly on constant fresh, new and diverse external inputs. Fresh water could not get into a cup if the cup has already been filled up with water. Water has to be poured out first. A deep learning model has the ability of constantly consuming new feeds and generating new results, thanks to its dropout mechanism that constantly and randomly drops out its own neurons for a better potential performance. Everything seems like really promising for XKool.
However, an automated architecture is not going to happen very soon. Although, maybe it is able to struggle through the complexity of architecture forest, it will definitely encounter a same dilemma that almost all other automation systems have dealt or are dealing with, the social ethics. A success rate of 99.9% results in 0.1% failure, which could cause 7.7 million people’s loss within 7.7 billion earth population in 2018. Considering the length of time that architecture creations run through, the reliability of such system would have to get tested in an astronomical scale first to get to an acceptable nano failure rate. Wise monopolies would not risk all responsibility if the value that their product produces goes over their own value. In near future, no high tech firm would ever dare to announce a level 5 self driving system for the same reason. When individuals could distribute responsibility, ethics and human rights in discreet, the labor works are likely to continue no matter how tedious they are. People waving flags at entrances of parking lots in Los Angeles exist for the same reason. One single severe building failure have put a lot of people in jail, so it could put an automation system in pause. It seems like the automation architecture might always stay
under human supervision. It might always be an “Computer Aided Design� process in essence. Presume that we would develop into a higher semi automated status in design. It would still give a huge impact onto architecture. Some roles will be eliminated from the discipline. Some new roles will arise. As speculative surrealists, we need to better prepare ourselves. An artificial intelligence venture capital investor who formerly worked in silicon valley, Kaifu Lee, demonstrated the importance of creativity and compassion in social roles by a diagram. Author made an reinterpretation of his diagram for several common roles in architecture discipline. This matrix diagram is formed in quadrants by two axis. According to the original matrix, there are four different Human AI relationships based on the amount of creativity/ optimization and compassion/ none-compassion required in the roles. Architecture students in general start from the very center, wandering around in between all the possibilities. Architecture student should see this matrix as an index for speculating the problem. In quadrant III, left lower side, AI would be performing more
50
excellent works than human. Human might have to hand out the workload and responsibility to artificial intelligence corporate that would have fully optimized their products towards efficiency and performance. This is happening now, in online/ telephone customer services that handles most of the repetitive inquiries for banks and governments, in facial recognition security guards for residential buildings in China, in Tesla’s automated chained truck in development, in tumor recognition and prediction for radiology, etc. In architecture, various softwares have simplified the structural and mechanical scoops of work so much that a person without much engineering training could run finite element simulations which support their formal concepts structurally and environmentally. Moreover, good old fashion programmers of all these softwares are running into an awkward situation, that deep learning neural networks are proved to debug faster, cleaner than human programmers. In quadrant I, right upper side, human would maintain an absolute dominant control over AI according to Kaifu’s speculation, where roles require more creativity and compassion. This is the area where an ideal super architect seats, who oversees the creative process and negotiates with external forces strategically with full compassion and understanding of social, economical and political intentions through their works. In my opinion, Rem Koolhaas is one of them holding his CCTV tower. In quadrant II and IV, left upper side and right lower side, things get tricky. Most of architectural
roles are laid within these areas in discreet. A diagonal red line going through them could be considered as a rough frontier of Human AI relationship. Architecture students are standing at its very middle point. As a cutting edge architecture education institute, Southern California Institute of Architecture (SCIArc) is a palace of creativity. Hundreds of SCIArc aluminis are becoming conceptual architects, digital artists and sculptural designers, which are in the far right side of quadrant IV. Architecture’s current is a non-humanism status and SCIArc is part of this trend. It is surely a way to prepare along and against an AI integrated future of architecture, but maybe not enough. It still seems hard to imagine that anything could automated the sophisticated processes of facade assembly design, parametric system design or architectural theory evolvement on the quadrant IV. However quadrant IV was the exact same position of game Go, where various sophisticated strategies had been continuously developed by a huge portion of human through thousands of years. It was considered to be a source of wisdom, and its 19 by 19 matrix had long been mathematically proved to be astronomically uncrackable by any exhaustive methods before quantum computation. Game Go was invented roughly centuries before the Stonehenge and documented before the found of Rome city. [9] Monte Carlo method is the mathematical method behind this significant achievement. Layers of neurons samples and dropouts randomly, building on top of each other. This is actually applying
a similar philosophical approach with Finite Element method, which exfoliate finite elements from a de facto infinite continuous matter into striated representatives that are rather randomly selected between generic samples, in order to simulate a continuous form to a certain extent. If we see Greg Lynn curve as a 2 dimensional illustration of Finite Element method in reverse procedurally, the closest 3 dimensional illustration of Monte Carlo method so far might be Gilles Retsin Architects and Studio Kinch’s automated agent systems. It is even more confusing that how the quadrant II might get an impact, which requires intensive communication and primary social compassion. Autodesk and teams like XKool’s experiments are still on their way targeting executive architects. What we could speculate is the progress of Game Go after Alpha Go. Competitions of Game Go are still getting held by Asian officials, but it becomes more of an intellectual entertaining event rather than an intellectual competition towards stronger gaming strategies. Technology is serving as a practical and analytical tool in the Game Go events, evaluating each move’s quality and predicting win-loss potential in quantity. Automobiles have been much faster than human running 700 years. We all still watch Olympics from a drone today. In this case, we should see the future of architecture as a cultural practice [10] , which entertains people or more importantly moves people. We might never know what the world might look like if Autodesk and XKools’ ambition is fulfilled. The “Spanish and Italian style” houses might be replaced by various
of personally generated Venturi’s mother's houses or Gehry’s houses. The Russian-Chinese Bauhaus towers might be turned into sets of anarchitecture pieces or another set of CCTV towers. Forms and designs among them might look all different from today in 2018, but might as well be the same in immanence. The emerging apartment buildings in downtown Los Angeles might look all different from each other in the criteria of 1950, but they are all standard work in the criteria of today. The crucial relative problem would be, what an exfoliated architecture work would perform in the grid of various deep learning network models, how we should approach to it as a human being. We should keep ourselves excited towards it. Reference:
[1] Image: Architecture and Urbanism; July 2005 Special Issue; Page 200 [2] Wikipedia definition of China Central Television: https://en.wikipedia.org/wiki/China_Central_Television [3] Wikipedia definition of China in politics: https://en.wikipedia.org/wiki/China [4] GSD lecture of Eric Höweler’s; 2015: https://youtu.be/M_4hRWvQkUo [5] XKool Release Conference; translation by author; June 2017: https://mp.weixin.qq.com/s?__=MjM5MDEyNDE4MQ ==&mid=2651355035&idx=1&sn=91ea0a10ade259d93dbe1b15fb5aa856&chksm=bdb546618ac2cf77b98ff9b5e3ebd3dc bcc90b03991b261b31beccdc6edb75f0f3c51f68aa5d&scene=27#wechat_redirect [6] Collapse of Lotus Riverside Block 7; Image: http://s.nextmedia.com/apple/a.php?i=20090628&sec_id=15335&s=0&a=12927819 [7] AI Superpowers, Kaifu Lee; 2018 https://www.ted.com/talks/kai_fu_lee_how_ai_can_save_our_humanity?language=en [8] Diagram by author; 2018 [9] Wikipedia definition of Game Go: https://en.wikipedia.org/wiki/History_of_Go [10] Interview of Yaohua Wang by Gooood.cn; 2018 https://www.gooood.cn/under-35-wang-yaohua.htm [11] Photography of SCIArc 2GA Fall 2018 studio final critic by author; 2018
52
In the recent piece of writing published on Medium[1], and in the related TED talk[2], the writer and artist James Bridle maps out a particularly weird strand of contemporary online culture. Bridle looks into the world of YouTube content made for children, such as various “finger family”, toy unboxing, “surprise egg” unwrapping and nursery rhyme videos, and the abyss stares back. As it turns out, YouTube content production for young children is a highly specialized and potentially lucrative endeavor that gets made though a strange combination of human production and machine learning.
Content Policy: Something Weird, Automatically and at Scale Damjan Jovanovic
54
These videos have the capacity to mesmerize children for hours through endless repetition and interchangeability of internal parts; the main tactic is to generate of as many different outcomes as possible, from a single premise. Once a successful trope is established, an enormous amount of similar content gets made, designed to replicate and expand the content producer’s success. This produces thousands of variations, and as automation works, the content gets progressively stranger to the point of becoming obscure and eventually, non-sensical. Even the titles of these videos are nothing like the unified semantic wholes aimed to be read by a human; rather, they operate more like strings of machine-readable tags in an attempt to capture as much bot traffic as possible. Title examples include: “Surprise Play Doh Eggs Peppa Pig Stamper Cars Pocoyo Minecraft Smurfs Kinder Play Doh Sparkle Brilho,”, or “Cars Screamin’ Banshee Eats Lightning McQueen Disney Pixar,” and “Disney Baby Pop Up Pals Easter Eggs SURPRISE”. It is unclear which portion of this content is produced by automation (bots) and which by actual humans, but it is clear that some form of automation is always at play. The videos vary a great deal in quality and production value: some are clearly amateurish and yet others employ professional actors, but all of them have a particular strategy for generating clicks and attracting viewership. YouTube content producers rely on advertising for revenue, and advertising relies on visibility; this pushes the producers to put out more videos with more variety - it is a numbers game. This enormous quantity of content
requires fresh production techniques, a new kind of labor; this produces a new, strange kind of practice. Bridle writes: “This is content production in the age of algorithmic discovery — even if you’re a human, you have to end up impersonating the machine.”[3] A strange twist, fit for the age of machine learning: in order to produce more content for children, adults become childrenlike, compelled to act out and imitate a completely alien set of rules that they do not understand. Habituation’s new groove.
All of this provides a glimpse into a possible future of education. A strange world emerges where the centuries-old maturation and habituation procedures are intercepted and contaminated by content platforms through machinelearning strategies, without any supervision and outside of any public debate. For us, a crucial question is whether these new practices are reprogramming young minds to read and interpret the world differently than traditional, non-AI assisted education would. What new
forms of intelligence, new methods of interpretation and new models of engagement will emerge? It is possible that these occurrences present a first glimpse into a profound epistemic shift for design practices, as they point to the radical gap between production and interpretation within the coming regime of cognitive technologies. These machinelearning enabled practices treat all images and objects as free floating, polysemic elements to produce work which is machine-readable. Ultimately, they work toward decoupling and obfuscation the classic relationships that define all design practices: that between the process and outcome, and that between the producer and content. The decoupling happens because machine-learning procedures give no access to the underlying operations; in fact, they go one step beyond the traditional idea of a “black box” as described by Bruno Latour[4]. In this case, the process is opaque even to the engineers, and no direct connection can be made between the underlying mathematics and the surface “meaning” layer. This phenomenon points to a possible new chapter of the digital in architectural design. The early digital regime culminated with
Parametricism stakes its authority on the capacity to see, understand and deploy the ‘big picture’, a total process where mapping, translation and optimization strategies cohere, point to and ultimately (re)produce the ‘authentic real’. The parametric model is held together by this belief of the collapse (or sameness) between what the philosopher Wilfrid Sellars called the scientific image and the manifest image[5]. In the age of machine learning and automated content generation, which brings radical opacity, we can see this parametric approach clearly as a form of ‘naive realism’. One that adheres to the deep belief of causality and the possibility of clear interpretation between the algorithmic substrate and its effects.
For now, we can at least map out formal properties and conceptual implications of these phenomena. It is a beginning of an attempt to confront the situation through design. Some properties to note: an enormously large search space of non-abstract (but rather characteristic, recognizable, qualitative) elements, inherent
not, and cannot fully know. Our formal systems have finally taken off in flight, revealing their deep alien nature, their non-binding relation with the real and their capacity to usher in a new real. If a central theme of any design method lies in constructing the metaphor between the model and narrative, machine learning techniques present us with one authored by non-human agents. We should be very interested in the new possibilities of constructing metaphors that structure and describe relations between formal systems and their effects—in order to understand what kind of attitude is appropriate after the classical one has been played out. The fear is that the attitude of “not knowing” will become the standard trope of the coming machinelearning culture, bringing us back into the domain of pure exegesis and interpretation, thus ushering in a new Dark Ages where the divine is replaced by AI.
parametricism, probably the last genre to maintain the explicit, direct causal relation between the machinic process and the cultural outcome.
Those days might be over - an abyss has opened and it is staring back at the design community. Crucially, this abyss presents us not with the problem of knowledge, but of design. If this gap is the one between production and interpretation, then it can not simply be closed through expert knowledge, as even the experts do
56
replaceability and interchangeability of features, naming practices that depend on the strange art of keyword/hashtag association, tendency toward recognizable tropes. Interestingly, these practices mirror some well-known methods within contemporary design collecting, sampling and mashing – but with a crucial difference: they are, strictly speaking, noncompositional, non-visual, and dependent on tags and keywords, rather than formal ideas. What could all this stuff mean for architectural design? It is a question of education, first and foremost. There is a need for new strategies for working, new practices of engagement, new interpretative procedures and deployment tactics. In the world where “being right”
is not enough anymore and where “meaning well” amounts to almost nothing, traditional education cannot help us much if we want to lead. And one possible strategy that could start new modes of engagement is as old as humanity itself: play.
In the words of Réne Thom, the French mathematician: “Challenging the moralist’s fatalism, the player, confronting any situation, thinks that there is always something to be done […] In the final analysis, what justifies the player’s stance is the fact that the only conceivable way to expose a black box is to play with it. Every great technical and scientific success consists of a black box rendered explicit.”[6]
Reference: [1] James Bridle, “Something is Wrong on the Internet”, Medium, November 6 2017, //medium.com/@jamesbridle/something-is-wrong-onthe-internet-c39c471271d2 [2] James Bridle, “ The nightmare videos of children’s YouTube — and what's wrong with the internet today”, TED Talk, July 2018youtube.com/wa tch?reload=9&v=v9EKV2nSU8w&feature=youtu.be [3] James Bridle, “Something is Wrong on the Internet”, Medium, November 6 2017 [4] Bruno Latour , Pandora's Hope: Essays on the Reality of Science Studies. (Cambridge, Massachusetts: Harvard University Press), 304. [5] Wilfrid S. Sellars , Philosophy and the Scientific Image of Man, p. 35-78. For a longer discussion about this in the realm of simulations, see Luciana Parisi, Simulations, in Ian Cheng Live Simulation Exhibition. [6] Rene Thom,” At the Boundaries of Man's Power: Play”, p.12-13
Credits: Metahaven. (2019). PSYOP. Amsterdam, Neatherland: Stedelijk Museum Amsterdam Cohen, J., Olano, M., & Manocha, D. (1998, July). Appearance-preserving simplification. In Proceedings of the 25th annual conference on Computer graphics and interactive techniques (pp. 115-122). ACM. Gatys, L. A., Ecker, A. S., & Bethge, M. (2015). A neural algorithm of artistic style. arXiv preprint arXiv:1508.06576. Wu, J., Zhang, C., Xue, T., Freeman, B., & Tenenbaum, J. (2016). Learning a probabilistic latent space of object shapes via 3d generative-adversarial modeling. In Advances in neural information processing systems (pp. 8290). Zhu, J. Y., Park, T., Isola, P., & Efros, A. A. (2017). Unpaired imageto-image translation using cycle-consistent adversarial networks. In Proceedings of the IEEE International Conference on Computer Vision (pp. 2223-2232). Ha, D., & Eck, D. (2017). A neural representation of sketch drawings. arXiv preprint arXiv:1704.03477. Howe, M. K. (2017, April 5). The Promise of Generative Design. WorldArchitects, Retrieved from https://www.world-architects.com/ Fridman, L., Terwilliger, J., & Jenik, B. (2018). DeepTraffic: Crowdsourced Hyperparameter Tuning of Deep Reinforcement Learning Systems for Multi-Agent Dense Traffic Navigation. arXiv preprint arXiv:1801.02805. Li, Z., & Snavely, N. (2018). MegaDepth: Learning single-view depth prediction from internet photos. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (pp. 2041-2050). Brock, A., Donahue, J., & Simonyan, K. (2018). Large scale gan training for high fidelity natural image synthesis. arXiv preprint arXiv:1809.11096.
58
Ruy, D. (2018). Observing AI. CLOG x Artificial Intelligence. New York, NY: CLOG. Jovanovic, D. (2018). Content Policy: Something Weird, Automatically and at Scale. Offramp 15: Stuff. Los Angeles, CA: SCI-Arc. Du, Y., Mordatch, I. (2019). Implicit Generation and Generalization in Energy-Based Models. arXiv preprint arXiv:1903.08689
Special Thanks to: Damjan Jovanivic Erik Ghenoiu David Ruy Maxime Lefebvre Marrikka Trotter Sanford Kwinter M. Casey Rehm Benjamin H. Bratton Devyn Weiser Andrew Chittenden Benjamin Weisgall Talbot Schmidt Daniel Arismendys Runze Zhang Tamara Glants Vincent Yung Sijin He
Author: Zhongding Addin Cui Date: April 6th 2019 Location: Southern California Institute of Architecture, Los Angeles, CA
59