Wilhelm Conrad Röntgen (/ˈrɛntɡən, -dʒən, ˈrʌnt-/; German pronunciation: [ˈvɪlhɛlm ˈʁœntɡən] (audio speaker iconlisten); 27 March 1845 – 10 February 1923) was a German mechanical engineer and physicist, who, on 8 November 1895, produced and detected electromagnetic radiation in a wavelength range known as X-rays or Röntgen rays, an achievement that earned him the inaugural Nobel Prize in Physics in 1901. In honour of Röntgen's accomplishments, in 2004 the International Union of Pure and Applied Chemistry (IUPAC) named element 111, roentgenium, a radioactive element with multiple unstable isotopes, after him. The unit of measurement roentgen was also named after him.
He was born to Friedrich Conrad Röntgen, a German merchant and cloth manufacturer, and Charlotte Constanze Frowein. At age three his family moved to Holland where his family lived. Röntgen attended high school at Utrecht Technical School in Utrecht, Netherlands. He followed courses at the Technical School for almost two years. In 1865, he was unfairly expelled from high school when one of his teachers intercepted a caricature of one of the teachers, which was drawn by someone else.
Without a high school diploma, Röntgen could only attend university in the Netherlands as a visitor. In 1865, he tried to attend Utrecht University without having the necessary credentials required for a regular student. Upon hearing that he could enter the Federal Polytechnic Institute in Zurich (today known as the ETH Zurich), he passed the entrance examination and began studies there as a student of mechanical engineering. In 1869, he graduated with a PhD from the University of Zurich; once there, he became a favorite student of Professor August Kundt, whom he followed to the newly founded German Kaiser-Wilhelms-Universität in Strasbourg.
In 1874, Röntgen became a lecturer at the University of Strasbourg. In 1875, he became a professor at the Academy of Agriculture at Hohenheim, Württemberg. He returned to Strasbourg as a professor of physics in 1876, and in 1879, he was appointed to the chair of physics at the University of Giessen. In 1888, he obtained the physics chair at the University of Würzburg, and in 1900 at the University of Munich, by special request of the Bavarian government.
Röntgen had family in Iowa in the United States and planned to emigrate. He accepted an appointment at Columbia University in New York City and bought transatlantic tickets, before the outbreak of World War I changed his plans. He remained in Munich for the rest of his career.
During 1895, at his laboratory in the Würzburg Physical Institute of the University of Würzburg, Röntgen was investigating the external effects from the various types of vacuum tube equipment—apparatuses from Heinrich Hertz, Johann Hittorf, William Crookes, Nikola Tesla and Philipp von Lenard—when an electrical discharge is passed through them. In early November, he was repeating an experiment with one of Lenard's tubes in which a thin aluminium window had been added to permit the cathode rays to exit the tube but a cardboard covering was added to protect the aluminium from damage by the strong electrostatic field that produces the cathode rays. Röntgen knew that the cardboard covering prevented light from escaping, yet he observed that the invisible cathode rays caused a fluorescent effect on a small cardboard screen painted with barium platinocyanide when it was placed close to the aluminium window. It occurred to Röntgen that the Crookes–Hittorf tube, which had a much thicker glass wall than the Lenard tube, might also cause this fluorescent effect.
In the late afternoon of 8 November 1895, Röntgen was determined to test his idea. He carefully constructed a black cardboard covering similar to the one he had used on the Lenard tube. He covered the Crookes–Hittorf tube with the cardboard and attached electrodes to a Ruhmkorff coil to generate an electrostatic charge. Before setting up the barium platinocyanide screen to test his idea, Röntgen darkened the room to test the opacity of his cardboard cover. As he passed the Ruhmkorff coil charge through the tube, he determined that the cover was light-tight and turned to prepare the next step of the experiment. It was at this point that Röntgen noticed a faint shimmering from a bench a few feet away from the tube. To be sure, he tried several more discharges and saw the same shimmering each time. Striking a match, he discovered the shimmering had come from the location of the barium platinocyanide screen he had been intending to use next.
Röntgen speculated that a new kind of ray might be responsible. 8 November was a Friday, so he took advantage of the weekend to repeat his experiments and made his first notes. In the following weeks, he ate and slept in his laboratory as he investigated many properties of the new rays he temporarily termed "X-rays", using the mathematical designation ("X") for something unknown. The new rays came to bear his name in many languages as "Röntgen rays" (and the associated X-ray radiograms as "Röntgenograms").
At one point while he was investigating the ability of various materials to stop the rays, Röntgen brought a small piece of lead into position while a discharge was occurring. Röntgen thus saw the first radiographic image: his own flickering ghostly skeleton on the barium platinocyanide screen. He later reported that it was at this point that he decided to continue his experiments in secrecy, fearing for his professional reputation if his observations were in error.
About six weeks after his discovery, he took a picture—a radiograph—using X-rays of his wife Anna Bertha's hand. When she saw her skeleton she exclaimed "I have seen my death!"[ He later took a better picture of his friend Albert von Kölliker's hand at a public lecture.
Röntgen's original paper, "On A New Kind of Rays" (Ueber eine neue Art von Strahlen), was published on 28 December 1895. On 5 January 1896, an Austrian newspaper reported Röntgen's discovery of a new type of radiation. Röntgen was awarded an honorary Doctor of Medicine degree from the University of Würzburg after his discovery. He also received the Rumford Medal of the British Royal Society in 1896, jointly with Philipp Lenard, who had already shown that a portion of the cathode rays could pass through a thin film of a metal such as aluminium. Röntgen published a total of three papers on X-rays between 1895 and 1897. Today, Röntgen is considered the father of diagnostic radiology, the medical speciality which uses imaging to diagnose disease.
A collection of his papers is held at the National Library of Medicine in Bethesda, Maryland.
Röntgen was married to Anna Bertha Ludwig for 47 years until her death in 1919 at age 80. In 1866 they met in Zürich at Anna's father's café, Zum Grünen Glas. They got engaged in 1869 and wed in Apeldoorn, Netherlands on 7 July 1872; the delay was due to Anna being six years Wilhelm's senior and his father not approving of her age or humble background. Their marriage began with financial difficulties as family support from Röntgen had ceased. They raised one child, Josephine Bertha Ludwig, whom they adopted at age 6 after her father, Anna's only brother, died in 1887.
He inherited two million Reichsmarks after his father's death. For ethical reasons, Röntgen did not seek patents for his discoveries, holding the view that it should be publicly available without charge. After receiving his Nobel prize money, Röntgen donated the 50,000 Swedish krona to research at the University of Würzburg. Although he accepted the honorary degree of Doctor of Medicine, he rejected an offer of lower nobility, or Niederer Adelstitel, denying the preposition von (meaning "of") as a nobiliary particle (i.e., von Röntgen). With the inflation following World War I, Röntgen fell into bankruptcy, spending his final years at his country home at Weilheim, near Munich. Röntgen died on 10 February 1923 from carcinoma of the intestine, also known as colorectal cancer. In keeping with his will, all his personal and scientific correspondence were destroyed upon his death.
In 1901, Röntgen was awarded the first Nobel Prize in Physics. The award was officially "in recognition of the extraordinary services he has rendered by the discovery of the remarkable rays subsequently named after him". Röntgen donated the 50,000 Swedish krona reward from his Nobel Prize to research at his university, the University of Würzburg. Like Marie and Pierre Curie, Röntgen refused to take out patents related to his discovery of X-rays, as he wanted society as a whole to benefit from practical applications of the phenomenon. Röntgen was also awarded Barnard Medal for Meritorious Service to Science in 1900.
His honors include:
Rumford Medal (1896)
Matteucci Medal (1896)
Elliott Cresson Medal (1897)
Nobel Prize for Physics (1901)
In November 2004 IUPAC named element number 111 roentgenium (Rg) in his honour. IUPAP adopted the name in November 2011.
In 1907 he became a foreign member of the Royal Netherlands Academy of Arts and Sciences.
Today, in Remscheid-Lennep, 40 kilometres east of Röntgen's birthplace in Düsseldorf, is the Deutsches Röntgen-Museum.
In Würzburg, where he discovered X-rays, a non-profit organization maintains his laboratory and provides guided tours to the Röntgen Memorial Site.
World Radiography Day: World Radiography Day is an annual event promoting the role of medical imaging in modern healthcare. It is celebrated on 8 November each year, coinciding with the anniversary of the Röntgen's discovery. It was first introduced in 2012 as a joint initiative between the European Society of Radiology, the Radiological Society of North America, and the American College of Radiology.
Röntgen Peak in Antarctica is named after Wilhelm Röntgen.
Minor planet 6401 Roentgen is named after him.
November 8, 1895
Produced and detected electromagnetic radiation in a wavelength range known as X-rays
An X-ray, or, much less commonly, X-radiation, is a penetrating form of high-energy electromagnetic radiation. Most X-rays have a wavelength ranging from 10 picometers to 10 nanometers, corresponding to frequencies in the range 30 petahertz to 30 exahertz (30×1015 Hz to 30×1018 Hz) and energies in the range 145eV to 124 keV. X-ray wavelengths are shorter than those of UV rays and typically longer than those of gamma rays. In many languages, X-radiation is referred to as Röntgen radiation, after the German scientist Wilhelm Conrad Röntgen, who discovered it on November 8, 1895. He named it X-radiation to signify an unknown type of radiation. Spellings of X-ray(s) in English include the variants x-ray(s), xray(s), and X ray(s).
Before their discovery in 1895, X-rays were just a type of unidentified radiation emanating from experimental discharge tubes. They were noticed by scientists investigating cathode rays produced by such tubes, which are energetic electron beams that were first observed in 1869. Many of the early Crookes tubes (invented around 1875) undoubtedly radiated X-rays, because early researchers noticed effects that were attributable to them, as detailed below. Crookes tubes created free electrons by ionization of the residual air in the tube by a high DC voltage of anywhere between a few kilovolts and 100 kV. This voltage accelerated the electrons coming from the cathode to a high enough velocity that they created X-rays when they struck the anode or the glass wall of the tube.
The earliest experimenter thought to have (unknowingly) produced X-rays was actually William Morgan. In 1785, he presented a paper to the Royal Society of London describing the effects of passing electrical currents through a partially evacuated glass tube, producing a glow created by X-rays. This work was further explored by Humphry Davy and his assistant Michael Faraday.
When Stanford University physics professor Fernando Sanford created his "electric photography", he also unknowingly generated and detected X-rays. From 1886 to 1888, he had studied in the Hermann Helmholtz laboratory in Berlin, where he became familiar with the cathode rays generated in vacuum tubes when a voltage was applied across separate electrodes, as previously studied by Heinrich Hertz and Philipp Lenard. His letter of January 6, 1893 (describing his discovery as "electric photography") to The Physical Review was duly published and an article entitled Without Lens or Light, Photographs Taken With Plate and Object in Darkness appeared in the San Francisco Examiner.
Starting in 1888, Philipp Lenard conducted experiments to see whether cathode rays could pass out of the Crookes tube into the air. He built a Crookes tube with a "window" at the end made of thin aluminium, facing the cathode so the cathode rays would strike it (later called a "Lenard tube"). He found that something came through, that would expose photographic plates and cause fluorescence. He measured the penetrating power of these rays through various materials. It has been suggested that at least some of these "Lenard rays" were actually X-rays.
In 1889, Ukrainian-born Ivan Puluj, a lecturer in experimental physics at the Prague Polytechnic who since 1877 had been constructing various designs of gas-filled tubes to investigate their properties, published a paper on how sealed photographic plates became dark when exposed to the emanations from the tubes.
Hermann von Helmholtz formulated mathematical equations for X-rays. He postulated a dispersion theory before Röntgen made his discovery and announcement. He based it on the electromagnetic theory of light.However, he did not work with actual X-rays.
In 1894, Nikola Tesla noticed damaged film in his lab that seemed to be associated with Crookes tube experiments and began investigating this invisible, radiant energy. After Röntgen identified the X-ray, Tesla began making X-ray images of his own using high voltages and tubes of his own design, as well as Crookes tubes.
On November 8, 1895, German physics professor Wilhelm Röntgen stumbled on X-rays while experimenting with Lenard tubes and Crookes tubes and began studying them. He wrote an initial report "On a new kind of ray: A preliminary communication" and on December 28, 1895, submitted it to Würzburg's Physical-Medical Society journal. This was the first paper written on X-rays. Röntgen referred to the radiation as "X", to indicate that it was an unknown type of radiation. The name stuck, although (over Röntgen's great objections) many of his colleagues suggested calling them Röntgen rays. They are still referred to as such in many languages, including German, Hungarian, Ukrainian, Danish, Polish, Bulgarian, Swedish, Finnish, Estonian, Turkish, Russian, Latvian, Lithuanian, Japanese, Dutch, Georgian, Hebrew, and Norwegian. Röntgen received the first Nobel Prize in Physics for his discovery.
There are conflicting accounts of his discovery because Röntgen had his lab notes burned after his death, but this is a likely reconstruction by his biographers: Röntgen was investigating cathode rays from a Crookes tube which he had wrapped in black cardboard so that the visible light from the tube would not interfere, using a fluorescent screen painted with barium platinocyanide. He noticed a faint green glow from the screen, about 1 meter (3.3 ft) away. Röntgen realized some invisible rays coming from the tube were passing through the cardboard to make the screen glow. He found they could also pass through books and papers on his desk. Röntgen threw himself into investigating these unknown rays systematically. Two months after his initial discovery, he published his paper.
Röntgen discovered their medical use when he made a picture of his wife's hand on a photographic plate formed due to X-rays. The photograph of his wife's hand was the first photograph of a human body part using X-rays. When she saw the picture, she said "I have seen my death."
The discovery of X-rays stimulated a veritable sensation. Röntgen's biographer Otto Glasser estimated that, in 1896 alone, as many as 49 essays and 1044 articles about the new rays were published. This was probably a conservative estimate, if one considers that nearly every paper around the world extensively reported about the new discovery, with a magazine such as Science dedicating as many as 23 articles to it in that year alone. Sensationalist reactions to the new discovery included publications linking the new kind of rays to occult and paranormal theories, such as telepathy.
Röntgen immediately noticed X-rays could have medical applications. Along with his 28 December Physical-Medical Society submission, he sent a letter to physicians he knew around Europe (January 1, 1896). News (and the creation of "shadowgrams") spread rapidly with Scottish electrical engineer Alan Archibald Campbell-Swinton being the first after Röntgen to create an X-ray (of a hand). Through February, there were 46 experimenters taking up the technique in North America alone.
The first use of X-rays under clinical conditions was by John Hall-Edwards in Birmingham, England on 11 January 1896, when he radiographed a needle stuck in the hand of an associate. On February 14, 1896, Hall-Edwards was also the first to use X-rays in a surgical operation.
Images by James Green, from "Sciagraphs of British Batrachians and Reptiles" (1897), featuring (from left) Rana esculenta (now Pelophylax lessonae), Lacerta vivipara (now Zootoca vivipara), and Lacerta agilis
In early 1896, several weeks after Röntgen's discovery, Ivan Romanovich Tarkhanov irradiated frogs and insects with X-rays, concluding that the rays "not only photograph, but also affect the living function".[28] At around the same time, the zoological illustrator James Green began to use X-rays to examine fragile specimens. George Albert Boulenger first mentioned this work in a paper he delivered before the Zoological Society of London in May 1896. The book Sciagraphs of British Batrachians and Reptiles (sciagraph is an obsolete name for an X-ray photograph), by Green and James H. Gardiner, with a foreword by Boulenger, was published in 1897.
The first medical X-ray made in the United States was obtained using a discharge tube of Pului's design. In January 1896, on reading of Röntgen's discovery, Frank Austin of Dartmouth College tested all of the discharge tubes in the physics laboratory and found that only the Pului tube produced X-rays. This was a result of Pului's inclusion of an oblique "target" of mica, used for holding samples of fluorescent material, within the tube. On 3 February 1896, Gilman Frost, professor of medicine at the college, and his brother Edwin Frost, professor of physics, exposed the wrist of Eddie McCarthy, whom Gilman had treated some weeks earlier for a fracture, to the X-rays and collected the resulting image of the broken bone on gelatin photographic plates obtained from Howard Langill, a local photographer also interested in Röntgen's work.
Many experimenters, including Röntgen himself in his original experiments, came up with methods to view X-ray images "live" using some form of luminescent screen. Röntgen used a screen coated with barium platinocyanide. On February 5, 1896, live imaging devices were developed by both Italian scientist Enrico Salvioni (his "cryptoscope") and Professor McGie of Princeton University (his "Skiascope"), both using barium platinocyanide. American inventor Thomas Edison started research soon after Röntgen's discovery and investigated materials' ability to fluoresce when exposed to X-rays, finding that calcium tungstate was the most effective substance. In May 1896, he developed the first mass-produced live imaging device, his "Vitascope", later called the fluoroscope, which became the standard for medical X-ray examinations. Edison dropped X-ray research around 1903, before the death of Clarence Madison Dally, one of his glassblowers. Dally had a habit of testing X-ray tubes on his own hands, developing a cancer in them so tenacious that both arms were amputated in a futile attempt to save his life; in 1904, he became the first known death attributed to X-ray exposure. During the time the fluoroscope was being developed, Serbian American physicist Mihajlo Pupin, using a calcium tungstate screen developed by Edison, found that using a fluorescent screen decreased the exposure time it took to create an X-ray for medical imaging from an hour to a few minutes.
In 1901, U.S. President William McKinley was shot twice in an assassination attempt. While one bullet only grazed his sternum, another had lodged somewhere deep inside his abdomen and could not be found. A worried McKinley aide sent word to inventor Thomas Edison to rush an X-ray machine to Buffalo to find the stray bullet. It arrived but was not used. While the shooting itself had not been lethal, gangrene had developed along the path of the bullet, and McKinley died of septic shock due to bacterial infection six days later.
With the widespread experimentation with X‑rays after their discovery in 1895 by scientists, physicians, and inventors came many stories of burns, hair loss, and worse in technical journals of the time. In February 1896, Professor John Daniel and Dr. William Lofland Dudley of Vanderbilt University reported hair loss after Dr. Dudley was X-rayed. A child who had been shot in the head was brought to the Vanderbilt laboratory in 1896. Before trying to find the bullet, an experiment was attempted, for which Dudley "with his characteristic devotion to science" volunteered. Daniel reported that 21 days after taking a picture of Dudley's skull (with an exposure time of one hour), he noticed a bald spot 5 centimeters (2 in) in diameter on the part of his head nearest the X-ray tube: "A plate holder with the plates towards the side of the skull was fastened and a coin placed between the skull and the head. The tube was fastened at the other side at a distance of one-half inch [1.3 cm] from the hair."
In August 1896, Dr. HD. Hawks, a graduate of Columbia College, suffered severe hand and chest burns from an X-ray demonstration. It was reported in Electrical Review and led to many other reports of problems associated with X-rays being sent in to the publication. Many experimenters including Elihu Thomson at Edison's lab, William J. Morton, and Nikola Tesla also reported burns. Elihu Thomson deliberately exposed a finger to an X-ray tube over a period of time and suffered pain, swelling, and blistering. Other effects were sometimes blamed for the damage including ultraviolet rays and (according to Tesla) ozone. Many physicians claimed there were no effects from X-ray exposure at all. On August 3, 1905, in San Francisco, California, Elizabeth Fleischman, an American X-ray pioneer, died from complications as a result of her work with X-rays.
Hall-Edwards developed a cancer (then called X-ray dermatitis) sufficiently advanced by 1904 to cause him to write papers and give public addresses on the dangers of X-rays. He lost his personal battle and his left arm had to be amputated at the elbow in 1908, and four fingers on his right arm soon thereafter, leaving only a thumb. He died of cancer in 1926. His left hand is kept at Birmingham University.
The many applications of X-rays immediately generated enormous interest. Workshops began making specialized versions of Crookes tubes for generating X-rays and these first-generation cold cathode or Crookes X-ray tubes were used until about 1920.
A typical early 20th century medical X-ray system consisted of a Ruhmkorff coil connected to a cold cathode Crookes X-ray tube. A spark gap was typically connected to the high voltage side in parallel to the tube and used for diagnostic purposes. The spark gap allowed detecting the polarity of the sparks, measuring voltage by the length of the sparks thus determining the "hardness" of the vacuum of the tube, and it provided a load in the event the X-ray tube was disconnected. To detect the hardness of the tube, the spark gap was initially opened to the widest setting. While the coil was operating, the operator reduced the gap until sparks began to appear. A tube in which the spark gap began to spark at around 6.4 centimeters (2.5 in) was considered soft (low vacuum) and suitable for thin body parts such as hands and arms. A 13-centimeter (5 in) spark indicated the tube was suitable for shoulders and knees. An 18-to-23-centimeter (7 to 9 in) spark would indicate a higher vacuum suitable for imaging the abdomen of larger individuals. Since the spark gap was connected in parallel to the tube, the spark gap had to be opened until the sparking ceased in order to operate the tube for imaging. Exposure time for photographic plates was around half a minute for a hand to a couple of minutes for a thorax. The plates may have a small addition of fluorescent salt to reduce exposure times.
Crookes tubes were unreliable. They had to contain a small quantity of gas (invariably air) as a current will not flow in such a tube if they are fully evacuated. However, as time passed, the X-rays caused the glass to absorb the gas, causing the tube to generate "harder" X-rays until it soon stopped operating. Larger and more frequently used tubes were provided with devices for restoring the air, known as "softeners". These often took the form of a small side tube that contained a small piece of mica, a mineral that traps relatively large quantities of air within its structure. A small electrical heater heated the mica, causing it to release a small amount of air, thus restoring the tube's efficiency. However, the mica had a limited life, and the restoration process was difficult to control.
In 1904, John Ambrose Fleming invented the thermionic diode, the first kind of vacuum tube. This used a hot cathode that caused an electric current to flow in a vacuum. This idea was quickly applied to X-ray tubes, and hence heated-cathode X-ray tubes, called "Coolidge tubes", completely replaced the troublesome cold cathode tubes by about 1920.
In about 1906, the physicist Charles Barkla discovered that X-rays could be scattered by gases, and that each element had a characteristic X-ray spectrum. He won the 1917 Nobel Prize in Physics for this discovery.
In 1912, Max von Laue, Paul Knipping, and Walter Friedrich first observed the diffraction of X-rays by crystals. This discovery, along with the early work of Paul Peter Ewald, William Henry Bragg, and William Lawrence Bragg, gave birth to the field of X-ray crystallography.
In 1913, Henry Moseley performed crystallography experiments with X-rays emanating from various metals and formulated Moseley's law which relates the frequency of the X-rays to the atomic number of the metal.
The Coolidge X-ray tube was invented the same year by William D. Coolidge. It made possible the continuous emissions of X-rays. Modern X-ray tubes are based on this design, often employing the use of rotating targets which allow for significantly higher heat dissipation than static targets, further allowing higher quantity X-ray output for use in high powered applications such as rotational CT scanners.
The use of X-rays for medical purposes (which developed into the field of radiation therapy) was pioneered by Major John Hall-Edwards in Birmingham, England. Then in 1908, he had to have his left arm amputated because of the spread of X-ray dermatitis on his arm.
Medical science also used the motion picture to study human physiology. In 1913, a motion picture was made in Detroit showing a hard-boiled egg inside a human stomach. This early X-ray movie was recorded at a rate of one still image every four seconds. Dr Lewis Gregory Cole of New York was a pioneer of the technique, which he called "serial radiography". In 1918, X-rays were used in association with motion picture cameras to capture the human skeleton in motion. In 1920, it was used to record the movements of tongue and teeth in the study of languages by the Institute of Phonetics in England.
In 1914, Marie Curie developed radiological cars to support soldiers injured in World War I. The cars would allow for rapid X-ray imaging of wounded soldiers so battlefield surgeons could quickly and more accurately operate.
From the early 1920s through to the 1950s, X-ray machines were developed to assist in the fitting of shoes and were sold to commercial shoe stores. Concerns regarding the impact of frequent or poorly controlled use were expressed in the 1950s, leading to the practice's eventual end that decade.
The X-ray microscope was developed during the 1950s.
The Chandra X-ray Observatory, launched on July 23, 1999, has been allowing the exploration of the very violent processes in the universe which produce X-rays. Unlike visible light, which gives a relatively stable view of the universe, the X-ray universe is unstable. It features stars being torn apart by black holes, galactic collisions, and novae, and neutron stars that build up layers of plasma that then explode into space.
An X-ray laser device was proposed as part of the Reagan Administration's Strategic Defense Initiative in the 1980s, but the only test of the device (a sort of laser "blaster" or death ray, powered by a thermonuclear explosion) gave inconclusive results. For technical and political reasons, the overall project (including the X-ray laser) was defunded (though was later revived by the second Bush Administration as National Missile Defense using different technologies).
Phase-contrast X-ray imaging refers to a variety of techniques that use phase information of a coherent X-ray beam to image soft tissues. It has become an important method for visualizing cellular and histological structures in a wide range of biological and medical studies. There are several technologies being used for X-ray phase-contrast imaging, all utilizing different principles to convert phase variations in the X-rays emerging from an object into intensity variations. These include propagation-based phase contrast, Talbot interferometry, refraction-enhanced imaging, and X-ray interferometry. These methods provide higher contrast compared to normal absorption-contrast X-ray imaging, making it possible to see smaller details. A disadvantage is that these methods require more sophisticated equipment, such as synchrotron or microfocus X-ray sources, X-ray optics, and high resolution X-ray detectors.
(Russian: Я́ндекс.Ло́нчер) is a free GUIGUI for organizing the workspace on AndroidAndroid smartphones.
According to The Next Web, one of the main distinguishing features of Yandex Launcher is the built-in recommendation service recommendation service. Machine learningMachine learning technology provides the basis of the recommendation service, with which Launcher selects apps, games, videos and other forms of content that might interest the user. The key elements of Launcher are the contentcontent feedfeed of personal recommendations by Yandex ZenYandex Zen, as well as a system of recommended apps; both elements are built into Launcher and analyze the user's favorite websiteswebsites and other aspects of their behavior with the aim of creating a unique model of the user's preferences.
Partners of Yandex.News (whose publications are aggregated in service) are both polythematicpolythematic in nature.
As of August 2016 Yandex.News had around 6700 partners.
For example, as of July 2019 InterfaxInterfax had 35.9% of visitors from Yandex.News.
The search technology provides local searchlocal search results in more than 1,400 cities. Yandex Search also features “parallel” search that presents results from both main web index and specialized information resources, including news, shopping, blogs, images and videos on a single page.
Yandex Search is responsive to real-time queries, recognizing when a query requires the most current information, such as breaking news or the most recent post on TwitterTwitter on a particular topic. It also contains some additional features: Wizard Answer, which provides additional information (for example, sports results), spell checkerspell checker, autocompleteautocomplete which suggests queries as-you-type, antivirus that detects malwaremalware on webpages and so on.
In May 2010, Yandex launched Yandex.com, a platform for beta testingbeta testing and improving non-Russian language search.
In 2009, Yandex launched MatrixNet, a new method of machine learning that significantly improves the relevance of search results. It allows the Yandex’ search engine to take into account a very large number of factors when it makes the decision about relevancy of search results.
Another technology, Spectrum, was launched in 2010. It allows inferring implicit queries and returning matching search results. The system automatically analyses users' searches and identifies objects like personal names, films or cars. Proportions of the search results responding to different user intents are based on the user demand for these results.
With the first release on July 21, 2017, BraveBrave web browser features Yandex as one of its default search engines.
The search engine consists of three main components:
The search engine is also able to index text inside Shockwave Flash objects (if the text is not placed on the image itself), if these elements are transferred as a separate page, which has the MIME type application/x-shockwave-flash , and files with the extension .swf
Yandex has 2 scanning robots - the “main” and the “fast”. The first is responsible for the whole Internet, the second indexes sites with frequently changing and updating information (news sites and news agencies). In 2010, the “fast” robot received a new technology called “Orange”, developed jointly by the California and Moscow divisions of Yandex.
Since 2009, Yandex has supported Sitemaps technology.
In the server logs, Yandex robots are represented as follows:
Mozilla/5.0 (compatible; YandexAddurl/2.0) - is a search robot hat indexes pages through the "Add URLURL" form.
Yandex, automatically, along with the original “exact form” of the query, searches for its various variations and formulations.
The Yandex search takes into account the morphology of the Russian language, therefore, regardless of the form of the word in the search query, the search will be performed for all word forms. If morphological analysismorphological analysis is undesirable, you can put an exclamation mark (!) Before the word - the search in this case will show only the specific form of the word. In addition, the search query practically does not take into account the so-called stop-wordsstop-words, that is, prepositionsprepositions, punctuationpunctuation, pronouns, etc., due to their wide distribution
As a rule, abbreviations are automatically disclosing, spelling is correcting. It also searches for synonyms (mobile - cellular). The extension of the original user request depends on the context. Expansion does not occur when a set of highly specialized terms, names of proper names of companies (for example, OJSC “Hippo” - OJSC “Hippopotamus”), adding the word “price”, in exact quotes (these are queries highlighted with typewriter quotes).
Search results for each user are formed individually based on their location, language of a query, interests and preferences based on the results of previous and current search sessions. However, the key factor in ranking search results is their relevance to the search query. Relevance is determined based on a ranking formula, which is constantly updated based on machine learning algorithms.
The search is performed in Russian, EnglishEnglish, FrenchFrench, GermanGerman, UkrainianUkrainian, BelarusianBelarusian, TatarTatar, KazakhKazakh.
The page with the search results consists of 10 links with short annotations - “snippets”. The snippets includes a text comment, link, address, popular sections of the site, pages on social networks, etc. As an alternative to snippets, Yandex introduced in 2014 a new interface called “Islands”.
Yandex implements the “parallel searches” mechanism, when together with a web search, a search is performed on Yandex services, such as Catalog, News, Market, Encyclopedias, Images, etc. As a result, in response to a user’s request, the system shows not only textual information, but also links to video files, pictures, dictionary entries, etc.
A distinctive feature of the search engine is also the technology of "intent search" that mean a search for solving a problem. Intent search elements are - dialog prompts in case of ambiguous request, automatic text translation, information about the characteristics of the requested car, etc. For example, when you request “Boris GrebenshchikovBoris Grebenshchikov - Golden City”, the system will show a form for online listening to music from the Yandex MusicYandex Music service, at the request of "st. Koroleva 12 " will be shown a fragment of the mapmap with the marked object on it.
In 2013, Yandex was considered by some to be the safest search engine at the time and the third most secure among all web resources. By 2016, Yandex had slipped down to third with Google being first.
Checking web pages and warning users appeared on Yandex in 2009: since then, on the search results page, next to a dangerous site there is a note “This site may threaten the security of your computer”. Two technologies at once are used to detect threats. The first was purchased from the American antivirus Sophos and based on a signature approach: that means, when accessing a web page, the antivirusantivirus system also accesses a database of already known viruses and malware. This approach is fast, but practically powerless against new viruses that have not yet entered the database. Therefore, Yandex along with the signature also uses its own antivirus complex, based on an analysis of the behavioral factor. The Yandex program, when accessing the site, checks whether the latter requested additional files from the browser, redirected it to an extraneous resource, etc. Thus, if information is received that the site begins to perform certain actions (cascading style sheets, JavaScriptJavaScript modules are launched and complete programs) without user permission, it is placed in the “black list” and in the database of virus signatures. Information about the infection of the site appears in the search results, and through the Yandex.Webmaster service the owner of the site receives a notification. After the first check, Yandex does the second, and if the infection information is confirmed a second time, the checks will be more frequent until the threat is eliminated. The total number of infected sites in the Yandex database does not exceed 1%.
Every day in 2013, Yandex checks 23 million web pages (while detecting 4,300 dangerous sites) and shows users 8 million warnings.[23] Approximately one billion sites are checked monthly.
For a long time, the key ranking factor for Yandex was the number of third-party links to a particular site. Each page on the Internet was assigned a unique citation index, similar to the index for authors of scientific articles: the more links, the better. A similar mechanism was implemented in the Yandex and in the Google’s PageRankPageRank. In order to prevent cheatingcheating, Yandex uses multivariate analysis, in which only 70 of the 800 factors are affected by the number of third-party links. Today, the content of the site and the presence or absence of keywords there, the ease of reading the text, the name of the domain, its history and the presence of multimedia content play a much greater role.
On December 5, 2013, Yandex announced a complete refusal of accounting the link factor in the future.
As the user types the query in the search bar, the search engine offers hints in the form of a drop-down list. Hints appear even before the search results appears and allow you to refine the query, correct the layout or typo, or go directly to the site you are looking for. For each user, hints are generated, including on the history of his search queries (My Finds service). In 2012, the so-called “Smart Search Hints” appeared, which instantly give out information about the main constants (equator length, speed of light, and so on), traffic jams, and have a built-in calculator. In addition, a translator was integrated in the “Hints” (the query “love in French” instantly gives out amour, affection ), the schedule and results of football matches, exchange rates, weather forecasts and more. You can find out the exact time by asking "what time is it." In 2011, Hints in the search for Yandex became completely local to 83 regions of Russia.
In addition to the actual search, Hints are built into Yandex search engines. Dictionaries ”,“ Yandex. MarketYandex. Market ”,“ Yandex. MapsYandex. Maps "and other Yandex services.
The hint function is a consequence of the development of the technology of intent search and first appeared on Yandex.Bar in August 2007, and in October 2008 it was introduced on the main page of the search engine. Available both in the desktop and mobile versions of the site, Yandex shows its users more than a billion search hints per day
According to media expert Mikhail Gurevich, Yandex is a “national treasure”, a “strategic product”.
This fact was also recognized in the State DumaState Duma of the Russian FederationRussian Federation, where in May 2012 a bill appeared in which Yandex and VKontakteVKontakte are recognized by strategic enterprises as national information translators. In 2009, President of Russia Dmitry MedvedevDmitry Medvedev initiated the purchase of a “golden sharegolden share” of YandexYandex by SberbankSberbank in order to avoid an important nationwide company falling into foreign hands.
In 2012, Yandex overtook Channel OneChannel One in terms of daily audience, which made the Yandex a leader in the domestic media market.] In 2013, Yandex confirmed this status, overtaking First in terms of revenue.
In 2008, Yandex was the ninth search engine in the world, in 2009 the seventh, and in 2013 the fourth.
One of the components of this situation is the presence in Russia of a sufficient number of mathematically savvy specialists with a scientific instinct.
By 2002, the word Yandex became so common that when Arkady Volozh`sArkady Volozh`s company demanded to return the yandex.com domain, bought by third parties, the defendant stated that the word "Yandex" was already synonymous with the search and became a household word in Russia.
Since late 2012, the Yandex search enginesearch engine has outperformed the number of GoogleGoogle users on the Google ChromeGoogle Chrome browser in RussiaRussia.
2008
2007
2006
In early December, next to each link in the results of search appeared the item “Saved copy”, clicking on which, the user goes to a full copy of the page in a special archive databasedatabase (“Yandex cache”)
2005
The ranking algorithm has been improved to increase search accuracy.
It became possible to limit search results by region.
2004
At the end of the year, the study “Some Aspects of Full-Text Search and Ranking in Yandex” was published (authors Ilya Segalovich, Mikhail Maslov ), which revealed certain ranking details in a search engine.
2003
2002
2001
2000
In December 2000, the volume of indexed information reached 355.22 GB.
1990
The word stands for yet another indexer (or as “ I am ("ya" in Russian language) and index )”. According to the interpretation of Artemy LebedevArtemy Lebedev, the name of the search engine is consonant with Yandeks, where yang means the masculine beginning.
The yandex.ru search engine was announced by CompTek on September 23, 1997 at the Softool exhibition, although some developments in the field of search (BibleBible indexing, searching for documents on CD-ROMCD-ROM, site search) were carried out by the company even earlier.
The first index contained information on 5 thousand servers and occupied 4.5 GB.
In the same 1997, the search for Yandex began to be used in the Russian version of Internet ExplorerInternet Explorer 4.0. It became possible to query in natural language.
“Yandex. Search ”as of 1998 worked on three machines running on FreeBSDFreeBSD under ApacheApache: one machine crawled the Internet and indexed documents, one search engine, and one machine duplicated the search engine.
In 1999, a search appeared in the categories - search, a combination of a search engine and a catalog. The version of the search engine was updated.
Is computer hardware that generates computer graphicscomputer graphics and allows them to be shown on a display, usually using a graphics card (video cardvideo card) in combination with a device driverdevice driver to create the images on the screen.
The most important piece of graphics hardware is the graphics cardgraphics card, which is the piece of equipment that renders out all images and sends them to a display. There are two types of graphics cards: integrated and dedicated. An integrated graphics card, usually by IntelIntel to use in their computers, is bound to the motherboard and shares RAMRAM(Random Access Memory) with the CPUCPU, reducing the total amount of RAMRAM available. This is undesirable for running programs and applications that use a large amount of video memory. A dedicated graphics card has its own RAMRAM and Processor for generating its images, and does not slow down the computer. Dedicated graphics cards also have higher performance than integrated graphics cards. It is possible to have both[2] dedicated and integrated graphics, however once a dedicated graphics card is installed, the integrated card will no longer function until the dedicated card is removed.
The GPUGPU, or graphics processing unit, is the unit that allows the graphics card to function. It performs a large amount of the work given to the card. The majority of video playback on a computer is controlled by the GPU. Once again, a GPU can be either integrated or dedicated.
A display driverdisplay driver is a piece of software which allows your graphics hardware to communicate with your operating system.operating system. Drivers in general allow your computer to utilize parts of itself, and without them, the machine would not function. This is because usually a graphics device communicates in its own language, which is more sophisticated, and a computer communicates in its own language, which largely deals with general commands. Therefore, a driver is required to translate between the two, and convert general commands into specific commands, and vice versa, so that each of the devices can understand the instructions and results.
Dedicated graphics cards are not bound to the motherboardmotherboard, and therefore most are removable, replaceable, or upgradable. They are installed in an expansion slot and connected to the motherboard. On the other hand, an integrated graphics card cannot be changed without buying a new motherboard with a better chip, as they are bound to the motherboard.
The major competing brands in graphics hardware are NVidiaNVidia and AMDAMD. NVidia is known largely in the computer graphics department due to its GeForceGeForce brand, whereas AMD is known due to its RadeonRadeon brand. These two brands account for largely 100 percent of the graphics hardware market, with NVidia making 4 billion dollars in revenue and AMD generating 6.5 billion in revenue (through all sales, not specifically graphics cards).
Also, computer graphics hardware usually generates a larger amount of heat, especially high end gaming pieces, and requires additional cooling systems to prevent overheating. This may further raise the cost, although some dedicated graphics cards come with built-in fans.
Amdahl's law can be used to calculate how much a computation can be sped up by running part of it in parallel. Amdahl's law is named after Gene Amdahl who presented the law in 1967. Most developers working with parallel or concurrent systems have an intuitive feel for potential speedup, even without knowing Amdahl's law. Regardless, Amdahl's law may still be useful to know.
I will first explain Amdahl's law mathematically, and then proceed to illustrate Amdahl's law using diagrams.
A program (or algorithm) which can be parallelized can be split up into two parts:
Imagine a program that processes files from disk. A small part of that program may scan the directory and create a list of files internally in memory. After that, each file is passed to a separate thread for processing. The part that scans the directory and creates the file list cannot be parallelized, but processing the files can.
The total time taken to execute the program in serial (not in parallel) is called T. The time T includes the time of both the non-parallelizable and parallelizable parts. The non-parallelizable part is called B. The parallizable part is referred to as T - B. The following list sums up these definitions:
From this follows that:
T = B + (T-B)
It may look a bit strange at first that the parallelizable part of the program does not have its own symbol in the equation. However, since the parallelizable part of the equation can be expressed using the total time T and B (the non-parallelizable part), the equation has actually been reduced conceptually, meaning that it contains less different variables in this form.
It is the parallelizable part, T - B, that can be sped up by executing it in parallel. How much it can be sped up depends on how many threads or CPUs you apply to execute it. The number of threads or CPUs is called N. The fastest the the parallelizable part can be executed is thus:
(T - B) / N
Another way to write this is:
(1/N) * (T - B)
Wikipedia uses this version in case you read about Amdahl's law there.
According to Amdahl's law, the total execution time of the program when the parallelizable part is executed using N threads or CPUs is thus:
T(N) = B + (T - B) / N
T(N) means total execution with with a parallelization factor of N. Thus, T could be written T(1) , meaning the total execution time with a parallelization factor of 1. Using T(1) instead of T, Amdahl's law looks like this:
T(N) = B + ( T(1) - B ) / N
It still means the same though.
To better understand Amdahl's law, let's go through a calculation example. The total time to execute a program is set to 1. The non-parallelizable part of the programs is 40% which out of a total time of 1 is equal to 0.4 . The parallelizable part is thus equal to 1 - 0.4 = 0.6 .
The execution time of the program with a parallelization factor of 2 (2 threads or CPUs executing the parallelizable part, so N is 2) would be:
T(2) = 0.4 + ( 1 - 0.4 ) / 2
= 0.4 + 0.6 / 2
= 0.4 + 0.3
= 0.7
Making the same calculation with a parallelization factor of 5 instead of 2 would look like this:
T(5) = 0.4 + ( 1 - 0.4 ) / 5
= 0.4 + 0.6 / 5
= 0.4 + 0.12
= 0.52
To better understand Amdahl's law I will try to illustrate how the law is derived.
First of all, a program can be broken up into a non-parallelizable part B, and a parallelizable part 1-B, as illustrated by this diagram:
The line with the delimiters on at the top is the total time T(1).
Here you see the execution time with a parallelization factor of 2:
Here you see the execution time with a parallelization factor of 3:
From Amdahl's law it follows naturally, that the parallelizable part can be executed faster by throwing hardware at it. More threads / CPUs. The non-parallelizable part, however, can only be executed faster by optimizing the code. Thus, you can increase the speed and parallelizability of your program by optimizing the non-parallelizable part. You might even change the algorithm to have a smaller non-parallelizable part in general, by moving some of the work into the parallelizable part (if possible).
If you optimize the sequential part of a program you can also use Amdahl's law to calculate the execution time of the program after the optimization. If the non-parallelizable part B is optimized by a factor of O, then Amdahl's law looks like this:
T(O,N) = B / O + (1 - B / O) / N
Remember, the non-parallelizable part of the program now takes B / O time, so the parallelizable part takes 1 - B / O time.
If B is 0.4, O is 2 and N is 5, then the calculation looks like this:
T(2,5) = 0.4 / 2 + (1 - 0.4 / 2) / 5
= 0.2 + (1 - 0.4 / 2) / 5
= 0.2 + (1 - 0.2) / 5
= 0.2 + 0.8 / 5
= 0.2 + 0.16
= 0.36
So far we have only used Amdahl's law to calculate the execution time of a program or algorithm after optimization or parallelization. We can also use Amdahl's law to calculate the speedup, meaning how much faster the new algorithm or program is than the old version.
If the time of the old version of the program or algorithm is T, then the speedup will be
Speedup = T / T(O,N)
We often set T to 1 just to calculate the execution time and speedup as a fraction of the old time. The equation then looks like this:
Speedup = 1 / T(O,N)
If we insert the Amdahl's law calculation instead of T(O,N), we get the following formula:
Speedup = 1 / ( B / O + (1 - B / O) / N )
With B = 0.4, O = 2 and N = 5, the calculation becomes:
Speedup = 1 / ( 0.4 / 2 + (1 - 0.4 / 2) / 5)
= 1 / ( 0.2 + (1 - 0.4 / 2) / 5)
= 1 / ( 0.2 + (1 - 0.2) / 5 )
= 1 / ( 0.2 + 0.8 / 5 )
= 1 / ( 0.2 + 0.16 )
= 1 / 0.36
= 2.77777 ...
That means, that if you optimize the non-parallelizable (sequential) part by a factor of 2, and paralellize the parallelizable part by a factor of 5, the new optimized version of the program or algorithm would run a maximum of 2.77777 times faster than the old version.
While Amdahl's law enables you to calculate the theoretic speedup of parallelization of an algorithm, don't rely too heavily on such calculations. In practice, many other factors may come into play when you optimize or parallelize an algorithm.
The speed of memory, CPU cache memory, disks, network cards etc. (if disk or network are used) may be a limiting factor too. If a new version of the algorithm is parallelized, but leads to a lot more CPU cache misses, you may not even get the desired x N speedup of using x N CPUs. The same is true if you end up saturating the memory bus, disk or network card or network connection.
My recommendation would be to use Amdahl's law to get an idea about where to optimize, but use a measurement to determine the real speedup of the optimization. Remember, sometimes a highly serialized sequential (single CPU) algorithm may outperform a parallel algorithm, simply because the sequential version has no coordination overhead (breaking down work and building the total again), and because a single CPU algorithm may conform better with how the underlying hardware works (CPU pipelines, CPU cache etc).
CPU design is the design engineering task of creating a central processing unit (CPU), a component of computer hardware. It is a subfield of electronics engineering and computer engineering.
As with most complex electronic designs, the logic verification effort (proving that the design does not have bugs) now dominates the project schedule of a CPU.
Key CPU architectural innovations include index register, cache, virtual memory, instruction pipelining, superscalar, CISC, RISC, virtual machine, emulators, microprogram, and stack.
The first CPUs were designed to do mathematical calculations faster and more reliably than human computers.
Each successive generation of CPU might be designed to achieve some of these goals:
Shrinking everything (a "photomask shrink"), resulting in the same number of transistors on a smaller die, improves performance (smaller transistors switch faster), reduces power (smaller wires have less parasitic capacitance) and reduces cost (more CPUs fit on the same wafer of silicon).
Releasing a CPU on the same size die, but with a smaller CPU core, keeps the cost about the same but allows higher levels of integration within one VLSI chip (additional cache, multiple CPUs, or other components), improving performance and reducing overall system cost.
Main article: Computer performance
Because there are too many programs to test a CPU's speed on all of them, benchmarks were developed. The most famous benchmarks are the SPECint and SPECfp benchmarks developed by Standard Performance Evaluation Corporation and the ConsumerMark benchmark developed by the Embedded Microprocessor Benchmark Consortium EEMBC.
Some important measurements include:
Some of these measures conflict. In particular, many design techniques that make a CPU run faster make the "performance per watt", "performance per dollar", and "deterministic response" much worse, and vice versa.
There are several different markets in which CPUs are used. Since each of these markets differ in their requirements for CPUs, the devices designed for one market are in most cases inappropriate for the other markets.
The vast majority of revenues generated from CPU sales is for general purpose computing, that is, desktop, laptop, and server computers commonly used in businesses and homes. In this market, the Intel IA-32 architecture dominates, with its rivals PowerPC and SPARC maintaining much smaller customer bases. Yearly, hundreds of millions of IA-32 architecture CPUs are used by this market. A growing percentage of these processors are for mobile implementations such as netbooks and laptops.
Since these devices are used to run countless different types of programs, these CPU designs are not specifically targeted at one type of application or one function. The demands of being able to run a wide range of programs efficiently has made these CPU designs among the more advanced technically, along with some disadvantages of being relatively costly, and having high power consumption.
In 1984, most high-performance CPUs required four to five years to develop.
Developing new, high-end CPUs is a very costly proposition. Both the logical complexity (needing very large logic design and logic verification teams and simulation farms with perhaps thousands of computers) and the high operating frequencies (needing large circuit design teams and access to the state-of-the-art fabrication process) account for the high cost of design for this type of chip. The design cost of a high-end CPU will be on the order of US $100 million. Since the design of such high-end chips nominally takes about five years to complete, to stay competitive a company has to fund at least two of these large design teams to release products at the rate of 2.5 years per product generation.
As an example, the typical loaded cost for one computer engineer is often quoted to be $250,000 US dollars/year. This includes salary, benefits, CAD tools, computers, office space rent, etc. Assuming that 100 engineers are needed to design a CPU and the project takes 4 years.
Total cost = $250,000 / Engineer-Man/Year x 100 engineers x 4 years = $100,000,000 USD.
The above amount is just an example. The design teams for modern day general purpose CPUs have several hundred team members.
Main article: Supercomputer
Scientific computing is a much smaller niche market (in revenue and units shipped). It is used in government research labs and universities. Before 1990, CPU design was often done for this market, but mass market CPUs organized into large clusters have proven to be more affordable. The main remaining area of active hardware design and research for scientific computing is for high-speed data transmission systems to connect mass market CPUs.
As measured by units shipped, most CPUs are embedded in other machinery, such as telephones, clocks, appliances, vehicles, and infrastructure. Embedded processors sell in the volume of many billions of units per year, however, mostly at much lower price points than that of the general purpose processors.
These single-function devices differ from the more familiar general-purpose CPUs in several ways:
Low cost is of utmost importance.
It is important to maintain a low power dissipation as embedded devices often have a limited battery life and it is often impractical to include cooling fans.
To give lower system cost, peripherals are integrated with the processor on the same silicon chip.
Keeping peripherals on-chip also reduces power consumption as external GPIO ports typically require buffering so that they can source or sink the relatively high current loads that are required to maintain a strong signal outside of the chip.
Many embedded applications have a limited amount of physical space for circuitry; keeping peripherals on-chip will reduce the space required for the circuit board.
The program and data memories are often integrated on the same chip. When the only allowed program memory is ROM, the device is known as a microcontroller.
For many embedded applications, interrupt latency will be more critical than in some general-purpose processors.
The embedded CPU family with the largest number of total units shipped is the 8051, averaging nearly a billion units per year. The 8051 is widely used because it is very inexpensive. The design time is now roughly zero, because it is widely available as commercial intellectual property. It is now often embedded as a small part of a larger system on a chip. The silicon cost of an 8051 is now as low as US$0.001, because some implementations use as few as 2,200 logic gates and take 0.0127 square millimeters of silicon
As of 2009, more CPUs are produced using the ARM architecture instruction set than any other 32-bit instruction set. The ARM architecture and the first ARM chip were designed in about one and a half years and 5 man years of work time.
The 32-bit Parallax Propeller microcontroller architecture and the first chip were designed by two people in about 10 man years of work time.
It is believed that the 8-bit AVR architecture and first AVR microcontroller was conceived and designed by two students at the Norwegian Institute of Technology.
The 8-bit 6502 architecture and the first MOS Technology 6502 chip were designed in 13 months by a group of about 9 people.
The 32 bit Berkeley RISC I and RISC II architecture and the first chips were mostly designed by a series of students as part of a four quarter sequence of graduate courses. This design became the basis of the commercial SPARC processor design.
For about a decade, every student taking the 6.004 class at MIT was part of a team—each team had one semester to design and build a simple 8 bit CPU out of 7400 series integrated circuits. One team of 4 students designed and built a simple 32 bit CPU during that semester.
Some undergraduate courses require a team of 2 to 5 students to design, implement, and test a simple CPU in a FPGA in a single 15 week semester.
For embedded systems, the highest performance levels are often not needed or desired due to the power consumption requirements. This allows for the use of processors which can be totally implemented by logic synthesis techniques. These synthesized processors can be implemented in a much shorter amount of time, giving quicker time-to-market.
The central processing unit (CPU) is the portion of a computer system that carries out the instructions of a computer program, to perform the basic arithmetical, logical, and input/output operations of the system. The CPU plays a role somewhat analogous to the brain in the computer. The term has been in use in the computer industry at least since the early 1960s. The form, design and implementation of CPUs have changed dramatically since the earliest examples, but their fundamental operation remains much the same.
On large machines, CPUs require one or more printed circuit boards. On personal computers and small workstations, the CPU is housed in a single silicon chip called a microprocessor. Since the 1970s the microprocessor class of CPUs has almost completely overtaken all other CPU implementations. Modern CPUs are large scale integrated circuits in packages typically less than four centimeters square, with hundreds of connecting pins.
Two typical components of a CPU are the arithmetic logic unit (ALU), which performs arithmetic and logical operations, and the control unit (CU), which extracts instructions from memory and decodes and executes them, calling on the ALU when necessary.
Not all computational systems rely on a central processing unit. An array processor or vector processor has multiple parallel computing elements, with no one unit considered the "center". In the distributed computing model, problems are solved by a distributed interconnected set of processors.
Computers such as the ENIAC had to be physically rewired in order to perform different tasks, which caused these machines to be called "fixed-program computers." Since the term "CPU" is generally defined as a software (computer program) execution device, the earliest devices that could rightly be called CPUs came with the advent of the stored-program computer.
The idea of a stored-program computer was already present in the design of J. Presper Eckert and John William Mauchly's ENIAC, but was initially omitted so that it could be finished sooner. On June 30, 1945, before ENIAC was made, mathematician John von Neumann distributed the paper entitled First Draft of a Report on the EDVAC. It was the outline of a stored-program computer that would eventually be completed in August 1949. EDVAC was designed to perform a certain number of instructions (or operations) of various types. These instructions could be combined to create useful programs for the EDVAC to run. Significantly, the programs written for EDVAC were stored in high-speed computer memory rather than specified by the physical wiring of the computer. This overcame a severe limitation of ENIAC, which was the considerable time and effort required to reconfigure the computer to perform a new task. With von Neumann's design, the program, or software, that EDVAC ran could be changed simply by changing the contents of the memory.
Early CPUs were custom-designed as a part of a larger, sometimes one-of-a-kind, computer. However, this method of designing custom CPUs for a particular application has largely given way to the development of mass-produced processors that are made for many purposes. This standardization began in the era of discrete transistor mainframes and minicomputers and has rapidly accelerated with the popularization of the integrated circuit (IC). The IC has allowed increasingly complex CPUs to be designed and manufactured to tolerances on the order of nanometers. Both the miniaturization and standardization of CPUs have increased the presence of digital devices in modern life far beyond the limited application of dedicated computing machines. Modern microprocessors appear in everything from automobiles to cell phones and children's toys.
While von Neumann is most often credited with the design of the stored-program computer because of his design of EDVAC, others before him, such as Konrad Zuse, had suggested and implemented similar ideas. The so-called Harvard architecture of the Harvard Mark I, which was completed before EDVAC, also utilized a stored-program design using punched paper tape rather than electronic memory. The key difference between the von Neumann and Harvard architectures is that the latter separates the storage and treatment of CPU instructions and data, while the former uses the same memory space for both. Most modern CPUs are primarily von Neumann in design, but elements of the Harvard architecture are commonly seen as well.
Relays and vacuum tubes (thermionic valves) were commonly used as switching elements; a useful computer requires thousands or tens of thousands of switching devices. The overall speed of a system is dependent on the speed of the switches. Tube computers like EDVAC tended to average eight hours between failures, whereas relay computers like the (slower, but earlier) Harvard Mark I failed very rarely. In the end, tube based CPUs became dominant because the significant speed advantages afforded generally outweighed the reliability problems. Most of these early synchronous CPUs ran at low clock rates compared to modern microelectronic designs (see below for a discussion of clock rate). Clock signal frequencies ranging from 100 kHz to 4 MHz were very common at this time, limited largely by the speed of the switching devices they were built with.
The control unit of the CPU contains circuitry that uses electrical signals to direct the entire computer system to carry out stored program instructions. The control unit does not execute program instructions; rather, it directs other parts of the system to do so. The control unit must communicate with both the arithmetic/logic unit and memory.
The design complexity of CPUs increased as various technologies facilitated building smaller and more reliable electronic devices. The first such improvement came with the advent of the transistor. Transistorized CPUs during the 1950s and 1960s no longer had to be built out of bulky, unreliable, and fragile switching elements like vacuum tubes and electrical relays. With this improvement more complex and reliable CPUs were built onto one or several printed circuit boards containing discrete (individual) components.
During this period, a method of manufacturing many transistors in a compact space gained popularity. The integrated circuit (IC) allowed a large number of transistors to be manufactured on a single semiconductor-based die, or "chip." At first only very basic non-specialized digital circuits such as NOR gates were miniaturized into ICs. CPUs based upon these "building block" ICs are generally referred to as "small-scale integration" (SSI) devices. SSI ICs, such as the ones used in the Apollo guidance computer, usually contained up to a few score transistors. To build an entire CPU out of SSI ICs required thousands of individual chips, but still consumed much less space and power than earlier discrete transistor designs. As microelectronic technology advanced, an increasing number of transistors were placed on ICs, thus decreasing the quantity of individual ICs needed for a complete CPU. MSI and LSI (medium- and large-scale integration) ICs increased transistor counts to hundreds, and then thousands.
In 1964 IBM introduced its System/360 computer architecture which was used in a series of computers that could run the same programs with different speed and performance. This was significant at a time when most electronic computers were incompatible with one another, even those made by the same manufacturer. To facilitate this improvement, IBM utilized the concept of a microprogram (often called "microcode"), which still sees widespread usage in modern CPUs. The System/360 architecture was so popular that it dominated the mainframe computer market for decades and left a legacy that is still continued by similar modern computers like the IBM zSeries. In the same year (1964), Digital Equipment Corporation (DEC) introduced another influential computer aimed at the scientific and research markets, the PDP-8. DEC would later introduce the extremely popular PDP-11 line that originally was built with SSI ICs but was eventually implemented with LSI components once these became practical. In stark contrast with its SSI and MSI predecessors, the first LSI implementation of the PDP-11 contained a CPU composed of only four LSI integrated circuits.
Transistor-based computers had several distinct advantages over their predecessors. Aside from facilitating increased reliability and lower power consumption, transistors also allowed CPUs to operate at much higher speeds because of the short switching time of a transistor in comparison to a tube or relay. Thanks to both the increased reliability as well as the dramatically increased speed of the switching elements (which were almost exclusively transistors by this time), CPU clock rates in the tens of megahertz were obtained during this period. Additionally while discrete transistor and IC CPUs were in heavy usage, new high-performance designs like SIMD (Single Instruction Multiple Data) vector processors began to appear. These early experimental designs later gave rise to the era of specialized supercomputers like those made by Cray Inc.
In the 1970s the fundamental inventions by Federico Faggin (Silicon Gate MOS ICs with self aligned gates along with his new random logic design methodology) changed the design and implementation of CPUs forever. Since the introduction of the first commercially available microprocessor (the Intel 4004), in 1970 and the first widely used microprocessor (the Intel 8080) in 1974, this class of CPUs has almost completely overtaken all other central processing unit implementation methods. Mainframe and minicomputer manufacturers of the time launched proprietary IC development programs to upgrade their older computer architectures, and eventually produced instruction set compatible microprocessors that were backward-compatible with their older hardware and software. Combined with the advent and eventual vast success of the now ubiquitous personal computer, the term CPU is now applied almost exclusively to microprocessors. Several CPUs can be combined in a single processing chip.
Previous generations of CPUs were implemented as discrete components and numerous small integrated circuits (ICs) on one or more circuit boards. Microprocessors, on the other hand, are CPUs manufactured on a very small number of ICs; usually just one. The overall smaller CPU size as a result of being implemented on a single die means faster switching time because of physical factors like decreased gate parasitic capacitance. This has allowed synchronous microprocessors to have clock rates ranging from tens of megahertz to several gigahertz. Additionally, as the ability to construct exceedingly small transistors on an IC has increased, the complexity and number of transistors in a single CPU has increased dramatically. This widely observed trend is described by Moore's law, which has proven to be a fairly accurate predictor of the growth of CPU (and other IC) complexity to date.
While the complexity, size, construction, and general form of CPUs have changed drastically over the past sixty years, it is notable that the basic design and function has not changed much at all. Almost all common CPUs today can be very accurately described as von Neumann stored-program machines. As the aforementioned Moore's law continues to hold true, concerns have arisen about the limits of integrated circuit transistor technology. Extreme miniaturization of electronic gates is causing the effects of phenomena like electromigration and subthreshold leakage to become much more significant. These newer concerns are among the many factors causing researchers to investigate new methods of computing such as the quantum computer, as well as to expand the usage of parallelism and other methods that extend the usefulness of the classical von Neumann model.
The fundamental operation of most CPUs, regardless of the physical form they take, is to execute a sequence of stored instructions called a program. The program is represented by a series of numbers that are kept in some kind of computer memory. There are four steps that nearly all CPUs use in their operation: fetch, decode, execute, and writeback.
The first step, fetch, involves retrieving an instruction (which is represented by a number or sequence of numbers) from program memory. The location in program memory is determined by a program counter (PC), which stores a number that identifies the current position in the program. After an instruction is fetched, the PC is incremented by the length of the instruction word in terms of memory units. Often, the instruction to be fetched must be retrieved from relatively slow memory, causing the CPU to stall while waiting for the instruction to be returned. This issue is largely addressed in modern processors by caches and pipeline architectures (see below).
The instruction that the CPU fetches from memory is used to determine what the CPU is to do. In the decode step, the instruction is broken up into parts that have significance to other portions of the CPU. The way in which the numerical instruction value is interpreted is defined by the CPU's instruction set architecture (ISA). Often, one group of numbers in the instruction, called the opcode, indicates which operation to perform. The remaining parts of the number usually provide information required for that instruction, such as operands for an addition operation. Such operands may be given as a constant value (called an immediate value), or as a place to locate a value: a register or a memory address, as determined by some addressing mode. In older designs the portions of the CPU responsible for instruction decoding were unchangeable hardware devices. However, in more abstract and complicated CPUs and ISAs, a microprogram is often used to assist in translating instructions into various configuration signals for the CPU. This microprogram is sometimes rewritable so that it can be modified to change the way the CPU decodes instructions even after it has been manufactured.
After the fetch and decode steps, the execute step is performed. During this step, various portions of the CPU are connected so they can perform the desired operation. If, for instance, an addition operation was requested, the arithmetic logic unit (ALU) will be connected to a set of inputs and a set of outputs. The inputs provide the numbers to be added, and the outputs will contain the final sum. The ALU contains the circuitry to perform simple arithmetic and logical operations on the inputs (like addition and bitwise operations). If the addition operation produces a result too large for the CPU to handle, an arithmetic overflow flag in a flags register may also be set.
The final step, writeback, simply "writes back" the results of the execute step to some form of memory. Very often the results are written to some internal CPU register for quick access by subsequent instructions. In other cases results may be written to slower, but cheaper and larger, main memory. Some types of instructions manipulate the program counter rather than directly produce result data. These are generally called "jumps" and facilitate behavior like loops, conditional program execution (through the use of a conditional jump), and functions in programs. Many instructions will also change the state of digits in a "flags" register. These flags can be used to influence how a program behaves, since they often indicate the outcome of various operations. For example, one type of "compare" instruction considers two values and sets a number in the flags register according to which one is greater. This flag could then be used by a later jump instruction to determine program flow.
After the execution of the instruction and writeback of the resulting data, the entire process repeats, with the next instruction cycle normally fetching the next-in-sequence instruction because of the incremented value in the program counter. If the completed instruction was a jump, the program counter will be modified to contain the address of the instruction that was jumped to, and program execution continues normally. In more complex CPUs than the one described here, multiple instructions can be fetched, decoded, and executed simultaneously. This section describes what is generally referred to as the "classic RISC pipeline", which in fact is quite common among the simple CPUs used in many electronic devices (often called microcontroller). It largely ignores the important role of CPU cache, and therefore the access stage of the pipeline.
Hardwired into a CPU's design is a list of basic operations it can perform, called an instruction set. Such operations may include adding or subtracting two numbers, comparing numbers, or jumping to a different part of a program. Each of these basic operations is represented by a particular sequence of bits; this sequence is called the opcode for that particular operation. Sending a particular opcode to a CPU will cause it to perform the operation represented by that opcode. To execute an instruction in a computer program, the CPU uses the opcode for that instruction as well as its arguments (for instance the two numbers to be added, in the case of an addition operation). A computer program is therefore a sequence of instructions, with each instruction including an opcode and that operation's arguments.
The actual mathematical operation for each instruction is performed by a subunit of the CPU known as the arithmetic logic unit or ALU. In addition to using its ALU to perform operations, a CPU is also responsible for reading the next instruction from memory, reading data specified in arguments from memory, and writing results to memory.
In many CPU designs, an instruction set will clearly differentiate between operations that load data from memory, and those that perform math. In this case the data loaded from memory is stored in registers, and a mathematical operation takes no arguments but simply performs the math on the data in the registers and writes it to a new register, whose value a separate operation may then write to memory.
The way a CPU represents numbers is a design choice that affects the most basic ways in which the device functions. Some early digital computers used an electrical model of the common decimal (base ten) numeral system to represent numbers internally. A few other computers have used more exotic numeral systems like ternary (base three). Nearly all modern CPUs represent numbers in binary form, with each digit being represented by some two-valued physical quantity such as a "high" or "low" voltage.
Related to number representation is the size and precision of numbers that a CPU can represent. In the case of a binary CPU, a bit refers to one significant place in the numbers a CPU deals with. The number of bits (or numeral places) a CPU uses to represent numbers is often called "word size", "bit width", "data path width", or "integer precision" when dealing with strictly integer numbers (as opposed to Floating point). This number differs between architectures, and often within different parts of the very same CPU. For example, an 8-bit CPU deals with a range of numbers that can be represented by eight binary digits (each digit having two possible values), that is, 28 or 256 discrete numbers. In effect, integer size sets a hardware limit on the range of integers the software run by the CPU can utilize.
Integer range can also affect the number of locations in memory the CPU can address (locate). For example, if a binary CPU uses 32 bits to represent a memory address, and each memory address represents one octet (8 bits), the maximum quantity of memory that CPU can address is 232 octets, or 4 GiB. This is a very simple view of CPU address space, and many designs use more complex addressing methods like paging in order to locate more memory than their integer range would allow with a flat address space.
Higher levels of integer range require more structures to deal with the additional digits, and therefore more complexity, size, power usage, and general expense. It is not at all uncommon, therefore, to see 4- or 8-bit microcontrollers used in modern applications, even though CPUs with much higher range (such as 16, 32, 64, even 128-bit) are available. The simpler microcontrollers are usually cheaper, use less power, and therefore generate less heat, all of which can be major design considerations for electronic devices. However, in higher-end applications, the benefits afforded by the extra range (most often the additional address space) are more significant and often affect design choices. To gain some of the advantages afforded by both lower and higher bit lengths, many CPUs are designed with different bit widths for different portions of the device. For example, the IBM System/370 used a CPU that was primarily 32 bit, but it used 128-bit precision inside its floating point units to facilitate greater accuracy and range in floating point numbers. Many later CPU designs use similar mixed bit width, especially when the processor is meant for general-purpose usage where a reasonable balance of integer and floating point capability is required.
The clock rate is the speed at which a microprocessor executes instructions. Every computer contains an internal clock that regulates the rate at which instructions are executed and synchronizes all the various computer components. The CPU requires a fixed number of clock ticks (or clock cycles) to execute each instruction. The faster the clock, the more instructions the CPU can execute per second.
Most CPUs, and indeed most sequential logic devices, are synchronous in nature. That is, they are designed and operate on assumptions about a synchronization signal. This signal, known as a clock signal, usually takes the form of a periodic square wave. By calculating the maximum time that electrical signals can move in various branches of a CPU's many circuits, the designers can select an appropriate period for the clock signal.
This period must be longer than the amount of time it takes for a signal to move, or propagate, in the worst-case scenario. In setting the clock period to a value well above the worst-case propagation delay, it is possible to design the entire CPU and the way it moves data around the "edges" of the rising and falling clock signal. This has the advantage of simplifying the CPU significantly, both from a design perspective and a component-count perspective. However, it also carries the disadvantage that the entire CPU must wait on its slowest elements, even though some portions of it are much faster. This limitation has largely been compensated for by various methods of increasing CPU parallelism. (see below)
However, architectural improvements alone do not solve all of the drawbacks of globally synchronous CPUs. For example, a clock signal is subject to the delays of any other electrical signal. Higher clock rates in increasingly complex CPUs make it more difficult to keep the clock signal in phase (synchronized) throughout the entire unit. This has led many modern CPUs to require multiple identical clock signals to be provided in order to avoid delaying a single signal significantly enough to cause the CPU to malfunction. Another major issue as clock rates increase dramatically is the amount of heat that is dissipated by the CPU. The constantly changing clock causes many components to switch regardless of whether they are being used at that time. In general, a component that is switching uses more energy than an element in a static state. Therefore, as clock rate increases, so does heat dissipation, causing the CPU to require more effective cooling solutions.
One method of dealing with the switching of unneeded components is called clock gating, which involves turning off the clock signal to unneeded components (effectively disabling them). However, this is often regarded as difficult to implement and therefore does not see common usage outside of very low-power designs. One notable late CPU design that uses clock gating is that of the IBM PowerPC-based Xbox 360. It utilizes extensive clock gating in order to reduce the power requirements of the aforementioned videogame console in which it is used. Another method of addressing some of the problems with a global clock signal is the removal of the clock signal altogether. While removing the global clock signal makes the design process considerably more complex in many ways, asynchronous (or clockless) designs carry marked advantages in power consumption and heat dissipation in comparison with similar synchronous designs. While somewhat uncommon, entire asynchronous CPUs have been built without utilizing a global clock signal. Two notable examples of this are the ARM compliant AMULET and the MIPS R3000 compatible MiniMIPS. Rather than totally removing the clock signal, some CPU designs allow certain portions of the device to be asynchronous, such as using asynchronous ALUs in conjunction with superscalar pipelining to achieve some arithmetic performance gains. While it is not altogether clear whether totally asynchronous designs can perform at a comparable or better level than their synchronous counterparts, it is evident that they do at least excel in simpler math operations. This, combined with their excellent power consumption and heat dissipation properties, makes them very suitable for embedded computers.
The description of the basic operation of a CPU offered in the previous section describes the simplest form that a CPU can take. This type of CPU, usually referred to as subscalar, operates on and executes one instruction on one or two pieces of data at a time.
This process gives rise to an inherent inefficiency in subscalar CPUs. Since only one instruction is executed at a time, the entire CPU must wait for that instruction to complete before proceeding to the next instruction. As a result, the subscalar CPU gets "hung up" on instructions which take more than one clock cycle to complete execution. Even adding a second execution unit (see below) does not improve performance much; rather than one pathway being hung up, now two pathways are hung up and the number of unused transistors is increased. This design, wherein the CPU's execution resources can operate on only one instruction at a time, can only possibly reach scalar performance (one instruction per clock). However, the performance is nearly always subscalar (less than one instruction per cycle).
Attempts to achieve scalar and better performance have resulted in a variety of design methodologies that cause the CPU to behave less linearly and more in parallel. When referring to parallelism in CPUs, two terms are generally used to classify these design techniques. Instruction level parallelism (ILP) seeks to increase the rate at which instructions are executed within a CPU (that is, to increase the utilization of on-die execution resources), and thread level parallelism (TLP) purposes to increase the number of threads (effectively individual programs) that a CPU can execute simultaneously. Each methodology differs both in the ways in which they are implemented, as well as the relative effectiveness they afford in increasing the CPU's performance for an application.
One of the simplest methods used to accomplish increased parallelism is to begin the first steps of instruction fetching and decoding before the prior instruction finishes executing. This is the simplest form of a technique known as instruction pipelining, and is utilized in almost all modern general-purpose CPUs. Pipelining allows more than one instruction to be executed at any given time by breaking down the execution pathway into discrete stages. This separation can be compared to an assembly line, in which an instruction is made more complete at each stage until it exits the execution pipeline and is retired.
Pipelining does, however, introduce the possibility for a situation where the result of the previous operation is needed to complete the next operation; a condition often termed data dependency conflict. To cope with this, additional care must be taken to check for these sorts of conditions and delay a portion of the instruction pipeline if this occurs. Naturally, accomplishing this requires additional circuitry, so pipelined processors are more complex than subscalar ones (though not very significantly so). A pipelined processor can become very nearly scalar, inhibited only by pipeline stalls (an instruction spending more than one clock cycle in a stage).
Further improvement upon the idea of instruction pipelining led to the development of a method that decreases the idle time of CPU components even further. Designs that are said to be superscalar include a long instruction pipeline and multiple identical execution units. In a superscalar pipeline, multiple instructions are read and passed to a dispatcher, which decides whether or not the instructions can be executed in parallel (simultaneously). If so they are dispatched to available execution units, resulting in the ability for several instructions to be executed simultaneously. In general, the more instructions a superscalar CPU is able to dispatch simultaneously to waiting execution units, the more instructions will be completed in a given cycle.
Most of the difficulty in the design of a superscalar CPU architecture lies in creating an effective dispatcher. The dispatcher needs to be able to quickly and correctly determine whether instructions can be executed in parallel, as well as dispatch them in such a way as to keep as many execution units busy as possible. This requires that the instruction pipeline is filled as often as possible and gives rise to the need in superscalar architectures for significant amounts of CPU cache. It also makes hazard-avoiding techniques like branch prediction, speculative execution, and out-of-order execution crucial to maintaining high levels of performance. By attempting to predict which branch (or path) a conditional instruction will take, the CPU can minimize the number of times that the entire pipeline must wait until a conditional instruction is completed. Speculative execution often provides modest performance increases by executing portions of code that may not be needed after a conditional operation completes. Out-of-order execution somewhat rearranges the order in which instructions are executed to reduce delays due to data dependencies. Also in case of Single Instructions Multiple Data — a case when a lot of data from the same type has to be processed, modern processors can disable parts of the pipeline so that when a single instruction is executed many times, the CPU skips the fetch and decode phases and thus greatly increases performance on certain occasions, especially in highly monotonous program engines such as video creation software and photo processing.
In the case where a portion of the CPU is superscalar and part is not, the part which is not suffers a performance penalty due to scheduling stalls. The Intel P5 Pentium had two superscalar ALUs which could accept one instruction per clock each, but its FPU could not accept one instruction per clock. Thus the P5 was integer superscalar but not floating point superscalar. Intel's successor to the P5 architecture, P6, added superscalar capabilities to its floating point features, and therefore afforded a significant increase in floating point instruction performance.
Both simple pipelining and superscalar design increase a CPU's ILP by allowing a single processor to complete execution of instructions at rates surpassing one instruction per cycle (IPC). Most modern CPU designs are at least somewhat superscalar, and nearly all general purpose CPUs designed in the last decade are superscalar. In later years some of the emphasis in designing high-ILP computers has been moved out of the CPU's hardware and into its software interface, or ISA. The strategy of the very long instruction word (VLIW) causes some ILP to become implied directly by the software, reducing the amount of work the CPU must perform to boost ILP and thereby reducing the design's complexity.
Another strategy of achieving performance is to execute multiple programs or threads in parallel. This area of research is known as parallel computing. In Flynn's taxonomy, this strategy is known as Multiple Instructions-Multiple Data or MIMD.
One technology used for this purpose was multiprocessing (MP). The initial flavor of this technology is known as symmetric multiprocessing (SMP), where a small number of CPUs share a coherent view of their memory system. In this scheme, each CPU has additional hardware to maintain a constantly up-to-date view of memory. By avoiding stale views of memory, the CPUs can cooperate on the same program and programs can migrate from one CPU to another. To increase the number of cooperating CPUs beyond a handful, schemes such as non-uniform memory access (NUMA) and directory-based coherence protocols were introduced in the 1990s. SMP systems are limited to a small number of CPUs while NUMA systems have been built with thousands of processors. Initially, multiprocessing was built using multiple discrete CPUs and boards to implement the interconnect between the processors. When the processors and their interconnect are all implemented on a single silicon chip, the technology is known as a multi-core microprocessor.
It was later recognized that finer-grain parallelism existed with a single program. A single program might have several threads (or functions) that could be executed separately or in parallel. Some of the earliest examples of this technology implemented input/output processing such as direct memory access as a separate thread from the computation thread. A more general approach to this technology was introduced in the 1970s when systems were designed to run multiple computation threads in parallel. This technology is known as multi-threading (MT). This approach is considered more cost-effective than multiprocessing, as only a small number of components within a CPU is replicated in order to support MT as opposed to the entire CPU in the case of MP. In MT, the execution units and the memory system including the caches are shared among multiple threads. The downside of MT is that the hardware support for multithreading is more visible to software than that of MP and thus supervisor software like operating systems have to undergo larger changes to support MT. One type of MT that was implemented is known as block multithreading, where one thread is executed until it is stalled waiting for data to return from external memory. In this scheme, the CPU would then quickly switch to another thread which is ready to run, the switch often done in one CPU clock cycle, such as the UltraSPARC Technology. Another type of MT is known as simultaneous multithreading, where instructions of multiple threads are executed in parallel within one CPU clock cycle.
For several decades from the 1970s to early 2000s, the focus in designing high performance general purpose CPUs was largely on achieving high ILP through technologies such as pipelining, caches, superscalar execution, out-of-order execution, etc. This trend culminated in large, power-hungry CPUs such as the Intel Pentium 4. By the early 2000s, CPU designers were thwarted from achieving higher performance from ILP techniques due to the growing disparity between CPU operating frequencies and main memory operating frequencies as well as escalating CPU power dissipation owing to more esoteric ILP techniques.
CPU designers then borrowed ideas from commercial computing markets such as transaction processing, where the aggregate performance of multiple programs, also known as throughput computing, was more important than the performance of a single thread or program.
This reversal of emphasis is evidenced by the proliferation of dual and multiple core CMP (chip-level multiprocessing) designs and notably, Intel's newer designs resembling its less superscalar P6 architecture. Late designs in several processor families exhibit CMP, including the x86-64 Opteron and Athlon 64 X2, the SPARC UltraSPARC T1, IBM POWER4 and POWER5, as well as several video game console CPUs like the Xbox 360's triple-core PowerPC design, and the PS3's 7-core Cell microprocessor.
A less common but increasingly important paradigm of CPUs (and indeed, computing in general) deals with data parallelism. The processors discussed earlier are all referred to as some type of scalar device. As the name implies, vector processors deal with multiple pieces of data in the context of one instruction. This contrasts with scalar processors, which deal with one piece of data for every instruction. Using Flynn's taxonomy, these two schemes of dealing with data are generally referred to as SIMD (single instruction, multiple data) and SISD (single instruction, single data), respectively. The great utility in creating CPUs that deal with vectors of data lies in optimizing tasks that tend to require the same operation (for example, a sum or a dot product) to be performed on a large set of data. Some classic examples of these types of tasks are multimedia applications (images, video, and sound), as well as many types of scientific and engineering tasks. Whereas a scalar CPU must complete the entire process of fetching, decoding, and executing each instruction and value in a set of data, a vector CPU can perform a single operation on a comparatively large set of data with one instruction. Of course, this is only possible when the application tends to require many steps which apply one operation to a large set of data.
Most early vector CPUs, such as the Cray-1, were associated almost exclusively with scientific research and cryptography applications. However, as multimedia has largely shifted to digital media, the need for some form of SIMD in general-purpose CPUs has become significant. Shortly after inclusion of floating point execution units started to become commonplace in general-purpose processors, specifications for and implementations of SIMD execution units also began to appear for general-purpose CPUs. Some of these early SIMD specifications like HP's Multimedia Acceleration eXtensions (MAX) and Intel's MMX were integer-only. This proved to be a significant impediment for some software developers, since many of the applications that benefit from SIMD primarily deal with floating point numbers. Progressively, these early designs were refined and remade into some of the common, modern SIMD specifications, which are usually associated with one ISA. Some notable modern examples are Intel's SSE and the PowerPC-related AltiVec (also known as VMX).
The performance or speed of a processor depends on the clock rate (generally given in multiples of hertz) and the instructions per clock (IPC), which together are the factors for the instructions per second (IPS) that the CPU can perform. Many reported IPS values have represented "peak" execution rates on artificial instruction sequences with few branches, whereas realistic workloads consist of a mix of instructions and applications, some of which take longer to execute than others. The performance of the memory hierarchy also greatly affects processor performance, an issue barely considered in MIPS calculations. Because of these problems, various standardized tests, often called "benchmarks" for this purpose—such as SPECint -- have been developed to attempt to measure the real effective performance in commonly used applications.
Processing performance of computers is increased by using multi-core processors, which essentially is plugging two or more individual processors (called cores in this sense) into one integrated circuit. Ideally, a dual core processor would be nearly twice as powerful as a single core processor. In practice, however, the performance gain is far less, only about 50%, due to imperfect software algorithms and implementation.
Is the category of tools for designing and producing electronic systems ranging from printed circuit boards (PCBs) to integrated circuits. This is sometimes referred to as ECAD (electronic computer-aided design) or just CAD. (Printed circuit boards and wire wrap both contain specialized discussions of the EDA used for those.)
The term "EDA" is also used as an umbrella term for computer-aided engineering, computer-aided design and computer-aided manufacturing of electronics in the discipline of electrical engineering. This usage probably originates in the IEEE Design Automation Technical Committee.
This article describes EDA specifically for electronics, and concentrates on EDA used for designing integrated circuits. The segment of the industry that must use EDA are chip designers at semiconductor companies. Large chips are too complex to design by hand.
EDA for electronics has rapidly increased in importance with the continuous scaling of semiconductor technology. (See Moore's Law.) Some users are foundry operators, who operate the semiconductor fabrication facilities, or "fabs", and design-service companies who use EDA software to evaluate an incoming design for manufacturing readiness. EDA tools are also used for programming design functionality into FPGAs.
Before EDA, integrated circuits were designed by hand, and manually laid out. Some advanced shops used geometric software to generate the tapes for the Gerber photoplotter, but even those copied digital recordings of mechanically-drawn components. The process was fundamentally graphic, with the translation from electronics to graphics done manually. The best known company from this era was Calma, whose GDSII format survives.
By the mid-70s, developers were starting to automate the design, and not just the drafting. The first placement and routing (Place and route) tools were developed. The proceedings of the Design Automation Conference cover much of this era.
The next era began more or less with the publication of "Introduction to VLSI Systems" by Carver Mead and Lynn Conway in 1980. This groundbreaking text advocated chip design with programming languages that compiled to silicon. The immediate result was a hundredfold increase in the complexity of the chips that could be designed, with improved access to design verification tools that used logic simulation. Often the chips were not just easier to lay out, but more correct as well, because their designs could be simulated more thoroughly before construction.
The earliest EDA tools were produced academically, and were in the public domain. One of the most famous was the "Berkeley VLSI Tools Tarball", a set of UNIX utilities used to design early VLSI systems. Still widely used is the Espresso heuristic logic minimizer and Magic.
Another crucial development was the formation of MOSIS, a consortium of universities and fabricators that developed an inexpensive way to train student chip designers by producing real integrated circuits. The basic idea was to use reliable, low-cost, relatively low-technology IC processes, and pack a large number of projects per wafer, with just a few copies of each projects' chips. Cooperating fabricators either donated the processed wafers, or sold them at cost, seeing the program as helpful to their own long-term growth.
1981 marks the beginning of EDA as an industry. For many years, the larger electronic companies, such as Hewlett Packard, Tektronix, and Intel, had pursued EDA internally. In 1981, managers and developers spun out of these companies to concentrate on EDA as a business. Daisy Systems, Mentor Graphics, and Valid Logic Systems were all founded around this time, and collectively referred to as DMV. Within a few years there were many companies specializing in EDA, each with a slightly different emphasis.
In 1986, Verilog, a popular high-level design language, was first introduced as a hardware description language by Gateway. In 1987, the U.S. Department of Defense funded creation of VHDL as a specification language. Simulators quickly followed these introductions, permitting direct simulation of chip designs: executable specifications. In a few more years, back-ends were developed to perform logic synthesis.
Many of the EDA companies acquire small companies with software or other technology that can be adapted to their core business. Most of the market leaders are rather incestuous amalgamations of many smaller companies. This trend is helped by the tendency of software companies to design tools as accessories that fit naturally into a larger vendor's suite of programs (the "tool flow").
While early EDA focused on digital circuitry, many new tools incorporate analog design, and mixed systems. This is happening because there is now a trend to place entire electronic systems on a single chip.
Current digital flows are extremely modular (see Integrated circuit design, Design closure, and Design flow (EDA)). The front ends produce standardized design descriptions that compile into invocations of "cells,", without regard to the cell technology. Cells implement logic or other electronic functions using a particular integrated circuit technology. Fabricators generally provide libraries of components for their production processes, with simulation models that fit standard simulation tools. Analog EDA tools are much less modular, since many more functions are required, they interact more strongly, and the components are (in general) less ideal.
EDA is divided into many (sometimes overlapping) sub-areas. They mostly align with the path of manufacturing from design to mask generation. The following applies to chip/ASIC/FPGA construction but is very similar in character to the areas of printed circuit board design:
Largest companies and their histories
Well before Electronic Design Automation, the use of computers to help with drafting tasks was well established, and software commercially available. For example, Calma, Applicon, and Computervision, established in the late 1960s, sold digitizing and drafting software used for ICs. Zuken Inc. in Japan, established in 1976, sold similar software for PC boards. While these tools were valuable, they did not help with the design portion of the process, which was still done by hand. Design Automation software was developed in the 70s, in academia and within large companies, but it was not until the early 1980s that software to help with the design portion of the process became commercially available.
In 1981, Mentor Graphics was founded by managers from Tektronix, Daisy Systems was founded largely by developers from Intel, and Valid Logic Systems by designers from Lawrence Livermore National Laboratory and Hewlett Packard. Meanwhile companies such as Calma and Zuken attempted to expand into the design, as well as the drafting, portion of the market.
When EDA started, analysts categorized these companies as a niche within the “computer aided design” market, primarily mechanical design drafting tools for conceptualizing bridges, buildings and automobiles. In a few years these fields diverged, and today no companies specialize in both mechanical and electrical design automation.
Cadence Design Systems was founded in the mid 1980s, specializing in physical IC design. Synopsys was founded about the same time frame to productize logic synthesis. Both have grown to be the largest full-line suppliers of EDA tools. Magma Design Automation was founded in 1997 to take advantage of the simplifications possible by building an IC design system from scratch.
Category of software tools for designing electronic systems such as integrated circuits and printed circuit boards
Electronic design automation (EDA), also referred to as electronic computer-aided design (ECAD),[1] is a category of software tools for designing electronic systems such as integrated circuits and printed circuit boards. The tools work together in a design flow that chip designers use to design and analyze entire semiconductor chips. Since a modern semiconductor chip can have billions of components, EDA tools are essential for their design; this article in particular describes EDA specifically with respect to integrated circuits (ICs).
Most analog circuits are still designed in a manual fashion, requiring specialist knowledge that is unique to analog design (such as matching concepts).[2] Hence, analog EDA tools are far less modular, since many more functions are required, they interact more strongly and the components are, in general, less ideal.
EDA for electronics has rapidly increased in importance with the continuous scaling of semiconductor technology.[3] Some users are foundry operators, who operate the semiconductor fabrication facilities ("fabs") and additional individuals responsible for utilising the technology design-service companies who use EDA software to evaluate an incoming design for manufacturing readiness. EDA tools are also used for programming design functionality into FPGAs or field-programmable gate arrays, customisable integrated circuit designs.
Market capitalization and company name as of December 2011:[5]
Note: EEsof should likely be on this list,[10] but it does not have a market cap as it is the EDA division of Keysight.
Many EDA companies acquire small companies with software or other technology that can be adapted to their core business.[11] Most of the market leaders are amalgamations of many smaller companies and this trend is helped by the tendency of software companies to design tools as accessories that fit naturally into a larger vendor's suite of programs on digital circuitry; many new tools incorporate analog design and mixed systems.[12] This is happening due to a trend to place entire electronic systems on a single chip.
Electronic design automation (EDA), also referred to as electronic computer-aided design (ECAD),[1] is a category of software tools for designing electronic systems such as integrated circuits and printed circuit boards. The tools work together in a design flow that chip designers use to design and analyze entire semiconductor chips. Since a modern semiconductor chip can have billions of components, EDA tools are essential for their design; this article in particular describes EDA specifically with respect to integrated circuits (ICs).
Prior to the development of EDA, integrated circuits were designed by hand and manually laid out. Some advanced shops used geometric software to generate tapes for a Gerber photoplotter, responsible for generating a monochromatic exposure image, but even those copied digital recordings of mechanically drawn components. The process was fundamentally graphic, with the translation from electronics to graphics done manually; the best-known company from this era was Calma, whose GDSII format is still in use today. By the mid-1970s, developers started to automate circuit design in addition to drafting and the first placement and routing tools were developed; as this occurred, the proceedings of the Design Automation Conference catalogued the large majority of the developments of the time.
The next era began following the publication of "Introduction to VLSI Systems" by Carver Mead and Lynn Conway in 1980; this groundbreaking text advocated chip design with programming languages that compiled to silicon. The immediate result was a considerable increase in the complexity of the chips that could be designed, with improved access to design verification tools that used logic simulation. Often the chips were easier to lay out and more likely to function correctly, since their designs could be simulated more thoroughly prior to construction. Although the languages and tools have evolved, this general approach of specifying the desired behavior in a textual programming language and letting the tools derive the detailed physical design remains the basis of digital IC design today.
The earliest EDA tools were produced academically. One of the most famous was the "Berkeley VLSI Tools Tarball", a set of UNIX utilities used to design early VLSI systems. Still widely used are the Espresso heuristic logic minimizer, responsible for circuit complexity reductions and Magic, a computer-aided design platform. Another crucial development was the formation of MOSIS, a consortium of universities and fabricators that developed an inexpensive way to train student chip designers by producing real integrated circuits. The basic concept was to use reliable, low-cost, relatively low-technology IC processes and pack a large number of projects per wafer, with several copies of chips from each project remaining preserved. Cooperating fabricators either donated the processed wafers or sold them at cost, as they saw the program helpful to their own long-term growth.
1981 marked the beginning of EDA as an industry. For many years, the larger electronic companies, such as Hewlett Packard, Tektronix and Intel, had pursued EDA internally, with managers and developers beginning to spin out of these companies to concentrate on EDA as a business. Daisy Systems, Mentor Graphics and Valid Logic Systems were all founded around this time and collectively referred to as DMV. In 1981, the U.S. Department of Defense additionally began funding of VHDL as a hardware description language. Within a few years, there were many companies specializing in EDA, each with a slightly different emphasis.
The first trade show for EDA was held at the Design Automation Conference in 1984 and in 1986, Verilog, another popular high-level design language, was first introduced as a hardware description language by Gateway Design Automation. Simulators quickly followed these introductions, permitting direct simulation of chip designs and executable specifications. Within several years, back-ends were developed to perform logic synthesis.
Main articles: Integrated circuit design, Design closure, and Design flow (EDA)
Current digital flows are extremely modular, with front ends producing standardized design descriptions that compile into invocations of units similar to cells without regard to their individual technology. Cells implement logic or other electronic functions via the utilisation of a particular integrated circuit technology. Fabricators generally provide libraries of components for their production processes, with simulation models that fit standard simulation tools.
Most analog circuits are still designed in a manual fashion, requiring specialist knowledge that is unique to analog design (such as matching concepts).[2] Hence, analog EDA tools are far less modular, since many more functions are required, they interact more strongly and the components are, in general, less ideal.
EDA for electronics has rapidly increased in importance with the continuous scaling of semiconductor technology.[3] Some users are foundry operators, who operate the semiconductor fabrication facilities ("fabs") and additional individuals responsible for utilising the technology design-service companies who use EDA software to evaluate an incoming design for manufacturing readiness. EDA tools are also used for programming design functionality into FPGAs or field-programmable gate arrays, customisable integrated circuit designs.
Design flow primarily remains characterised via several primary components; these include:
Further information: List of EDA companies
Market capitalization and company name as of December 2011:[5]
Note: EEsof should likely be on this list,[10] but it does not have a market cap as it is the EDA division of Keysight.
Many EDA companies acquire small companies with software or other technology that can be adapted to their core business.[11] Most of the market leaders are amalgamations of many smaller companies and this trend is helped by the tendency of software companies to design tools as accessories that fit naturally into a larger vendor's suite of programs on digital circuitry; many new tools incorporate analog design and mixed systems.[12] This is happening due to a trend to place entire electronic systems on a single chip.