What does multicore assembly language look like?
Once upon a time, to write x86 assembler, for example, you would have instructions stating "load the EDX register with the value 5", "increment the EDX" register, etc.
With modern CPUs that have 4 cores (or even more), at the machine code level does it just look like there are 4 separate CPUs (i.e. are there just 4 distinct "EDX" registers) ? If so, when you say "increment the EDX register", what determines which CPU's EDX register is incremented? Is there a "CPU context" or "thread" concept in x86 assembler now?
How does communication/synchronization between the cores work?
If you were writing an operating system, what mechanism is exposed via hardware to allow you to schedule execution on different cores? Is it some special priviledged instruction(s)?
If you were writing an optimizing compiler/bytecode VM for a multicore CPU, what would you need to know specifically about, say, x86 to make it generate code that runs efficiently across all the cores?
What changes have been made to x86 machine code to support multi-core functionality?