（美國有線電視新聞網）一個準備在11月份競選國會的堪薩斯州共和黨候選人在本月的一次黨派會議上告訴觀眾，“除了西方文明之外，只有野蠻行為”。
國家參議員史蒂夫·菲茨杰拉德（Steve Fitzgerald）在一場7月2日的萊文沃思縣共和黨會議上發表評論，他正在尋求共和黨候選人提名以取代即將退休的眾議員林恩·詹金斯。在他30多分鐘的演講中，菲茨杰拉德慨嘆人們相信“西方文明就是問題”，他認為基督教世界正在“受到攻擊”，並且他更表明美國計劃生育聯盟計劃（Planned Parenthood）比納粹集中營更糟糕。
根據民主黨人提供的視頻，菲茨杰拉德說：“我們被告知西方文明是世界上的問題。” “在西方文明之外，只有野蠻行為。”
他補充說：“我們的猶太 – 基督教倫理就是文明，而它正受到國內外的攻擊。它也有不同的名稱，基督教世界（Christendom），它正受到攻擊。甚至談論它也會讓你受到攻擊。它讓我受到了攻擊。“
這不是菲茨杰拉德第一次提出有爭議的評論。
今年早些時候，在有人以菲茨杰拉德的名義捐贈給美國計劃生育聯盟之後，他在一封信中告訴該組織，他不同意捐贈，並將他們與納粹集中營Dachau進行比較，後者在大屠殺期間殺死了數千人。
菲茨杰拉德寫道：“這與Dachau的名字一樣糟糕 – 甚至更糟。”
儘管堪薩斯州共和黨和大堪薩斯城拉比協會都要求候選人停止發表類似言論，菲茨杰拉德在7月會議期間重新講述了這個故事並加重了他的評論。
“因此，美國計劃生育聯盟正在忙著與下一代發動戰爭並正贏得勝利，”他說道，然後才注意到他給美國計劃生育聯盟的信件公開之後收到一些言論反彈。
“這是一場風暴。我接到了來自世界各地的電話，’你怎麼敢？你怎麼能？你怎麼能把它與Dachau相提並論？’”他回憶道。 “我說，你是對的，’Dachau真的不是一個更大的殺戮營地，這些傢伙的數字超出了他們所做的任何事情。’”
但隨後菲茨杰拉德闡述了他的論點，告訴共和黨人，歷史上充斥著人們利用和派遣他人的生命，引用奴隸制和美洲印第安人的待遇。
“向黑人詢問奴隸制。問問美洲印第安人，”他說。 “問好了。。。當然，印第安人曾經互相殘殺，白人也曾經互相殘殺。但如果你回到過去，我們從來沒有對互相殘殺的事情有很多內疚，特別是如果你無法反擊的時候。“
他補充說：“除了西方文明之外，只有野蠻行為。墮胎與西方文明不相容。我們需要明確這一點，我們需要讓它得到認可，我們需要讓人們明白我們在談論的是人道主義。”
菲茨杰拉德沒有回應美國有線電視新聞網的訪問請求，他是在11月競選中代表共和黨的七名候選人的其中之一。他在8月7日的共和黨初選中扮演一個反對墮胎的聲音，他的競選網站將他描述為“美國最大的墮胎提供者美國計劃生育聯盟的重複譴責者”。
雖然美國有線電視新聞網表示11月的競選偏向共和黨，但民主黨已經組織人們去支持前堪薩斯州眾議院少數黨領袖保羅戴維斯，以便推翻從北向南延伸，包括堪薩斯城的部分城郊地區的各個選情。戴維斯在民主黨國會競選委員會的幫助下籌集了超過100萬美元，遠遠超過任何共和黨的候選人。
在演講中，菲茨杰拉德還提出了他對唐納德特朗普總統的看法。
儘管他讚揚了總統在任期間的行為，但他承認起初他“根本不喜歡這個人”。
“我根本不喜歡他，”他說。 “但是，真的，我現在愛上他了。我不確定我是否一定想和他一起喝啤酒。但我希望他能繼續前進。繼續做他正在做的事，朋友。”
“我們認為資本主義被鎖定在與社會主義的意識形態鬥爭中，但我們從未真正看到資本主義可能會被自己的孩子 – 技術所擊敗。”
這是彼得泰爾投資公司Thiel Capital的數學家和常務董事埃里克•溫斯坦（Eric Weinstein）最近為BigThink.com開設的視頻。 他認為，技術已經改變了我們的世界，“我們可能需要一種未來的混合模式，這種模式比我們今天的資本主義更加資本主義，甚至可能比我們過去的共產主義更具社會主義色彩。”
這是另一種說社會主義原則可能是唯一可以拯救資本主義的方式。
溫斯坦的思想反映了矽谷對資本主義社會所面臨挑戰的認識不斷提高。 技術將繼續顛覆職業生涯，跨領域的工人將越來越多地流離失所，很多失業的工作可能不會被取代。
因此，矽谷的許多技術專家和企業家正在將普遍基本收入等思想融合在一起，以減輕技術創新的不利影響。
我聯繫溫斯坦談論資本主義的危機 – 我們如何到達這裡，可以做些什麼，以及為什麼他認為不採取行動可能會導致社會崩潰。 他主要擔心的是，億萬富翁階層 （雖然他自己不是其中的一部分，但通過他的工作有所接觸 ） 並沒有認識到社會將需要徹底改變。
“最大的危險，”他告訴我，“真正的富人越來越遠離我們其他人的生活，因此他們對那些仍然按小時賺錢的人的擔憂變得非常不敏感。”因此他警告說，“富人可能不會預料到未來的顛覆改變，我們將看到革命的開始成為這種不敏感的代價。”
您可以在下面閱讀我們經過些微編輯的對話。
如今，“晚期資本主義”這個詞很流行。 你覺得它在分析上有用嗎？
我發現它在語言上是準確的，在政治上具有挑釁性。 我不認為接下來會是缺乏市場。 我認為資本主義失敗並且將被無政府狀態或社會主義所取代。 我認為這可能僅僅是資本主義開始的結束，它的下一階段將繼續其許多基本原則，但幾乎無法辨認。
我想問你下一階段會是什麼樣子，但首先我想知道你是否認為市場資本主義已經過時了？
我相信市場資本主義，正如我們已經理解的那樣，實際上與特定時期有關，在這段時間裡存在某些巧合。 一個人的勞動力的邊際產量與一個人在社會適當水平上消費的邊際需求之間存在巧合。 主要由私人物品和服務組成的經濟體之間也存在匹配，可以徵稅以支付少數公共產品和服務，這些公共產品的市場價格將遠低於這些商品的集體價值。
除此之外，在一個人的青年時期短暫訓練的能力之間也存在巧合，以便獲得一種可靠的技能，這種技能可以在一生中以微小的差異重複，從而導致我們通常稱之為職業或事業，而我相信許多巧合現在正在破裂，因為它們實際上從未被任何基本法則捆綁在一起。
這種崩潰的驅動力很大一部分是技術的提升，你正確地將其描述此為資本主義的孩子。 資本主義的孩子有可能成為它的毀滅者嗎？
這是一個重要的問題。 自從工業革命以來，技術一直是一個有益的原動力，驅使工人從最低價值的活動轉變為更高價值的重複行為。 可是，現代計算機技術提升的問題在於它似乎針對所有的重複行為。 如果將所有人類活動分解為僅發生一次並且不重置自身的行為，以及每天，每週，每月或每年循環的行為，您會發現技術存在消除循環行為的需要，而不僅是追逐我們從低重要性的循環行為到高價值的循環行為。
對於大多數人來說，這種趨勢似乎客觀上是不利的，他們的工作主要包括簡單的重複行動。
我認為這意味著我們比計算機具有優勢，特別是在基於一次性機會的經濟區域。 通常情況下，這是對沖基金經理，創意人員，工程師以及任何實際上都在嘗試做他們以前從未做過的事情的人。 我們從未考慮過的是如何將整個社會，由常規主導，轉變為我們競爭的一次性經濟，我們在機器上具有特定優勢，以及我們做從未做過的事情的能力。
這提出了一個棘手的問題：這種技術經濟所獎勵的技能並不是大多數人擁有的技能。 無論我們提供多少培訓或教育，也許很多人根本無法在這個領域茁壯成長。
我認為這是一個有趣的問題，這在很大程度上取決於你對教育的看法。 巴克敏斯特富勒（一位於1983年去世的傑出美國作家和建築師）說了一句話，“我們都是天生的天才，但是在我們生活的過程中變成平庸。” 我認為，經過幾年的後見之明，我們可以看到，使得我們平庸的事實上就是我們的教育。
問題是我們有一個教育系統，它的基本設計是把我們對探索的自然傾向塑造成願意每天實行例行公事的麻木頭腦。 這是因為我們的教育系統旨在生產適合工作的可就業產品，但正是這些例行工作正逐漸讓位於一次性的創造性工作。
這是一個看似簡單的問題但牽涉非常複雜的解決方法。
問題的一部分是，我們如何重設現在的教育系統，以提醒自己，替我們打掃房間的酒店女傭實際上可能是業餘畫家呢？ 我們的稅務會計師可能是一個午夜劇作家？ 我認為華盛頓許多官僚們不太清楚的是，各行各業都有很多的人才和創造力。
我們還不知道如何為這些行為作經濟支持，因為很多這些劇本，書籍和發明都無法獲得足夠高的市場價格，但這就是超資本主義和超社會主義的共融衍生必須討論的問題。
我們來談談這個。 資本主義和社會主義的混合體是什麼樣的？
我認為我們不知道它的樣子。 我認為資本主義需要更加自由。 某些領域需要經歷一個徹底放鬆管制的過程，以便為能夠創造我們最大創造力的少數人提供實驗和發揮的餘地，因為它們為我們提供了未來經濟所依賴的奇蹟。 。
出於同樣的原因，我們必須明白，我們的人口不是一群投入資本主義機器的工人，而是國家的靈魂，他們的尊嚴，福祉和健康必須以獨立的人道主義術語來考慮。 現在，這並不意味著我們可以單純地使用國家福利來滿足工作場所對人民所提供的尊嚴。
人們將不得不參與社會積極活動，但並非所有這些社會積極活動都能夠在適當的水平上佔據足夠的市場份額，因此我認為我們將不得在超資本主義上加設超社會主義的內容來滿足人們的尊嚴和需要。
我同意大部分內容，但我不確定我們是否已準備好迅速適應這些新情況。 你所描述的是政治和文化的近乎革命性的轉變，而這不是我們可以做的事情。
我相信，一旦我們的頂級創意階層擺脫那些社會消極的障礙，他們就能夠選擇資本主義是通過進化還是革命來進行， 我希望億萬富翁階層會因為自身利益的驅使另他們採取開明的道路，重新思考那些提供絕大多數公民尊重的工作。
你有信心億萬富翁階層如此開明嗎？ 因為我不是。 所有這些變化在幾年前都是可以想像的，但億萬富翁階層在過去幾年卻未能認真對待這一切。 據我所知，對他們來說，創新的衝動和利潤高於其他所有問題。
這很奇怪。 幾年前有一個安靜的轉變，煙霧繚繞的房間停止了對不平等問題的嘲笑，甚至開始私下接受它們。 我希望我可以說改變是由最成功人們心中的善良所驅使的，但我認為這實際上是一種認識，即經濟貢獻的差別所致的不平等並不能解釋現今基於人類靈魂及其需要的不平等程度。
我認為這是一種尷尬和開明的自我利益的結合，這個階級 – 幾個超出我自己的階級 – 試圖確保它不會播下高度破壞性而崩潰社會的種子，我相信我已經看到了不少技術專家中和領先的思想家都有真正的個人轉變，他們已經深入關注他們工作的影響。 他們中很少有人想要被人們標籤為摧毀了工業革命以來積累收益的工作殺手。
所以我認為，就想要留下一個社會積極的遺產而言，他們中的許多人都有動力通過普遍基本收入等概念進行創新，發現華盛頓在社會方面與新技術一樣失去了新思想。
但是我們怎麼讓事情變得如此糟糕？ 我們很早就知道，如果沒有強大的中產階級作為窮人和富人之間的緩衝，政治制度往往會崩潰，但我們卻匆匆忙忙地進入這種不可持續的氣氛。
我達到了生命中一個奇怪的階段，在這個階段我同樣可能會經濟或私人飛行。 因此，我在這個問題上有一個獨特的鏡頭。 我的一位朋友對我說：“現代機場是未來階級戰爭的完美比喻。”我問道，“那你怎麼看？” 他說：“富人在頭等艙和商務艙中首先坐下來，以便窮人可以在經濟中游行，以便注意他們的特權。” 我說，“我認為這個比喻比你的功勞更好，因為那些初次和商業的人實際上都是假富人。真正的富人總是在另一個終端或另一個機場。”
在我看來，最大的危險是真正的富人，我說的是九個和十個人的富人，他們越來越多地與我們其他人的生活分開，這樣他們就會對那些仍然掙錢的人的關注變得非常不敏感。 因此，他們可能不會預料到許多變化，我們將看到革命的爆發作為這種不敏感的代價。
但是，我希望隨著社會動蕩的發展，如果只是為了真正的良好願望，目前將中產階級和較低階層的富人投入怨恨的中產階級和窮人的政治制度將會結束。避免對他們所依賴的穩定性及其所依賴的社會穩定性構成真正的威脅。
我想這是我的觀點。 如果有能力改變事物的人有足夠的責任，他們在沒有時間採取行動的情況下沒有意識到緊急情況，那麼我們這中間階層應該怎麼辦？
嗯，不是沒有警告呀。 我保證，當佔領華爾街示威者離開Zuccotti公園的範圍並來到曼哈頓的上東區住宅時 ，它立即關注那些可以部署大量資金的人的思想。 值得慶幸的是，那些抗議者足夠聰明地意識到和平示威是向尚未進行計算的人宣傳社會變革的最佳方式。
但是，如果你是那些六年前發射過那種和平警告的佔領華爾街抗議者之一，並且你反思自那以後發生了什麼，你是否有理由認為這個消息已被收到？ 你不環顧四周說：“沒有什麼變化”？ 華爾街的賭場經濟仍在嗡嗡作響。 那個案子應該吸取什麼教訓？
嗯，這對銀行家來說太過責備了。 我的意思是，問題在於佔領華爾街的抗議者和銀行家有著共同的錯覺。 他們兩人都認為銀行家在故事中的實力比他們實際上更強大。 我們的社會尚未面對的真正問題是，在1970年左右的某個時候，我們結束了科學，技術和經濟學合理指數增長的幾個時期。 從那時起，我們一直在努力解決這樣一個事實：在第二次世界大戰後的增長期間，我們幾乎所有的機構都根本地擁抱增長主義。
這究竟是什麼意思？
這意味著所有這些機構，無論是律師事務所還是大學還是軍隊，都必須考慮到通過運行龐氏的增長無法維持經濟增長和生產力的穩定狀態[意味著經濟增長和生產率略有波動]。計劃，或試圖蠶食他人以實現一種虛假的增長，以保持這些特定的機構運作。 這是沒有人報導的重大故事。 我們有一個系統範圍的問題，嵌入式增長假設將我們變成了流氓和騙子。
你能否解釋一下，因為這是一個基本問題，我想確保當你說“擁抱增長假設”將我們變成“惡棍和騙子”時，讀者確切地知道你的意思。
當然。 比方說，我有一家成長中的律師事務所，在任何時候都有五位員工支持每一位合夥人，而這些員工希望成為合夥人，這樣他們就可以依次聘請五位員工。 在律師事務所發展的同時，這種等級分工的方法運作良好，但一旦律師事務所達到穩定狀態，每個合夥人實際上只能有一名員工，他必須等多年才能成為該合夥人退休的合夥人。 這種經濟模式不起作用，因為在入門級位置願意接受的長時間工資和減少的工資是基於在短期內接管更高槓桿的位置。 教授和他們的研究生都在重複這一點。 它經常在軍事等級中重複出現。
它幾乎發生在任何地方，當指數增長耗盡時，這些機構中的每一個都必須找到某種方式，要么擁有新的商業模式，要么繼續使用煙霧鏡子來蠶食其他人的收入來源。
因此，我們整個經濟基本上是一個紙牌屋，建立在過時的假設基礎上，並推出了量化寬鬆等噱頭。 我們似乎已經非常善於避免面對我們文明的矛盾。
嗯，這就是問題所在。 我有時稱之為Wile E. Coyote效應，因為只要Wile E. Coyote不向下看，他就會懸浮在空中，即使他剛剛逃離懸崖。 但是最大的危險就是理解一切都被翻轉了。 在2008年危機期間，許多評論員表示，市場突然變得瘋狂，而且正好相反。 艾倫·格林斯潘，蒂莫西·蓋特納和其他人所推動的所謂大肆節制實際上是一種瘋狂，而2008年的危機代表了瘋狂的罕見中斷，市場突然醒來，看看究竟是什麼上。 因此，嚴重的危險不是瘋狂而是理智。
問題在於，當理智終於出現時，長時間的瘋狂只會使災難復雜化。
]]>## List Docker CLI commands
docker
docker container --help
## Display Docker version and info
docker --version
docker version
docker info
## Execute Docker image
docker run hello-world
## List Docker images
docker image ls
## List Docker containers (running, all, all in quiet mode)
docker container ls
docker container ls --all
docker container ls -aq
sudo groupadd docker
sudo gpasswd -a $USER docker
newgrp docker
or log out/in to activate the changes to groups.docker run hello-world
to check if you can run docker without sudo.
# Use an official Python runtime as a parent image
FROM python:2.7-slim
# Set the working directory to /app
WORKDIR /app
# Copy the current directory contents into the container at /app
ADD . /app
# Install any needed packages specified in requirements.txt
RUN pip install --trusted-host pypi.python.org -r requirements.txt
# Make port 80 available to the world outside this container
EXPOSE 80
# Define environment variable
ENV NAME World
# Run app.py when the container launches
CMD ["python", "app.py"]
docker build -t friendlyhello .
docker build -t friendlyhello . # Create image using this directory's Dockerfile
docker run -p 4000:80 friendlyhello # Run "friendlyname" mapping port 4000 to 80
docker run -d -p 4000:80 friendlyhello # Same thing, but in detached mode
docker container ls # List all running containers
docker container ls -a # List all containers, even those not running
docker container stop <hash> # Gracefully stop the specified container
docker container kill <hash> # Force shutdown of the specified container
docker container rm <hash> # Remove specified container from this machine
docker container rm $(docker container ls -a -q) # Remove all containers
docker image ls -a # List all images on this machine
docker image rm <image id> # Remove specified image from this machine
docker image rm $(docker image ls -a -q) # Remove all images from this machine
docker login # Log in this CLI session using your Docker credentials
docker tag <image> username/repository:tag # Tag <image> for upload to registry
docker push username/repository:tag # Upload tagged image to registry
docker run username/repository:tag # Run image from a registry
]]>
While differential privacy is probably the hottest research area in data privacy. It is not immune from critics. The strongest criticism to differential privacy is that one has to increase the noise amplitude as the number of queries increases. In essence, there is a limit of how many times a query can be made until a data set becomes totally drown by noise. Fairly speaking, this may be a fundamental price that needs to be paid for privacy. However, differential privacy demands that the ultimate number of queries need to be known beforehand. No simple retroactive corrective measure has been suggested for the scenario when a data set owner wants to change the possible number of query afterward.
Another potential problem of differential privacy is that it is overly pessimistic. The Laplace mechanism relies on the sensitivity of the query function which is defined in the worst case scenario ignoring the actual data one possesses. It is quite possible that the pair of “adjacent” data sets that maximize the difference in the sensitivity definition will never occur. If so, a much lower noise level can be sufficient to provide the same privacy level. Indeed, the definition of differential privacy is essentially “blind” to the actual data as it requires the chosen condition to be satisfied for all potential data set pairs. It excludes the possibility that one may be better off excluding the worst case scenarios by trimming the records that are “sensitive“ to the query.
Another problem is that the parameter has no physical meaning to the data set owner. There is no intuition what is sufficient to ensure privacy. In contrast, the physical meaning of the corresponding parameters in prior privacy formulations such as -anonymization and -diversity can be interpreted intuitively. Once again, the definition itself is essentially “data blind”—to the extend that the definition is indifferent to the change of the query outcome. It is because the inequality in the definition has to be satisfied by not just for all adjacent data set pairs, but also by all subsets of the range of the query function. For example, for the counting query that is most commonly used in explaining differential privacy, the degrees of protection towards the cases when the counting results being 1 and 10,000 are completely the same. Intuitively, if the query is counting the number of AIDS patients living in a particular zip code, we expect that higher protection to patient privacy should be needed when the number of patients is significantly smaller. Finally, because of the data blind nature, differential privacy completely ignores the possibility of quantifying the utility of a data record. It essentially adds noise evenly to all records despite that some records could be more useful than the rest. Moreover, the parameter , without any apparent realistic interpretation, is usually predetermined rather arbitrarily and controls a hard privacy constraint. A systematic way to trade off privacy and utility (in what way one should adjust if higher utility is desirable) is not given.
]]>If a mechanism is -differential private to begin with, then further operation on is still -differential private.
Let’s assume the operation applying on the output of is stochastic as deterministic operation is just a special case. Then, we have
(1)
where and are “adjacent” data sets and is due to the fact that is -differential private. Therefore, is -differential private.
Actually, the above property should be apparent from the beginning. Because if this is not satisfied, the definition of privacy is meaningless since the “private” information could disclose unauthorized information simply through some malicious processing.
On the other hand, if multiple mechanisms are sequentially explored, we expect that the privacy of the overall mechanism (as a sequential combination of the multiple access) should reduce. We have this very simple lemma as follows.
If we have two mechanisms and are – and -differential private, respectively, the overall mechanism is -differential private provided that the two mechanisms are independent.
(2)
where and are “adjacent” data sets.
One can of course generalize the above lemma to combination of many mechanisms by applying it repeatedly.
We can also compose multiple mechanisms in parallel where each member mechanism can only access a disjoint portion of the data set. Then, we have the following lemma.
If we have two mechanisms and applying on disjoint subsets of a data set and and are -differential private, then the overall mechanism is -differential private provided that the member mechanisms are independent.
Let and are “adjacent” data sets. Since they are adjacent, we cannot have both and . Without loss of generality, assume and . Then,
(3)
Again, the parallel composition lemma can obviously be generalized to the case with more than two member mechanisms.
Consider that we have a data set containing only the gender attribute and we disclose the population sizes of each gender. To ensure the data set is differentially private, one can add laplacian noise to each population size before disclosing. Let and denote the mechanism of outputing the population for the males and females, respectively. If each mechanism is -differential private, the combined mechanism is also -differential private. Note that we have parallel composition here since the gender populations are disjoint.
Consider a similar setting as the previous example but we try to disclose the population sizes of a number of diseases, e.g., AIDS, SARS, lung cancer, etc. Again, we add laplacian noise to each population size before disclosing such that each disclosing mechanism is -differentially private. Let say we consider different diseases here and we will disclose the population size for each of them. The overall mechanism in this case will be -differentially private instead of -differentially private. Note that we have sequential composition here rather than parallel composition since the diseases populations are generally not disjoint (e.g., one can have both AIDS and lung cancer).
]]>For convenient, let’s repeat the definition of differential privacy from the previous post.
The query is -differentially private if and only if for any subset of the range of ,
(1)
where and are any two data sets where they differ by at most one record and one data set is a subset of the other.
It is common in the literature to refer the “private” way to disclose the statistics of a data set to as privacy mechanisms. To ensure differential privacy, two common mechanisms are considered: Laplace mechanism and exponential mechanism. But before discussing them, let’s define the sensitivity of a function applying on a data set as follows.
The sensitivity of a function on a data set, , is defined by
(2)
where and are any two data sets where they differ by at most one record and one data set is a subset of the other.
As one can see from the definition, sensitivity measure to a extreme how much a single record can change the function value of a query. Intuitively, a malicious user may have higher chance to infer private information from a more sensitive query than a less sensitive one. Therefore more protection may be needed for a more sensitve query.
Instead of returning the true computation, , the Laplace mechanism returns the query result plus a Laplacian noise with standard deviation (denoted as ). In other words, it returns .
It is not difficult to show that (1) (and hence differential privacy) will be satisfied. Note that (1) is true for all subset if it is true for all singleton set . Therefore we can restrict ourselves to only consider the latter case. For any , note that
(3)
where is due to triangle inequality () and exponential function being monotonically increasing, and is from the definition of sensitivity.
Now, if the system allows more queries, intuitively more protection (hence more distortions to queries) will be needed since there is a larger chance for a malicious user to identify a record owner via analyzing the additional information.
Consider the case when we have two query functions and . To ensure that a malicious user can’t take advantage the joint statistics of the queries to identify a record, we should modify the differential privacy definition in (1) to
(4)
where and are the modified query output functions for and , respectively. Again, and are any two data sets where they differ by at most one record and one data set is a subset of the other.
As in the single query case, we can construct and by simply adding Laplacian noises to the true results. However, we will require a larger noise as mentioned earlier. Actually, it turns out that Laplacian noise is sufficient. In other words, we may choose and .
Similar to the single query case, to show (4) to be true for all , it is sufficient to show that it is true for any singleton set . Then,
(5)
where is due to triangle inequality and is again from the definition of sensitivity.
When an attribute is categorical, we cannot simply attach a noise to it and so Laplace mechanism will not apply in that case. To get around this, we introduce exponential mechanism, which essentially adds noise to the distribution of the attribute rather than the attribute itself.
Recall that the core idea of privacy mechanism is to introduce randomness to query outcome. Therefore, even for a categorical query the output should not be deterministic. For a data set and a particular query, we may specify an utility function to control the preference of the output. Essentially, the larger , the more likely will be the output.
Given an utility function , we output with probability proportional to , where is the sensitivity of given by
(6)
and and are any two adjacent data sets.
Note that the definition of sensitivity differs from the previous one.
The exponential mechanism described above is differentially private. Intuitively, one cannot gather information of an individual record out of the distribution of the observed outcomes.
Denote as the output of the exponential mechanism.
Therefore, the probability of getting for data set can be written as
(7)
Note that if we shift from to an adjacent data set . The change of numerator is bounded by and the change of denominator is also bounded by . So overall the change in probability is bounded by . Thus the mechanism is -differentially private.
In the next post, we will discuss the effect of compositions to privacy mechanism.
]]>From the point of view of data privacy, we can categorize attributes of a data into three different groups: key attributes, sensitive attributes, and quasi-identifiers. Let’s use medical data as example. Key attributes include attributes such as patient name and social security number. Sensitive attributes can be the diseases the patient possesses. And quasi-identifiers can be zip code and salary range and so on. To protect the patients’ privacy, simply removing key attributes from the database alone is not sufficient. Because a malicious user may take advantage of other prior knowledge to uniquely identify the owner of a record and/or infer sensitive information from the quasi-identifiers.
For example, consider an extremely small data set as shown below. Assume that this is some patient dataset in a very small community. In the table below, ID and name are essentially the key attribute, age and gender are the quasi-identifiers, and the information whether the patient has aids or not is the sensitive attribute.
ID | Name | Age | Gender | AIDS? |
0001 | Janice | 39 | F | No |
0002 | Sam | 44 | M | No |
0003 | Paul | 28 | M | Yes |
0004 | Alice | 25 | F | Yes |
0005 | Jody | 30 | F | No |
If we are going to disclose the data set for research purposes, of course we should remove the key attribute (ID and name) from the data set. But that is not sufficient. Knowing Alice is in her 20s and from the latter three columns alone will imply that she is likely to have AIDS.
Therefore, to ensure the privacy of records’ owners, we cannot always return a truthful answer for any queries. All privacy-ensuring techniques essentially end up introducing distortion to the query answers. Roughly speaking, we can categorize the privacy-ensuring techniques into two classes: pre-query randomization and post-query randomization. For pre-query randomization, we essentially construct a “fake” database to serve as a surrogate in replace of the original one. This can be achieved by adding fake records, deleting true records, distorting records, suppressing some attributes and generalizing some attributes of records. Many privacy methodolgies such as -anonymization and -diversity are proposed along this line of thought. For example, in terms of -diversity, the conditional entropy of a sensitive attribute given any quasi-identifier , , is larger than bits. Note that the conditional entropy here essentially measures the randomness of the sensitive attribute given in terms of number of bits. Therefore, the larger the , the harder to predict from . The advantage of this class of methods is that we can allow users to utilize the data set directly and non-interactively. However, as users are going to use the surrogate data set directly for their studies, the surrogate data set should have all statistics closed to the original one. The latter, of course, is very hard to be guaranteed.
If we restrict the queries that can be made by the users, rather than distorting the data set itself, we may distort the query results afterward. The post-query randomization approach is suitable to data sets that only allow interactive access. An example of this approach is the Laplace mechanism for differential privacy. Even though differential privacy can be made possible in non-interactive setting as well, we will restrict ourselves to interactive setting in this post. In other words, we assume that users cannot directly access the data set but can only make queries regarding the data set.
Differential privacy is based on the premise that the query result of a “private” data set should not change drastically with an addition or deletion of a single record. Otherwise, there can be a chance for malicious users to infer the identity or sensitive information of a private record. Note that differential privacy is a rather strong condition since it does not assume the methods a malicious user could use in extracting that information. Differential privacy does not try to limit what records can or cannot exist in a data set. Actually, the definition is blind to (independent of) the actual data inside the data set. To ensure that the statistics of query results do not change drastically with adding or deleting a record, the query outcome is typically distorted by the addition of noise. As a result, the query outcome function even for a fixed data set should be considered as a random variable.
Therefore, if is private, for any two data sets and where one data set is equal to the other one with an additional record, we want
for all subset of range . The above formulation can be more precisely characterized with the following definition.
The query is -differentially private if and only if for any subset of the range of ,
(1)
where and are any two data sets where they differ by at most one record and one data set is a subset of the other.
Since (1) has to be held by all “adjacent” data sets and (including when and are swapped), (1) apparently can only be satisfied when or .
A necessary condition for differential privacy is that the K-L divergence has to be less than for any “adjacent” and . Since for any singleton set , therefore
(2)
Note that K-L divergence essentially measures the difference between the distributions of and . Thus, the smaller the , the harder to distinguish between and from query and so more difficult to identify the individual record that differs and .
In the next post, we will talk about two privacy mechanisms to achieve differential privacy. Namely, the Laplace mechanism and the exponential mechanism.
]]>The above scenario is an example of symmetric cryptography. The problem is that Alice has to share with Bob the secret key somehow. And if some malicious Eve manages to get the secret key, she can eavesdrop the message without being caught.
To avoid the above problem, it is much better if we have an `”asymmetric” system, where the encryption and decryption keys are different. Alice will then encrypt her message with the encryption key (the public key) and Bob will decipher the ciphertext with the decryption key (the secret key). Basically, when Alice needs to send a secret message to Bob, she will first ask Bob to send her his public key. After receiving Bob’s public key, Alice will encrypt the message with his key. And only Bob can decipher the message with his secret key. Therefore, this is also called public key cryptography.
For the above to work, we need to make sure that while it is easy to compute the original message from the ciphertext with the decryption key, it is computationally infeasible (extremely expensive) to decipher the message without the key. Moreover, we need to make sure that we cannot derive the secret key from the public key, the ciphertext, or the combined information.
RSA encryption algorithm considers the following math problem
,
where , , are the original message, encryption key, and ciphertext, respectively, and is some very large integer. In plain English, the ciphertext is the remainder of divided by .
Note that while it is easy to compute the from given and . It is very difficulty to compute the inverse for large and . Now, the question is what should the secret key be and what should we do with so that we can solve given . Actually it is not even clear from the equation alone that the inverse exists or is unique.
Euler’s theorem luckily comes to the rescue. It states that
for any integer that is coprime with . above is the Euler totient’s function and it is the number of positive integers that are both coprime and smaller than . For example, if is prime and thus does not share any factors with any numbers smaller than it, then is simply .
From Euler’s theorem, for any arbitrary positive integer , we immediately have
.
So if we pick , we have
. In plain English, we can decipher the message as the remainder of divided by .
Let’s borrow an numerical example from this very nice video. In particular, we will construct as the product two prime numbers and . Let’s choose and , then and . Pick and . Note that , which is an integer as desired.
So to transmit a secret message, Bob should first send the public key and to Alice. For example, Alice would like to send the message to Bob. The ciphertext should then be computed as the remainder of divided by , which gives us . And when Bob received , he should decipher the message as the remainder of divided by . This gives us back the original message of .
A really careful reader here may catch a “bug” of our argument above. Note that Euler’s theorem only hold for message which is coprime with . Since , not every integer smaller than is coprime with it. This suggests that we should limit our message only to a subset of positive integers smaller than . However, it turns out that this is not necessary. A complete proof for RSA can be shown using a combination of Chinese remainder theorem and Fermat’s little theorem (a special case of Euler’s theorem). But that would be another story.
]]>
As a simple example, the set of all (real) number under the addition (plus) operation forms a group.
Consider any subset of a group , that is, is composed of some elements of . We will call is a subgroup if itself is a group.
Continue with our example above, the set of all integers (positive, negative, and zero) under the addition operation forms a subgroup (of the set of all numbers under the same operation).
One simple and amazing result by Lagrange is that for any finite group and any of its subgroup , the size of has to be divisible by . The result can be used to prove the Euler’s theorem as described later, which in turn is needed to explain the famous RSA cryptosystem.
The proof turns out to be very simple. Consider any element in , we can create a set (known as a coset and denoted by ) that includes the element if is in . Note that it is easy to show that any two cosets and are identical if is in , where is the inverse of . We will show this in two parts. First, we can show that is a subset of because for any element in , there exists some in such that . By assumption that both and are in and thus is in . Therefore, is in as well and so is a subset of .
Now, let’s show that is a subset of . Note that we can get this for free since is in , which itself is a group, and thus the inverse is in as well (Sanity check: ). From the previous discussion, by just swapping with , we must have being a subset of and thus .
What if is not in ? In this case, must not share any element with . Otherwise, let say if we have in both and , then this means that we should have some and in such that . Rearranging the terms, we have , which is in . This contradicts with our original assumption.
Finally, note that any element of the group is in the coset since ( is a group and thus has to include the unit element ). Combining with the discussion above, all cosets of partition the original group and since all coset has the same size and is equal to the size of , we must have the size of is divisible by the size of .
For any positive integer , consider the set containing the integers where and are coprime, i.e., and do not share any factor. For in , we can now define the group product that is the remainder of divided by . To ensure that is indeed a group, recall that we need to show the followings
1). Product operation is closed and associative
2). Unit exists
3). Inverse exists
Note that given 1). above, and if the set is finite, we can replace the conditions 2). and 3). instead by a cancellation rule as described below
2′). Cancellation rule: if for any elements and , then .
Since the set is finite, for any element , we must be able to find two different positive integers and such that . Without loss of generality, assume , by the cancellation rule, we can cancel some from both sides and thus immediately we have and . And by assumption 1 that the operation is closed. We have the unit element and the inverse of , , exist in the set.
Let’s verify that is indeed a group by verifying that Conditions 1). and 2′). are satisfied. First, let’s show that the defined product operation is closed. Assume that and are in . Therefore, and are coprime and so do and . Immediately we can see that (and also its remainder of ) and cannot share any common factor and thus are coprime as well. Thus, is in and therefore the defined product is closed.
To show that the product is associative, we should verify that for any in , . Note that is defined as the remainder of divided by . That can be written as , where is some positive integer. Repeat the same procedure, we may write as for some positive integer . By a similar token, we may write as for some and . Since both and are remainders that should fall between and , this forces and thus the two results should be identical. Therefore, the defined product is associative.
Finally, let’s show that the cancellation rule (2′) is satisfied. Consider in and assume that . We must have divisible by . Since and are coprime ( is in ), we must have divisible by . And this forces . Therefore, cancellation rule holds. And is indeed a group.
This is a bit long article. But the end result here is pretty neat. It turns out that we have a name for the size of and that is called the Euler’s totient function, which often is denoted as . basically counts the number of positive integers that are both coprime and smaller than . For example, if is prime and thus does not share any factors with any numbers smaller than it, then is simply .
With everything discussed so far, we can now readily show the Euler’s theorem, which states that
for any integer that is coprime with .
Or to translate it with plain English, if and do not share any factor, the remainder of divided by is .
If we pick as some prime , we immediately obtain Fermat’s little theorem from Euler’s theorem since . That is,
.
First, note that since and are coprime, is in . And consider the set and it is easy to see that is a subgroup of . Let say the size of is . Then, should be the minimum positive integer that satisfies (otherwise, the size of won’t be !). Now, by Lagrange’s theorem, the size of , , should be divisible by the size of , , since is a subgroup of . Thus, we should also have . Note that the unit element in the group is actually just . Therefore, the remainder of divided by is .
]]>xrandr -s 640x480
I can then recalibrate the stylus with the usual Wacom Tablet setting.
]]>